Power-User Techniques¶

Kelora includes powerful features that solve complex log analysis problems with minimal code. These techniques often go undiscovered but can dramatically simplify workflows that would otherwise require custom scripts or multiple tools.

When to Use These Techniques¶

You're dealing with deeply nested JSON from APIs or microservices
You need to group similar errors that differ only in variable data
You want deterministic sampling for consistent analysis across log rotations
You're extracting structured data from unstructured text logs
You need privacy-preserving analytics with consistent hashing
You're working with JWTs, URLs, or other complex embedded formats

Pattern Normalization¶

The Problem¶

Error messages and log lines often contain variable data (IPs, emails, UUIDs, numbers) that make grouping difficult:

"Failed to connect to 192.168.1.10"
"Failed to connect to 10.0.5.23"
"Failed to connect to 172.16.88.5"

These are the same error pattern but appear as three different messages.

The Solution: `normalized()`¶

The normalized() function automatically detects and replaces common patterns with placeholders:

Command/Output

echo '{"msg":"User 192.168.1.1 sent email to alice@example.com with ID a1b2c3d4-e5f6-7890-1234-567890abcdef"}' | \
  kelora -j --exec 'e.pattern = e.msg.normalized()' \
  -k pattern

pattern='User <ipv4> sent email to <email> with ID <uuid>'

Real-World Use Case: Error Grouping¶

Group errors by pattern rather than exact message to see that many different error messages are actually the same pattern repeated with different IPs/UUIDs:

Command/OutputLog Data

kelora -j examples/production-errors.jsonl \
  --exec 'e.error_pattern = e.message.normalized()' \
  --metrics \
  --exec 'track_count(e.error_pattern)'

Failed to connect to <ipv4> = 4
Timeout on request <uuid> = 3
User <email> sent invalid request = 3

{"message":"Failed to connect to 192.168.1.10","service":"api","level":"ERROR"}
{"message":"Failed to connect to 10.0.5.23","service":"web","level":"ERROR"}
{"message":"Failed to connect to 172.16.88.5","service":"worker","level":"ERROR"}
{"message":"User alice@example.com sent invalid request","service":"api","level":"WARN"}
{"message":"User bob@test.org sent invalid request","service":"web","level":"WARN"}
{"message":"Timeout on request a1b2c3d4-e5f6-7890-1234-567890abcdef","service":"api","level":"ERROR"}
{"message":"Timeout on request f1e2d3c4-b5a6-9807-5432-098765fedcba","service":"worker","level":"ERROR"}
{"message":"Failed to connect to 203.0.113.42","service":"api","level":"ERROR"}
{"message":"User charlie@example.net sent invalid request","service":"api","level":"WARN"}
{"message":"Timeout on request 11111111-2222-3333-4444-555555555555","service":"web","level":"ERROR"}

Supported Patterns¶

By default, normalized() replaces:

IPv4 addresses → <ipv4>
IPv6 addresses → <ipv6>
Email addresses → <email>
UUIDs → <uuid>
URLs → <url>
Numbers → <num>

Specify specific patterns if you only want certain replacements:

# Only normalize IPs and emails
kelora -j logs.jsonl \
  --exec 'e.pattern = e.message.normalized(["ipv4", "email"])'

Deterministic Sampling with `bucket()`¶

The Problem¶

Random sampling (--head N or random() < 0.1) gives different results each run, making it impossible to track specific requests across multiple log files or rotations.

The Solution: Hash-Based Sampling¶

The bucket() function returns a consistent integer hash for any string, enabling deterministic sampling.

The same request_id always hashes to the same number, so you'll get consistent sampling across multiple log files, log rotations, different days, and distributed systems.

Command/OutputLog Data

kelora -j examples/user-activity.jsonl \
  --filter 'e.user_id.bucket() % 20 == 0' \
  -k user_id,action,timestamp

user_id='user_v5w6x' action='checkout' timestamp='2024-01-15T10:07:00Z'

{"user_id":"user_a1b2c","action":"login","timestamp":"2024-01-15T10:00:00Z"}
{"user_id":"user_d3e4f","action":"view_page","timestamp":"2024-01-15T10:01:00Z"}
{"user_id":"user_g5h6i","action":"purchase","timestamp":"2024-01-15T10:02:00Z"}
{"user_id":"user_j7k8l","action":"logout","timestamp":"2024-01-15T10:03:00Z"}
{"user_id":"user_m9n0o","action":"login","timestamp":"2024-01-15T10:04:00Z"}
{"user_id":"user_p1q2r","action":"view_page","timestamp":"2024-01-15T10:05:00Z"}
{"user_id":"user_s3t4u","action":"add_to_cart","timestamp":"2024-01-15T10:06:00Z"}
{"user_id":"user_v5w6x","action":"checkout","timestamp":"2024-01-15T10:07:00Z"}
{"user_id":"user_y7z8a","action":"login","timestamp":"2024-01-15T10:08:00Z"}
{"user_id":"user_b9c0d","action":"search","timestamp":"2024-01-15T10:09:00Z"}
{"user_id":"user_e1f2g","action":"view_page","timestamp":"2024-01-15T10:10:00Z"}
{"user_id":"user_h3i4j","action":"logout","timestamp":"2024-01-15T10:11:00Z"}
{"user_id":"user_k5l6m","action":"login","timestamp":"2024-01-15T10:12:00Z"}
{"user_id":"user_n7o8p","action":"purchase","timestamp":"2024-01-15T10:13:00Z"}
{"user_id":"user_q9r0s","action":"view_page","timestamp":"2024-01-15T10:14:00Z"}
{"user_id":"user_t1u2v","action":"logout","timestamp":"2024-01-15T10:15:00Z"}
{"user_id":"user_w3x4y","action":"login","timestamp":"2024-01-15T10:16:00Z"}
{"user_id":"user_z5a6b","action":"search","timestamp":"2024-01-15T10:17:00Z"}
{"user_id":"user_c7d8e","action":"add_to_cart","timestamp":"2024-01-15T10:18:00Z"}
{"user_id":"user_f9g0h","action":"purchase","timestamp":"2024-01-15T10:19:00Z"}

This always returns the same 5% of users - run it multiple times and you'll get identical results.

Partition logs for parallel processing:

# Process logs in 4 partitions
for i in {0..3}; do
  kelora -j huge.jsonl \
    --filter "e.request_id.bucket() % 4 == $i" \
    > partition_$i.log &
done
wait

Debug specific sessions across microservices:

# All logs for session IDs ending in 0-2 (30% sample)
kelora -j service-*.jsonl \
  --filter 'e.session_id.bucket() % 10 < 3'

Deep Structure Flattening¶

The Problem¶

APIs return deeply nested JSON that's hard to query or export to flat formats (CSV, SQL):

{
  "api": {
    "queries": [
      {
        "results": {
          "users": [
            {"id": 1, "permissions": {"read": true, "write": true}}
          ]
        }
      }
    ]
  }
}

The Solution: `flattened()`¶

The flattened() function creates a flat map with bracket-notation keys:

Command/OutputLog Data

kelora -j examples/deeply-nested.jsonl \
  --exec 'e.flat = e.api.flattened()' \
  --exec 'print(e.flat.to_json())' -q

{"queries[0].results.users[0].id":1,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"queries[0].results.users[0].id":2,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":false,"queries[0].results.users[1].id":3,"queries[0].results.users[1].permissions.read":false,"queries[0].results.users[1].permissions.write":false}
{"queries[0].results.users[0].id":4,"queries[0].results.users[0].permissions.admin":true,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}

{"api":{"queries":[{"results":{"users":[{"id":1,"permissions":{"read":true,"write":true}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":2,"permissions":{"read":true,"write":false}},{"id":3,"permissions":{"read":false,"write":false}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":4,"permissions":{"read":true,"write":true,"admin":true}}]}}]}}

Advanced: Multi-Level Fan-Out¶

For extremely nested data, combine flattened() with emit_each() to chain multiple levels of nesting into flat records:

Command/OutputLog Data

kelora -j examples/nightmare_deeply_nested_transform.jsonl \
  --filter 'e.request_id == "req_002"' \
  --exec 'emit_each(e.get_path("api.queries[0].results.orders", []))' \
  --exec 'emit_each(e.items)' \
  -k sku,quantity,unit_price,final_price -F csv

sku,quantity,unit_price,final_price
PROD-A,50,99.99,4274.79
PROD-B,25,149.99,3374.78
PROD-C,100,49.99,4999.0

{"request_id":"req_001","timestamp":"2024-01-15T10:00:00Z","api":{"endpoint":"/graphql","queries":[{"operation":"getUsers","filters":{"status":"active","role":{"in":["admin","moderator"]}},"results":{"users":[{"id":1,"name":"alice","permissions":{"read":true,"write":true,"delete":false},"last_login":"2024-01-14T15:30:00Z"},{"id":2,"name":"bob","permissions":{"read":true,"write":false,"delete":false},"last_login":"2024-01-13T09:15:00Z"}],"total":2,"page":1}},{"operation":"getPosts","filters":{"published":true,"tags":["tech","security"]},"results":{"posts":[{"id":101,"title":"Security Best Practices","author_id":1,"tags":["security","authentication"],"metrics":{"views":1523,"likes":89,"comments":[{"user_id":3,"text":"Great post!","sentiment":"positive"},{"user_id":4,"text":"Needs more examples","sentiment":"neutral"}]}},{"id":102,"title":"Tech Trends 2024","author_id":2,"tags":["tech","future"],"metrics":{"views":2341,"likes":156,"comments":[{"user_id":5,"text":"Very insightful","sentiment":"positive"}]}}],"total":2}}]},"response":{"status":200,"duration_ms":245,"cached":false}}
{"request_id":"req_002","timestamp":"2024-01-15T10:00:05Z","api":{"endpoint":"/rest/v2/orders","queries":[{"operation":"listOrders","filters":{"customer":{"region":"us-west","tier":"premium"},"date_range":{"start":"2024-01-01","end":"2024-01-15"}},"results":{"orders":[{"order_id":"ord_501","customer":{"id":1001,"name":"Acme Corp","contacts":[{"type":"primary","email":"orders@acme.com"},{"type":"billing","email":"billing@acme.com"}]},"items":[{"sku":"PROD-A","quantity":50,"unit_price":99.99,"discounts":[{"type":"volume","percent":10},{"type":"loyalty","percent":5}],"final_price":4274.79},{"sku":"PROD-B","quantity":25,"unit_price":149.99,"discounts":[{"type":"volume","percent":10}],"final_price":3374.78}],"totals":{"subtotal":7649.57,"tax":612.36,"shipping":25.00,"grand_total":8286.93},"fulfillment":{"warehouse":"WH-001","status":"shipped","tracking":"TRK12345","estimated_delivery":"2024-01-18"}},{"order_id":"ord_502","customer":{"id":1002,"name":"TechStart Inc","contacts":[{"type":"primary","email":"team@techstart.io"}]},"items":[{"sku":"PROD-C","quantity":100,"unit_price":49.99,"discounts":[],"final_price":4999.00}],"totals":{"subtotal":4999.00,"tax":399.92,"shipping":0.00,"grand_total":5398.92},"fulfillment":{"warehouse":"WH-002","status":"processing","tracking":null,"estimated_delivery":"2024-01-20"}}],"summary":{"total_orders":2,"total_revenue":13685.85,"avg_order_value":6842.93}}}]},"response":{"status":200,"duration_ms":567,"cached":true}}
{"request_id":"req_003","timestamp":"2024-01-15T10:00:10Z","api":{"endpoint":"/analytics/dashboard","queries":[{"operation":"getMetrics","time_range":{"start":"2024-01-15T09:00:00Z","end":"2024-01-15T10:00:00Z","granularity":"5m"},"results":{"timeseries":[{"timestamp":"2024-01-15T09:00:00Z","metrics":{"requests":1523,"errors":12,"latency":{"p50":45,"p95":234,"p99":567},"status_codes":{"2xx":1489,"4xx":22,"5xx":12}}},{"timestamp":"2024-01-15T09:05:00Z","metrics":{"requests":1687,"errors":8,"latency":{"p50":42,"p95":198,"p99":445},"status_codes":{"2xx":1665,"4xx":14,"5xx":8}}},{"timestamp":"2024-01-15T09:10:00Z","metrics":{"requests":1834,"errors":15,"latency":{"p50":48,"p95":267,"p99":623},"status_codes":{"2xx":1801,"4xx":18,"5xx":15}}}],"aggregates":{"total_requests":5044,"total_errors":35,"error_rate":0.69,"avg_latency":45,"peak_requests_per_min":367},"top_endpoints":[{"path":"/api/users","count":1234,"avg_latency":34},{"path":"/api/posts","count":987,"avg_latency":56},{"path":"/api/comments","count":654,"avg_latency":23}]}}]},"response":{"status":200,"duration_ms":1234,"cached":false}}

JWT Parsing Without Verification¶

The Problem¶

You need to inspect JWT claims for debugging but don't want to set up signature verification.

The Solution: `parse_jwt()`¶

Extract header and claims without cryptographic validation:

Command/OutputLog Data

kelora -j examples/auth-logs.jsonl \
  --filter 'e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          e.user = jwt.claims.sub;
          e.role = jwt.claims.role;
          e.expires = jwt.claims.exp;
          e.token = ()' \
  -k timestamp,user,role,expires

timestamp='2024-01-15T10:00:00Z' user='user123' role='admin' expires=1732153600
timestamp='2024-01-15T10:05:00Z' user='user456' role='user' expires=1732157200
timestamp='2024-01-15T10:10:00Z' user='user789' role='guest' expires=1700000000
timestamp='2024-01-15T10:15:00Z' user='user111' role='moderator' expires=1732160800

{"timestamp":"2024-01-15T10:00:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwibmFtZSI6IkFsaWNlIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzMyMTUzNjAwfQ.sig1","status":200}
{"timestamp":"2024-01-15T10:05:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNDU2IiwibmFtZSI6IkJvYiIsInJvbGUiOiJ1c2VyIiwiZXhwIjoxNzMyMTU3MjAwfQ.sig2","status":200}
{"timestamp":"2024-01-15T10:10:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNzg5IiwibmFtZSI6IkNoYXJsaWUiLCJyb2xlIjoiZ3Vlc3QiLCJleHAiOjE3MDAwMDAwMDB9.sig3","status":401}
{"timestamp":"2024-01-15T10:15:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTExIiwibmFtZSI6IkRpYW5hIiwicm9sZSI6Im1vZGVyYXRvciIsImV4cCI6MTczMjE2MDgwMH0.sig4","status":200}

Security Warning: This does NOT validate signatures. Use only for debugging or parsing tokens you already trust.

Use Case: Track Token Expiration Issues¶

Command/OutputLog Data

kelora -j examples/api_errors.jsonl \
  --filter 'e.status == 401 && e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          let now = 1732000000;
          e.expired = jwt.claims.exp < now;
          e.expires_in = jwt.claims.exp - now' \
  --filter 'e.expired == true' \
  -k request_id,user,expires_in

request_id='req-abc123' user='alice' expires_in=-32000000
request_id='req-def456' user='bob' expires_in=-27000000

{"timestamp":"2024-07-17T12:00:00Z","level":"INFO","endpoint":"/health","status":200}
{"timestamp":"2024-07-17T12:00:05Z","level":"ERROR","endpoint":"/api/data","status":500,"error":"database timeout"}
{"timestamp":"2024-07-17T12:00:10Z","level":"INFO","endpoint":"/api/users","status":200}
{"timestamp":"2024-07-17T12:00:12Z","level":"ERROR","endpoint":"/api/admin","status":401,"request_id":"req-abc123","user":"alice","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsInJvbGUiOiJhZG1pbiIsImV4cCI6MTcwMDAwMDAwMH0.sig1","error":"token expired"}
{"timestamp":"2024-07-17T12:00:15Z","level":"INFO","endpoint":"/api/posts","status":200}
{"timestamp":"2024-07-17T12:00:18Z","level":"ERROR","endpoint":"/api/billing","status":401,"request_id":"req-def456","user":"bob","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJib2IiLCJyb2xlIjoidXNlciIsImV4cCI6MTcwNTAwMDAwMH0.sig2","error":"token expired"}
{"timestamp":"2024-07-17T12:00:20Z","level":"ERROR","endpoint":"/api/export","status":503,"error":"service unavailable"}
{"timestamp":"2024-07-17T12:00:25Z","level":"INFO","endpoint":"/health","status":200}

Advanced String Extraction¶

Kelora provides powerful string manipulation beyond basic regex:

Extract Text Between Delimiters¶

Command/Output

echo '{"log":"Response: <data>secret content</data>"}' | \
  kelora -j --exec 'e.content = e.log.between("<data>", "</data>")' \
  -k content

content='secret content'

Extract Before/After Markers¶

Command/Output

echo '{"line":"2024-01-15 10:00:00 | INFO | User logged in"}' | \
  kelora -j --exec 'e.timestamp = e.line.before(" | ");
                     e.level = e.line.after(" | ").before(" | ");
                     e.message = e.line.after(" | ", -1)' \
  -k timestamp,level,message

timestamp='2024-01-15 10:00:00' level='INFO' message='User logged in'

Nth occurrence support:

e.text.after(" | ", 1) - after first occurrence (default)
e.text.after(" | ", -1) - after last occurrence
e.text.after(" | ", 2) - after second occurrence

Extract Multiple Items¶

Command/Output

echo '{"message":"Check https://example.com and http://test.org for more info"}' | \
  kelora -j --exec 'e.urls = e.message.extract_regexes(#"https?://[^\s]+"#)' \
  -F inspect

---
message | string   | "Check https://example.com and http://test.org for more info"
urls    | array(2) | [
  [0] | string | "https://example.com"
  [1] | string | "http://test.org"
]

Fuzzy Matching with Edit Distance¶

Use Case: Find Typos or Similar Errors¶

The edit_distance() function calculates Levenshtein distance to find errors with typos or slight variations:

Command/OutputLog Data

kelora -j examples/error-logs.jsonl \
  --exec 'e.similarity = e.error.edit_distance("connection timeout")' \
  --filter 'e.similarity < 5' \
  -k error,similarity

error='connection timeout' similarity=0
error='connection timed out' similarity=2
error='conecttion timeout' similarity=2
error='conection timeot' similarity=2

{"timestamp":"2024-01-15T10:00:00Z","error":"connection timeout","service":"api"}
{"timestamp":"2024-01-15T10:01:00Z","error":"connection timed out","service":"web"}
{"timestamp":"2024-01-15T10:02:00Z","error":"conecttion timeout","service":"worker"}
{"timestamp":"2024-01-15T10:03:00Z","error":"network timeout","service":"api"}
{"timestamp":"2024-01-15T10:04:00Z","error":"conection timeot","service":"web"}
{"timestamp":"2024-01-15T10:05:00Z","error":"timeout on connection","service":"api"}

Use Case: Detect Configuration Drift¶

Command/Output

echo -e '{"host":"prod-web-01"}\n{"host":"prod-web-02"}\n{"host":"prd-web-01"}' | \
  kelora -j --exec 'e.distance = e.host.edit_distance("prod-web-01")' \
  --filter 'e.distance > 2' \
  -k host,distance

Hash Algorithms¶

The Problem¶

You need to hash data for checksums, deduplication, or correlation with external systems.

The Solution: Cryptographic and Non-Cryptographic Hashing¶

Command/OutputLog Data

kelora -j examples/user-data.jsonl \
  --exec 'e.sha256 = e.email.hash("sha256");
          e.xxh3 = e.email.hash("xxh3");
          e.email = ()' \
  -k user_id,sha256,xxh3 -F csv

user_id,sha256,xxh3
user001,ff8d9819fc0e12bf0d24892e45987e249a28dce836a85cad60e28eaaa8c6d976,76eb895512bf35ff
user002,686b5e4cf4f963adf8f51468a48028ef8d15bd02fa335f821279a3d1678c9615,71ad17ff8e8c867a
user003,653974f7ada0b4cb371ab7c8b1aaeaf6ba2855f89b2b0a9735b664fec7fdbc89,cedd532a6ab34757
user004,80905964842ce834af09045642241f609661deefa60e5e926235b3306582725e,14d867ef05bedac8
user005,d1d8233690c21cb0eba4915374178b71cafa23599a3d1961beaf1bac2faf0b64,30fe5cfbd1ca7cae

{"user_id":"user001","email":"alice@example.com","action":"login","ip":"192.168.1.10"}
{"user_id":"user002","email":"bob@example.org","action":"purchase","ip":"10.0.5.23"}
{"user_id":"user003","email":"charlie@test.net","action":"view","ip":"172.16.88.5"}
{"user_id":"user004","email":"diana@company.com","action":"logout","ip":"192.168.1.11"}
{"user_id":"user005","email":"eve@sample.io","action":"login","ip":"10.0.5.24"}

Available algorithms:

sha256 - SHA-256 (default, cryptographic)
xxh3 - xxHash3 (non-cryptographic, extremely fast)

When to use which:

Use sha256 for checksums, integrity verification, or when you need cryptographic properties
Use xxh3 for bucketing, sampling, or deduplication where speed matters and cryptographic security isn't needed

Use Case: Privacy-Preserving Analytics¶

Create consistent anonymous IDs using HMAC-SHA256 with a secret key for domain-separated hashing:

Command/OutputLog Data

KELORA_SECRET="your-secret-key" kelora -j examples/analytics.jsonl \
  --exec 'e.anon_user = pseudonym(e.email, "users");
          e.anon_session = pseudonym(e.session_id, "sessions");
          e.email = ();
          e.session_id = ()' \
  -k anon_user,anon_session,page,duration -F csv

anon_user,anon_session,page,duration
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/home,45
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/products,120
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/cart,30
kC9USgAtR_OvbKPgcs6kHAp1,jvEOhxqnt1nxVTyK0REoUPRU,/home,15
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/checkout,90
63fKdSofkibwUyAVggSVZHgd,R-lgjpv6mIcOLG0zj66CVbrS,/home,20

{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/home","duration":45}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/products","duration":120}
{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/cart","duration":30}
{"email":"charlie@test.net","session_id":"sess_i9j0k1l2","page":"/home","duration":15}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/checkout","duration":90}
{"email":"alice@example.com","session_id":"sess_m3n4o5p6","page":"/home","duration":20}

Extract JSON from Unstructured Text¶

The Problem¶

Logs contain JSON snippets embedded in plain text:

2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}

The Solution: `extract_json()` and `extract_jsons()`¶

Extract first JSON object:

Command/Output

echo '2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' | \
  kelora --exec 'e.json_str = e.line.extract_json()' \
  --filter 'e.has("json_str")' \
  --exec 'e.error_data = e.json_str' \
  -k line,error_data

line='2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}'
  error_data={"code":500,"message":"Internal error"}

Extract all JSON objects:

Command/Output

echo '{"log":"Found errors: {\"a\":1} and {\"b\":2} in output"}' | \
  kelora -j --exec 'e.all_jsons = e.log.extract_jsons()' \
  -F inspect

---
log       | string   | "Found errors: {"a":1} and {"b":2} in output"
all_jsons | array(2) | [
  [0] | string | "{"a":1}"
  [1] | string | "{"b":2}"
]

Parse Key-Value Pairs from Text¶

The Solution: `absorb_kv()`¶

Extract key=value pairs from unstructured log lines and convert them to structured fields:

Command/OutputLog Data

kelora examples/kv_pairs.log \
  --exec 'e.absorb_kv("line")' \
  -k timestamp,action,user,ip,success -F csv

timestamp,action,user,ip,success
2024-01-15T10:00:00Z,login,alice,192.168.1.10,true
2024-01-15T10:01:00Z,view_page,bob,,
2024-01-15T10:02:00Z,api_call,charlie,,
2024-01-15T10:03:00Z,file_upload,diana,,true
2024-01-15T10:04:00Z,failed_login,eve,203.0.113.5,
2024-01-15T10:05:00Z,password_reset,frank,,
2024-01-15T10:06:00Z,logout,grace,,
2024-01-15T10:07:00Z,api_call,henry,,
2024-01-15T10:08:00Z,privilege_escalation,iris,,false
2024-01-15T10:09:00Z,delete_account,jack,,

user=alice action=login timestamp=2024-01-15T10:00:00Z success=true ip=192.168.1.10
user=bob action=view_page timestamp=2024-01-15T10:01:00Z page=/dashboard duration=1.5
user=charlie action=api_call timestamp=2024-01-15T10:02:00Z endpoint=/api/users method=GET status=200
user=diana action=file_upload timestamp=2024-01-15T10:03:00Z filename=document.pdf size=1048576 success=true
user=eve action=failed_login timestamp=2024-01-15T10:04:00Z attempts=3 locked=true ip=203.0.113.5
user=frank action=password_reset timestamp=2024-01-15T10:05:00Z email=frank@example.com token_sent=true
user=grace action=logout timestamp=2024-01-15T10:06:00Z session_duration=3600 reason=manual
user=henry action=api_call timestamp=2024-01-15T10:07:00Z endpoint=/api/export method=POST bytes=5242880
user=iris action=privilege_escalation timestamp=2024-01-15T10:08:00Z from=user to=admin success=false
user=jack action=delete_account timestamp=2024-01-15T10:09:00Z confirmed=true data_removed=true

Options¶

# Custom separators
kelora logs.log \
  --exec 'e.absorb_kv("line", #{sep: ";", kv_sep: ":"})'

# Keep original line
kelora logs.log \
  --exec 'e.absorb_kv("line", #{keep_source: true})'

Histogram Bucketing with `track_bucket()`¶

The Problem¶

You want to see the distribution of response times, not just average/max.

The Solution: Bucket Tracking¶

Command/OutputLog Data

kelora -j examples/api_logs.jsonl \
  --filter 'e.has("response_time")' \
  --metrics \
  --exec 'let bucket = (e.response_time / 0.5).floor() * 0.5;
          track_bucket("response_ms", bucket)'

response_ms  = #{"0": 11, "0.5": 1, "1": 1, "1.5": 1, "2.5": 1, "5": 1}

{"timestamp":"2025-01-15T10:23:45Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-a1b2c3d4","user_id":42,"response_time":0.234,"status":200,"client_ip":"192.168.1.100","path":"/api/users","method":"GET","referer":"https://app.example.com","metadata":{"subscription":{"tier":"premium","expires":"2025-12-31"},"region":"us-east-1"}}
{"timestamp":"2025-01-15T10:24:12Z","level":"ERROR","service":"auth-service","message":"Connection timeout while validating user credentials","request_id":"req-e5f6g7h8","user_id":103,"response_time":5.123,"status":500,"client_ip":"10.0.5.23","path":"/api/auth/login","method":"POST","error":"ConnectionError: timeout after 5000ms","stack_trace":"at validateCredentials (auth.js:234)\n  at processLogin (handler.js:89)"}
{"timestamp":"2025-01-15T10:24:33Z","level":"INFO","service":"api-gateway","message":"User not found in database","request_id":"req-i9j0k1l2","response_time":0.156,"status":404,"client_ip":"172.16.88.5","path":"/api/users/99999","method":"GET"}
{"timestamp":"2025-01-15T10:25:01Z","level":"WARN","service":"payment-service","message":"Payment processing timeout - retrying","request_id":"req-m3n4o5p6","user_id":42,"response_time":2.567,"status":200,"client_ip":"192.168.1.100","path":"/api/payments","method":"POST","referer":"https://checkout.example.com"}
{"timestamp":"2025-01-15T10:25:18Z","level":"ERROR","service":"database","message":"Database connection pool exhausted","request_id":"req-q7r8s9t0","response_time":0.001,"error":"PoolExhausted: no available connections","severity":"critical"}
{"timestamp":"2025-01-15T10:25:45Z","level":"ERROR","service":"api-gateway","message":"Invalid JWT token provided","request_id":"req-u1v2w3x4","status":401,"client_ip":"198.51.100.77","path":"/api/admin","method":"GET","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzA1MzE3NjAwfQ.dGVzdC1zaWduYXR1cmU"}
{"timestamp":"2025-01-15T10:26:02Z","level":"INFO","service":"cache-service","message":"Cache miss for key user:42:profile","request_id":"req-y5z6a7b8","response_time":0.089}
{"timestamp":"2025-01-15T10:26:23Z","level":"DEBUG","service":"api-gateway","message":"Health check passed","request_id":"req-c9d0e1f2","response_time":0.003,"status":200,"path":"/health"}
{"timestamp":"2025-01-15T10:26:44Z","level":"ERROR","service":"auth-service","message":"Unauthorized access attempt detected","request_id":"req-g3h4i5j6","user_id":999,"status":403,"client_ip":"172.16.88.6","path":"/api/admin/users","method":"DELETE","source_ip":"172.16.88.6"}
{"timestamp":"2025-01-15T10:27:05Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-k7l8m9n0","user_id":42,"response_time":0.456,"status":200,"client_ip":"192.168.1.100","path":"/api/profile","method":"GET","json_payload":"{\"settings\":{\"theme\":\"dark\",\"notifications\":true}}"}
{"timestamp":"2025-01-15T10:27:26Z","level":"ERROR","service":"storage","message":"File upload failed - size limit exceeded","request_id":"req-o1p2q3r4","user_id":156,"status":413,"client_ip":"198.51.100.88","path":"/api/upload","method":"POST","error":"FileSizeError: maximum size 10MB exceeded"}
{"timestamp":"2025-01-15T10:27:47Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-s5t6u7v8","response_time":0.234,"status":200,"client_ip":"203.0.113.50","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:28:08Z","level":"WARN","service":"rate-limiter","message":"Rate limit approaching for user","request_id":"req-w9x0y1z2","user_id":42,"remaining_requests":5,"reset_time":"2025-01-15T11:00:00Z"}
{"timestamp":"2025-01-15T10:28:29Z","level":"INFO","service":"api-gateway","message":"Static content served from CDN","request_id":"req-a3b4c5d6","response_time":0.012,"status":304,"client_ip":"192.168.1.102","path":"/static/app.js"}
{"timestamp":"2025-01-15T10:28:50Z","level":"ERROR","service":"api-gateway","message":"Endpoint not found","request_id":"req-e7f8g9h0","status":404,"client_ip":"172.16.88.7","path":"/wp-admin","method":"GET"}
{"timestamp":"2025-01-15T10:29:11Z","level":"INFO","service":"analytics","message":"Report generated successfully","request_id":"req-i1j2k3l4","user_id":234,"response_time":1.789,"status":200,"client_ip":"198.51.100.99","path":"/api/analytics","method":"GET","metadata":{"report_type":"daily","date":"2025-01-15"}}
{"timestamp":"2025-01-15T10:29:32Z","level":"INFO","service":"auth-service","message":"User logged out successfully","request_id":"req-m5n6o7p8","user_id":42,"response_time":0.023,"status":200,"client_ip":"192.168.1.100","path":"/api/logout","method":"POST"}
{"timestamp":"2025-01-15T10:29:53Z","level":"INFO","service":"search-service","message":"Search query executed","request_id":"req-q9r0s1t2","user_id":178,"response_time":0.567,"status":200,"client_ip":"10.0.5.24","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:30:14Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-u3v4w5x6","response_time":0.089,"status":200,"client_ip":"203.0.113.60","path":"/sitemap.xml"}
{"timestamp":"2025-01-15T10:30:35Z","level":"INFO","service":"order-service","message":"Order query executed","request_id":"req-y7z8a9b0","user_id":789,"response_time":1.123,"status":200,"client_ip":"192.168.1.103","path":"/api/orders","method":"GET","action":"query_orders"}
{"timestamp":"2025-01-15T10:30:56Z","level":"ERROR","service":"payment-service","message":"Payment declined by provider","request_id":"req-c1d2e3f4","user_id":456,"status":402,"client_ip":"192.168.1.104","error":"PaymentDeclined: insufficient funds","severity":"high"}
{"timestamp":"2025-01-15T10:31:17Z","level":"INFO","service":"notification-service","message":"Email notification sent","request_id":"req-g5h6i7j8","user_id":42,"from":"noreply@example.com","email":"alice@example.com"}
{"timestamp":"2025-01-15T10:31:38Z","level":"ERROR","service":"api-gateway","message":"Service unavailable","request_id":"req-k9l0m1n2","status":503,"client_ip":"10.0.5.25","path":"/api/heavy-operation","error":"ServiceUnavailable: upstream timeout"}
{"timestamp":"2025-01-15T10:31:59Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-o3p4q5r6","user_id":42,"response_time":0.167,"status":200,"client_ip":"192.168.1.100","path":"/api/settings","session_id":"sess-abc123"}
{"timestamp":"2025-01-15T10:32:20Z","level":"WARN","service":"auth-service","message":"Multiple failed login attempts detected","request_id":"req-s7t8u9v0","client_ip":"198.51.100.120","attempts":5,"locked":false}

Use Case: HTTP Status Code Distribution¶

Command/OutputLog Data

kelora -f combined examples/web_access.log \
  --metrics \
  --exec 'track_bucket("status", e.status / 100 * 100)'

status       = #{"200": 15, "300": 1, "400": 3, "500": 1}

192.168.1.100 - alice [15/Jan/2025:10:23:45 +0000] "GET /api/users?utm_source=email&user_id=42 HTTP/1.1" 200 1523 "https://marketing.example.com/campaign" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
10.0.5.23 - bob [15/Jan/2025:10:24:12 +0000] "POST /api/orders HTTP/1.1" 201 892 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
172.16.88.5 - - [15/Jan/2025:10:24:33 +0000] "GET /search?q=widgets&page=2 HTTP/1.1" 200 5421 "https://www.google.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)"
192.168.1.100 - alice [15/Jan/2025:10:25:01 +0000] "GET /products/42?utm_source=google&utm_campaign=spring HTTP/1.1" 200 2341 "https://www.google.com/search?q=gadgets" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.45 - - [15/Jan/2025:10:25:18 +0000] "GET /robots.txt HTTP/1.1" 200 158 "-" "GoogleBot/2.1 (+http://www.google.com/bot.html)"

Format Conversion in Pipelines¶

Convert Between Formats On-The-Fly¶

JSON to logfmt:

Command/OutputLog Data

kelora -j examples/simple_json.jsonl \
  --exec 'print(e.to_logfmt())' -q | head -3

level=INFO message="Application started" service=api timestamp=2024-01-15T10:00:00Z version=1.2.3
config_file=/etc/app/config.yml level=DEBUG message="Loading configuration" service=api timestamp=2024-01-15T10:00:05Z
level=INFO max_connections=50 message="Connection pool initialized" service=database timestamp=2024-01-15T10:00:10Z

{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}

Logfmt to JSON:

Command/OutputLog Data

kelora -f logfmt examples/app.log \
  --exec 'print(e.to_json())' -q | head -3

kelora: Parse errors: 8 total
  line 1: Key cannot contain spaces
  line 2: Key cannot contain spaces
  line 3: Key cannot contain spaces
  [+5 more. Use -v to see each error or --no-diagnostics to suppress this summary.]

[2025-01-15 10:00:00] INFO Application started on :8080
[2025-01-15 10:00:05] INFO Connected to database db-primary
[2025-01-15 10:00:12] WARN Slow query detected: 450ms (threshold: 200ms)

Use Case: Normalize Multi-Format Logs¶

Handle logs with mixed JSON and logfmt lines:

Command/OutputLog Data

kelora examples/nightmare_mixed_formats.log \
  --exec 'if e.line.contains("{") {
    let json_str = e.line.extract_json();
    e.data = json_str
  } else if e.line.contains("=") {
    e.data = e.line.parse_kv()
  }' \
  --filter 'e.has("data")' \
  -F json | head -5

{"data":{"connections":50,"format":"json","level":"DEBUG","message":"Connection pool initialized","timestamp":"2024-01-15T10:00:01Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:01Z\",\"level\":\"DEBUG\",\"format\":\"json\",\"message\":\"Connection pool initialized\",\"connections\":50}"}
{"data":{"format":"logfmt","level":"info","msg":"\"Cache","size":"1024","timestamp":"2024-01-15T10:00:02Z"},"line":"timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg=\"Cache layer ready\" size=1024"}
{"data":{"level":"WARN","nested":{"data":{"deeply":{"buried":{"value":"hard to extract with jq"}}}},"timestamp":"2024-01-15T10:00:05Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:05Z\",\"level\":\"WARN\",\"nested\":{\"data\":{\"deeply\":{\"buried\":{\"value\":\"hard to extract with jq\"}}}}}"}
{"data":{"err":"\"connection","level":"error","max_retries":"5","retry":"3","timestamp":"2024-01-15T10:00:07Z"},"line":"timestamp=2024-01-15T10:00:07Z level=error err=\"connection timeout\" retry=3 max_retries=5"}
{"data":{"action":"batch_process","timestamp":"2024-01-15T10:00:09Z","users":[{"id":1,"name":"alice"},{"id":2,"name":"bob"}]},"line":"{\"timestamp\":\"2024-01-15T10:00:09Z\",\"users\":[{\"id\":1,\"name\":\"alice\"},{\"id\":2,\"name\":\"bob\"}],\"action\":\"batch_process\"}"}

2024-01-15 10:00:00 [INFO] Server starting
{"timestamp":"2024-01-15T10:00:01Z","level":"DEBUG","format":"json","message":"Connection pool initialized","connections":50}
timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg="Cache layer ready" size=1024
<34>Jan 15 10:00:03 appserver syslog: Authentication module loaded
web_1    | 2024-01-15 10:00:04 [INFO] HTTP server listening on port 8080

Stateful Processing with `state`¶

When to Use `state`¶

The state global map enables complex stateful processing that track_*() functions cannot handle:

Deduplication: Track which IDs have already been seen
Cross-event dependencies: Make decisions based on previous events
Complex objects: Store nested maps, arrays, or other structured data
Conditional logic: Remember arbitrary state across events
State machines: Track connection states, session lifecycles
Event correlation: Match request/response pairs, build sessions

Quick Decision Guide:

Feature	`state`	`track_*()`
Purpose	Complex stateful logic	Simple metrics & aggregations
Read access	✅ Yes (during processing)	❌ No (write-only, read in `--end`)
Parallel mode	❌ Sequential only	✅ Works in parallel
Storage	Any Rhai value	Any value (strings, numbers, etc.)
Performance	Slower (RwLock)	Faster (atomic/optimized)
Use for	Deduplication, FSMs, correlation	Counting, unique tracking, bucketing

Important: For simple counting and metrics, prefer track_count(), track_sum(), etc.—they work in both sequential and parallel modes. state only works in sequential mode.

The Problem: Deduplication¶

You have logs with duplicate entries for the same request ID, but you only want to process each unique request once:

{"request_id": "req-001", "status": "start"}
{"request_id": "req-002", "status": "start"}
{"request_id": "req-001", "status": "duplicate"}  ← Skip this
{"request_id": "req-003", "status": "start"}

The Solution: Track Seen IDs with `state`¶

kelora -j logs.jsonl \
  --exec 'if !state.contains(e.request_id) {
    state[e.request_id] = true;
    e.is_first = true;
  } else {
    e.is_first = false;
  }' \
  --filter 'e.is_first == true' \
  -k request_id,status

Only first occurrences pass through; duplicates are filtered out.

Use Case: Track Complex Per-User State¶

Store nested maps to track multiple attributes per user:

Command/OutputLog Data

kelora -j examples/user-events.jsonl \
  --exec 'if !state.contains(e.user) {
    state[e.user] = #{login_count: 0, last_seen: (), errors: []};
  }
  let user_state = state[e.user];
  user_state.login_count += 1;
  user_state.last_seen = e.timestamp;
  if e.has("error") {
    user_state.errors.push(e.error);
  }
  state[e.user] = user_state;
  e.user_login_count = user_state.login_count' \
  -k timestamp,user,user_login_count

timestamp='2024-01-15T10:00:00Z' user='alice' user_login_count=1
timestamp='2024-01-15T10:01:00Z' user='bob' user_login_count=1
timestamp='2024-01-15T10:02:00Z' user='alice' user_login_count=2
timestamp='2024-01-15T10:03:00Z' user='alice' user_login_count=3
timestamp='2024-01-15T10:04:00Z' user='bob' user_login_count=2
timestamp='2024-01-15T10:05:00Z' user='charlie' user_login_count=1
timestamp='2024-01-15T10:06:00Z' user='alice' user_login_count=4
timestamp='2024-01-15T10:07:00Z' user='bob' user_login_count=3
timestamp='2024-01-15T10:08:00Z' user='charlie' user_login_count=2
timestamp='2024-01-15T10:09:00Z' user='alice' user_login_count=5

{"timestamp":"2024-01-15T10:00:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:01:00Z","user":"bob","event":"login"}
{"timestamp":"2024-01-15T10:02:00Z","user":"alice","event":"view_page"}
{"timestamp":"2024-01-15T10:03:00Z","user":"alice","event":"error","error":"timeout"}
{"timestamp":"2024-01-15T10:04:00Z","user":"bob","event":"purchase"}
{"timestamp":"2024-01-15T10:05:00Z","user":"charlie","event":"login"}
{"timestamp":"2024-01-15T10:06:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:07:00Z","user":"bob","event":"error","error":"payment_failed"}
{"timestamp":"2024-01-15T10:08:00Z","user":"charlie","event":"view_page"}
{"timestamp":"2024-01-15T10:09:00Z","user":"alice","event":"logout"}

Use Case: Sequential Event Numbering¶

Assign a global sequence number across all events:

kelora -j logs.jsonl \
  --begin 'state["count"] = 0' \
  --exec 'state["count"] += 1; e.seq = state["count"]' \
  -k seq,timestamp,message -F csv

Note: For simple counting by category, use track_count(e.category) instead.

Converting State to Regular Map¶

state is a special StateMap type with limited operations. To use map functions like .to_logfmt() or .to_kv(), convert it first:

Command/OutputLog Data

kelora -j examples/simple_json.jsonl \
  --exec 'state[e.level] = (state.get(e.level) ?? 0) + 1' \
  --end 'print(state.to_map().to_logfmt())' -q

CRITICAL=1 DEBUG=4 ERROR=3 INFO=9 WARN=3

{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}
{"timestamp":"2024-01-15T10:01:00Z","level":"WARN","service":"api","message":"High memory usage detected","memory_percent":85}
{"timestamp":"2024-01-15T10:01:30Z","level":"ERROR","service":"database","message":"Query timeout","query":"SELECT * FROM users","duration_ms":5000}

Use Case: Event Correlation (Request/Response Pairs)¶

Match request and response events, calculating latency and emitting complete transactions:

kelora -j api-events.jsonl \
  --exec 'if e.event_type == "request" {
    state[e.request_id] = #{sent_at: e.timestamp, method: e.method};
    e = ();  # Don't emit until we see response
  } else if e.event_type == "response" && state.contains(e.request_id) {
    let req = state[e.request_id];
    e.duration_ms = (e.timestamp - req.sent_at).as_millis();
    e.method = req.method;
    state.remove(e.request_id);  # Clean up
  }' \
  -k request_id,method,duration_ms,status

Use Case: State Machines for Protocol Analysis¶

Track connection states through their lifecycle:

kelora -j network-events.jsonl \
  --exec 'if !state.contains(e.conn_id) {
    state[e.conn_id] = "NEW";
  }
  let current_state = state[e.conn_id];

  # State transitions
  if current_state == "NEW" && e.event == "SYN" {
    state[e.conn_id] = "SYN_SENT";
  } else if current_state == "SYN_SENT" && e.event == "SYN_ACK" {
    state[e.conn_id] = "ESTABLISHED";
  } else if current_state == "ESTABLISHED" && e.event == "FIN" {
    state[e.conn_id] = "CLOSING";
  } else if e.event != "DATA" {
    e.protocol_error = true;  # Invalid transition
  }
  e.connection_state = state[e.conn_id]' \
  --filter 'e.has("protocol_error")' \
  -k timestamp,conn_id,event,connection_state

Use Case: Session Reconstruction¶

Accumulate events into complete sessions, emitting only when session ends:

kelora -j user-events.jsonl \
  --exec 'if e.event == "login" {
    state[e.session_id] = #{
      user: e.user,
      events: [],
      start: e.timestamp
    };
  }
  if state.contains(e.session_id) {
    state[e.session_id].events.push(#{event: e.event, ts: e.timestamp});
  }
  if e.event == "logout" {
    let session = state[e.session_id];
    session.end = e.timestamp;
    session.event_count = session.events.len();
    print(session.to_json());
    state.remove(e.session_id);
  }
  e = ()' -q  # Suppress individual events, only emit complete sessions

Use Case: Rate Limiting - Sample First N per Key¶

Only emit the first 100 events per API key, then suppress the rest:

kelora -j api-logs.jsonl \
  --exec 'if !state.contains(e.api_key) {
    state[e.api_key] = 0;
  }
  state[e.api_key] += 1;
  if state[e.api_key] > 100 {
    e = ();  # Drop after first 100 per key
  }' \
  -k timestamp,api_key,endpoint

Performance and Memory Management¶

For large state maps (millions of keys), consider periodic cleanup:

kelora -j huge-logs.jsonl \
  --exec 'if !state.contains("counter") { state["counter"] = 0; }
  state["counter"] += 1;

  # Periodic cleanup every 100k events
  if state["counter"] % 100000 == 0 {
    eprint("State size: " + state.len() + " keys");
    if state.len() > 500000 {
      state.clear();  # Reset if too large
      eprint("State cleared");
    }
  }

  # Your stateful logic here
  if !state.contains(e.request_id) {
    state[e.request_id] = true;
  } else {
    e = ();
  }'

Parallel Mode Restriction¶

state requires sequential processing to maintain consistency. Using it with --parallel causes a runtime error:

# This will fail:
kelora -j logs.jsonl --parallel \
  --exec 'state["count"] += 1'
# Error: 'state' is not available in --parallel mode

For parallel-safe tracking, use track_*() functions instead.

Combining Techniques¶

The real power comes from combining these features. Here's a complex real-world example:

# Process deeply nested API logs with privacy controls
kelora -j api-responses.jsonl \
  --filter 'e.api_version == "v2"' \
  --exec 'emit_each(e.get_path("data.orders", []))' \
  --exec 'emit_each(e.items)' \
  --exec 'e.error_pattern = e.get("error_msg", "").normalized();
          e.user_hash = e.user_id.hash("xxh3");
          e.sample_group = e.order_id.bucket() % 10;
          e.user_id = ()' \
  --filter 'e.sample_group < 3' \
  --metrics \
  --exec 'track_count(e.error_pattern);
          track_sum("revenue", e.price * e.quantity)' \
  -k order_id,sku,quantity,price,error_pattern -F csv \
  > processed_orders.csv

This pipeline:

Filters to API v2 only
Fans out nested orders → items (multi-level)
Normalizes error patterns
Hashes user IDs for privacy
Creates deterministic 30% sample
Tracks error patterns and revenue
Exports flat CSV

All in a single command without temporary files or custom scripts.

Performance Tips¶

Use bucket() for sampling before heavy processing - reduces work by 90% with 10% sample
Apply filters early - before fan-out or expensive transformations
Chain operations in one --exec when sharing variables (semicolon-separated)
Use xxh3 hash for non-cryptographic use cases (much faster than sha256)
Limit window size (--window N) to minimum needed for sliding calculations

Troubleshooting¶

"Function not found" errors:

Check spelling and capitalization (Rhai is case-sensitive)
Verify the function exists in kelora --help-functions

() (unit) value errors:

Guard optional fields: if e.has("field") { ... }
Use safe conversions: to_int_or(e.field, 0)

Pattern normalization doesn't work:

Check that patterns exist in input: echo "test 192.168.1.1" | kelora --exec '...'
Verify pattern names: normalized(["ipv4", "email"]) not ["ip", "emails"]

Hash consistency issues:

Same input + same algorithm = same hash (deterministic)
Different Kelora versions may use different hash implementations
Use KELORA_SECRET env var for pseudonym() to ensure domain separation

Power-User Techniques¶

When to Use These Techniques¶

Pattern Normalization¶

The Problem¶

The Solution: normalized()¶

Real-World Use Case: Error Grouping¶

Supported Patterns¶

Deterministic Sampling with bucket()¶

The Problem¶

The Solution: Hash-Based Sampling¶

Deep Structure Flattening¶

The Problem¶

The Solution: flattened()¶

Advanced: Multi-Level Fan-Out¶

JWT Parsing Without Verification¶

The Problem¶

The Solution: parse_jwt()¶

Use Case: Track Token Expiration Issues¶

Advanced String Extraction¶

Extract Text Between Delimiters¶

Extract Before/After Markers¶

Extract Multiple Items¶

Fuzzy Matching with Edit Distance¶

Use Case: Find Typos or Similar Errors¶

Use Case: Detect Configuration Drift¶

Hash Algorithms¶

The Problem¶

The Solution: Cryptographic and Non-Cryptographic Hashing¶

Use Case: Privacy-Preserving Analytics¶

Extract JSON from Unstructured Text¶

The Problem¶

The Solution: extract_json() and extract_jsons()¶

Parse Key-Value Pairs from Text¶

The Solution: absorb_kv()¶

Options¶

Histogram Bucketing with track_bucket()¶

The Problem¶

The Solution: Bucket Tracking¶

Use Case: HTTP Status Code Distribution¶

Format Conversion in Pipelines¶

Convert Between Formats On-The-Fly¶

Use Case: Normalize Multi-Format Logs¶

Stateful Processing with state¶

When to Use state¶

The Problem: Deduplication¶

The Solution: Track Seen IDs with state¶

Use Case: Track Complex Per-User State¶

Use Case: Sequential Event Numbering¶

Converting State to Regular Map¶

Use Case: Event Correlation (Request/Response Pairs)¶

Use Case: State Machines for Protocol Analysis¶

Use Case: Session Reconstruction¶

Use Case: Rate Limiting - Sample First N per Key¶

Performance and Memory Management¶

Parallel Mode Restriction¶

Combining Techniques¶

Performance Tips¶

Troubleshooting¶

See Also¶

The Solution: `normalized()`¶

Deterministic Sampling with `bucket()`¶

The Solution: `flattened()`¶

The Solution: `parse_jwt()`¶

The Solution: `extract_json()` and `extract_jsons()`¶

The Solution: `absorb_kv()`¶

Histogram Bucketing with `track_bucket()`¶

Stateful Processing with `state`¶

When to Use `state`¶

The Solution: Track Seen IDs with `state`¶