Skip to content

Power-User Techniques

Kelora includes powerful features that solve complex log analysis problems with minimal code. These techniques often go undiscovered but can dramatically simplify workflows that would otherwise require custom scripts or multiple tools.

When to Use These Techniques

  • You're dealing with deeply nested JSON from APIs or microservices
  • You need to group similar errors that differ only in variable data
  • You want deterministic sampling for consistent analysis across log rotations
  • You're extracting structured data from unstructured text logs
  • You need privacy-preserving analytics with consistent hashing
  • You're working with JWTs, URLs, or other complex embedded formats

Pattern Normalization

The Problem

Error messages and log lines often contain variable data (IPs, emails, UUIDs, numbers) that make grouping difficult:

"Failed to connect to 192.168.1.10"
"Failed to connect to 10.0.5.23"
"Failed to connect to 172.16.88.5"

These are the same error pattern but appear as three different messages.

The Solution: normalized()

The normalized() function automatically detects and replaces common patterns with placeholders:

echo '{"msg":"User 192.168.1.1 sent email to alice@example.com with ID a1b2c3d4-e5f6-7890-1234-567890abcdef"}' | \
  kelora -j --exec 'e.pattern = e.msg.normalized()' \
  -k pattern
pattern='User <ipv4> sent email to <email> with ID <uuid>'

Real-World Use Case: Error Grouping

Group errors by pattern rather than exact message to see that many different error messages are actually the same pattern repeated with different IPs/UUIDs:

kelora -j examples/production-errors.jsonl \
  --exec 'e.error_pattern = e.message.normalized()' \
  --metrics \
  --exec 'track_count(e.error_pattern)'
Failed to connect to <ipv4> = 4
Timeout on request <uuid> = 3
User <email> sent invalid request = 3
{"message":"Failed to connect to 192.168.1.10","service":"api","level":"ERROR"}
{"message":"Failed to connect to 10.0.5.23","service":"web","level":"ERROR"}
{"message":"Failed to connect to 172.16.88.5","service":"worker","level":"ERROR"}
{"message":"User alice@example.com sent invalid request","service":"api","level":"WARN"}
{"message":"User bob@test.org sent invalid request","service":"web","level":"WARN"}
{"message":"Timeout on request a1b2c3d4-e5f6-7890-1234-567890abcdef","service":"api","level":"ERROR"}
{"message":"Timeout on request f1e2d3c4-b5a6-9807-5432-098765fedcba","service":"worker","level":"ERROR"}
{"message":"Failed to connect to 203.0.113.42","service":"api","level":"ERROR"}
{"message":"User charlie@example.net sent invalid request","service":"api","level":"WARN"}
{"message":"Timeout on request 11111111-2222-3333-4444-555555555555","service":"web","level":"ERROR"}

Supported Patterns

By default, normalized() replaces:

  • IPv4 addresses → <ipv4>
  • IPv6 addresses → <ipv6>
  • Email addresses → <email>
  • UUIDs → <uuid>
  • URLs → <url>
  • Numbers → <num>

Specify specific patterns if you only want certain replacements:

# Only normalize IPs and emails
kelora -j logs.jsonl \
  --exec 'e.pattern = e.message.normalized(["ipv4", "email"])'

Deterministic Sampling with bucket()

The Problem

Random sampling (--head N or random() < 0.1) gives different results each run, making it impossible to track specific requests across multiple log files or rotations.

The Solution: Hash-Based Sampling

The bucket() function returns a consistent integer hash for any string, enabling deterministic sampling.

The same request_id always hashes to the same number, so you'll get consistent sampling across multiple log files, log rotations, different days, and distributed systems.

kelora -j examples/user-activity.jsonl \
  --filter 'e.user_id.bucket() % 20 == 0' \
  -k user_id,action,timestamp
user_id='user_v5w6x' action='checkout' timestamp='2024-01-15T10:07:00Z'
{"user_id":"user_a1b2c","action":"login","timestamp":"2024-01-15T10:00:00Z"}
{"user_id":"user_d3e4f","action":"view_page","timestamp":"2024-01-15T10:01:00Z"}
{"user_id":"user_g5h6i","action":"purchase","timestamp":"2024-01-15T10:02:00Z"}
{"user_id":"user_j7k8l","action":"logout","timestamp":"2024-01-15T10:03:00Z"}
{"user_id":"user_m9n0o","action":"login","timestamp":"2024-01-15T10:04:00Z"}
{"user_id":"user_p1q2r","action":"view_page","timestamp":"2024-01-15T10:05:00Z"}
{"user_id":"user_s3t4u","action":"add_to_cart","timestamp":"2024-01-15T10:06:00Z"}
{"user_id":"user_v5w6x","action":"checkout","timestamp":"2024-01-15T10:07:00Z"}
{"user_id":"user_y7z8a","action":"login","timestamp":"2024-01-15T10:08:00Z"}
{"user_id":"user_b9c0d","action":"search","timestamp":"2024-01-15T10:09:00Z"}
{"user_id":"user_e1f2g","action":"view_page","timestamp":"2024-01-15T10:10:00Z"}
{"user_id":"user_h3i4j","action":"logout","timestamp":"2024-01-15T10:11:00Z"}
{"user_id":"user_k5l6m","action":"login","timestamp":"2024-01-15T10:12:00Z"}
{"user_id":"user_n7o8p","action":"purchase","timestamp":"2024-01-15T10:13:00Z"}
{"user_id":"user_q9r0s","action":"view_page","timestamp":"2024-01-15T10:14:00Z"}
{"user_id":"user_t1u2v","action":"logout","timestamp":"2024-01-15T10:15:00Z"}
{"user_id":"user_w3x4y","action":"login","timestamp":"2024-01-15T10:16:00Z"}
{"user_id":"user_z5a6b","action":"search","timestamp":"2024-01-15T10:17:00Z"}
{"user_id":"user_c7d8e","action":"add_to_cart","timestamp":"2024-01-15T10:18:00Z"}
{"user_id":"user_f9g0h","action":"purchase","timestamp":"2024-01-15T10:19:00Z"}

This always returns the same 5% of users - run it multiple times and you'll get identical results.

Partition logs for parallel processing:

# Process logs in 4 partitions
for i in {0..3}; do
  kelora -j huge.jsonl \
    --filter "e.request_id.bucket() % 4 == $i" \
    > partition_$i.log &
done
wait

Debug specific sessions across microservices:

# All logs for session IDs ending in 0-2 (30% sample)
kelora -j service-*.jsonl \
  --filter 'e.session_id.bucket() % 10 < 3'

Deep Structure Flattening

The Problem

APIs return deeply nested JSON that's hard to query or export to flat formats (CSV, SQL):

{
  "api": {
    "queries": [
      {
        "results": {
          "users": [
            {"id": 1, "permissions": {"read": true, "write": true}}
          ]
        }
      }
    ]
  }
}

The Solution: flattened()

The flattened() function creates a flat map with bracket-notation keys:

kelora -j examples/deeply-nested.jsonl \
  --exec 'e.flat = e.api.flattened()' \
  --exec 'print(e.flat.to_json())' -q
{"queries[0].results.users[0].id":1,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"queries[0].results.users[0].id":2,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":false,"queries[0].results.users[1].id":3,"queries[0].results.users[1].permissions.read":false,"queries[0].results.users[1].permissions.write":false}
{"queries[0].results.users[0].id":4,"queries[0].results.users[0].permissions.admin":true,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"api":{"queries":[{"results":{"users":[{"id":1,"permissions":{"read":true,"write":true}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":2,"permissions":{"read":true,"write":false}},{"id":3,"permissions":{"read":false,"write":false}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":4,"permissions":{"read":true,"write":true,"admin":true}}]}}]}}

Advanced: Multi-Level Fan-Out

For extremely nested data, combine flattened() with emit_each() to chain multiple levels of nesting into flat records:

kelora -j examples/nightmare_deeply_nested_transform.jsonl \
  --filter 'e.request_id == "req_002"' \
  --exec 'emit_each(e.get_path("api.queries[0].results.orders", []))' \
  --exec 'emit_each(e.items)' \
  -k sku,quantity,unit_price,final_price -F csv
sku,quantity,unit_price,final_price
PROD-A,50,99.99,4274.79
PROD-B,25,149.99,3374.78
PROD-C,100,49.99,4999.0
{"request_id":"req_001","timestamp":"2024-01-15T10:00:00Z","api":{"endpoint":"/graphql","queries":[{"operation":"getUsers","filters":{"status":"active","role":{"in":["admin","moderator"]}},"results":{"users":[{"id":1,"name":"alice","permissions":{"read":true,"write":true,"delete":false},"last_login":"2024-01-14T15:30:00Z"},{"id":2,"name":"bob","permissions":{"read":true,"write":false,"delete":false},"last_login":"2024-01-13T09:15:00Z"}],"total":2,"page":1}},{"operation":"getPosts","filters":{"published":true,"tags":["tech","security"]},"results":{"posts":[{"id":101,"title":"Security Best Practices","author_id":1,"tags":["security","authentication"],"metrics":{"views":1523,"likes":89,"comments":[{"user_id":3,"text":"Great post!","sentiment":"positive"},{"user_id":4,"text":"Needs more examples","sentiment":"neutral"}]}},{"id":102,"title":"Tech Trends 2024","author_id":2,"tags":["tech","future"],"metrics":{"views":2341,"likes":156,"comments":[{"user_id":5,"text":"Very insightful","sentiment":"positive"}]}}],"total":2}}]},"response":{"status":200,"duration_ms":245,"cached":false}}
{"request_id":"req_002","timestamp":"2024-01-15T10:00:05Z","api":{"endpoint":"/rest/v2/orders","queries":[{"operation":"listOrders","filters":{"customer":{"region":"us-west","tier":"premium"},"date_range":{"start":"2024-01-01","end":"2024-01-15"}},"results":{"orders":[{"order_id":"ord_501","customer":{"id":1001,"name":"Acme Corp","contacts":[{"type":"primary","email":"orders@acme.com"},{"type":"billing","email":"billing@acme.com"}]},"items":[{"sku":"PROD-A","quantity":50,"unit_price":99.99,"discounts":[{"type":"volume","percent":10},{"type":"loyalty","percent":5}],"final_price":4274.79},{"sku":"PROD-B","quantity":25,"unit_price":149.99,"discounts":[{"type":"volume","percent":10}],"final_price":3374.78}],"totals":{"subtotal":7649.57,"tax":612.36,"shipping":25.00,"grand_total":8286.93},"fulfillment":{"warehouse":"WH-001","status":"shipped","tracking":"TRK12345","estimated_delivery":"2024-01-18"}},{"order_id":"ord_502","customer":{"id":1002,"name":"TechStart Inc","contacts":[{"type":"primary","email":"team@techstart.io"}]},"items":[{"sku":"PROD-C","quantity":100,"unit_price":49.99,"discounts":[],"final_price":4999.00}],"totals":{"subtotal":4999.00,"tax":399.92,"shipping":0.00,"grand_total":5398.92},"fulfillment":{"warehouse":"WH-002","status":"processing","tracking":null,"estimated_delivery":"2024-01-20"}}],"summary":{"total_orders":2,"total_revenue":13685.85,"avg_order_value":6842.93}}}]},"response":{"status":200,"duration_ms":567,"cached":true}}
{"request_id":"req_003","timestamp":"2024-01-15T10:00:10Z","api":{"endpoint":"/analytics/dashboard","queries":[{"operation":"getMetrics","time_range":{"start":"2024-01-15T09:00:00Z","end":"2024-01-15T10:00:00Z","granularity":"5m"},"results":{"timeseries":[{"timestamp":"2024-01-15T09:00:00Z","metrics":{"requests":1523,"errors":12,"latency":{"p50":45,"p95":234,"p99":567},"status_codes":{"2xx":1489,"4xx":22,"5xx":12}}},{"timestamp":"2024-01-15T09:05:00Z","metrics":{"requests":1687,"errors":8,"latency":{"p50":42,"p95":198,"p99":445},"status_codes":{"2xx":1665,"4xx":14,"5xx":8}}},{"timestamp":"2024-01-15T09:10:00Z","metrics":{"requests":1834,"errors":15,"latency":{"p50":48,"p95":267,"p99":623},"status_codes":{"2xx":1801,"4xx":18,"5xx":15}}}],"aggregates":{"total_requests":5044,"total_errors":35,"error_rate":0.69,"avg_latency":45,"peak_requests_per_min":367},"top_endpoints":[{"path":"/api/users","count":1234,"avg_latency":34},{"path":"/api/posts","count":987,"avg_latency":56},{"path":"/api/comments","count":654,"avg_latency":23}]}}]},"response":{"status":200,"duration_ms":1234,"cached":false}}

JWT Parsing Without Verification

The Problem

You need to inspect JWT claims for debugging but don't want to set up signature verification.

The Solution: parse_jwt()

Extract header and claims without cryptographic validation:

kelora -j examples/auth-logs.jsonl \
  --filter 'e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          e.user = jwt.claims.sub;
          e.role = jwt.claims.role;
          e.expires = jwt.claims.exp;
          e.token = ()' \
  -k timestamp,user,role,expires
timestamp='2024-01-15T10:00:00Z' user='user123' role='admin' expires=1732153600
timestamp='2024-01-15T10:05:00Z' user='user456' role='user' expires=1732157200
timestamp='2024-01-15T10:10:00Z' user='user789' role='guest' expires=1700000000
timestamp='2024-01-15T10:15:00Z' user='user111' role='moderator' expires=1732160800
{"timestamp":"2024-01-15T10:00:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwibmFtZSI6IkFsaWNlIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzMyMTUzNjAwfQ.sig1","status":200}
{"timestamp":"2024-01-15T10:05:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNDU2IiwibmFtZSI6IkJvYiIsInJvbGUiOiJ1c2VyIiwiZXhwIjoxNzMyMTU3MjAwfQ.sig2","status":200}
{"timestamp":"2024-01-15T10:10:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNzg5IiwibmFtZSI6IkNoYXJsaWUiLCJyb2xlIjoiZ3Vlc3QiLCJleHAiOjE3MDAwMDAwMDB9.sig3","status":401}
{"timestamp":"2024-01-15T10:15:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTExIiwibmFtZSI6IkRpYW5hIiwicm9sZSI6Im1vZGVyYXRvciIsImV4cCI6MTczMjE2MDgwMH0.sig4","status":200}

Security Warning: This does NOT validate signatures. Use only for debugging or parsing tokens you already trust.

Use Case: Track Token Expiration Issues

kelora -j examples/api_errors.jsonl \
  --filter 'e.status == 401 && e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          let now = 1732000000;
          e.expired = jwt.claims.exp < now;
          e.expires_in = jwt.claims.exp - now' \
  --filter 'e.expired == true' \
  -k request_id,user,expires_in
request_id='req-abc123' user='alice' expires_in=-32000000
request_id='req-def456' user='bob' expires_in=-27000000
{"timestamp":"2024-07-17T12:00:00Z","level":"INFO","endpoint":"/health","status":200}
{"timestamp":"2024-07-17T12:00:05Z","level":"ERROR","endpoint":"/api/data","status":500,"error":"database timeout"}
{"timestamp":"2024-07-17T12:00:10Z","level":"INFO","endpoint":"/api/users","status":200}
{"timestamp":"2024-07-17T12:00:12Z","level":"ERROR","endpoint":"/api/admin","status":401,"request_id":"req-abc123","user":"alice","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsInJvbGUiOiJhZG1pbiIsImV4cCI6MTcwMDAwMDAwMH0.sig1","error":"token expired"}
{"timestamp":"2024-07-17T12:00:15Z","level":"INFO","endpoint":"/api/posts","status":200}
{"timestamp":"2024-07-17T12:00:18Z","level":"ERROR","endpoint":"/api/billing","status":401,"request_id":"req-def456","user":"bob","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJib2IiLCJyb2xlIjoidXNlciIsImV4cCI6MTcwNTAwMDAwMH0.sig2","error":"token expired"}
{"timestamp":"2024-07-17T12:00:20Z","level":"ERROR","endpoint":"/api/export","status":503,"error":"service unavailable"}
{"timestamp":"2024-07-17T12:00:25Z","level":"INFO","endpoint":"/health","status":200}

Advanced String Extraction

Kelora provides powerful string manipulation beyond basic regex:

Extract Text Between Delimiters

echo '{"log":"Response: <data>secret content</data>"}' | \
  kelora -j --exec 'e.content = e.log.between("<data>", "</data>")' \
  -k content
content='secret content'

Extract Before/After Markers

echo '{"line":"2024-01-15 10:00:00 | INFO | User logged in"}' | \
  kelora -j --exec 'e.timestamp = e.line.before(" | ");
                     e.level = e.line.after(" | ").before(" | ");
                     e.message = e.line.after(" | ", -1)' \
  -k timestamp,level,message
timestamp='2024-01-15 10:00:00' level='INFO' message='User logged in'

Nth occurrence support:

  • e.text.after(" | ", 1) - after first occurrence (default)
  • e.text.after(" | ", -1) - after last occurrence
  • e.text.after(" | ", 2) - after second occurrence

Extract Multiple Items

echo '{"message":"Check https://example.com and http://test.org for more info"}' | \
  kelora -j --exec 'e.urls = e.message.extract_regexes(#"https?://[^\s]+"#)' \
  -F inspect
---
message | string   | "Check https://example.com and http://test.org for more info"
urls    | array(2) | [
  [0] | string | "https://example.com"
  [1] | string | "http://test.org"
]

Fuzzy Matching with Edit Distance

Use Case: Find Typos or Similar Errors

The edit_distance() function calculates Levenshtein distance to find errors with typos or slight variations:

kelora -j examples/error-logs.jsonl \
  --exec 'e.similarity = e.error.edit_distance("connection timeout")' \
  --filter 'e.similarity < 5' \
  -k error,similarity
error='connection timeout' similarity=0
error='connection timed out' similarity=2
error='conecttion timeout' similarity=2
error='conection timeot' similarity=2
{"timestamp":"2024-01-15T10:00:00Z","error":"connection timeout","service":"api"}
{"timestamp":"2024-01-15T10:01:00Z","error":"connection timed out","service":"web"}
{"timestamp":"2024-01-15T10:02:00Z","error":"conecttion timeout","service":"worker"}
{"timestamp":"2024-01-15T10:03:00Z","error":"network timeout","service":"api"}
{"timestamp":"2024-01-15T10:04:00Z","error":"conection timeot","service":"web"}
{"timestamp":"2024-01-15T10:05:00Z","error":"timeout on connection","service":"api"}

Use Case: Detect Configuration Drift

echo -e '{"host":"prod-web-01"}\n{"host":"prod-web-02"}\n{"host":"prd-web-01"}' | \
  kelora -j --exec 'e.distance = e.host.edit_distance("prod-web-01")' \
  --filter 'e.distance > 2' \
  -k host,distance

Hash Algorithms

The Problem

You need to hash data for checksums, deduplication, or correlation with external systems.

The Solution: Cryptographic and Non-Cryptographic Hashing

kelora -j examples/user-data.jsonl \
  --exec 'e.sha256 = e.email.hash("sha256");
          e.xxh3 = e.email.hash("xxh3");
          e.email = ()' \
  -k user_id,sha256,xxh3 -F csv
user_id,sha256,xxh3
user001,ff8d9819fc0e12bf0d24892e45987e249a28dce836a85cad60e28eaaa8c6d976,76eb895512bf35ff
user002,686b5e4cf4f963adf8f51468a48028ef8d15bd02fa335f821279a3d1678c9615,71ad17ff8e8c867a
user003,653974f7ada0b4cb371ab7c8b1aaeaf6ba2855f89b2b0a9735b664fec7fdbc89,cedd532a6ab34757
user004,80905964842ce834af09045642241f609661deefa60e5e926235b3306582725e,14d867ef05bedac8
user005,d1d8233690c21cb0eba4915374178b71cafa23599a3d1961beaf1bac2faf0b64,30fe5cfbd1ca7cae
{"user_id":"user001","email":"alice@example.com","action":"login","ip":"192.168.1.10"}
{"user_id":"user002","email":"bob@example.org","action":"purchase","ip":"10.0.5.23"}
{"user_id":"user003","email":"charlie@test.net","action":"view","ip":"172.16.88.5"}
{"user_id":"user004","email":"diana@company.com","action":"logout","ip":"192.168.1.11"}
{"user_id":"user005","email":"eve@sample.io","action":"login","ip":"10.0.5.24"}

Available algorithms:

  • sha256 - SHA-256 (default, cryptographic)
  • xxh3 - xxHash3 (non-cryptographic, extremely fast)

When to use which:

  • Use sha256 for checksums, integrity verification, or when you need cryptographic properties
  • Use xxh3 for bucketing, sampling, or deduplication where speed matters and cryptographic security isn't needed

Use Case: Privacy-Preserving Analytics

Create consistent anonymous IDs using HMAC-SHA256 with a secret key for domain-separated hashing:

KELORA_SECRET="your-secret-key" kelora -j examples/analytics.jsonl \
  --exec 'e.anon_user = pseudonym(e.email, "users");
          e.anon_session = pseudonym(e.session_id, "sessions");
          e.email = ();
          e.session_id = ()' \
  -k anon_user,anon_session,page,duration -F csv
anon_user,anon_session,page,duration
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/home,45
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/products,120
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/cart,30
kC9USgAtR_OvbKPgcs6kHAp1,jvEOhxqnt1nxVTyK0REoUPRU,/home,15
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/checkout,90
63fKdSofkibwUyAVggSVZHgd,R-lgjpv6mIcOLG0zj66CVbrS,/home,20
{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/home","duration":45}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/products","duration":120}
{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/cart","duration":30}
{"email":"charlie@test.net","session_id":"sess_i9j0k1l2","page":"/home","duration":15}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/checkout","duration":90}
{"email":"alice@example.com","session_id":"sess_m3n4o5p6","page":"/home","duration":20}

Extract JSON from Unstructured Text

The Problem

Logs contain JSON snippets embedded in plain text:

2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}

The Solution: extract_json() and extract_jsons()

Extract first JSON object:

echo '2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' | \
  kelora --exec 'e.json_str = e.line.extract_json()' \
  --filter 'e.has("json_str")' \
  --exec 'e.error_data = e.json_str' \
  -k line,error_data
line='2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}'
  error_data={"code":500,"message":"Internal error"}

Extract all JSON objects:

echo '{"log":"Found errors: {\"a\":1} and {\"b\":2} in output"}' | \
  kelora -j --exec 'e.all_jsons = e.log.extract_jsons()' \
  -F inspect
---
log       | string   | "Found errors: {"a":1} and {"b":2} in output"
all_jsons | array(2) | [
  [0] | string | "{"a":1}"
  [1] | string | "{"b":2}"
]

Parse Key-Value Pairs from Text

The Solution: absorb_kv()

Extract key=value pairs from unstructured log lines and convert them to structured fields:

kelora examples/kv_pairs.log \
  --exec 'e.absorb_kv("line")' \
  -k timestamp,action,user,ip,success -F csv
timestamp,action,user,ip,success
2024-01-15T10:00:00Z,login,alice,192.168.1.10,true
2024-01-15T10:01:00Z,view_page,bob,,
2024-01-15T10:02:00Z,api_call,charlie,,
2024-01-15T10:03:00Z,file_upload,diana,,true
2024-01-15T10:04:00Z,failed_login,eve,203.0.113.5,
2024-01-15T10:05:00Z,password_reset,frank,,
2024-01-15T10:06:00Z,logout,grace,,
2024-01-15T10:07:00Z,api_call,henry,,
2024-01-15T10:08:00Z,privilege_escalation,iris,,false
2024-01-15T10:09:00Z,delete_account,jack,,
user=alice action=login timestamp=2024-01-15T10:00:00Z success=true ip=192.168.1.10
user=bob action=view_page timestamp=2024-01-15T10:01:00Z page=/dashboard duration=1.5
user=charlie action=api_call timestamp=2024-01-15T10:02:00Z endpoint=/api/users method=GET status=200
user=diana action=file_upload timestamp=2024-01-15T10:03:00Z filename=document.pdf size=1048576 success=true
user=eve action=failed_login timestamp=2024-01-15T10:04:00Z attempts=3 locked=true ip=203.0.113.5
user=frank action=password_reset timestamp=2024-01-15T10:05:00Z email=frank@example.com token_sent=true
user=grace action=logout timestamp=2024-01-15T10:06:00Z session_duration=3600 reason=manual
user=henry action=api_call timestamp=2024-01-15T10:07:00Z endpoint=/api/export method=POST bytes=5242880
user=iris action=privilege_escalation timestamp=2024-01-15T10:08:00Z from=user to=admin success=false
user=jack action=delete_account timestamp=2024-01-15T10:09:00Z confirmed=true data_removed=true

Options

# Custom separators
kelora logs.log \
  --exec 'e.absorb_kv("line", #{sep: ";", kv_sep: ":"})'

# Keep original line
kelora logs.log \
  --exec 'e.absorb_kv("line", #{keep_source: true})'

Histogram Bucketing with track_bucket()

The Problem

You want to see the distribution of response times, not just average/max.

The Solution: Bucket Tracking

kelora -j examples/api_logs.jsonl \
  --filter 'e.has("response_time")' \
  --metrics \
  --exec 'let bucket = (e.response_time / 0.5).floor() * 0.5;
          track_bucket("response_ms", bucket)'
response_ms  = #{"0": 11, "0.5": 1, "1": 1, "1.5": 1, "2.5": 1, "5": 1}
{"timestamp":"2025-01-15T10:23:45Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-a1b2c3d4","user_id":42,"response_time":0.234,"status":200,"client_ip":"192.168.1.100","path":"/api/users","method":"GET","referer":"https://app.example.com","metadata":{"subscription":{"tier":"premium","expires":"2025-12-31"},"region":"us-east-1"}}
{"timestamp":"2025-01-15T10:24:12Z","level":"ERROR","service":"auth-service","message":"Connection timeout while validating user credentials","request_id":"req-e5f6g7h8","user_id":103,"response_time":5.123,"status":500,"client_ip":"10.0.5.23","path":"/api/auth/login","method":"POST","error":"ConnectionError: timeout after 5000ms","stack_trace":"at validateCredentials (auth.js:234)\n  at processLogin (handler.js:89)"}
{"timestamp":"2025-01-15T10:24:33Z","level":"INFO","service":"api-gateway","message":"User not found in database","request_id":"req-i9j0k1l2","response_time":0.156,"status":404,"client_ip":"172.16.88.5","path":"/api/users/99999","method":"GET"}
{"timestamp":"2025-01-15T10:25:01Z","level":"WARN","service":"payment-service","message":"Payment processing timeout - retrying","request_id":"req-m3n4o5p6","user_id":42,"response_time":2.567,"status":200,"client_ip":"192.168.1.100","path":"/api/payments","method":"POST","referer":"https://checkout.example.com"}
{"timestamp":"2025-01-15T10:25:18Z","level":"ERROR","service":"database","message":"Database connection pool exhausted","request_id":"req-q7r8s9t0","response_time":0.001,"error":"PoolExhausted: no available connections","severity":"critical"}
{"timestamp":"2025-01-15T10:25:45Z","level":"ERROR","service":"api-gateway","message":"Invalid JWT token provided","request_id":"req-u1v2w3x4","status":401,"client_ip":"198.51.100.77","path":"/api/admin","method":"GET","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzA1MzE3NjAwfQ.dGVzdC1zaWduYXR1cmU"}
{"timestamp":"2025-01-15T10:26:02Z","level":"INFO","service":"cache-service","message":"Cache miss for key user:42:profile","request_id":"req-y5z6a7b8","response_time":0.089}
{"timestamp":"2025-01-15T10:26:23Z","level":"DEBUG","service":"api-gateway","message":"Health check passed","request_id":"req-c9d0e1f2","response_time":0.003,"status":200,"path":"/health"}
{"timestamp":"2025-01-15T10:26:44Z","level":"ERROR","service":"auth-service","message":"Unauthorized access attempt detected","request_id":"req-g3h4i5j6","user_id":999,"status":403,"client_ip":"172.16.88.6","path":"/api/admin/users","method":"DELETE","source_ip":"172.16.88.6"}
{"timestamp":"2025-01-15T10:27:05Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-k7l8m9n0","user_id":42,"response_time":0.456,"status":200,"client_ip":"192.168.1.100","path":"/api/profile","method":"GET","json_payload":"{\"settings\":{\"theme\":\"dark\",\"notifications\":true}}"}
{"timestamp":"2025-01-15T10:27:26Z","level":"ERROR","service":"storage","message":"File upload failed - size limit exceeded","request_id":"req-o1p2q3r4","user_id":156,"status":413,"client_ip":"198.51.100.88","path":"/api/upload","method":"POST","error":"FileSizeError: maximum size 10MB exceeded"}
{"timestamp":"2025-01-15T10:27:47Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-s5t6u7v8","response_time":0.234,"status":200,"client_ip":"203.0.113.50","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:28:08Z","level":"WARN","service":"rate-limiter","message":"Rate limit approaching for user","request_id":"req-w9x0y1z2","user_id":42,"remaining_requests":5,"reset_time":"2025-01-15T11:00:00Z"}
{"timestamp":"2025-01-15T10:28:29Z","level":"INFO","service":"api-gateway","message":"Static content served from CDN","request_id":"req-a3b4c5d6","response_time":0.012,"status":304,"client_ip":"192.168.1.102","path":"/static/app.js"}
{"timestamp":"2025-01-15T10:28:50Z","level":"ERROR","service":"api-gateway","message":"Endpoint not found","request_id":"req-e7f8g9h0","status":404,"client_ip":"172.16.88.7","path":"/wp-admin","method":"GET"}
{"timestamp":"2025-01-15T10:29:11Z","level":"INFO","service":"analytics","message":"Report generated successfully","request_id":"req-i1j2k3l4","user_id":234,"response_time":1.789,"status":200,"client_ip":"198.51.100.99","path":"/api/analytics","method":"GET","metadata":{"report_type":"daily","date":"2025-01-15"}}
{"timestamp":"2025-01-15T10:29:32Z","level":"INFO","service":"auth-service","message":"User logged out successfully","request_id":"req-m5n6o7p8","user_id":42,"response_time":0.023,"status":200,"client_ip":"192.168.1.100","path":"/api/logout","method":"POST"}
{"timestamp":"2025-01-15T10:29:53Z","level":"INFO","service":"search-service","message":"Search query executed","request_id":"req-q9r0s1t2","user_id":178,"response_time":0.567,"status":200,"client_ip":"10.0.5.24","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:30:14Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-u3v4w5x6","response_time":0.089,"status":200,"client_ip":"203.0.113.60","path":"/sitemap.xml"}
{"timestamp":"2025-01-15T10:30:35Z","level":"INFO","service":"order-service","message":"Order query executed","request_id":"req-y7z8a9b0","user_id":789,"response_time":1.123,"status":200,"client_ip":"192.168.1.103","path":"/api/orders","method":"GET","action":"query_orders"}
{"timestamp":"2025-01-15T10:30:56Z","level":"ERROR","service":"payment-service","message":"Payment declined by provider","request_id":"req-c1d2e3f4","user_id":456,"status":402,"client_ip":"192.168.1.104","error":"PaymentDeclined: insufficient funds","severity":"high"}
{"timestamp":"2025-01-15T10:31:17Z","level":"INFO","service":"notification-service","message":"Email notification sent","request_id":"req-g5h6i7j8","user_id":42,"from":"noreply@example.com","email":"alice@example.com"}
{"timestamp":"2025-01-15T10:31:38Z","level":"ERROR","service":"api-gateway","message":"Service unavailable","request_id":"req-k9l0m1n2","status":503,"client_ip":"10.0.5.25","path":"/api/heavy-operation","error":"ServiceUnavailable: upstream timeout"}
{"timestamp":"2025-01-15T10:31:59Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-o3p4q5r6","user_id":42,"response_time":0.167,"status":200,"client_ip":"192.168.1.100","path":"/api/settings","session_id":"sess-abc123"}
{"timestamp":"2025-01-15T10:32:20Z","level":"WARN","service":"auth-service","message":"Multiple failed login attempts detected","request_id":"req-s7t8u9v0","client_ip":"198.51.100.120","attempts":5,"locked":false}

Use Case: HTTP Status Code Distribution

kelora -f combined examples/web_access.log \
  --metrics \
  --exec 'track_bucket("status", e.status / 100 * 100)'
status       = #{"200": 15, "300": 1, "400": 3, "500": 1}
192.168.1.100 - alice [15/Jan/2025:10:23:45 +0000] "GET /api/users?utm_source=email&user_id=42 HTTP/1.1" 200 1523 "https://marketing.example.com/campaign" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
10.0.5.23 - bob [15/Jan/2025:10:24:12 +0000] "POST /api/orders HTTP/1.1" 201 892 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
172.16.88.5 - - [15/Jan/2025:10:24:33 +0000] "GET /search?q=widgets&page=2 HTTP/1.1" 200 5421 "https://www.google.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)"
192.168.1.100 - alice [15/Jan/2025:10:25:01 +0000] "GET /products/42?utm_source=google&utm_campaign=spring HTTP/1.1" 200 2341 "https://www.google.com/search?q=gadgets" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.45 - - [15/Jan/2025:10:25:18 +0000] "GET /robots.txt HTTP/1.1" 200 158 "-" "GoogleBot/2.1 (+http://www.google.com/bot.html)"

Format Conversion in Pipelines

Convert Between Formats On-The-Fly

JSON to logfmt:

kelora -j examples/simple_json.jsonl \
  --exec 'print(e.to_logfmt())' -q | head -3
level=INFO message="Application started" service=api timestamp=2024-01-15T10:00:00Z version=1.2.3
config_file=/etc/app/config.yml level=DEBUG message="Loading configuration" service=api timestamp=2024-01-15T10:00:05Z
level=INFO max_connections=50 message="Connection pool initialized" service=database timestamp=2024-01-15T10:00:10Z
{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}

Logfmt to JSON:

kelora -f logfmt examples/app.log \
  --exec 'print(e.to_json())' -q | head -3
kelora: Parse errors: 8 total
  line 1: Key cannot contain spaces
  line 2: Key cannot contain spaces
  line 3: Key cannot contain spaces
  [+5 more. Use -v to see each error or --no-diagnostics to suppress this summary.]
[2025-01-15 10:00:00] INFO Application started on :8080
[2025-01-15 10:00:05] INFO Connected to database db-primary
[2025-01-15 10:00:12] WARN Slow query detected: 450ms (threshold: 200ms)

Use Case: Normalize Multi-Format Logs

Handle logs with mixed JSON and logfmt lines:

kelora examples/nightmare_mixed_formats.log \
  --exec 'if e.line.contains("{") {
    let json_str = e.line.extract_json();
    e.data = json_str
  } else if e.line.contains("=") {
    e.data = e.line.parse_kv()
  }' \
  --filter 'e.has("data")' \
  -F json | head -5
{"data":{"connections":50,"format":"json","level":"DEBUG","message":"Connection pool initialized","timestamp":"2024-01-15T10:00:01Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:01Z\",\"level\":\"DEBUG\",\"format\":\"json\",\"message\":\"Connection pool initialized\",\"connections\":50}"}
{"data":{"format":"logfmt","level":"info","msg":"\"Cache","size":"1024","timestamp":"2024-01-15T10:00:02Z"},"line":"timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg=\"Cache layer ready\" size=1024"}
{"data":{"level":"WARN","nested":{"data":{"deeply":{"buried":{"value":"hard to extract with jq"}}}},"timestamp":"2024-01-15T10:00:05Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:05Z\",\"level\":\"WARN\",\"nested\":{\"data\":{\"deeply\":{\"buried\":{\"value\":\"hard to extract with jq\"}}}}}"}
{"data":{"err":"\"connection","level":"error","max_retries":"5","retry":"3","timestamp":"2024-01-15T10:00:07Z"},"line":"timestamp=2024-01-15T10:00:07Z level=error err=\"connection timeout\" retry=3 max_retries=5"}
{"data":{"action":"batch_process","timestamp":"2024-01-15T10:00:09Z","users":[{"id":1,"name":"alice"},{"id":2,"name":"bob"}]},"line":"{\"timestamp\":\"2024-01-15T10:00:09Z\",\"users\":[{\"id\":1,\"name\":\"alice\"},{\"id\":2,\"name\":\"bob\"}],\"action\":\"batch_process\"}"}
2024-01-15 10:00:00 [INFO] Server starting
{"timestamp":"2024-01-15T10:00:01Z","level":"DEBUG","format":"json","message":"Connection pool initialized","connections":50}
timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg="Cache layer ready" size=1024
<34>Jan 15 10:00:03 appserver syslog: Authentication module loaded
web_1    | 2024-01-15 10:00:04 [INFO] HTTP server listening on port 8080

Stateful Processing with state

When to Use state

The state global map enables complex stateful processing that track_*() functions cannot handle:

  • Deduplication: Track which IDs have already been seen
  • Cross-event dependencies: Make decisions based on previous events
  • Complex objects: Store nested maps, arrays, or other structured data
  • Conditional logic: Remember arbitrary state across events
  • State machines: Track connection states, session lifecycles
  • Event correlation: Match request/response pairs, build sessions

Quick Decision Guide:

Feature state track_*()
Purpose Complex stateful logic Simple metrics & aggregations
Read access ✅ Yes (during processing) ❌ No (write-only, read in --end)
Parallel mode ❌ Sequential only ✅ Works in parallel
Storage Any Rhai value Any value (strings, numbers, etc.)
Performance Slower (RwLock) Faster (atomic/optimized)
Use for Deduplication, FSMs, correlation Counting, unique tracking, bucketing

Important: For simple counting and metrics, prefer track_count(), track_sum(), etc.—they work in both sequential and parallel modes. state only works in sequential mode.

The Problem: Deduplication

You have logs with duplicate entries for the same request ID, but you only want to process each unique request once:

{"request_id": "req-001", "status": "start"}
{"request_id": "req-002", "status": "start"}
{"request_id": "req-001", "status": "duplicate"}  ← Skip this
{"request_id": "req-003", "status": "start"}

The Solution: Track Seen IDs with state

kelora -j logs.jsonl \
  --exec 'if !state.contains(e.request_id) {
    state[e.request_id] = true;
    e.is_first = true;
  } else {
    e.is_first = false;
  }' \
  --filter 'e.is_first == true' \
  -k request_id,status

Only first occurrences pass through; duplicates are filtered out.

Use Case: Track Complex Per-User State

Store nested maps to track multiple attributes per user:

kelora -j examples/user-events.jsonl \
  --exec 'if !state.contains(e.user) {
    state[e.user] = #{login_count: 0, last_seen: (), errors: []};
  }
  let user_state = state[e.user];
  user_state.login_count += 1;
  user_state.last_seen = e.timestamp;
  if e.has("error") {
    user_state.errors.push(e.error);
  }
  state[e.user] = user_state;
  e.user_login_count = user_state.login_count' \
  -k timestamp,user,user_login_count
timestamp='2024-01-15T10:00:00Z' user='alice' user_login_count=1
timestamp='2024-01-15T10:01:00Z' user='bob' user_login_count=1
timestamp='2024-01-15T10:02:00Z' user='alice' user_login_count=2
timestamp='2024-01-15T10:03:00Z' user='alice' user_login_count=3
timestamp='2024-01-15T10:04:00Z' user='bob' user_login_count=2
timestamp='2024-01-15T10:05:00Z' user='charlie' user_login_count=1
timestamp='2024-01-15T10:06:00Z' user='alice' user_login_count=4
timestamp='2024-01-15T10:07:00Z' user='bob' user_login_count=3
timestamp='2024-01-15T10:08:00Z' user='charlie' user_login_count=2
timestamp='2024-01-15T10:09:00Z' user='alice' user_login_count=5
{"timestamp":"2024-01-15T10:00:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:01:00Z","user":"bob","event":"login"}
{"timestamp":"2024-01-15T10:02:00Z","user":"alice","event":"view_page"}
{"timestamp":"2024-01-15T10:03:00Z","user":"alice","event":"error","error":"timeout"}
{"timestamp":"2024-01-15T10:04:00Z","user":"bob","event":"purchase"}
{"timestamp":"2024-01-15T10:05:00Z","user":"charlie","event":"login"}
{"timestamp":"2024-01-15T10:06:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:07:00Z","user":"bob","event":"error","error":"payment_failed"}
{"timestamp":"2024-01-15T10:08:00Z","user":"charlie","event":"view_page"}
{"timestamp":"2024-01-15T10:09:00Z","user":"alice","event":"logout"}

Use Case: Sequential Event Numbering

Assign a global sequence number across all events:

kelora -j logs.jsonl \
  --begin 'state["count"] = 0' \
  --exec 'state["count"] += 1; e.seq = state["count"]' \
  -k seq,timestamp,message -F csv

Note: For simple counting by category, use track_count(e.category) instead.

Converting State to Regular Map

state is a special StateMap type with limited operations. To use map functions like .to_logfmt() or .to_kv(), convert it first:

kelora -j examples/simple_json.jsonl \
  --exec 'state[e.level] = (state.get(e.level) ?? 0) + 1' \
  --end 'print(state.to_map().to_logfmt())' -q
CRITICAL=1 DEBUG=4 ERROR=3 INFO=9 WARN=3
{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}
{"timestamp":"2024-01-15T10:01:00Z","level":"WARN","service":"api","message":"High memory usage detected","memory_percent":85}
{"timestamp":"2024-01-15T10:01:30Z","level":"ERROR","service":"database","message":"Query timeout","query":"SELECT * FROM users","duration_ms":5000}

Use Case: Event Correlation (Request/Response Pairs)

Match request and response events, calculating latency and emitting complete transactions:

kelora -j api-events.jsonl \
  --exec 'if e.event_type == "request" {
    state[e.request_id] = #{sent_at: e.timestamp, method: e.method};
    e = ();  # Don't emit until we see response
  } else if e.event_type == "response" && state.contains(e.request_id) {
    let req = state[e.request_id];
    e.duration_ms = (e.timestamp - req.sent_at).as_millis();
    e.method = req.method;
    state.remove(e.request_id);  # Clean up
  }' \
  -k request_id,method,duration_ms,status

Use Case: State Machines for Protocol Analysis

Track connection states through their lifecycle:

kelora -j network-events.jsonl \
  --exec 'if !state.contains(e.conn_id) {
    state[e.conn_id] = "NEW";
  }
  let current_state = state[e.conn_id];

  # State transitions
  if current_state == "NEW" && e.event == "SYN" {
    state[e.conn_id] = "SYN_SENT";
  } else if current_state == "SYN_SENT" && e.event == "SYN_ACK" {
    state[e.conn_id] = "ESTABLISHED";
  } else if current_state == "ESTABLISHED" && e.event == "FIN" {
    state[e.conn_id] = "CLOSING";
  } else if e.event != "DATA" {
    e.protocol_error = true;  # Invalid transition
  }
  e.connection_state = state[e.conn_id]' \
  --filter 'e.has("protocol_error")' \
  -k timestamp,conn_id,event,connection_state

Use Case: Session Reconstruction

Accumulate events into complete sessions, emitting only when session ends:

kelora -j user-events.jsonl \
  --exec 'if e.event == "login" {
    state[e.session_id] = #{
      user: e.user,
      events: [],
      start: e.timestamp
    };
  }
  if state.contains(e.session_id) {
    state[e.session_id].events.push(#{event: e.event, ts: e.timestamp});
  }
  if e.event == "logout" {
    let session = state[e.session_id];
    session.end = e.timestamp;
    session.event_count = session.events.len();
    print(session.to_json());
    state.remove(e.session_id);
  }
  e = ()' -q  # Suppress individual events, only emit complete sessions

Use Case: Rate Limiting - Sample First N per Key

Only emit the first 100 events per API key, then suppress the rest:

kelora -j api-logs.jsonl \
  --exec 'if !state.contains(e.api_key) {
    state[e.api_key] = 0;
  }
  state[e.api_key] += 1;
  if state[e.api_key] > 100 {
    e = ();  # Drop after first 100 per key
  }' \
  -k timestamp,api_key,endpoint

Performance and Memory Management

For large state maps (millions of keys), consider periodic cleanup:

kelora -j huge-logs.jsonl \
  --exec 'if !state.contains("counter") { state["counter"] = 0; }
  state["counter"] += 1;

  # Periodic cleanup every 100k events
  if state["counter"] % 100000 == 0 {
    eprint("State size: " + state.len() + " keys");
    if state.len() > 500000 {
      state.clear();  # Reset if too large
      eprint("State cleared");
    }
  }

  # Your stateful logic here
  if !state.contains(e.request_id) {
    state[e.request_id] = true;
  } else {
    e = ();
  }'

Parallel Mode Restriction

state requires sequential processing to maintain consistency. Using it with --parallel causes a runtime error:

# This will fail:
kelora -j logs.jsonl --parallel \
  --exec 'state["count"] += 1'
# Error: 'state' is not available in --parallel mode

For parallel-safe tracking, use track_*() functions instead.

Combining Techniques

The real power comes from combining these features. Here's a complex real-world example:

# Process deeply nested API logs with privacy controls
kelora -j api-responses.jsonl \
  --filter 'e.api_version == "v2"' \
  --exec 'emit_each(e.get_path("data.orders", []))' \
  --exec 'emit_each(e.items)' \
  --exec 'e.error_pattern = e.get("error_msg", "").normalized();
          e.user_hash = e.user_id.hash("xxh3");
          e.sample_group = e.order_id.bucket() % 10;
          e.user_id = ()' \
  --filter 'e.sample_group < 3' \
  --metrics \
  --exec 'track_count(e.error_pattern);
          track_sum("revenue", e.price * e.quantity)' \
  -k order_id,sku,quantity,price,error_pattern -F csv \
  > processed_orders.csv

This pipeline:

  1. Filters to API v2 only
  2. Fans out nested orders → items (multi-level)
  3. Normalizes error patterns
  4. Hashes user IDs for privacy
  5. Creates deterministic 30% sample
  6. Tracks error patterns and revenue
  7. Exports flat CSV

All in a single command without temporary files or custom scripts.

Performance Tips

  • Use bucket() for sampling before heavy processing - reduces work by 90% with 10% sample
  • Apply filters early - before fan-out or expensive transformations
  • Chain operations in one --exec when sharing variables (semicolon-separated)
  • Use xxh3 hash for non-cryptographic use cases (much faster than sha256)
  • Limit window size (--window N) to minimum needed for sliding calculations

Troubleshooting

"Function not found" errors:

  • Check spelling and capitalization (Rhai is case-sensitive)
  • Verify the function exists in kelora --help-functions

() (unit) value errors:

  • Guard optional fields: if e.has("field") { ... }
  • Use safe conversions: to_int_or(e.field, 0)

Pattern normalization doesn't work:

  • Check that patterns exist in input: echo "test 192.168.1.1" | kelora --exec '...'
  • Verify pattern names: normalized(["ipv4", "email"]) not ["ip", "emails"]

Hash consistency issues:

  • Same input + same algorithm = same hash (deterministic)
  • Different Kelora versions may use different hash implementations
  • Use KELORA_SECRET env var for pseudonym() to ensure domain separation

See Also