Skip to content

Power-User Techniques

Things Kelora does in one line that would otherwise need a custom script or a chain of tools. Skim the gallery, find the trick you didn't know existed, and follow the link when you want the full guide.

How to read this page

Each entry is a teaser: a problem, one command, and a link to the deep dive. Nothing here is the complete reference — that lives in the Function Reference.

Group similar errors — normalized()

"Failed to connect to 192.168.1.10" and "...10.0.5.23" are the same error. normalized() swaps variable data (IPs, emails, UUIDs, numbers) for placeholders so they collapse into one pattern.

echo '{"msg":"User 192.168.1.1 sent email to alice@example.com with ID a1b2c3d4-e5f6-7890-1234-567890abcdef"}' | \
  kelora -j --exec 'e.pattern = e.msg.normalized()' \
  -k pattern
pattern='User <ipv4> sent email to <email> with ID <uuid>'

→ Pair it with track_freq() to rank error patterns, or let --drain mine templates automatically. Full pattern list and options: normalized() reference.

Discover log templates automatically — --drain

No normalization rules to maintain: Drain clusters raw lines into templates.

kelora -j examples/app_monitoring.jsonl --drain -k message

Formats: --drain (table), =full (line ranges + samples), =id (stable IDs for diffs), =json (programmatic). → --drain reference.

Deterministic sampling — bucket()

--head, sample_prob(), and rand() give different rows every run. bucket() hashes a key to a stable integer, so the same request shows up in every run, every rotation, every service.

kelora -j examples/user-activity.jsonl \
  --filter 'e.user_id.bucket() % 20 == 0' \
  -k user_id,action,timestamp
user_id='user_v5w6x' action='checkout' timestamp='2024-01-15T10:07:00Z'

Same key → same bucket, so you can also shard a huge file into N partitions (bucket() % 4 == $i) for parallel processing. → Function Reference.

Flatten deeply nested JSON — flattened()

Turn nested API payloads into flat, bracket-keyed fields ready for CSV or SQL.

kelora -j examples/deeply-nested.jsonl \
  --exec 'e.flat = e.api.flattened()' \
  --exec 'print(e.flat.to_json())' -q
{"queries[0].results.users[0].id":1,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"queries[0].results.users[0].id":2,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":false,"queries[0].results.users[1].id":3,"queries[0].results.users[1].permissions.read":false,"queries[0].results.users[1].permissions.write":false}
{"queries[0].results.users[0].id":4,"queries[0].results.users[0].permissions.admin":true,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}

For arrays-within-arrays, chain emit_each() to fan out multiple levels into flat rows. → Flatten Nested JSON for Analysis.

Inspect JWT claims — parse_jwt()

Read header and claims for debugging, no signature setup. The standard time claims exp/iat/nbf come back as datetimes (expires_at, issued_at, not_before), so you can format them or compare against now() directly.

kelora -j examples/auth-logs.jsonl \
  --filter 'e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          e.user = jwt.claims.sub;
          e.role = jwt.claims.role;
          e.expires = jwt.expires_at.to_iso();
          e.token = ()' \
  -k timestamp,user,role,expires
timestamp='2024-01-15T10:00:00Z' user='user123' role='admin' expires='2024-11-21T01:46:40+00:00'
timestamp='2024-01-15T10:05:00Z' user='user456' role='user' expires='2024-11-21T02:46:40+00:00'
timestamp='2024-01-15T10:10:00Z' user='user789' role='guest' expires='2023-11-14T22:13:20+00:00'
timestamp='2024-01-15T10:15:00Z' user='user111' role='moderator' expires='2024-11-21T03:46:40+00:00'

Find expired tokens by comparing the decoded expiry against the current time:

kelora -j examples/auth-logs.jsonl \
  --filter 'e.token.parse_jwt().expires_at < now()'

To flatten the claims straight onto the event in one step (dropping the token), use absorb_jwt() — the JWT member of the absorb family:

kelora -j examples/auth-logs.jsonl --exec 'e.absorb_jwt("token")'

Warning

Does not verify signatures — debugging / trusted tokens only.

Function Reference.

Surgical string extraction — between / before / after

Pull fields out of semi-structured lines without writing a regex.

echo '{"line":"2024-01-15 10:00:00 | INFO | User logged in"}' | \
  kelora -j --exec 'e.timestamp = e.line.before(" | ");
                     e.level = e.line.after(" | ").before(" | ");
                     e.message = e.line.after(" | ", -1)' \
  -k timestamp,level,message
timestamp='2024-01-15 10:00:00' level='INFO' message='User logged in'

Nth-occurrence (after(sep, 2)), last (-1), between(), and extract_regexes() for multiple matches. → Function Reference.

Fuzzy matching — edit_distance()

Levenshtein distance finds typo'd errors or config drift (prod-web vs prd-web).

kelora -j examples/error-logs.jsonl \
  --exec 'e.similarity = e.error.edit_distance("connection timeout")' \
  --filter 'e.similarity < 5' \
  -k error,similarity
error='connection timeout' similarity=0
error='connection timed out' similarity=2
error='conecttion timeout' similarity=2
error='conection timeot' similarity=2

Function Reference.

Hashing & pseudonymization — hash() / pseudonym()

sha256 for integrity, xxh3 for fast bucketing, and pseudonym() for consistent anonymous IDs (HMAC with KELORA_SECRET).

KELORA_SECRET="your-secret-key" kelora -j examples/analytics.jsonl \
  --exec 'e.anon_user = pseudonym(e.email, "users");
          e.email = ()' \
  -k anon_user,page,duration -F csv
anon_user,page,duration
63fKdSofkibwUyAVggSVZHgd,/home,45
KU12CR0zP6NrFyh1qu_mhecX,/products,120
63fKdSofkibwUyAVggSVZHgd,/cart,30
kC9USgAtR_OvbKPgcs6kHAp1,/home,15
KU12CR0zP6NrFyh1qu_mhecX,/checkout,90
63fKdSofkibwUyAVggSVZHgd,/home,20

Sanitize Logs Before Sharing · Pseudonymize Identifiers.

Extract JSON & key-values from text — extract_json() / absorb_kv()

Lift structured data out of plain-text log lines.

echo '2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' | \
  kelora --exec 'e.data = e.line.extract_json()' \
  --filter 'e.has("data")' -k line,data
kelora hint: No input format detected; keeping whole lines as 'line'. For 'timestamp LEVEL message' app logs, extract fields with -f 'cols:ts(2) level *msg' (or a regex:). Mixed file? Cascade with repeated -f, e.g. -f json -f 'cols:ts(2) level *msg'. See --help-formats.
line='2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' data={"code":500,"message":"Internal error"}

extract_jsons() grabs every object; absorb_kv("line") promotes key=value pairs to fields. → Function Reference.

Histogram buckets — track_freq()

See the distribution, not just the average.

kelora -j examples/api_logs.jsonl \
  --filter 'e.has("response_time")' \
  --metrics \
  --exec 'let bucket = (e.response_time / 0.5).floor() * 0.5;
          track_freq("response_ms", bucket)'
response_ms 0   11
response_ms 0.5 1
response_ms 1   1
response_ms 1.5 1
response_ms 2.5 1
response_ms 5   1

Metrics and Tracking.

Format conversion on the fly — to_json() / to_logfmt() / cascade

Convert between JSON, logfmt, CSV mid-pipeline, or let cascade mode (-f json,logfmt,line) auto-detect mixed streams line by line.

kelora -f json,logfmt,line examples/nightmare_mixed_formats.log \
  -F json | head -5
{"line":"2024-01-15 10:00:00 [INFO] Server starting","_format":"line"}
{"timestamp":"2024-01-15T10:00:01Z","level":"DEBUG","message":"Connection pool initialized","format":"json","connections":50,"_format":"json"}
{"timestamp":"2024-01-15T10:00:02Z","level":"info","msg":"Cache layer ready","format":"logfmt","size":1024,"_format":"logfmt"}
{"line":"<34>Jan 15 10:00:03 appserver syslog: Authentication module loaded","_format":"line"}
{"line":"web_1    | 2024-01-15 10:00:04 [INFO] HTTP server listening on port 8080","_format":"line"}

Format Reference.

Cross-event logic — state

When track_*() isn't enough — deduplication, request/response correlation, session reconstruction, state machines — the state map remembers anything across events.

kelora -j examples/simple_json.jsonl \
  --exec 'state[e.level] = (state.get(e.level) ?? 0) + 1' \
  --end 'print(state.to_map().to_logfmt())' -q
CRITICAL=1 DEBUG=4 ERROR=3 INFO=9 WARN=3

Note

state is sequential-only (not available under --parallel). For simple counting prefer track_*(), which works in parallel.

→ Full recipes (dedup, correlation, FSMs, session rebuild, memory management): Cross-Event Logic with state.

Combine them

The payoff is composition — fan out nested orders, normalize errors, hash users, take a deterministic sample, and aggregate, in one command:

kelora -j api-responses.jsonl \
  --filter 'e.api_version == "v2"' \
  --exec 'emit_each(e.get_path("data.orders", []))' \
  --exec 'emit_each(e.items)' \
  --exec 'e.error_pattern = e.get("error_msg", "").normalized();
          e.user_hash = e.user_id.hash("xxh3");
          e.sample_group = e.order_id.bucket() % 10;
          e.user_id = ()' \
  --filter 'e.sample_group < 3' \
  --metrics \
  --exec 'track_freq("error_pattern", e.error_pattern)' \
  -k order_id,sku,quantity,error_pattern -F csv

See Also