Power-User Techniques¶

Things Kelora does in one line that would otherwise need a custom script or a chain of tools. Skim the gallery, find the trick you didn't know existed, and follow the link when you want the full guide.

How to read this page

Each entry is a teaser: a problem, one command, and a link to the deep dive. Nothing here is the complete reference — that lives in the Function Reference.

Group similar errors — `normalized()`¶

"Failed to connect to 192.168.1.10" and "...10.0.5.23" are the same error. normalized() swaps variable data (IPs, emails, UUIDs, numbers) for placeholders so they collapse into one pattern.

Command/Output

echo '{"msg":"User 192.168.1.1 sent email to alice@example.com with ID a1b2c3d4-e5f6-7890-1234-567890abcdef"}' | \
  kelora -j --exec 'e.pattern = e.msg.normalized()' \
  -k pattern

pattern='User <ipv4> sent email to <email> with ID <uuid>'

→ Pair it with track_freq() to rank error patterns, or let --drain mine templates automatically. Full pattern list and options: normalized() reference.

Discover log templates automatically — `--drain`¶

No normalization rules to maintain: Drain clusters raw lines into templates.

kelora -j examples/app_monitoring.jsonl --drain -k message

Formats: --drain (table), =full (line ranges + samples), =id (stable IDs for diffs), =json (programmatic). → --drain reference.

Deterministic sampling — `bucket()`¶

--head, sample_prob(), and rand() give different rows every run. bucket() hashes a key to a stable integer, so the same request shows up in every run, every rotation, every service.

Command/Output

kelora -j examples/user-activity.jsonl \
  --filter 'e.user_id.bucket() % 20 == 0' \
  -k user_id,action,timestamp

user_id='user_v5w6x' action='checkout' timestamp='2024-01-15T10:07:00Z'

Same key → same bucket, so you can also shard a huge file into N partitions (bucket() % 4 == $i) for parallel processing. → Function Reference.

Flatten deeply nested JSON — `flattened()`¶

Turn nested API payloads into flat, bracket-keyed fields ready for CSV or SQL.

Command/Output

kelora -j examples/deeply-nested.jsonl \
  --exec 'e.flat = e.api.flattened()' \
  --exec 'print(e.flat.to_json())' -q

{"queries[0].results.users[0].id":1,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"queries[0].results.users[0].id":2,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":false,"queries[0].results.users[1].id":3,"queries[0].results.users[1].permissions.read":false,"queries[0].results.users[1].permissions.write":false}
{"queries[0].results.users[0].id":4,"queries[0].results.users[0].permissions.admin":true,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}

For arrays-within-arrays, chain emit_each() to fan out multiple levels into flat rows. → Flatten Nested JSON for Analysis.

Inspect JWT claims — `parse_jwt()`¶

Read header and claims for debugging, no signature setup. The standard time claims exp/iat/nbf come back as datetimes (expires_at, issued_at, not_before), so you can format them or compare against now() directly.

Command/Output

kelora -j examples/auth-logs.jsonl \
  --filter 'e.has("token")' \
  --exec 'let jwt = e.token.parse_jwt();
          e.user = jwt.claims.sub;
          e.role = jwt.claims.role;
          e.expires = jwt.expires_at.to_iso();
          e.token = ()' \
  -k timestamp,user,role,expires

timestamp='2024-01-15T10:00:00Z' user='user123' role='admin' expires='2024-11-21T01:46:40+00:00'
timestamp='2024-01-15T10:05:00Z' user='user456' role='user' expires='2024-11-21T02:46:40+00:00'
timestamp='2024-01-15T10:10:00Z' user='user789' role='guest' expires='2023-11-14T22:13:20+00:00'
timestamp='2024-01-15T10:15:00Z' user='user111' role='moderator' expires='2024-11-21T03:46:40+00:00'

Find expired tokens by comparing the decoded expiry against the current time:

kelora -j examples/auth-logs.jsonl \
  --filter 'e.token.parse_jwt().expires_at < now()'

To flatten the claims straight onto the event in one step (dropping the token), use absorb_jwt() — the JWT member of the absorb family:

kelora -j examples/auth-logs.jsonl --exec 'e.absorb_jwt("token")'

Warning

Does not verify signatures — debugging / trusted tokens only.

→ Function Reference.

Surgical string extraction — `between` / `before` / `after`¶

Pull fields out of semi-structured lines without writing a regex.

Command/Output

echo '{"line":"2024-01-15 10:00:00 | INFO | User logged in"}' | \
  kelora -j --exec 'e.timestamp = e.line.before(" | ");
                     e.level = e.line.after(" | ").before(" | ");
                     e.message = e.line.after(" | ", -1)' \
  -k timestamp,level,message

timestamp='2024-01-15 10:00:00' level='INFO' message='User logged in'

Nth-occurrence (after(sep, 2)), last (-1), between(), and extract_regexes() for multiple matches. → Function Reference.

Fuzzy matching — `edit_distance()`¶

Levenshtein distance finds typo'd errors or config drift (prod-web vs prd-web).

Command/Output

kelora -j examples/error-logs.jsonl \
  --exec 'e.similarity = e.error.edit_distance("connection timeout")' \
  --filter 'e.similarity < 5' \
  -k error,similarity

error='connection timeout' similarity=0
error='connection timed out' similarity=2
error='conecttion timeout' similarity=2
error='conection timeot' similarity=2

→ Function Reference.

Hashing & pseudonymization — `hash()` / `pseudonym()`¶

sha256 for integrity, xxh3 for fast bucketing, and pseudonym() for consistent anonymous IDs (HMAC with KELORA_SECRET).

Command/Output

KELORA_SECRET="your-secret-key" kelora -j examples/analytics.jsonl \
  --exec 'e.anon_user = pseudonym(e.email, "users");
          e.email = ()' \
  -k anon_user,page,duration -F csv

anon_user,page,duration
63fKdSofkibwUyAVggSVZHgd,/home,45
KU12CR0zP6NrFyh1qu_mhecX,/products,120
63fKdSofkibwUyAVggSVZHgd,/cart,30
kC9USgAtR_OvbKPgcs6kHAp1,/home,15
KU12CR0zP6NrFyh1qu_mhecX,/checkout,90
63fKdSofkibwUyAVggSVZHgd,/home,20

→ Sanitize Logs Before Sharing · Pseudonymize Identifiers.

Extract JSON & key-values from text — `extract_json()` / `absorb_kv()`¶

Lift structured data out of plain-text log lines.

Command/Output

echo '2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' | \
  kelora --exec 'e.data = e.line.extract_json()' \
  --filter 'e.has("data")' -k line,data

kelora hint: No input format detected; keeping whole lines as 'line'. For 'timestamp LEVEL message' app logs, extract fields with -f 'cols:ts(2) level *msg' (or a regex:). Mixed file? Cascade with repeated -f, e.g. -f json -f 'cols:ts(2) level *msg'. See --help-formats.
line='2024-01-15 ERROR: Failed with response: {"code":500,"message":"Internal error"}' data={"code":500,"message":"Internal error"}

extract_jsons() grabs every object; absorb_kv("line") promotes key=value pairs to fields. → Function Reference.

Histogram buckets — `track_freq()`¶

See the distribution, not just the average.

Command/Output

kelora -j examples/api_logs.jsonl \
  --filter 'e.has("response_time")' \
  --metrics \
  --exec 'let bucket = (e.response_time / 0.5).floor() * 0.5;
          track_freq("response_ms", bucket)'

response_ms 0   11
response_ms 0.5 1
response_ms 1   1
response_ms 1.5 1
response_ms 2.5 1
response_ms 5   1

→ Metrics and Tracking.

Format conversion on the fly — `to_json()` / `to_logfmt()` / cascade¶

Convert between JSON, logfmt, CSV mid-pipeline, or let cascade mode (-f json,logfmt,line) auto-detect mixed streams line by line.

Command/Output

kelora -f json,logfmt,line examples/nightmare_mixed_formats.log \
  -F json | head -5

{"line":"2024-01-15 10:00:00 [INFO] Server starting","_format":"line"}
{"timestamp":"2024-01-15T10:00:01Z","level":"DEBUG","message":"Connection pool initialized","format":"json","connections":50,"_format":"json"}
{"timestamp":"2024-01-15T10:00:02Z","level":"info","msg":"Cache layer ready","format":"logfmt","size":1024,"_format":"logfmt"}
{"line":"<34>Jan 15 10:00:03 appserver syslog: Authentication module loaded","_format":"line"}
{"line":"web_1    | 2024-01-15 10:00:04 [INFO] HTTP server listening on port 8080","_format":"line"}

→ Format Reference.

Cross-event logic — `state`¶

When track_*() isn't enough — deduplication, request/response correlation, session reconstruction, state machines — the state map remembers anything across events.

Command/Output

kelora -j examples/simple_json.jsonl \
  --exec 'state[e.level] = (state.get(e.level) ?? 0) + 1' \
  --end 'print(state.to_map().to_logfmt())' -q

CRITICAL=1 DEBUG=4 ERROR=3 INFO=9 WARN=3

Note

state is sequential-only (not available under --parallel). For simple counting prefer track_*(), which works in parallel.

→ Full recipes (dedup, correlation, FSMs, session rebuild, memory management): Cross-Event Logic with state.

Combine them¶

The payoff is composition — fan out nested orders, normalize errors, hash users, take a deterministic sample, and aggregate, in one command:

kelora -j api-responses.jsonl \
  --filter 'e.api_version == "v2"' \
  --exec 'emit_each(e.get_path("data.orders", []))' \
  --exec 'emit_each(e.items)' \
  --exec 'e.error_pattern = e.get("error_msg", "").normalized();
          e.user_hash = e.user_id.hash("xxh3");
          e.sample_group = e.order_id.bucket() % 10;
          e.user_id = ()' \
  --filter 'e.sample_group < 3' \
  --metrics \
  --exec 'track_freq("error_pattern", e.error_pattern)' \
  -k order_id,sku,quantity,error_pattern -F csv

Power-User Techniques¶

Group similar errors — normalized()¶

Discover log templates automatically — --drain¶

Deterministic sampling — bucket()¶

Flatten deeply nested JSON — flattened()¶

Inspect JWT claims — parse_jwt()¶

Surgical string extraction — between / before / after¶

Fuzzy matching — edit_distance()¶

Hashing & pseudonymization — hash() / pseudonym()¶

Extract JSON & key-values from text — extract_json() / absorb_kv()¶

Histogram buckets — track_freq()¶

Format conversion on the fly — to_json() / to_logfmt() / cascade¶

Cross-event logic — state¶

Combine them¶

See Also¶

Group similar errors — `normalized()`¶

Discover log templates automatically — `--drain`¶

Deterministic sampling — `bucket()`¶

Flatten deeply nested JSON — `flattened()`¶

Inspect JWT claims — `parse_jwt()`¶

Surgical string extraction — `between` / `before` / `after`¶

Fuzzy matching — `edit_distance()`¶

Hashing & pseudonymization — `hash()` / `pseudonym()`¶

Extract JSON & key-values from text — `extract_json()` / `absorb_kv()`¶

Histogram buckets — `track_freq()`¶

Format conversion on the fly — `to_json()` / `to_logfmt()` / cascade¶

Cross-event logic — `state`¶