Power-User Techniques¶
Kelora includes powerful features that solve complex log analysis problems with minimal code. These techniques often go undiscovered but can dramatically simplify workflows that would otherwise require custom scripts or multiple tools.
When to Use These Techniques¶
- You're dealing with deeply nested JSON from APIs or microservices
- You need to group similar errors that differ only in variable data
- You want deterministic sampling for consistent analysis across log rotations
- You're extracting structured data from unstructured text logs
- You need privacy-preserving analytics with consistent hashing
- You're working with JWTs, URLs, or other complex embedded formats
Pattern Normalization¶
The Problem¶
Error messages and log lines often contain variable data (IPs, emails, UUIDs, numbers) that make grouping difficult:
"Failed to connect to 192.168.1.10"
"Failed to connect to 10.0.5.23"
"Failed to connect to 172.16.88.5"
These are the same error pattern but appear as three different messages.
The Solution: normalized()¶
The normalized() function automatically detects and replaces common patterns with placeholders:
Real-World Use Case: Error Grouping¶
Group errors by pattern rather than exact message to see that many different error messages are actually the same pattern repeated with different IPs/UUIDs:
{"message":"Failed to connect to 192.168.1.10","service":"api","level":"ERROR"}
{"message":"Failed to connect to 10.0.5.23","service":"web","level":"ERROR"}
{"message":"Failed to connect to 172.16.88.5","service":"worker","level":"ERROR"}
{"message":"User alice@example.com sent invalid request","service":"api","level":"WARN"}
{"message":"User bob@test.org sent invalid request","service":"web","level":"WARN"}
{"message":"Timeout on request a1b2c3d4-e5f6-7890-1234-567890abcdef","service":"api","level":"ERROR"}
{"message":"Timeout on request f1e2d3c4-b5a6-9807-5432-098765fedcba","service":"worker","level":"ERROR"}
{"message":"Failed to connect to 203.0.113.42","service":"api","level":"ERROR"}
{"message":"User charlie@example.net sent invalid request","service":"api","level":"WARN"}
{"message":"Timeout on request 11111111-2222-3333-4444-555555555555","service":"web","level":"ERROR"}
Supported Patterns¶
By default, normalized() replaces:
- IPv4 addresses →
<ipv4> - IPv6 addresses →
<ipv6> - Email addresses →
<email> - UUIDs →
<uuid> - URLs →
<url> - Numbers →
<num>
Specify specific patterns if you only want certain replacements:
# Only normalize IPs and emails
kelora -j logs.jsonl \
--exec 'e.pattern = e.message.normalized(["ipv4", "email"])'
Deterministic Sampling with bucket()¶
The Problem¶
Random sampling (--head N or random() < 0.1) gives different results each run, making it impossible to track specific requests across multiple log files or rotations.
The Solution: Hash-Based Sampling¶
The bucket() function returns a consistent integer hash for any string, enabling deterministic sampling.
The same request_id always hashes to the same number, so you'll get consistent sampling across multiple log files, log rotations, different days, and distributed systems.
{"user_id":"user_a1b2c","action":"login","timestamp":"2024-01-15T10:00:00Z"}
{"user_id":"user_d3e4f","action":"view_page","timestamp":"2024-01-15T10:01:00Z"}
{"user_id":"user_g5h6i","action":"purchase","timestamp":"2024-01-15T10:02:00Z"}
{"user_id":"user_j7k8l","action":"logout","timestamp":"2024-01-15T10:03:00Z"}
{"user_id":"user_m9n0o","action":"login","timestamp":"2024-01-15T10:04:00Z"}
{"user_id":"user_p1q2r","action":"view_page","timestamp":"2024-01-15T10:05:00Z"}
{"user_id":"user_s3t4u","action":"add_to_cart","timestamp":"2024-01-15T10:06:00Z"}
{"user_id":"user_v5w6x","action":"checkout","timestamp":"2024-01-15T10:07:00Z"}
{"user_id":"user_y7z8a","action":"login","timestamp":"2024-01-15T10:08:00Z"}
{"user_id":"user_b9c0d","action":"search","timestamp":"2024-01-15T10:09:00Z"}
{"user_id":"user_e1f2g","action":"view_page","timestamp":"2024-01-15T10:10:00Z"}
{"user_id":"user_h3i4j","action":"logout","timestamp":"2024-01-15T10:11:00Z"}
{"user_id":"user_k5l6m","action":"login","timestamp":"2024-01-15T10:12:00Z"}
{"user_id":"user_n7o8p","action":"purchase","timestamp":"2024-01-15T10:13:00Z"}
{"user_id":"user_q9r0s","action":"view_page","timestamp":"2024-01-15T10:14:00Z"}
{"user_id":"user_t1u2v","action":"logout","timestamp":"2024-01-15T10:15:00Z"}
{"user_id":"user_w3x4y","action":"login","timestamp":"2024-01-15T10:16:00Z"}
{"user_id":"user_z5a6b","action":"search","timestamp":"2024-01-15T10:17:00Z"}
{"user_id":"user_c7d8e","action":"add_to_cart","timestamp":"2024-01-15T10:18:00Z"}
{"user_id":"user_f9g0h","action":"purchase","timestamp":"2024-01-15T10:19:00Z"}
This always returns the same 5% of users - run it multiple times and you'll get identical results.
Partition logs for parallel processing:
# Process logs in 4 partitions
for i in {0..3}; do
kelora -j huge.jsonl \
--filter "e.request_id.bucket() % 4 == $i" \
> partition_$i.log &
done
wait
Debug specific sessions across microservices:
# All logs for session IDs ending in 0-2 (30% sample)
kelora -j service-*.jsonl \
--filter 'e.session_id.bucket() % 10 < 3'
Deep Structure Flattening¶
The Problem¶
APIs return deeply nested JSON that's hard to query or export to flat formats (CSV, SQL):
{
"api": {
"queries": [
{
"results": {
"users": [
{"id": 1, "permissions": {"read": true, "write": true}}
]
}
}
]
}
}
The Solution: flattened()¶
The flattened() function creates a flat map with bracket-notation keys:
kelora -j examples/deeply-nested.jsonl \
--exec 'e.flat = e.api.flattened()' \
--exec 'print(e.flat.to_json())' -q
{"queries[0].results.users[0].id":1,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"queries[0].results.users[0].id":2,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":false,"queries[0].results.users[1].id":3,"queries[0].results.users[1].permissions.read":false,"queries[0].results.users[1].permissions.write":false}
{"queries[0].results.users[0].id":4,"queries[0].results.users[0].permissions.admin":true,"queries[0].results.users[0].permissions.read":true,"queries[0].results.users[0].permissions.write":true}
{"api":{"queries":[{"results":{"users":[{"id":1,"permissions":{"read":true,"write":true}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":2,"permissions":{"read":true,"write":false}},{"id":3,"permissions":{"read":false,"write":false}}]}}]}}
{"api":{"queries":[{"results":{"users":[{"id":4,"permissions":{"read":true,"write":true,"admin":true}}]}}]}}
Advanced: Multi-Level Fan-Out¶
For extremely nested data, combine flattened() with emit_each() to chain multiple levels of nesting into flat records:
{"request_id":"req_001","timestamp":"2024-01-15T10:00:00Z","api":{"endpoint":"/graphql","queries":[{"operation":"getUsers","filters":{"status":"active","role":{"in":["admin","moderator"]}},"results":{"users":[{"id":1,"name":"alice","permissions":{"read":true,"write":true,"delete":false},"last_login":"2024-01-14T15:30:00Z"},{"id":2,"name":"bob","permissions":{"read":true,"write":false,"delete":false},"last_login":"2024-01-13T09:15:00Z"}],"total":2,"page":1}},{"operation":"getPosts","filters":{"published":true,"tags":["tech","security"]},"results":{"posts":[{"id":101,"title":"Security Best Practices","author_id":1,"tags":["security","authentication"],"metrics":{"views":1523,"likes":89,"comments":[{"user_id":3,"text":"Great post!","sentiment":"positive"},{"user_id":4,"text":"Needs more examples","sentiment":"neutral"}]}},{"id":102,"title":"Tech Trends 2024","author_id":2,"tags":["tech","future"],"metrics":{"views":2341,"likes":156,"comments":[{"user_id":5,"text":"Very insightful","sentiment":"positive"}]}}],"total":2}}]},"response":{"status":200,"duration_ms":245,"cached":false}}
{"request_id":"req_002","timestamp":"2024-01-15T10:00:05Z","api":{"endpoint":"/rest/v2/orders","queries":[{"operation":"listOrders","filters":{"customer":{"region":"us-west","tier":"premium"},"date_range":{"start":"2024-01-01","end":"2024-01-15"}},"results":{"orders":[{"order_id":"ord_501","customer":{"id":1001,"name":"Acme Corp","contacts":[{"type":"primary","email":"orders@acme.com"},{"type":"billing","email":"billing@acme.com"}]},"items":[{"sku":"PROD-A","quantity":50,"unit_price":99.99,"discounts":[{"type":"volume","percent":10},{"type":"loyalty","percent":5}],"final_price":4274.79},{"sku":"PROD-B","quantity":25,"unit_price":149.99,"discounts":[{"type":"volume","percent":10}],"final_price":3374.78}],"totals":{"subtotal":7649.57,"tax":612.36,"shipping":25.00,"grand_total":8286.93},"fulfillment":{"warehouse":"WH-001","status":"shipped","tracking":"TRK12345","estimated_delivery":"2024-01-18"}},{"order_id":"ord_502","customer":{"id":1002,"name":"TechStart Inc","contacts":[{"type":"primary","email":"team@techstart.io"}]},"items":[{"sku":"PROD-C","quantity":100,"unit_price":49.99,"discounts":[],"final_price":4999.00}],"totals":{"subtotal":4999.00,"tax":399.92,"shipping":0.00,"grand_total":5398.92},"fulfillment":{"warehouse":"WH-002","status":"processing","tracking":null,"estimated_delivery":"2024-01-20"}}],"summary":{"total_orders":2,"total_revenue":13685.85,"avg_order_value":6842.93}}}]},"response":{"status":200,"duration_ms":567,"cached":true}}
{"request_id":"req_003","timestamp":"2024-01-15T10:00:10Z","api":{"endpoint":"/analytics/dashboard","queries":[{"operation":"getMetrics","time_range":{"start":"2024-01-15T09:00:00Z","end":"2024-01-15T10:00:00Z","granularity":"5m"},"results":{"timeseries":[{"timestamp":"2024-01-15T09:00:00Z","metrics":{"requests":1523,"errors":12,"latency":{"p50":45,"p95":234,"p99":567},"status_codes":{"2xx":1489,"4xx":22,"5xx":12}}},{"timestamp":"2024-01-15T09:05:00Z","metrics":{"requests":1687,"errors":8,"latency":{"p50":42,"p95":198,"p99":445},"status_codes":{"2xx":1665,"4xx":14,"5xx":8}}},{"timestamp":"2024-01-15T09:10:00Z","metrics":{"requests":1834,"errors":15,"latency":{"p50":48,"p95":267,"p99":623},"status_codes":{"2xx":1801,"4xx":18,"5xx":15}}}],"aggregates":{"total_requests":5044,"total_errors":35,"error_rate":0.69,"avg_latency":45,"peak_requests_per_min":367},"top_endpoints":[{"path":"/api/users","count":1234,"avg_latency":34},{"path":"/api/posts","count":987,"avg_latency":56},{"path":"/api/comments","count":654,"avg_latency":23}]}}]},"response":{"status":200,"duration_ms":1234,"cached":false}}
JWT Parsing Without Verification¶
The Problem¶
You need to inspect JWT claims for debugging but don't want to set up signature verification.
The Solution: parse_jwt()¶
Extract header and claims without cryptographic validation:
kelora -j examples/auth-logs.jsonl \
--filter 'e.has("token")' \
--exec 'let jwt = e.token.parse_jwt();
e.user = jwt.claims.sub;
e.role = jwt.claims.role;
e.expires = jwt.claims.exp;
e.token = ()' \
-k timestamp,user,role,expires
timestamp='2024-01-15T10:00:00Z' user='user123' role='admin' expires=1732153600
timestamp='2024-01-15T10:05:00Z' user='user456' role='user' expires=1732157200
timestamp='2024-01-15T10:10:00Z' user='user789' role='guest' expires=1700000000
timestamp='2024-01-15T10:15:00Z' user='user111' role='moderator' expires=1732160800
{"timestamp":"2024-01-15T10:00:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwibmFtZSI6IkFsaWNlIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzMyMTUzNjAwfQ.sig1","status":200}
{"timestamp":"2024-01-15T10:05:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNDU2IiwibmFtZSI6IkJvYiIsInJvbGUiOiJ1c2VyIiwiZXhwIjoxNzMyMTU3MjAwfQ.sig2","status":200}
{"timestamp":"2024-01-15T10:10:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyNzg5IiwibmFtZSI6IkNoYXJsaWUiLCJyb2xlIjoiZ3Vlc3QiLCJleHAiOjE3MDAwMDAwMDB9.sig3","status":401}
{"timestamp":"2024-01-15T10:15:00Z","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTExIiwibmFtZSI6IkRpYW5hIiwicm9sZSI6Im1vZGVyYXRvciIsImV4cCI6MTczMjE2MDgwMH0.sig4","status":200}
Security Warning: This does NOT validate signatures. Use only for debugging or parsing tokens you already trust.
Use Case: Track Token Expiration Issues¶
{"timestamp":"2024-07-17T12:00:00Z","level":"INFO","endpoint":"/health","status":200}
{"timestamp":"2024-07-17T12:00:05Z","level":"ERROR","endpoint":"/api/data","status":500,"error":"database timeout"}
{"timestamp":"2024-07-17T12:00:10Z","level":"INFO","endpoint":"/api/users","status":200}
{"timestamp":"2024-07-17T12:00:12Z","level":"ERROR","endpoint":"/api/admin","status":401,"request_id":"req-abc123","user":"alice","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhbGljZSIsInJvbGUiOiJhZG1pbiIsImV4cCI6MTcwMDAwMDAwMH0.sig1","error":"token expired"}
{"timestamp":"2024-07-17T12:00:15Z","level":"INFO","endpoint":"/api/posts","status":200}
{"timestamp":"2024-07-17T12:00:18Z","level":"ERROR","endpoint":"/api/billing","status":401,"request_id":"req-def456","user":"bob","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJib2IiLCJyb2xlIjoidXNlciIsImV4cCI6MTcwNTAwMDAwMH0.sig2","error":"token expired"}
{"timestamp":"2024-07-17T12:00:20Z","level":"ERROR","endpoint":"/api/export","status":503,"error":"service unavailable"}
{"timestamp":"2024-07-17T12:00:25Z","level":"INFO","endpoint":"/health","status":200}
Advanced String Extraction¶
Kelora provides powerful string manipulation beyond basic regex:
Extract Text Between Delimiters¶
Extract Before/After Markers¶
Nth occurrence support:
e.text.after(" | ", 1)- after first occurrence (default)e.text.after(" | ", -1)- after last occurrencee.text.after(" | ", 2)- after second occurrence
Extract Multiple Items¶
Fuzzy Matching with Edit Distance¶
Use Case: Find Typos or Similar Errors¶
The edit_distance() function calculates Levenshtein distance to find errors with typos or slight variations:
{"timestamp":"2024-01-15T10:00:00Z","error":"connection timeout","service":"api"}
{"timestamp":"2024-01-15T10:01:00Z","error":"connection timed out","service":"web"}
{"timestamp":"2024-01-15T10:02:00Z","error":"conecttion timeout","service":"worker"}
{"timestamp":"2024-01-15T10:03:00Z","error":"network timeout","service":"api"}
{"timestamp":"2024-01-15T10:04:00Z","error":"conection timeot","service":"web"}
{"timestamp":"2024-01-15T10:05:00Z","error":"timeout on connection","service":"api"}
Use Case: Detect Configuration Drift¶
Hash Algorithms¶
The Problem¶
You need to hash data for checksums, deduplication, or correlation with external systems.
The Solution: Cryptographic and Non-Cryptographic Hashing¶
kelora -j examples/user-data.jsonl \
--exec 'e.sha256 = e.email.hash("sha256");
e.xxh3 = e.email.hash("xxh3");
e.email = ()' \
-k user_id,sha256,xxh3 -F csv
user_id,sha256,xxh3
user001,ff8d9819fc0e12bf0d24892e45987e249a28dce836a85cad60e28eaaa8c6d976,76eb895512bf35ff
user002,686b5e4cf4f963adf8f51468a48028ef8d15bd02fa335f821279a3d1678c9615,71ad17ff8e8c867a
user003,653974f7ada0b4cb371ab7c8b1aaeaf6ba2855f89b2b0a9735b664fec7fdbc89,cedd532a6ab34757
user004,80905964842ce834af09045642241f609661deefa60e5e926235b3306582725e,14d867ef05bedac8
user005,d1d8233690c21cb0eba4915374178b71cafa23599a3d1961beaf1bac2faf0b64,30fe5cfbd1ca7cae
{"user_id":"user001","email":"alice@example.com","action":"login","ip":"192.168.1.10"}
{"user_id":"user002","email":"bob@example.org","action":"purchase","ip":"10.0.5.23"}
{"user_id":"user003","email":"charlie@test.net","action":"view","ip":"172.16.88.5"}
{"user_id":"user004","email":"diana@company.com","action":"logout","ip":"192.168.1.11"}
{"user_id":"user005","email":"eve@sample.io","action":"login","ip":"10.0.5.24"}
Available algorithms:
sha256- SHA-256 (default, cryptographic)xxh3- xxHash3 (non-cryptographic, extremely fast)
When to use which:
- Use
sha256for checksums, integrity verification, or when you need cryptographic properties - Use
xxh3for bucketing, sampling, or deduplication where speed matters and cryptographic security isn't needed
Use Case: Privacy-Preserving Analytics¶
Create consistent anonymous IDs using HMAC-SHA256 with a secret key for domain-separated hashing:
KELORA_SECRET="your-secret-key" kelora -j examples/analytics.jsonl \
--exec 'e.anon_user = pseudonym(e.email, "users");
e.anon_session = pseudonym(e.session_id, "sessions");
e.email = ();
e.session_id = ()' \
-k anon_user,anon_session,page,duration -F csv
anon_user,anon_session,page,duration
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/home,45
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/products,120
63fKdSofkibwUyAVggSVZHgd,Kb08bpqW9g_k0jOCHsnZjTpx,/cart,30
kC9USgAtR_OvbKPgcs6kHAp1,jvEOhxqnt1nxVTyK0REoUPRU,/home,15
KU12CR0zP6NrFyh1qu_mhecX,21fD9S_Mu5xWb43ciUQfQnYq,/checkout,90
63fKdSofkibwUyAVggSVZHgd,R-lgjpv6mIcOLG0zj66CVbrS,/home,20
{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/home","duration":45}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/products","duration":120}
{"email":"alice@example.com","session_id":"sess_a1b2c3d4","page":"/cart","duration":30}
{"email":"charlie@test.net","session_id":"sess_i9j0k1l2","page":"/home","duration":15}
{"email":"bob@example.org","session_id":"sess_e5f6g7h8","page":"/checkout","duration":90}
{"email":"alice@example.com","session_id":"sess_m3n4o5p6","page":"/home","duration":20}
Extract JSON from Unstructured Text¶
The Problem¶
Logs contain JSON snippets embedded in plain text:
The Solution: extract_json() and extract_jsons()¶
Extract first JSON object:
Extract all JSON objects:
Parse Key-Value Pairs from Text¶
The Solution: absorb_kv()¶
Extract key=value pairs from unstructured log lines and convert them to structured fields:
kelora examples/kv_pairs.log \
--exec 'e.absorb_kv("line")' \
-k timestamp,action,user,ip,success -F csv
timestamp,action,user,ip,success
2024-01-15T10:00:00Z,login,alice,192.168.1.10,true
2024-01-15T10:01:00Z,view_page,bob,,
2024-01-15T10:02:00Z,api_call,charlie,,
2024-01-15T10:03:00Z,file_upload,diana,,true
2024-01-15T10:04:00Z,failed_login,eve,203.0.113.5,
2024-01-15T10:05:00Z,password_reset,frank,,
2024-01-15T10:06:00Z,logout,grace,,
2024-01-15T10:07:00Z,api_call,henry,,
2024-01-15T10:08:00Z,privilege_escalation,iris,,false
2024-01-15T10:09:00Z,delete_account,jack,,
user=alice action=login timestamp=2024-01-15T10:00:00Z success=true ip=192.168.1.10
user=bob action=view_page timestamp=2024-01-15T10:01:00Z page=/dashboard duration=1.5
user=charlie action=api_call timestamp=2024-01-15T10:02:00Z endpoint=/api/users method=GET status=200
user=diana action=file_upload timestamp=2024-01-15T10:03:00Z filename=document.pdf size=1048576 success=true
user=eve action=failed_login timestamp=2024-01-15T10:04:00Z attempts=3 locked=true ip=203.0.113.5
user=frank action=password_reset timestamp=2024-01-15T10:05:00Z email=frank@example.com token_sent=true
user=grace action=logout timestamp=2024-01-15T10:06:00Z session_duration=3600 reason=manual
user=henry action=api_call timestamp=2024-01-15T10:07:00Z endpoint=/api/export method=POST bytes=5242880
user=iris action=privilege_escalation timestamp=2024-01-15T10:08:00Z from=user to=admin success=false
user=jack action=delete_account timestamp=2024-01-15T10:09:00Z confirmed=true data_removed=true
Options¶
# Custom separators
kelora logs.log \
--exec 'e.absorb_kv("line", #{sep: ";", kv_sep: ":"})'
# Keep original line
kelora logs.log \
--exec 'e.absorb_kv("line", #{keep_source: true})'
Histogram Bucketing with track_bucket()¶
The Problem¶
You want to see the distribution of response times, not just average/max.
The Solution: Bucket Tracking¶
{"timestamp":"2025-01-15T10:23:45Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-a1b2c3d4","user_id":42,"response_time":0.234,"status":200,"client_ip":"192.168.1.100","path":"/api/users","method":"GET","referer":"https://app.example.com","metadata":{"subscription":{"tier":"premium","expires":"2025-12-31"},"region":"us-east-1"}}
{"timestamp":"2025-01-15T10:24:12Z","level":"ERROR","service":"auth-service","message":"Connection timeout while validating user credentials","request_id":"req-e5f6g7h8","user_id":103,"response_time":5.123,"status":500,"client_ip":"10.0.5.23","path":"/api/auth/login","method":"POST","error":"ConnectionError: timeout after 5000ms","stack_trace":"at validateCredentials (auth.js:234)\n at processLogin (handler.js:89)"}
{"timestamp":"2025-01-15T10:24:33Z","level":"INFO","service":"api-gateway","message":"User not found in database","request_id":"req-i9j0k1l2","response_time":0.156,"status":404,"client_ip":"172.16.88.5","path":"/api/users/99999","method":"GET"}
{"timestamp":"2025-01-15T10:25:01Z","level":"WARN","service":"payment-service","message":"Payment processing timeout - retrying","request_id":"req-m3n4o5p6","user_id":42,"response_time":2.567,"status":200,"client_ip":"192.168.1.100","path":"/api/payments","method":"POST","referer":"https://checkout.example.com"}
{"timestamp":"2025-01-15T10:25:18Z","level":"ERROR","service":"database","message":"Database connection pool exhausted","request_id":"req-q7r8s9t0","response_time":0.001,"error":"PoolExhausted: no available connections","severity":"critical"}
{"timestamp":"2025-01-15T10:25:45Z","level":"ERROR","service":"api-gateway","message":"Invalid JWT token provided","request_id":"req-u1v2w3x4","status":401,"client_ip":"198.51.100.77","path":"/api/admin","method":"GET","token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyMTIzIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNzA1MzE3NjAwfQ.dGVzdC1zaWduYXR1cmU"}
{"timestamp":"2025-01-15T10:26:02Z","level":"INFO","service":"cache-service","message":"Cache miss for key user:42:profile","request_id":"req-y5z6a7b8","response_time":0.089}
{"timestamp":"2025-01-15T10:26:23Z","level":"DEBUG","service":"api-gateway","message":"Health check passed","request_id":"req-c9d0e1f2","response_time":0.003,"status":200,"path":"/health"}
{"timestamp":"2025-01-15T10:26:44Z","level":"ERROR","service":"auth-service","message":"Unauthorized access attempt detected","request_id":"req-g3h4i5j6","user_id":999,"status":403,"client_ip":"172.16.88.6","path":"/api/admin/users","method":"DELETE","source_ip":"172.16.88.6"}
{"timestamp":"2025-01-15T10:27:05Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-k7l8m9n0","user_id":42,"response_time":0.456,"status":200,"client_ip":"192.168.1.100","path":"/api/profile","method":"GET","json_payload":"{\"settings\":{\"theme\":\"dark\",\"notifications\":true}}"}
{"timestamp":"2025-01-15T10:27:26Z","level":"ERROR","service":"storage","message":"File upload failed - size limit exceeded","request_id":"req-o1p2q3r4","user_id":156,"status":413,"client_ip":"198.51.100.88","path":"/api/upload","method":"POST","error":"FileSizeError: maximum size 10MB exceeded"}
{"timestamp":"2025-01-15T10:27:47Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-s5t6u7v8","response_time":0.234,"status":200,"client_ip":"203.0.113.50","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:28:08Z","level":"WARN","service":"rate-limiter","message":"Rate limit approaching for user","request_id":"req-w9x0y1z2","user_id":42,"remaining_requests":5,"reset_time":"2025-01-15T11:00:00Z"}
{"timestamp":"2025-01-15T10:28:29Z","level":"INFO","service":"api-gateway","message":"Static content served from CDN","request_id":"req-a3b4c5d6","response_time":0.012,"status":304,"client_ip":"192.168.1.102","path":"/static/app.js"}
{"timestamp":"2025-01-15T10:28:50Z","level":"ERROR","service":"api-gateway","message":"Endpoint not found","request_id":"req-e7f8g9h0","status":404,"client_ip":"172.16.88.7","path":"/wp-admin","method":"GET"}
{"timestamp":"2025-01-15T10:29:11Z","level":"INFO","service":"analytics","message":"Report generated successfully","request_id":"req-i1j2k3l4","user_id":234,"response_time":1.789,"status":200,"client_ip":"198.51.100.99","path":"/api/analytics","method":"GET","metadata":{"report_type":"daily","date":"2025-01-15"}}
{"timestamp":"2025-01-15T10:29:32Z","level":"INFO","service":"auth-service","message":"User logged out successfully","request_id":"req-m5n6o7p8","user_id":42,"response_time":0.023,"status":200,"client_ip":"192.168.1.100","path":"/api/logout","method":"POST"}
{"timestamp":"2025-01-15T10:29:53Z","level":"INFO","service":"search-service","message":"Search query executed","request_id":"req-q9r0s1t2","user_id":178,"response_time":0.567,"status":200,"client_ip":"10.0.5.24","path":"/api/search","method":"GET"}
{"timestamp":"2025-01-15T10:30:14Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-u3v4w5x6","response_time":0.089,"status":200,"client_ip":"203.0.113.60","path":"/sitemap.xml"}
{"timestamp":"2025-01-15T10:30:35Z","level":"INFO","service":"order-service","message":"Order query executed","request_id":"req-y7z8a9b0","user_id":789,"response_time":1.123,"status":200,"client_ip":"192.168.1.103","path":"/api/orders","method":"GET","action":"query_orders"}
{"timestamp":"2025-01-15T10:30:56Z","level":"ERROR","service":"payment-service","message":"Payment declined by provider","request_id":"req-c1d2e3f4","user_id":456,"status":402,"client_ip":"192.168.1.104","error":"PaymentDeclined: insufficient funds","severity":"high"}
{"timestamp":"2025-01-15T10:31:17Z","level":"INFO","service":"notification-service","message":"Email notification sent","request_id":"req-g5h6i7j8","user_id":42,"from":"noreply@example.com","email":"alice@example.com"}
{"timestamp":"2025-01-15T10:31:38Z","level":"ERROR","service":"api-gateway","message":"Service unavailable","request_id":"req-k9l0m1n2","status":503,"client_ip":"10.0.5.25","path":"/api/heavy-operation","error":"ServiceUnavailable: upstream timeout"}
{"timestamp":"2025-01-15T10:31:59Z","level":"INFO","service":"api-gateway","message":"Request processed successfully","request_id":"req-o3p4q5r6","user_id":42,"response_time":0.167,"status":200,"client_ip":"192.168.1.100","path":"/api/settings","session_id":"sess-abc123"}
{"timestamp":"2025-01-15T10:32:20Z","level":"WARN","service":"auth-service","message":"Multiple failed login attempts detected","request_id":"req-s7t8u9v0","client_ip":"198.51.100.120","attempts":5,"locked":false}
Use Case: HTTP Status Code Distribution¶
192.168.1.100 - alice [15/Jan/2025:10:23:45 +0000] "GET /api/users?utm_source=email&user_id=42 HTTP/1.1" 200 1523 "https://marketing.example.com/campaign" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
10.0.5.23 - bob [15/Jan/2025:10:24:12 +0000] "POST /api/orders HTTP/1.1" 201 892 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
172.16.88.5 - - [15/Jan/2025:10:24:33 +0000] "GET /search?q=widgets&page=2 HTTP/1.1" 200 5421 "https://www.google.com" "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)"
192.168.1.100 - alice [15/Jan/2025:10:25:01 +0000] "GET /products/42?utm_source=google&utm_campaign=spring HTTP/1.1" 200 2341 "https://www.google.com/search?q=gadgets" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
203.0.113.45 - - [15/Jan/2025:10:25:18 +0000] "GET /robots.txt HTTP/1.1" 200 158 "-" "GoogleBot/2.1 (+http://www.google.com/bot.html)"
Format Conversion in Pipelines¶
Convert Between Formats On-The-Fly¶
JSON to logfmt:
level=INFO message="Application started" service=api timestamp=2024-01-15T10:00:00Z version=1.2.3
config_file=/etc/app/config.yml level=DEBUG message="Loading configuration" service=api timestamp=2024-01-15T10:00:05Z
level=INFO max_connections=50 message="Connection pool initialized" service=database timestamp=2024-01-15T10:00:10Z
{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}
Logfmt to JSON:
Use Case: Normalize Multi-Format Logs¶
Handle logs with mixed JSON and logfmt lines:
kelora examples/nightmare_mixed_formats.log \
--exec 'if e.line.contains("{") {
let json_str = e.line.extract_json();
e.data = json_str
} else if e.line.contains("=") {
e.data = e.line.parse_kv()
}' \
--filter 'e.has("data")' \
-F json | head -5
{"data":{"connections":50,"format":"json","level":"DEBUG","message":"Connection pool initialized","timestamp":"2024-01-15T10:00:01Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:01Z\",\"level\":\"DEBUG\",\"format\":\"json\",\"message\":\"Connection pool initialized\",\"connections\":50}"}
{"data":{"format":"logfmt","level":"info","msg":"\"Cache","size":"1024","timestamp":"2024-01-15T10:00:02Z"},"line":"timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg=\"Cache layer ready\" size=1024"}
{"data":{"level":"WARN","nested":{"data":{"deeply":{"buried":{"value":"hard to extract with jq"}}}},"timestamp":"2024-01-15T10:00:05Z"},"line":"{\"timestamp\":\"2024-01-15T10:00:05Z\",\"level\":\"WARN\",\"nested\":{\"data\":{\"deeply\":{\"buried\":{\"value\":\"hard to extract with jq\"}}}}}"}
{"data":{"err":"\"connection","level":"error","max_retries":"5","retry":"3","timestamp":"2024-01-15T10:00:07Z"},"line":"timestamp=2024-01-15T10:00:07Z level=error err=\"connection timeout\" retry=3 max_retries=5"}
{"data":{"action":"batch_process","timestamp":"2024-01-15T10:00:09Z","users":[{"id":1,"name":"alice"},{"id":2,"name":"bob"}]},"line":"{\"timestamp\":\"2024-01-15T10:00:09Z\",\"users\":[{\"id\":1,\"name\":\"alice\"},{\"id\":2,\"name\":\"bob\"}],\"action\":\"batch_process\"}"}
2024-01-15 10:00:00 [INFO] Server starting
{"timestamp":"2024-01-15T10:00:01Z","level":"DEBUG","format":"json","message":"Connection pool initialized","connections":50}
timestamp=2024-01-15T10:00:02Z level=info format=logfmt msg="Cache layer ready" size=1024
<34>Jan 15 10:00:03 appserver syslog: Authentication module loaded
web_1 | 2024-01-15 10:00:04 [INFO] HTTP server listening on port 8080
Stateful Processing with state¶
When to Use state¶
The state global map enables complex stateful processing that track_*() functions cannot handle:
- Deduplication: Track which IDs have already been seen
- Cross-event dependencies: Make decisions based on previous events
- Complex objects: Store nested maps, arrays, or other structured data
- Conditional logic: Remember arbitrary state across events
- State machines: Track connection states, session lifecycles
- Event correlation: Match request/response pairs, build sessions
Quick Decision Guide:
| Feature | state |
track_*() |
|---|---|---|
| Purpose | Complex stateful logic | Simple metrics & aggregations |
| Read access | ✅ Yes (during processing) | ❌ No (write-only, read in --end) |
| Parallel mode | ❌ Sequential only | ✅ Works in parallel |
| Storage | Any Rhai value | Any value (strings, numbers, etc.) |
| Performance | Slower (RwLock) | Faster (atomic/optimized) |
| Use for | Deduplication, FSMs, correlation | Counting, unique tracking, bucketing |
Important: For simple counting and metrics, prefer track_count(), track_sum(), etc.—they work in both sequential and parallel modes. state only works in sequential mode.
The Problem: Deduplication¶
You have logs with duplicate entries for the same request ID, but you only want to process each unique request once:
{"request_id": "req-001", "status": "start"}
{"request_id": "req-002", "status": "start"}
{"request_id": "req-001", "status": "duplicate"} ← Skip this
{"request_id": "req-003", "status": "start"}
The Solution: Track Seen IDs with state¶
kelora -j logs.jsonl \
--exec 'if !state.contains(e.request_id) {
state[e.request_id] = true;
e.is_first = true;
} else {
e.is_first = false;
}' \
--filter 'e.is_first == true' \
-k request_id,status
Only first occurrences pass through; duplicates are filtered out.
Use Case: Track Complex Per-User State¶
Store nested maps to track multiple attributes per user:
kelora -j examples/user-events.jsonl \
--exec 'if !state.contains(e.user) {
state[e.user] = #{login_count: 0, last_seen: (), errors: []};
}
let user_state = state[e.user];
user_state.login_count += 1;
user_state.last_seen = e.timestamp;
if e.has("error") {
user_state.errors.push(e.error);
}
state[e.user] = user_state;
e.user_login_count = user_state.login_count' \
-k timestamp,user,user_login_count
timestamp='2024-01-15T10:00:00Z' user='alice' user_login_count=1
timestamp='2024-01-15T10:01:00Z' user='bob' user_login_count=1
timestamp='2024-01-15T10:02:00Z' user='alice' user_login_count=2
timestamp='2024-01-15T10:03:00Z' user='alice' user_login_count=3
timestamp='2024-01-15T10:04:00Z' user='bob' user_login_count=2
timestamp='2024-01-15T10:05:00Z' user='charlie' user_login_count=1
timestamp='2024-01-15T10:06:00Z' user='alice' user_login_count=4
timestamp='2024-01-15T10:07:00Z' user='bob' user_login_count=3
timestamp='2024-01-15T10:08:00Z' user='charlie' user_login_count=2
timestamp='2024-01-15T10:09:00Z' user='alice' user_login_count=5
{"timestamp":"2024-01-15T10:00:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:01:00Z","user":"bob","event":"login"}
{"timestamp":"2024-01-15T10:02:00Z","user":"alice","event":"view_page"}
{"timestamp":"2024-01-15T10:03:00Z","user":"alice","event":"error","error":"timeout"}
{"timestamp":"2024-01-15T10:04:00Z","user":"bob","event":"purchase"}
{"timestamp":"2024-01-15T10:05:00Z","user":"charlie","event":"login"}
{"timestamp":"2024-01-15T10:06:00Z","user":"alice","event":"login"}
{"timestamp":"2024-01-15T10:07:00Z","user":"bob","event":"error","error":"payment_failed"}
{"timestamp":"2024-01-15T10:08:00Z","user":"charlie","event":"view_page"}
{"timestamp":"2024-01-15T10:09:00Z","user":"alice","event":"logout"}
Use Case: Sequential Event Numbering¶
Assign a global sequence number across all events:
kelora -j logs.jsonl \
--begin 'state["count"] = 0' \
--exec 'state["count"] += 1; e.seq = state["count"]' \
-k seq,timestamp,message -F csv
Note: For simple counting by category, use track_count(e.category) instead.
Converting State to Regular Map¶
state is a special StateMap type with limited operations. To use map functions like .to_logfmt() or .to_kv(), convert it first:
{"timestamp":"2024-01-15T10:00:00Z","level":"INFO","service":"api","message":"Application started","version":"1.2.3"}
{"timestamp":"2024-01-15T10:00:05Z","level":"DEBUG","service":"api","message":"Loading configuration","config_file":"/etc/app/config.yml"}
{"timestamp":"2024-01-15T10:00:10Z","level":"INFO","service":"database","message":"Connection pool initialized","max_connections":50}
{"timestamp":"2024-01-15T10:01:00Z","level":"WARN","service":"api","message":"High memory usage detected","memory_percent":85}
{"timestamp":"2024-01-15T10:01:30Z","level":"ERROR","service":"database","message":"Query timeout","query":"SELECT * FROM users","duration_ms":5000}
Use Case: Event Correlation (Request/Response Pairs)¶
Match request and response events, calculating latency and emitting complete transactions:
kelora -j api-events.jsonl \
--exec 'if e.event_type == "request" {
state[e.request_id] = #{sent_at: e.timestamp, method: e.method};
e = (); # Don't emit until we see response
} else if e.event_type == "response" && state.contains(e.request_id) {
let req = state[e.request_id];
e.duration_ms = (e.timestamp - req.sent_at).as_millis();
e.method = req.method;
state.remove(e.request_id); # Clean up
}' \
-k request_id,method,duration_ms,status
Use Case: State Machines for Protocol Analysis¶
Track connection states through their lifecycle:
kelora -j network-events.jsonl \
--exec 'if !state.contains(e.conn_id) {
state[e.conn_id] = "NEW";
}
let current_state = state[e.conn_id];
# State transitions
if current_state == "NEW" && e.event == "SYN" {
state[e.conn_id] = "SYN_SENT";
} else if current_state == "SYN_SENT" && e.event == "SYN_ACK" {
state[e.conn_id] = "ESTABLISHED";
} else if current_state == "ESTABLISHED" && e.event == "FIN" {
state[e.conn_id] = "CLOSING";
} else if e.event != "DATA" {
e.protocol_error = true; # Invalid transition
}
e.connection_state = state[e.conn_id]' \
--filter 'e.has("protocol_error")' \
-k timestamp,conn_id,event,connection_state
Use Case: Session Reconstruction¶
Accumulate events into complete sessions, emitting only when session ends:
kelora -j user-events.jsonl \
--exec 'if e.event == "login" {
state[e.session_id] = #{
user: e.user,
events: [],
start: e.timestamp
};
}
if state.contains(e.session_id) {
state[e.session_id].events.push(#{event: e.event, ts: e.timestamp});
}
if e.event == "logout" {
let session = state[e.session_id];
session.end = e.timestamp;
session.event_count = session.events.len();
print(session.to_json());
state.remove(e.session_id);
}
e = ()' -q # Suppress individual events, only emit complete sessions
Use Case: Rate Limiting - Sample First N per Key¶
Only emit the first 100 events per API key, then suppress the rest:
kelora -j api-logs.jsonl \
--exec 'if !state.contains(e.api_key) {
state[e.api_key] = 0;
}
state[e.api_key] += 1;
if state[e.api_key] > 100 {
e = (); # Drop after first 100 per key
}' \
-k timestamp,api_key,endpoint
Performance and Memory Management¶
For large state maps (millions of keys), consider periodic cleanup:
kelora -j huge-logs.jsonl \
--exec 'if !state.contains("counter") { state["counter"] = 0; }
state["counter"] += 1;
# Periodic cleanup every 100k events
if state["counter"] % 100000 == 0 {
eprint("State size: " + state.len() + " keys");
if state.len() > 500000 {
state.clear(); # Reset if too large
eprint("State cleared");
}
}
# Your stateful logic here
if !state.contains(e.request_id) {
state[e.request_id] = true;
} else {
e = ();
}'
Parallel Mode Restriction¶
state requires sequential processing to maintain consistency. Using it with --parallel causes a runtime error:
# This will fail:
kelora -j logs.jsonl --parallel \
--exec 'state["count"] += 1'
# Error: 'state' is not available in --parallel mode
For parallel-safe tracking, use track_*() functions instead.
Combining Techniques¶
The real power comes from combining these features. Here's a complex real-world example:
# Process deeply nested API logs with privacy controls
kelora -j api-responses.jsonl \
--filter 'e.api_version == "v2"' \
--exec 'emit_each(e.get_path("data.orders", []))' \
--exec 'emit_each(e.items)' \
--exec 'e.error_pattern = e.get("error_msg", "").normalized();
e.user_hash = e.user_id.hash("xxh3");
e.sample_group = e.order_id.bucket() % 10;
e.user_id = ()' \
--filter 'e.sample_group < 3' \
--metrics \
--exec 'track_count(e.error_pattern);
track_sum("revenue", e.price * e.quantity)' \
-k order_id,sku,quantity,price,error_pattern -F csv \
> processed_orders.csv
This pipeline:
- Filters to API v2 only
- Fans out nested orders → items (multi-level)
- Normalizes error patterns
- Hashes user IDs for privacy
- Creates deterministic 30% sample
- Tracks error patterns and revenue
- Exports flat CSV
All in a single command without temporary files or custom scripts.
Performance Tips¶
- Use
bucket()for sampling before heavy processing - reduces work by 90% with 10% sample - Apply filters early - before fan-out or expensive transformations
- Chain operations in one
--execwhen sharing variables (semicolon-separated) - Use
xxh3hash for non-cryptographic use cases (much faster thansha256) - Limit window size (
--window N) to minimum needed for sliding calculations
Troubleshooting¶
"Function not found" errors:
- Check spelling and capitalization (Rhai is case-sensitive)
- Verify the function exists in
kelora --help-functions
() (unit) value errors:
- Guard optional fields:
if e.has("field") { ... } - Use safe conversions:
to_int_or(e.field, 0)
Pattern normalization doesn't work:
- Check that patterns exist in input:
echo "test 192.168.1.1" | kelora --exec '...' - Verify pattern names:
normalized(["ipv4", "email"])not["ip", "emails"]
Hash consistency issues:
- Same input + same algorithm = same hash (deterministic)
- Different Kelora versions may use different hash implementations
- Use
KELORA_SECRETenv var forpseudonym()to ensure domain separation
See Also¶
- Advanced Scripting Tutorial - Multi-stage transformations
- Metrics and Tracking Tutorial - Aggregation patterns
- Function Reference - Complete function catalog
- Flatten Nested JSON - Deep dive on
emit_each() - Extract and Mask Sensitive Data - Privacy techniques