Kelora¶
One command for messy logs. Parse, filter, transform, and summarize logs across JSON, logfmt, syslog, CSV, plain text, and your own custom formats — with embedded Rhai scripting when simple filters aren't enough.
Watch Hack the Clown's 5-minute introduction video to see Kelora in action.
See What's New in 2.0 for the highlights and a migration guide.
A quick tour¶
You don't even know what's in the file yet. Start there — no flags, no regex. Kelora decompresses the gzip, recognizes the Apache combined format, and profiles every field with real sample values:
Field Type Seen Miss Uniq Examples
ip string 1200 0% ~1200 "178.28.169.124", "122.133.215.248", "167.218.48.211", "32.151.123.30", "93.230.194.210", "94.85.124.181", "240.138.244.202", "151.92.11.251", ...
ts string 1200 0% ~1168 "04/Oct/2025:10:39:55 +0200", "04/Oct/2025:11:29:49 +0200", "04/Oct/2025:11:29:40 +0200", "04/Oct/2025:10:56:32 +0200", "04/Oct/2025:10:22:35 +0200", "04/Oct/...
request string 1200 0% ~1200 "PATCH /distributed/cross-platform/extend HTTP/1.1", "DELETE /bandwidth/rich HTTP/2.0", "PUT /whiteboard/synergize/revolutionize/experiences HTTP/1.0", "PATCH...
method string 1200 0% 6 "GET", "PUT", "POST", "PATCH", "HEAD", "DELETE"
path string 1200 0% ~1050 "/visionary/visionary/infrastructures/sticky", "/markets", "/action-items/benchmark/facilitate", "/expedite/integrated/niches/efficient", "/visionary/enable/c...
protocol string 1200 0% 3 "HTTP/1.0", "HTTP/1.1", "HTTP/2.0"
status int 1200 0% 21 302, 500, 301, 304, 203, 504, 100, 405, ...
bytes int 1200 0% ~1175 40607, 61325, 62596, 72150, 93509, 89810, 32347, 47359, ...
referer string 1200 0% ~1200 "https://www.directworld-class.io/cross-platform/web-enabled/integrate", "http://www.regionaldot-com.info/cross-platform/iterate/intuitive/bandwidth", "http:/...
user_agent string 1200 0% ~1200 "Mozilla/5.0 (Windows; U; Windows NT 6.1) AppleWebKit/533.39.1 (KHTML, like Gecko) Version/6.0 Safari/533.39.1", "Mozilla/5.0 (Windows; U; Windows NT 4.0) App...
user string 580 52% ~564 "denesik8024", "terry2142", "graham1715", "damore6011", "steuber8635", "king8088", "vandervort1212", "gutkowski5805", ...
1200 events scanned | format: combined (auto-detected) | timestamp: ts -> meta.parsed_ts
Mixed formats in one file are the normal case, not the exception. Give Kelora a cascade of parsers (-f json,line) and it tries each one per line, tagging every event with the winner in _format — so you keep the structured lines, drop the noise, and emit clean CSV in a single pass:
kelora -f json,line examples/mixed_format.log \
--filter 'e._format == "json"' -k timestamp,level,msg -F csv
timestamp,level,msg
2024-01-15T10:00:02Z,INFO,Order 4412 captured for user alice
2024-01-15T10:00:03Z,WARN,Retrying upstream auth.svc after HTTP 503
2024-01-15T10:00:05Z,ERROR,Upstream auth.svc timeout after 5000ms
2024-01-15T10:00:08Z,INFO,Order 4413 captured for user bob
2024-01-15T10:00:09Z,WARN,Connection pool at 85% capacity
Server starting up, plaintext logger active before JSON init
Loading config from /etc/checkout/config.yml
{"timestamp":"2024-01-15T10:00:02Z","level":"INFO","msg":"Order 4412 captured for user alice"}
{"timestamp":"2024-01-15T10:00:03Z","level":"WARN","msg":"Retrying upstream auth.svc after HTTP 503"}
[legacy] 2024-01-15 10:00:04 ERROR gateway connection reset by peer
{"timestamp":"2024-01-15T10:00:05Z","level":"ERROR","msg":"Upstream auth.svc timeout after 5000ms"}
Traceback (most recent call last):
File "worker.py", line 42, in process_order
{"timestamp":"2024-01-15T10:00:08Z","level":"INFO","msg":"Order 4413 captured for user bob"}
{"timestamp":"2024-01-15T10:00:09Z","level":"WARN","msg":"Connection pool at 85% capacity"}
And when those logs are a wall of near-duplicate errors that differ only by hostname, UUID, or timestamp, cut straight to what's actually breaking. Point --drain at a field (-k msg, the syslog message here) and it groups near-identical lines by inferring where the values varied — <fqdn>, <uuid>, <path>, <duration> — so 742 noisy lines collapse into the handful of patterns causing the noise:
<27>Mar 14 08:00:01 worker01 checkout[5331]: Connection timeout to database host db-replica-1.prod.internal after 10404ms
<27>Mar 14 08:00:03 api01 worker[5571]: Payment gateway adyen.gateway.internal rejected transaction c39f2e8b-2392-4450-939c-a51d53906989: insufficient_funds
<27>Mar 14 08:00:05 api01 checkout[5735]: Connection timeout to database host db-replica-1.prod.internal after 22354ms
<27>Mar 14 08:00:06 edge01 checkout[8861]: Upstream search.svc.internal returned 504 for request 267f0b10-b1df-40dd-ba9e-8bcdf85452c6
<27>Mar 14 08:00:09 api02 checkout[2996]: Connection timeout to database host db-replica-2.prod.internal after 26137ms
<27>Mar 14 08:00:11 worker02 checkout[3399]: Connection timeout to database host db-shard-3.prod.internal after 7504ms
<27>Mar 14 08:00:13 web01 checkout[2294]: Connection timeout to database host db-shard-3.prod.internal after 12361ms
<27>Mar 14 08:00:15 edge01 checkout[8849]: Connection timeout to database host db-replica-1.prod.internal after 15399ms
When Kelora helps¶
Reach for Kelora when you'd otherwise be writing a throwaway Python script. It's the middle ground between "grep is enough" and "I need a real observability platform."
- Chained pipelines collapse into one command.
grep | awk | jq | script.pybecomeskelora, with state preserved across the pipeline instead of lost between pipes. - Messy formats parse cleanly. Mixed JSON and plaintext in the same file, key=value pairs inside message strings, nested JSON fanned out to flat rows — without regex gymnastics.
- Embedded scripting when you need it. Simple filters are one-liners. When logic gets stateful — session reconstruction, per-service error rates, request/response correlation — there's a full scripting layer.
- Plays well with your existing tools. Pipe
ripgreporjqupstream to pre-filter; pipe Kelora's JSON or CSV output into whatever comes next.
Kelora trades raw speed for programmability. Simple filters and format conversions handle multi-GB files comfortably; heavy Rhai scripting tops out in the low hundreds of thousands of lines before you'll want to pre-filter. Kelora plays well with ripgrep, jq, qsv, and other Unix tools.
More examples¶
Filter & Convert (The Basics)¶
Scenario: Filter a Logfmt file for slow requests and output clean JSON.
ts=2024-07-17T11:59:40Z level=INFO method=GET path=/docs status=200 latency_ms=280
ts=2024-07-17T11:59:55Z level=INFO method=POST path=/checkout status=200 latency_ms=950
ts=2024-07-17T12:00:01Z level=INFO method=GET path=/docs status=200 latency_ms=320
ts=2024-07-17T12:00:04Z level=INFO method=POST path=/checkout status=200 latency_ms=1450
ts=2024-07-17T12:00:05Z level=ERROR method=POST path=/checkout status=502 latency_ms=2100
ts=2024-07-17T12:00:07Z level=WARN method=GET path=/billing status=200 latency_ms=900
Modify & Anonymize (Scripting)¶
Scenario: Mask user emails for privacy and convert milliseconds to seconds before printing.
kelora examples/audit.jsonl \
--exec 'e.email = "***"; e.duration_sec = e.ms / 1000.0;' \
--keys timestamp,user_id,email,duration_sec
timestamp='2024-01-15T10:00:00Z' user_id='usr_123' email='***' duration_sec=1.25
timestamp='2024-01-15T10:05:00Z' user_id='usr_456' email='***' duration_sec=0.34
timestamp='2024-01-15T10:10:00Z' user_id='usr_789' email='***' duration_sec=2.1
timestamp='2024-01-15T10:15:00Z' user_id='usr_234' email='***' duration_sec=0.89
{"timestamp":"2024-01-15T10:00:00Z","user_id":"usr_123","email":"alice@example.com","action":"login","ms":1250}
{"timestamp":"2024-01-15T10:05:00Z","user_id":"usr_456","email":"bob@company.org","action":"view_document","ms":340}
{"timestamp":"2024-01-15T10:10:00Z","user_id":"usr_789","email":"charlie@domain.net","action":"update_profile","ms":2100}
{"timestamp":"2024-01-15T10:15:00Z","user_id":"usr_234","email":"diana@email.com","action":"download_report","ms":890}
Stateful Analysis (Streaming Stats)¶
Scenario: 800 API calls across three endpoints. The average latency looks fine — but the tail might not be. Compute a full distribution summary (avg, min/max, p50/p95/p99) per endpoint in one pass, no external aggregator.
kelora examples/api_latency_incident.jsonl --metrics \
--exec 'track_stats("latency_" + e.endpoint.after("/", -1), e.response_time_ms)'
latency_posts_avg 146.6142714694471
latency_posts_count 261
latency_posts_max 1000
latency_posts_min 38.80401816366475
latency_posts_p50 99.29855374396752
latency_posts_p95 386.8110029886961
latency_posts_p99 880.1589150573112
latency_posts_sum 38266.32485352569
latency_search_avg 136.74529883317723
latency_search_count 256
latency_search_max 914.089985589136
latency_search_min 38.41200046695686
latency_search_p50 58.567401927088085
latency_search_p95 391.37520665797547
latency_search_p99 888.3565953312707
latency_search_sum 35006.79650129337
latency_users_avg 137.4664766693681
latency_users_count 283
latency_users_max 913.710194540903
latency_users_min 40.08425944552916
latency_users_p50 97.50887914455556
latency_users_p95 383.87714424025705
latency_users_p99 796.4844047782076
latency_users_sum 38903.012897431174
{"timestamp": "2025-01-20T14:00:00Z", "level": "INFO", "endpoint": "/api/search", "response_time_ms": 52.10434414818461, "status": 200}
{"timestamp": "2025-01-20T14:00:03Z", "level": "INFO", "endpoint": "/api/users", "response_time_ms": 47.39308575090249, "status": 200}
{"timestamp": "2025-01-20T14:00:06Z", "level": "INFO", "endpoint": "/api/posts", "response_time_ms": 49.60562557980427, "status": 200}
Look at latency_posts: the average (~147ms) looks healthy, but p99 is ~880ms — a 6× tail the average hides entirely. track_stats maintains streaming state across events (averages and counts directly, percentiles via t-digest), so this scales to files of any size without holding everything in memory. --exec runs per event; --metrics prints just the tracked metrics at the end (it implies --quiet, so individual events are suppressed).
Advanced Features¶
Beyond basic filtering and conversion, Kelora includes specialized functions that solve problems you'd otherwise need multiple tools or custom scripts for:
-
Extract JSON from text - Pull structured data from unstructured lines
e.data = e.line.extract_json() -
Deep flattening - Fan out nested arrays to flat records
emit_each(e.get_path("data.orders", [])) -
Pattern normalization - Group errors by replacing IPs, UUIDs, emails with placeholders
e.error_pattern = e.message.normalized() -
Deterministic sampling - Consistent sampling across log rotations
--filter 'e.request_id.bucket() % 10 == 0' -
JWT parsing - Extract claims (or flag expired tokens) without verification
e.token.parse_jwt().expires_at < now() -
Cryptographic pseudonymization - Privacy-preserving anonymization with HMAC
e.anon_user = pseudonym(e.email, "users")
See Power-User Techniques for real-world examples.
Get Started¶
→ Installation - macOS, Linux, Windows, and Cargo
→ Quickstart (5 minutes) - Run your first commands
→ Tutorial: Basics (30 minutes) - Learn input formats, filtering, and output
→ How-To Guides - Solve specific problems (including debugging)
Need to reconstruct one timeline from several already-ordered log shards? See Merge Sorted Files by Timestamp.
For deeper understanding, see Concepts. For complete reference, see Glossary, Functions, Formats, and CLI options.
Upgrading from 1.x? See What's New in 2.0 for the highlights and a migration guide.
On-call?
Jump to Incident Response Playbooks for copy-paste commands covering latency spikes, error surges, auth failures, and more.
License¶
Kelora is open source software licensed under the MIT License.
Development Approach¶
Kelora is an experiment in agentic AI development: AI agents generate all implementation and tests, and I steer requirements rather than writing or reviewing code. Validation relies on an extensive automated test suite plus cargo audit and cargo deny. Kelora is local-only with no networking or telemetry, enforced by a CI check.
This is a single-developer spare-time project, and support is best-effort. Review the Security Policy before using it on sensitive data in production.