Kelora¶

One command for messy logs. Parse, filter, transform, and summarize logs across JSON, logfmt, syslog, CSV, plain text, and your own custom formats — with embedded Rhai scripting when simple filters aren't enough.

Watch Hack the Clown's 5-minute introduction video to see Kelora in action.

See What's New in 2.0 for the highlights and a migration guide.

Install Kelora

A quick tour¶

You don't even know what's in the file yet. Start there — no flags, no regex. Kelora decompresses the gzip, recognizes the Apache combined format, and profiles every field with real sample values:

Command/Output

kelora examples/web_access_large.log.gz --discover

Field       Type    Seen  Miss   Uniq  Examples
ip          string  1200    0%  ~1200  "178.28.169.124", "122.133.215.248", "167.218.48.211", "32.151.123.30", "93.230.194.210", "94.85.124.181", "240.138.244.202", "151.92.11.251", ...
ts          string  1200    0%  ~1168  "04/Oct/2025:10:39:55 +0200", "04/Oct/2025:11:29:49 +0200", "04/Oct/2025:11:29:40 +0200", "04/Oct/2025:10:56:32 +0200", "04/Oct/2025:10:22:35 +0200", "04/Oct/...
request     string  1200    0%  ~1200  "PATCH /distributed/cross-platform/extend HTTP/1.1", "DELETE /bandwidth/rich HTTP/2.0", "PUT /whiteboard/synergize/revolutionize/experiences HTTP/1.0", "PATCH...
method      string  1200    0%      6  "GET", "PUT", "POST", "PATCH", "HEAD", "DELETE"
path        string  1200    0%  ~1050  "/visionary/visionary/infrastructures/sticky", "/markets", "/action-items/benchmark/facilitate", "/expedite/integrated/niches/efficient", "/visionary/enable/c...
protocol    string  1200    0%      3  "HTTP/1.0", "HTTP/1.1", "HTTP/2.0"
status      int     1200    0%     21  302, 500, 301, 304, 203, 504, 100, 405, ...
bytes       int     1200    0%  ~1175  40607, 61325, 62596, 72150, 93509, 89810, 32347, 47359, ...
referer     string  1200    0%  ~1200  "https://www.directworld-class.io/cross-platform/web-enabled/integrate", "http://www.regionaldot-com.info/cross-platform/iterate/intuitive/bandwidth", "http:/...
user_agent  string  1200    0%  ~1200  "Mozilla/5.0 (Windows; U; Windows NT 6.1) AppleWebKit/533.39.1 (KHTML, like Gecko) Version/6.0 Safari/533.39.1", "Mozilla/5.0 (Windows; U; Windows NT 4.0) App...
user        string   580   52%   ~564  "denesik8024", "terry2142", "graham1715", "damore6011", "steuber8635", "king8088", "vandervort1212", "gutkowski5805", ...

1200 events scanned | format: combined (auto-detected) | timestamp: ts -> meta.parsed_ts

Mixed formats in one file are the normal case, not the exception. Give Kelora a cascade of parsers (-f json,line) and it tries each one per line, tagging every event with the winner in _format — so you keep the structured lines, drop the noise, and emit clean CSV in a single pass:

Command/OutputInput Data

kelora -f json,line examples/mixed_format.log \
  --filter 'e._format == "json"' -k timestamp,level,msg -F csv

timestamp,level,msg
2024-01-15T10:00:02Z,INFO,Order 4412 captured for user alice
2024-01-15T10:00:03Z,WARN,Retrying upstream auth.svc after HTTP 503
2024-01-15T10:00:05Z,ERROR,Upstream auth.svc timeout after 5000ms
2024-01-15T10:00:08Z,INFO,Order 4413 captured for user bob
2024-01-15T10:00:09Z,WARN,Connection pool at 85% capacity

Server starting up, plaintext logger active before JSON init
Loading config from /etc/checkout/config.yml
{"timestamp":"2024-01-15T10:00:02Z","level":"INFO","msg":"Order 4412 captured for user alice"}
{"timestamp":"2024-01-15T10:00:03Z","level":"WARN","msg":"Retrying upstream auth.svc after HTTP 503"}
[legacy] 2024-01-15 10:00:04 ERROR gateway connection reset by peer
{"timestamp":"2024-01-15T10:00:05Z","level":"ERROR","msg":"Upstream auth.svc timeout after 5000ms"}
Traceback (most recent call last):
  File "worker.py", line 42, in process_order
{"timestamp":"2024-01-15T10:00:08Z","level":"INFO","msg":"Order 4413 captured for user bob"}
{"timestamp":"2024-01-15T10:00:09Z","level":"WARN","msg":"Connection pool at 85% capacity"}

And when those logs are a wall of near-duplicate errors that differ only by hostname, UUID, or timestamp, cut straight to what's actually breaking. Point --drain at a field (-k msg, the syslog message here) and it groups near-identical lines by inferring where the values varied — <fqdn>, <uuid>, <path>, <duration> — so 742 noisy lines collapse into the handful of patterns causing the noise:

Command/OutputInput (8 of 742 lines)

kelora examples/syslog_errors.log --drain -k msg

templates (4 items):
  438: Connection timeout to database host <fqdn> after <duration>
  187: Upstream <fqdn> returned <num> for request <uuid>
   94: Failed to acquire lock on resource <path> after <duration>
   23: Payment gateway <fqdn> rejected transaction <uuid> insufficient_funds

<27>Mar 14 08:00:01 worker01 checkout[5331]: Connection timeout to database host db-replica-1.prod.internal after 10404ms
<27>Mar 14 08:00:03 api01 worker[5571]: Payment gateway adyen.gateway.internal rejected transaction c39f2e8b-2392-4450-939c-a51d53906989: insufficient_funds
<27>Mar 14 08:00:05 api01 checkout[5735]: Connection timeout to database host db-replica-1.prod.internal after 22354ms
<27>Mar 14 08:00:06 edge01 checkout[8861]: Upstream search.svc.internal returned 504 for request 267f0b10-b1df-40dd-ba9e-8bcdf85452c6
<27>Mar 14 08:00:09 api02 checkout[2996]: Connection timeout to database host db-replica-2.prod.internal after 26137ms
<27>Mar 14 08:00:11 worker02 checkout[3399]: Connection timeout to database host db-shard-3.prod.internal after 7504ms
<27>Mar 14 08:00:13 web01 checkout[2294]: Connection timeout to database host db-shard-3.prod.internal after 12361ms
<27>Mar 14 08:00:15 edge01 checkout[8849]: Connection timeout to database host db-replica-1.prod.internal after 15399ms

When Kelora helps¶

Reach for Kelora when you'd otherwise be writing a throwaway Python script. It's the middle ground between "grep is enough" and "I need a real observability platform."

Chained pipelines collapse into one command. grep | awk | jq | script.py becomes kelora, with state preserved across the pipeline instead of lost between pipes.
Messy formats parse cleanly. Mixed JSON and plaintext in the same file, key=value pairs inside message strings, nested JSON fanned out to flat rows — without regex gymnastics.
Embedded scripting when you need it. Simple filters are one-liners. When logic gets stateful — session reconstruction, per-service error rates, request/response correlation — there's a full scripting layer.
Plays well with your existing tools. Pipe ripgrep or jq upstream to pre-filter; pipe Kelora's JSON or CSV output into whatever comes next.

Kelora trades raw speed for programmability. Simple filters and format conversions handle multi-GB files comfortably; heavy Rhai scripting tops out in the low hundreds of thousands of lines before you'll want to pre-filter. Kelora plays well with ripgrep, jq, qsv, and other Unix tools.

More examples¶

Filter & Convert (The Basics)¶

Scenario: Filter a Logfmt file for slow requests and output clean JSON.

Command/OutputInput Data

kelora examples/traffic_logfmt.log \
  --filter 'e.status >= 500 || e.latency_ms > 1000' \
  -F json

{"ts":"2024-07-17T12:00:04Z","level":"INFO","method":"POST","path":"/checkout","status":200,"latency_ms":1450}
{"ts":"2024-07-17T12:00:05Z","level":"ERROR","method":"POST","path":"/checkout","status":502,"latency_ms":2100}

ts=2024-07-17T11:59:40Z level=INFO method=GET path=/docs status=200 latency_ms=280
ts=2024-07-17T11:59:55Z level=INFO method=POST path=/checkout status=200 latency_ms=950
ts=2024-07-17T12:00:01Z level=INFO method=GET path=/docs status=200 latency_ms=320
ts=2024-07-17T12:00:04Z level=INFO method=POST path=/checkout status=200 latency_ms=1450
ts=2024-07-17T12:00:05Z level=ERROR method=POST path=/checkout status=502 latency_ms=2100
ts=2024-07-17T12:00:07Z level=WARN method=GET path=/billing status=200 latency_ms=900

Modify & Anonymize (Scripting)¶

Scenario: Mask user emails for privacy and convert milliseconds to seconds before printing.

Command/OutputInput Data

kelora examples/audit.jsonl \
  --exec 'e.email = "***"; e.duration_sec = e.ms / 1000.0;' \
  --keys timestamp,user_id,email,duration_sec

timestamp='2024-01-15T10:00:00Z' user_id='usr_123' email='***' duration_sec=1.25
timestamp='2024-01-15T10:05:00Z' user_id='usr_456' email='***' duration_sec=0.34
timestamp='2024-01-15T10:10:00Z' user_id='usr_789' email='***' duration_sec=2.1
timestamp='2024-01-15T10:15:00Z' user_id='usr_234' email='***' duration_sec=0.89

{"timestamp":"2024-01-15T10:00:00Z","user_id":"usr_123","email":"alice@example.com","action":"login","ms":1250}
{"timestamp":"2024-01-15T10:05:00Z","user_id":"usr_456","email":"bob@company.org","action":"view_document","ms":340}
{"timestamp":"2024-01-15T10:10:00Z","user_id":"usr_789","email":"charlie@domain.net","action":"update_profile","ms":2100}
{"timestamp":"2024-01-15T10:15:00Z","user_id":"usr_234","email":"diana@email.com","action":"download_report","ms":890}

Stateful Analysis (Streaming Stats)¶

Scenario: 800 API calls across three endpoints. The average latency looks fine — but the tail might not be. Compute a full distribution summary (avg, min/max, p50/p95/p99) per endpoint in one pass, no external aggregator.

Command/OutputInput Data

kelora examples/api_latency_incident.jsonl --metrics \
  --exec 'track_stats("latency_" + e.endpoint.after("/", -1), e.response_time_ms)'

latency_posts_avg       146.6142714694471
latency_posts_count     261
latency_posts_max       1000
latency_posts_min       38.80401816366475
latency_posts_p50       99.29855374396752
latency_posts_p95       386.8110029886961
latency_posts_p99       880.1589150573112
latency_posts_sum       38266.32485352569
latency_search_avg      136.74529883317723
latency_search_count        256
latency_search_max      914.089985589136
latency_search_min      38.41200046695686
latency_search_p50      58.567401927088085
latency_search_p95      391.37520665797547
latency_search_p99      888.3565953312707
latency_search_sum      35006.79650129337
latency_users_avg       137.4664766693681
latency_users_count     283
latency_users_max       913.710194540903
latency_users_min       40.08425944552916
latency_users_p50       97.50887914455556
latency_users_p95       383.87714424025705
latency_users_p99       796.4844047782076
latency_users_sum       38903.012897431174

{"timestamp": "2025-01-20T14:00:00Z", "level": "INFO", "endpoint": "/api/search", "response_time_ms": 52.10434414818461, "status": 200}
{"timestamp": "2025-01-20T14:00:03Z", "level": "INFO", "endpoint": "/api/users", "response_time_ms": 47.39308575090249, "status": 200}
{"timestamp": "2025-01-20T14:00:06Z", "level": "INFO", "endpoint": "/api/posts", "response_time_ms": 49.60562557980427, "status": 200}

Look at latency_posts: the average (~147ms) looks healthy, but p99 is ~880ms — a 6× tail the average hides entirely. track_stats maintains streaming state across events (averages and counts directly, percentiles via t-digest), so this scales to files of any size without holding everything in memory. --exec runs per event; --metrics prints just the tracked metrics at the end (it implies --quiet, so individual events are suppressed).

Advanced Features¶

Beyond basic filtering and conversion, Kelora includes specialized functions that solve problems you'd otherwise need multiple tools or custom scripts for:

Extract JSON from text - Pull structured data from unstructured lines e.data = e.line.extract_json()
Deep flattening - Fan out nested arrays to flat records emit_each(e.get_path("data.orders", []))
Pattern normalization - Group errors by replacing IPs, UUIDs, emails with placeholders e.error_pattern = e.message.normalized()
Deterministic sampling - Consistent sampling across log rotations --filter 'e.request_id.bucket() % 10 == 0'
JWT parsing - Extract claims (or flag expired tokens) without verification e.token.parse_jwt().expires_at < now()
Cryptographic pseudonymization - Privacy-preserving anonymization with HMAC e.anon_user = pseudonym(e.email, "users")

See Power-User Techniques for real-world examples.

Get Started¶

→ Installation - macOS, Linux, Windows, and Cargo

→ Quickstart (5 minutes) - Run your first commands

→ Tutorial: Basics (30 minutes) - Learn input formats, filtering, and output

→ How-To Guides - Solve specific problems (including debugging)

Need to reconstruct one timeline from several already-ordered log shards? See Merge Sorted Files by Timestamp.

For deeper understanding, see Concepts. For complete reference, see Glossary, Functions, Formats, and CLI options.

Upgrading from 1.x? See What's New in 2.0 for the highlights and a migration guide.

On-call?

Jump to Incident Response Playbooks for copy-paste commands covering latency spikes, error surges, auth failures, and more.

License¶

Kelora is open source software licensed under the MIT License.

Development Approach¶

Kelora is an experiment in agentic AI development: AI agents generate all implementation and tests, and I steer requirements rather than writing or reviewing code. Validation relies on an extensive automated test suite plus cargo audit and cargo deny. Kelora is local-only with no networking or telemetry, enforced by a CI check.

This is a single-developer spare-time project, and support is best-effort. Review the Security Policy before using it on sensitive data in production.