Format Reference¶
Quick reference for all formats supported by Kelora.
Input Formats¶
Specify input format with -f, --input-format <format>.
Overview¶
| Format | Description |
|---|---|
auto |
Auto-detect from first non-empty line (default) |
json |
Application logs, structured data (shorthand: -j) |
line |
Unstructured logs, plain text (trailing newline/CR trimmed) |
raw |
Plain text preserved verbatim (no trimming of newline/CR or other artifacts) |
logfmt |
Heroku-style logs, simple structured logs |
csv / tsv |
Spreadsheet data, exports |
syslog |
System logs, network devices |
combined |
Apache/Nginx web server access logs |
cef |
ArcSight Common Event Format, SIEM data |
cri |
Kubernetes CRI/containerd container logs (kubectl logs --timestamps, /var/log/pods/*) |
<name> |
Built-in application-log formats (glog, log4j, …) — see --help-formats |
cols:<spec> |
Custom column-based logs |
regex:<pattern> |
Custom regex parsing with named groups and type annotations |
<fmt1>,<fmt2>[,…] |
Cascade mode — try parsers in order, first success wins (e.g. json,line) |
JSON Format¶
Syntax: -f json or -j
Description: JSON Lines format (one object per line). Nested structures preserved.
Input Example:
{"timestamp": "2024-01-15T10:30:00Z", "level": "ERROR", "service": "api", "message": "Connection failed"}
Output Fields: All JSON fields become event fields with original names and types.
Notes:
- Use
-M jsonfor multi-line JSON objects - Preserves field types (strings, numbers, booleans, null)
- Supports nested objects and arrays
Line Format¶
Syntax: -f line
Description: Plain text, one line per event.
Output Fields:
| Field | Type | Description |
|---|---|---|
line |
String | Complete line content |
Notes:
- Auto-detect falls back to line when it can't identify a structured format
- Empty lines are skipped
- Useful for unstructured logs or custom parsing with
--exec
Raw Format¶
Syntax: -f raw
Description: Plain text, one event per line, preserved verbatim. Unlike
line, the trailing newline/CR is not trimmed and backslashes and other
artifacts are kept exactly as read.
Output Fields:
| Field | Type | Description |
|---|---|---|
raw |
String | Line content, byte-for-byte as read |
Notes:
- Choose
rawoverlinewhen trailing whitespace or escape characters are significant (e.g. round-tripping data without normalization) - Like
line, it matches every line, so in a cascade it must come last
Logfmt Format¶
Syntax: -f logfmt
Description: Heroku-style key-value pairs.
Input Example:
Output Fields: All key-value pairs become top-level fields.
Notes:
- Supports quoted values:
key="value with spaces" - Keys must be alphanumeric (with underscores/hyphens)
CSV / TSV Formats¶
Syntax:
-f csv- Comma-separated with header-f tsv- Tab-separated with header-f csvnh- CSV without header-f tsvnh- TSV without header
Output Fields:
- With header: Field names from header row
- Without header:
c1,c2,c3, etc.
Type Annotations:
Specify field types for automatic conversion:
Supported types: int, float, bool
Notes:
- Quoted fields supported:
"value, with, commas" - Escaped quotes:
"value with ""quotes""" - Embedded newlines inside quoted fields (RFC 4180) are supported: a record that
spans several physical lines, e.g.
"a","line1<newline>line2", is reassembled into a single event before parsing. This works in both sequential and-P/--parallelmode — the parallel batcher keeps whole records within a batch so the record can be stitched back together. A quote that is opened but never closed before end of input is genuinely malformed and reported as anUnterminated quoted fieldparse error in either mode. - Ragged rows are preserved, not dropped: columns beyond the header (or beyond
the first row in header-less mode) are kept under positional names (
c5,c6, ... counted from 1), and rows with fewer columns leave the trailing fields absent. Both cases are counted and reported as a hint on stderr. --strictrejects ragged rows as parse errors instead.
Syslog Format¶
Syntax: -f syslog
Description: RFC5424 and RFC3164 syslog messages. Auto-detects format.
Input Examples:
RFC5424:
RFC3164:
Output Fields:
| Field | Type | RFC5424 | RFC3164 | Description |
|---|---|---|---|---|
pri |
Integer | ✓ | ✓* | Priority value (facility * 8 + severity) |
facility |
Integer | ✓ | ✓* | Syslog facility code |
severity |
Integer | ✓ | ✓* | Severity level (0-7) |
level |
String | ✓ | ✓* | Log level (EMERG, ALERT, CRIT, ERROR, WARN, NOTICE, INFO, DEBUG) |
ts |
String | ✓ | ✓ | Parsed timestamp |
host |
String | ✓ | ✓ | Source hostname |
prog |
String | ✓ | ✓ | Application/program name |
pid |
Integer/String | ✓ | ✓ | Process ID (parsed as integer if numeric) |
msgid |
String | ✓ | - | Message ID |
version |
Integer | ✓ | - | Syslog protocol version |
msg |
String | ✓ | ✓ | Log message |
*RFC3164: Only present if priority prefix <NNN> is included
Notes:
- Severity levels: 0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
Combined Log Format¶
Syntax: -f combined
Description: Apache/Nginx web server logs. Auto-handles three variants:
- Apache Common Log Format (CLF)
- Apache Combined Log Format
- Nginx Combined with request_time
Input Examples:
Common:
Combined:
192.168.1.1 - user [15/Jan/2024:10:30:00 +0000] "GET /api/data HTTP/1.1" 200 1234 "http://example.com/" "Mozilla/5.0"
Nginx with request_time:
192.168.1.1 - - [15/Jan/2024:10:30:00 +0000] "GET /api/data HTTP/1.1" 200 1234 "-" "curl/7.68.0" "0.123"
Output Fields:
| Field | Type | Common | Combined | Nginx | Description |
|---|---|---|---|---|---|
ip |
String | ✓ | ✓ | ✓ | Client IP address |
identity |
String | ✓ | ✓ | ✓ | RFC 1413 identity (omit if -) |
user |
String | ✓ | ✓ | ✓ | HTTP auth username (omit if -) |
ts |
String | ✓ | ✓ | ✓ | Request timestamp |
request |
String | ✓ | ✓ | ✓ | Full HTTP request line |
method |
String | ✓ | ✓ | ✓ | HTTP method (auto-extracted) |
path |
String | ✓ | ✓ | ✓ | Request path (auto-extracted) |
protocol |
String | ✓ | ✓ | ✓ | HTTP protocol (auto-extracted) |
status |
Integer | ✓ | ✓ | ✓ | HTTP status code |
bytes |
Integer | ✓ | ✓ | ✓ | Response size (omit if -, keep if 0) |
referer |
String | - | ✓ | ✓ | HTTP referer (omit if -) |
user_agent |
String | - | ✓ | ✓ | HTTP user agent (omit if -) |
request_time |
Float | - | - | ✓ | Request time in seconds (omit if -) |
Notes:
- Parser auto-detects variant per line
- Fields with
-values omitted (exceptbytesincludes0)
CEF Format¶
Syntax: -f cef
Description: ArcSight Common Event Format for security logs.
Input Example:
Output Fields:
Syslog prefix (optional):
| Field | Type | Description |
|---|---|---|
ts |
String | Timestamp from syslog prefix |
host |
String | Hostname from syslog prefix |
CEF header:
| Field | Type | Description |
|---|---|---|
cefver |
String | CEF format version |
vendor |
String | Device vendor name |
product |
String | Device product name |
version |
String | Device version |
eventid |
String | Event signature ID |
event |
String | Event name/classification |
severity |
String | Event severity (0-10) |
Extensions: All extension key=value pairs become top-level fields with automatic type conversion (integers, floats, booleans)
CRI Format¶
Syntax: -f cri
Description: The Kubernetes CRI/containerd on-disk container-log layout —
<RFC3339Nano> <stream> <tag> <message>. This is the raw shape of
/var/log/pods/*/*.log, of kubectl logs --timestamps, and of what node-level
log shippers (Fluent Bit, Vector, promtail) read before forwarding. The message
is often itself JSON or logfmt; it is kept verbatim in msg for an optional
second-stage parse.
Input Example:
2024-07-17T12:12:05.123456789Z stdout F {"level":"info","msg":"started"}
2024-07-17T12:12:06.223456789Z stderr P panic: runtime error: nil pointer
Output Fields:
| Field | Type | Description |
|---|---|---|
ts |
String | RFC3339Nano timestamp written by the runtime |
stream |
String | stdout or stderr |
tag |
String | F (full line) or P (partial — the runtime split a line longer than ~16 KiB; consumers rejoin consecutive P lines up to the next F) |
msg |
String | The container's log line, kept verbatim |
Example Usage:
# Show only stderr lines from a pod log file
kelora pod.log -f cri --filter 'e.stream == "stderr"' -k ts,msg
# Fan a structured JSON payload back into top-level fields, then filter on it
kelora pod.log -f cri --exec 'e.absorb_json("msg")' --filter 'e.level == "error"'
Notes:
- Unlike the other built-in application-log formats (see below),
criis part of auto-detection's early path — it is tried before the logfmt and CSV steps, because a CRI message is frequently itself JSON or logfmt (whose commas/pairs would otherwise be misdetected as CSV/logfmt). Auto-detection therefore recognises CRI logs regardless of the message payload. - The Docker
json-filelog driver writes a different shape (one JSON object per line withlog/stream/timekeys) — use-f jsonfor those.
Built-in Application-Log Formats¶
Beyond the wire/access formats above, Kelora ships a curated set of built-in
application-log layouts — glog (Go/Kubernetes klog), nginx-error,
apache-error, log4j/Java, python-logging, postgres (PostgreSQL server
log), redis, s3, haproxy, and iso8601-level
— that parse into ts, level, msg, and format-specific extras. Select them
with -f <name> (e.g. -f log4j) or inside a cascade (-f log4j,line). With
the exception of cri (above), they are tried only as the last step before the
line fallback, so they never override a format detected earlier. Run kelora
--help-formats for the full catalogue, sample lines, and per-format notes.
(These definitions are adapted from lnav, BSD-3-Clause;
cri is Kelora-original.)
The access-log formats (s3, haproxy) keep only a curated set of useful
fields and may drop a long, version-dependent tail. Nothing is lost: the full
raw line is always available to a script as line / meta.line, so a dropped
column can be recovered with a second-stage parse, e.g. extracting a trailing
quoted column off the raw line:
postgres matches the default log_line_prefix = '%m [%p] '
(2024-01-02 15:04:05.123 UTC [1234] LOG: …); a customized prefix (adding
user@db, an application name, etc.) won't auto-detect — reach those with
-f regex:.
A PostgreSQL error often spans several physical lines — an ERROR:/STATEMENT:
record followed by tab-indented query-continuation lines that carry no prefix.
Parse those with -M indent, which folds the indented lines into the preceding
record before parsing; without it they are reported as parse errors and dropped.
(-f postgres,line is the alternative: it keeps each unmatched continuation
line as a raw line event instead of folding it.)
Its ts is naive: like the other timestamp-only formats, it is resolved
through --input-tz (default UTC), not the zone abbreviation logged on the
line. The abbreviation is preserved in the log_tz field for inspection but not
applied to the timestamp, because zone abbreviations are ambiguous (CST,
BST, IST each map to several zones) and an abbreviation alone cannot encode
DST — so it cannot be reliably converted to an offset. (The field is named
log_tz, not tz, to signal that it is the abbreviation the server logged, not
the zone applied to the timestamp.) A Postgres server logging in UTC (the
common, recommended configuration) is therefore correct by default; for a server
logging in a non-UTC zone, pass --input-tz <IANA> matching its log_timezone,
e.g. --input-tz Europe/Berlin.
Column Format¶
Syntax: -f 'cols:<spec>'
Description: Custom column-based parsing with whitespace (or custom separator) splitting.
Separator: Use --cols-sep <separator> for custom separators (default: whitespace)
Specification Syntax:
field- Consume one columnfield(N)- Consume N columns and join-or-(N)- Skip one or N columns*field- Capture remaining columns (must be last)field:type- Apply type annotation (int,float,bool,string)
Examples:
Simple fields:
Multi-token timestamp:
# Input: 2024-01-15 10:30:00 INFO Connection failed
kelora -f 'cols:ts(2) level *msg' app.log --ts-field ts
Custom separator:
Output Fields: Field names from specification with applied type conversions.
Notes:
*fieldmust be the final token
Regex Format¶
Syntax: -f 'regex:<pattern>'
Description: Parse logs using regular expressions with named capture groups and optional type annotations.
Pattern Syntax:
(?P<name>pattern)- Named capture group (field stored as string)(?P<name:type>pattern)- Named capture group with type annotation
Supported Types: int, float, bool (lowercase only)
Examples:
Simple extraction:
Structured logs:
# Input: 2025-01-15T10:00:00Z [ERROR] Database connection failed
kelora -f 'regex:^(?P<ts>\S+) \[(?P<level>\w+)\] (?P<msg>.+)$' app.log
Apache-style logs with typed fields:
# Input: 192.168.1.1 - - [15/Jan/2025:10:00:00 +0000] "GET /api/users HTTP/1.1" 200 1234
kelora -f 'regex:^(?P<ip>\S+) - - \[(?P<timestamp>[^\]]+)\] "(?P<method>\w+) (?P<path>\S+) HTTP/[\d.]+" (?P<status:int>\d+) (?P<bytes:int>\d+)$' access.log
Output Fields: Field names from capture groups with applied type conversions.
Behavior:
- Full-line matching: Pattern implicitly anchored with
^...$ - Empty captures: Skipped (not stored as fields)
- Non-matching lines:
- Default (lenient): Returns error, line skipped, processing continues
- With
--strict: Returns error, processing halts
- Type conversion failures (e.g.,
"abc"for:int):- Default (lenient): Automatically falls back to storing as string
- With
--strict: Returns error, processing halts
Reserved Field Names:
The following names cannot be used: original_line, parsed_ts, fields
Limitations:
- Nested named capture groups are not supported
- Type annotations must be lowercase (
:int, not:INT)
Notes:
- Use raw strings in shell to avoid escaping issues:
-f 'regex:...' - Combine with
--ts-fieldto specify which field contains the timestamp - Non-capturing groups
(?:...)are supported
Auto-Detection¶
Syntax: -f auto
Description: Automatically detect format from first non-empty line.
Detection Order:
- JSON (starts with
{) - Syslog (starts with
<NNN>or an RFC3164 date) - CEF (starts with
CEF:) - Combined (matches Apache/Nginx pattern)
- CRI (
<RFC3339Nano> stdout|stderr F|P …, tried early so a JSON/logfmt message isn't misread as CSV/logfmt) - Logfmt (contains
key=valuepairs) - CSV (contains commas with consistent pattern)
- Built-in application-log formats (regex-based:
glog,log4j, …; see--help-formats) - Line (fallback)
Notes:
- Detects once, applies to all lines
- Not suitable for mixed-format files — use cascade mode instead
Auto-Detection Per File¶
Syntax: -f auto-per-file
Description: Automatically detect format from the first non-empty line of each input file, then apply that parser to the rest of the file.
Good fit: Batch runs where each file is internally consistent, but different files use different formats.
Example Usage:
# JSON app logs and logfmt worker logs in one invocation
kelora -f auto-per-file -J logs/api/*.log logs/workers/*.log
# Aggregate error events across mixed file formats without splitting first
kelora -f auto-per-file --levels error,warn logs/**/* -J
Notes:
- Detects once per file
- Uses the same first-non-empty-line semantics as
-f auto - For mixed formats within a single file, use cascade mode instead
- Not supported with
--parallelor--merge-sorted
Cascade Mode¶
Syntax: -f <fmt1>,<fmt2>[,…] (comma-separated list of simple formats)
Description: Try each parser in order on every line; the first one that
succeeds handles the event. Designed for the common "noisy JSON" case —
structured logs with plain-text noise (stack traces, panics, startup banners)
interspersed — without requiring users to split the stream with grep first.
Input Example:
{"level":"info","msg":"hello"}
Server starting on port 8080
{"level":"error","msg":"connection refused"}
java.lang.NullPointerException
Example Usage:
# Noisy JSON with plain-text fallback
kelora -f json,line app.log
# Three-way cascade
kelora -f json,logfmt,line mixed.log
# Segment downstream by how each event was parsed
kelora -f json,line app.log --filter 'e._format == "line"'
# See per-format breakdown
kelora -f json,line app.log --stats
The _format field: Every event emitted in cascade mode gets a
_format field naming the winning parser (e.g. "json", "line"). This
field is only added in cascade mode — single-format runs are unchanged.
Filter or group by it in Rhai, or inspect it in the output to debug
classification.
Diagnostic counts: With --stats, cascade mode adds a per-format
breakdown so silent misclassification surfaces immediately:
Allowed in a comma list: json, line, raw, logfmt, syslog,
cef, combined.
Not allowed in a comma list (rejected at CLI parse time):
auto— meaningless inside a cascade list; list the formats explicitlycsv,tsv,csvnh,tsvnh— schema-based; headers/types can't safely change mid-streamcols:<spec>,regex:<pattern>— a regex pattern may itself contain commas, so commas can't safely delimit them. Use repeated-finstead.
Cascades with cols:/regex: — use repeated -f. Pass one -f per
format and they are tried in order, exactly like a comma list, but each spec
is taken whole so cols:/regex: work as members:
# JSON lines plus a 'timestamp LEVEL message' app log in one file
kelora -f json -f 'cols:ts(2) level *msg' app.log
# Selective regex first, raw line as the catch-all for anything else
kelora -f json -f 'regex:(?P<ts>\S+ \S+) (?P<level>\w+) (?P<msg>.*)' -f line app.log
A comma list and repeated -f can be combined; comma-list members are
flattened into the cascade in order.
Catch-alls go last. line, raw, and cols: match essentially every
line (in resilient mode cols: fills missing fields with () rather than
failing), so anything listed after them would never run — Kelora rejects that
ordering. regex: is selective: it declines non-matching lines, so it may sit
earlier in the cascade and fall through to a later catch-all (as in the
example above, where stack-trace lines that don't match the regex are kept by
line).
Ordering matters. The first parser that returns Ok wins, so list
high-confidence formats first and use line as the terminal fallback.
Liberal grammars (like logfmt, which accepts any key=value substring)
should come after stricter ones to avoid swallowing events that a later
parser would handle correctly.
Multiline: Multiline chunking is format-aware and runs before parsing, so it follows the first listed format's strategy. Per-chunk reclassification is intentionally not supported.
Notes:
- Adds ~5–10% overhead per line vs. a single parser (one extra parse attempt on fall-through)
- Safe to combine with
--filter,--select,--stats,--metrics, and Rhai scripting - Works in both sequential and parallel modes
Output Formats¶
Specify output format with -F, --output-format <format>.
| Format | Description |
|---|---|
default |
Key-value format with colors |
json |
JSON lines (one object per line) |
logfmt |
Key-value pairs (logfmt format) |
inspect |
Debug format with type information |
levelmap |
Events grouped by log level |
keymap |
Shows first character of specified field (requires --keys with exactly one field) |
tailmap |
Visualizes numeric field distributions with percentile thresholds (requires --keys with exactly one numeric field) |
csv |
CSV with header row |
tsv |
Tab-separated values with header row |
csvnh |
CSV without header |
tsvnh |
TSV without header |
Use -q/--quiet to suppress output (implied by --stats and --metrics).
Levelmap Visual Example:

The levelmap format provides a compact visual representation of logs, showing timestamps and level indicators in a condensed format ideal for quick scanning.
Map legends:
All three map formats (levelmap, keymap, tailmap) append a one-line legend
that decodes their glyphs. The legend is data-driven: it lists only the
glyphs that actually appeared, mapped back to the source values that produced
them. For example a keymap over HTTP status codes might end with
2 = 200,204 | 4 = 404 | 5 = 500,503, and a levelmap with
E = ERROR | I = INFO | W = WARN.
By default the legend is shown only when output goes to an interactive terminal, so piped or redirected output stays clean. Override with:
--legend— always append the legend (even when piped)--no-legend— never append the legend
Keymap Format:
The keymap format works similarly to levelmap but displays the first character of any specified field instead of being limited to log levels. This is useful for visualizing patterns in custom fields like HTTP methods, status codes, user types, etc.
- Requires
--keys(or-k) with exactly one field name - Shows the first character of the field value (converted to string for non-string fields)
- Displays
.for empty or missing field values - Groups events by timestamp like
levelmap - Legend groups full values under each glyph (e.g.
2 = 200,204) - Not compatible with
--parallelmode
Tailmap Format:
The tailmap format visualizes numeric field distributions over time using tail-focused percentile thresholds (p90, p95, p99). This is ideal for performance monitoring, latency analysis, and identifying outliers.
- Requires
--keys(or-k) with exactly one numeric field name - Uses symbols:
_(below p90),1(p90-p95),2(p95-p99),3(above p99),.(missing) - Shows a summary with field statistics and percentile thresholds
- Groups events by timestamp for timeline visualization
- Not compatible with
--parallelmode
Use cases: - API response time analysis - Database query performance monitoring - Request latency tracking - Any time-series numeric data where tail latencies matter
Examples:
kelora -j app.log -F json # Output as JSON
kelora -j app.log -F csv --keys ts,level,msg # Output as CSV
kelora -F keymap -k method access.log # Show HTTP method patterns
kelora -F keymap --keys status api.log # Show status field patterns
kelora -F tailmap -k response_time api.log # Visualize response time distribution
kelora -F tailmap --keys query_time_ms db.log # Show database query performance
kelora -j app.log --stats # Only stats
See Also¶
- CLI Reference - Complete flag documentation including timestamp parsing, multiline strategies, and prefix extraction
- Quickstart - Format examples with annotated output
- Parsing Custom Formats Tutorial - Step-by-step guide
- Prepare CSV Exports for Analytics - CSV-specific tips