Processing Architecture¶

Understanding how Kelora processes logs through its multi-layer architecture.

Overview¶

Kelora's processing model consists of three distinct layers operating on different data types:

Input Layer - File/stdin handling and decompression
Line-Level Processing - Raw string filtering and event boundary detection
Event-Level Processing - Structured data transformation and output

This layered architecture enables efficient streaming with low memory usage while supporting both sequential and parallel processing modes.

Click the diagram to open it full-size.

Diagram summary: Kelora runs raw-line options before parsing, then parses complete event strings into event maps. User stages (--filter, --levels, --exec, --assert, and related flags) run in CLI order. Fixed event stages such as timestamp filtering, key selection, and --take run afterward.

Pipeline diagram description: a sample compressed input file becomes raw log lines, then line processing drops unwanted lines before parsing. Multiline aggregation can combine stack traces into one event string. The parser creates event map e, user stages run in CLI order, fixed event stages run afterward, and the formatter emits structured output. A lifecycle rail shows --begin, conf, track_* metrics updates, and --end.

Quick Start: What You'll Use Most¶

For typical log analysis, you only interact with these stages:

Input - Kelora auto-detects format (JSON, logfmt, syslog, etc.) from your files
Event Processing - Use --filter and --exec in the order you specify them on the CLI
Output - Events stream to stdout in readable format

Read the full doc when: You need multiline handling for stack traces, parallel processing for large files, span aggregation for grouping events, or to understand why certain features interact the way they do.

Layer 1: Input Layer¶

Input Sources¶

Stdin Mode:

Activated when no files specified or file is "-"
Background thread reads from stdin via channel
Supports one stdin source (error if "-" appears multiple times)
Useful for piping: tail -f app.log | kelora -j

File Mode:

Processes one or more files sequentially
Tracks current filename for context
Supports --file-order for processing sequence:
cli (default) - Process in CLI argument order
name - Sort alphabetically
mtime - Sort by modification time (oldest first)

Examples:

# Stdin mode
tail -f app.log | kelora -j

# File mode with ordering
kelora *.log --file-order mtime

# Mixed stdin and files
kelora file1.log - file2.log  # stdin in middle

Automatic Decompression¶

Kelora automatically detects and decompresses compressed input using magic bytes detection (not file extensions):

Supported Formats:

Gzip - Magic bytes 1F 8B 08 (.gz files or gzipped stdin)
Zstd - Magic bytes 28 B5 2F FD (.zst files or zstd stdin)
Plain - No magic bytes, passthrough

Behavior:

Transparent decompression before any processing
Works on both files and stdin
ZIP files explicitly rejected with error message
Decompression happens in Input Layer

Examples:

kelora app.log.gz                    # Auto-detected gzip
kelora app.log.zst --parallel        # Auto-detected zstd
gzip -c app.log | kelora -j          # Gzipped stdin

Reader Threading¶

Sequential Mode:

Spawns background reader thread
Sends lines via bounded channel (1024 line buffer)
Main thread processes lines one at a time
Supports multiline timeout flush (default: 200ms)

Parallel Mode:

Reader batches lines (default: 1000 lines, 200ms timeout)
Worker pool processes batches concurrently
No cross-batch state (impacts multiline, spans)

Layer 2: Line-Level Processing¶

Operations on raw string lines before parsing into events.

Line Skipping (`--skip-lines`)¶

Skip first N lines from input (useful for CSV headers, preambles).

kelora data.csv --skip-lines 1

Line Filtering (`--ignore-lines`, `--keep-lines`)¶

Regex-based filtering on raw lines before parsing:

--ignore-lines <REGEX> - Skip lines matching pattern
--keep-lines <REGEX> - Keep only lines matching pattern

Resilient mode: Skip non-matching lines, continue processing Strict mode: Abort on regex error

# Ignore health checks before parsing
kelora access.log --ignore-lines 'health-check'

# Keep only lines starting with timestamp
kelora app.log --keep-lines '^\d{4}-\d{2}-\d{2}'

Section Selection¶

Extract specific sections from logs based on start/end markers:

Flags:

--section-after <REGEX> - Begin section (exclude marker line)
--section-from <REGEX> - Begin section (include marker line)
--section-through <REGEX> - End section (include marker line)
--section-before <REGEX> - End section (exclude marker line)
--max-sections <N> - Limit number of sections

State Machine:

NotStarted → (match start) → InSection → (match end) → BetweenSections → ...

Example:

# Extract sections between markers
kelora system.log \
    --section-from '=== Test Started ===' \
    --section-through '=== Test Completed ==='

Event Aggregation (Multiline)¶

Detects event boundaries to combine multiple lines into single events before parsing.

Four Strategies:

1. Timestamp Strategy (auto-detect timestamp headers)

kelora app.log -M timestamp
kelora app.log -M 'timestamp:format=%Y-%m-%d %H-%M-%S'

Detects lines starting with timestamps as new events. Continuation lines (stack traces, wrapped messages) are appended to current event.

2. Indent Strategy (whitespace continuation)

kelora app.log -M indent

Lines starting with whitespace are continuations of previous event.

3. Regex Strategy (custom patterns)

kelora app.log -M 'regex:match=^\['
kelora app.log -M 'regex:match=^\[:end=^\['

Define custom start/end patterns for event boundaries using match= (required) and end= (optional) segments within the -M argument.

4. All Strategy (entire input as one event)

kelora config.json -M all

Buffers entire input as single event (use for structured files).

Note: The current CLI treats : as an option separator inside the -M value. For regex patterns, encode literal colons (for example \x3A). Timestamp hints that require : currently need pre-normalised input or a regex-based strategy.

Multiline Timeout:

Sequential mode: Flush incomplete events after timeout (default: 200ms)
Parallel mode: Flush at batch boundaries (no timeout)

Critical: Multiline creates event boundaries before parsing. Each complete event string is then parsed into structured data.

Layer 3: Event-Level Processing¶

Operations on parsed events (maps/objects).

Parsing¶

Convert complete event strings into structured maps:

Event string → Parser → Event map (e.field accessible)

Parsers: json, logfmt, syslog, combined, csv, tsv, cols, etc.

Script Stages (Pipeline Core)¶

User-controlled stages execute exactly where you place them on the CLI:

--filter <EXPR> – Boolean filter (true = keep, false = skip)
--levels/-l <LIST> – Include log levels (comma-separated for OR; use separate flags for progressive filtering)
--exclude-levels/-L <LIST> – Exclude log levels (comma-separated)
--exec <SCRIPT> – Transform/process event
--exec-file <PATH> – Execute script from file (alias: -E)

You can mix and repeat these flags; each stage sees the output of the previous one. For level filtering, use comma-separated values for OR logic (--levels error,warn). Consecutive --levels flags create AND filters (advanced).

Inside --exec, call skip() to drop the current event immediately; later stages and output are skipped, and the event is counted as filtered.

Example:

kelora -j app.log \
    --levels error,critical \        # Stage 1: Level filter
    --filter 'e.status >= 400' \     # Stage 2: Filter
    --exec 'e.alert = true' \        # Stage 3: Exec (only 4xx/5xx errors)
    --exclude-levels debug \         # Stage 4: Remove any downgraded events
    --exec 'track_freq("path", e.path)'     # Stage 5: Exec (track surviving paths)

Each stage processes the output of the previous stage sequentially.

Complete Stage Ordering¶

User-controlled stages (run in the order you specify them on the CLI):

--filter, --levels, --exclude-levels, --exec, --exec-file, --assert

Fixed-position event stages (always run after user-controlled stages, regardless of CLI order):

Timestamp filtering – --since, --until
Timestamp conversion – --normalize-ts
Drain summary – --drain (sequential only)
Key filtering – --keys, --exclude-keys
Take limit – --take

Place --levels before heavy transforms to prune work early, or add another --levels after a script if you synthesise a level field there.

Span Processing¶

Groups events into spans for aggregation:

Click the diagram to open it full-size.

Diagram summary: Kelora assigns span metadata before user stages run, but only emitted events are stored in span.events. Dropped events are marked as filtered and excluded from the span buffer. When a span closes, --span-close runs once with the span binding and per-span metric deltas.

Count-based Spans:

kelora -j app.log --span 100 \
    --span-close 'print("Span complete: " + meta.span_id)'

Closes span every N events that pass filters.

Time-based Spans:

kelora -j app.log --span 5m \
    --span-close 'track_sum("requests", span.size)'

Closes span on aligned time windows (5m, 1h, 30s, etc.).

Span Processing Flow:

Parser creates an event map
Span processor assigns pending span metadata (meta.span_id, meta.span_status, meta.span_start, meta.span_end)
User stages and fixed event stages run
Only emitted events are recorded in span.events; dropped events are marked filtered and excluded
When span closes → --span-close hook executes with span and metrics

Constraints:

Spans force sequential mode (incompatible with --parallel)
Span state maintained across events

Begin and End Stages¶

--begin: Execute once before processing any events --end: Execute once after all events processed

kelora -j app.log \
    --begin 'print("Starting analysis")' \
    --exec 'track_freq("service", e.service)' \
    --end 'print("Services seen: " + metrics.len())' \
    --metrics

In parallel mode:

--begin runs sequentially before worker pool starts
--end runs sequentially after workers complete (with merged metrics)

Context Lines¶

Show surrounding lines around matches:

--before-context N / -B N - Show N lines before match
--after-context N / -A N - Show N lines after match
--context N / -C N - Show N lines before and after

Requires active filtering (--filter, --levels, --since, etc.).

kelora -j app.log \
    --filter 'e.level == "ERROR"' \
    --before-context 2 \
    --after-context 2

Output Stage¶

Format and emit events:

Apply --keys field selection
Convert timestamps (--normalize-ts, --show-ts-local, --show-ts-utc)
Format output (--output-format: default, json, csv, etc.)
Apply --take limit
Write to stdout or files

kelora -j app.log \
    --keys timestamp,service,message \
    -F json \
    --take 100

Parallel Processing Model¶

Kelora's --parallel mode is batch-parallel, not stage-parallel.

Click the diagram to open it full-size.

Diagram summary: Parallel mode batches raw lines. Without multiline, workers receive line batches directly. With multiline enabled, a chunker thread builds complete event strings before worker processing. Workers run independent pipeline instances, then Kelora merges results, ordering, metrics, and stats.

Architecture¶

Sequential:  Line → Line filters → Multiline → Parse → Script stages → Output
             (one at a time)

Parallel:    Lines → Line filters → Line batches
             Without multiline: batches → Worker pool
             With multiline: batches → Chunker thread → Event batches → Worker pool
             Each worker: Parse → Script stages → Fixed stages
             Results → Ordering buffer → Output

Where:

Line filters = --skip-lines, section selection, --keep-lines, --ignore-lines
Multiline = Event boundary detection (aggregates multiple lines into events)
Script stages = --filter, --levels, --exclude-levels, --exec, --assert, and --exec-file in CLI order

How It Works¶

Reader thread reads lines from input
Batcher thread applies early line processing and batches surviving lines (default: 1000 lines, 200ms timeout)
With multiline enabled, a chunker thread converts line batches into complete event strings before worker processing
Worker pool processes batches independently (default: CPU count workers)
Each worker has its own Pipeline instance
Results merged with ordering preservation (default) or unordered (--unordered)
Stats/metrics merged from all workers

Configuration:

kelora -j large.log \
    --parallel \
    --threads 8 \
    --batch-size 2000 \
    --batch-timeout 500

Constraints and Tradeoffs¶

Incompatible Features:

Spans - Cannot maintain span state across batches (forces sequential)
Cross-event context (--window, -B/-C) - Order-dependent across batches, so it also forces sequential. kelora ignores --parallel/--threads/--batch-size (with a warning) when these are set; otherwise per-worker buffers would silently return wrong cross-event results.

Multiline Behavior:

With multiline enabled, a dedicated chunker thread creates complete event strings before workers run
This avoids splitting multiline events at line-batch boundaries
Multiline workloads may still parallelize less efficiently because chunking is a shared upstream step

Ordering:

Default: Preserve input order (adds overhead)
--unordered: Trade ordering for maximum throughput

Best For:

Large files with independent events
CPU-bound transformations (regex, hashing, calculations)
High-throughput batch processing

Not Ideal For:

Real-time streaming (use sequential)
Cross-event analysis (use spans in sequential mode)
Small files (overhead exceeds benefit)

Metrics and Statistics¶

Kelora maintains two tracking systems:

User Metrics (`--metrics`)¶

Populated by Rhai functions in --exec scripts:

kelora -j app.log \
    --exec 'track_freq("service", e.service)' \
    --exec 'track_sum("total_bytes", e.bytes)' \
    --exec 'track_unique("users", e.user_id)' \
    --metrics

Available Functions:

track_freq(name, value) - Count occurrences per category
track_sum(key, value) - Sum values
track_min(key, value) - Track minimum value
track_max(key, value) - Track maximum value
track_unique(key, value) - Collect unique values (exact, stores all)
track_cardinality(key, value) - Estimate unique count (HyperLogLog, ~1% error)
track_top(name, item [, n]) - Track most frequent items

Access in --end stage:

kelora -j app.log \
    --exec 'track_freq("service", e.service)' \
    --end 'print("Total services: " + metrics.len())' \
    --metrics

Output:

Printed to stderr with --metrics
Written to JSON file with --metrics-file metrics.json

Internal Statistics (`--stats`)¶

Auto-collected counters:

events_created - Parsed events
events_output - Output events
events_filtered - Filtered events
discovered_levels - Log levels seen
discovered_keys - Field names seen
Parse errors, filter errors, etc.

kelora -j app.log --stats

Parallel Metrics Merging¶

In parallel mode:

Each worker maintains local tracking state
GlobalTracker merges worker states after processing:
Counters: summed
Unique sets: unioned
Averages: recomputed from sums and counts
Merged metrics available in --end stage

Error Handling¶

Resilient Mode (Default)¶

Parse errors: Skip line, continue processing
Filter errors: Treat as false, skip event
Transform errors: Return original event unchanged
Summary: Show recovered runtime errors as warnings at end
Exit code: Recovered filter/exec errors exit 0; use --strict to fail on them

kelora -j app.log --verbose  # Show errors as they occur

Strict Mode (`--strict`)¶

Any error: Abort immediately with exit code 1
No summary: Program exits on first error

kelora -j app.log --strict

Verbosity Levels¶

-v / --verbose - Show detailed errors (level 1)
-vv - More verbose (level 2)
-vvv - Maximum verbosity (level 3)

Quiet/Output Modes¶

-q / --quiet - Suppress events
--no-diagnostics - Suppress diagnostics (fatal line still emitted)
--silent - Suppress pipeline terminal output (events, diagnostics, stats, terminal metrics); script output still allowed unless you add --no-script-output or use data-only modes; one fatal line on errors; metrics files still write

Complete Data Flow¶

┌─────────────────────────────────────────┐
│  Layer 1: Input                         │
├─────────────────────────────────────────┤
│  • Stdin or Files (--file-order)        │
│  • Automatic decompression (gzip/zstd)  │
│  • Reader thread spawning               │
└──────────────┬──────────────────────────┘
               │ Raw lines
┌──────────────▼──────────────────────────┐
│  Layer 2: Line-Level Processing         │
├─────────────────────────────────────────┤
│  • --skip-lines (skip first N)          │
│  • --section-from/through (sections)    │
│  • --ignore-lines/--keep-lines (regex)  │
│  • Multiline chunker (event boundaries) │
└──────────────┬──────────────────────────┘
               │ Complete event strings
┌──────────────▼──────────────────────────┐
│  Layer 3: Event-Level Processing        │
├─────────────────────────────────────────┤
│  • Parser → Event map                   │
│  • Span preparation (assign span_id)    │
│  • User stages in CLI order             │
│    - --filter/--levels/--exec/--assert  │
│  • Fixed event stages                   │
│    - --since/--until, --keys, etc.      │
│  • Take limiter (--take)                │
│  • Span close hooks (--span-close)      │
│  • Output formatting                    │
└──────────────┬──────────────────────────┘
               │ Formatted output
               ▼
           stdout/files

Parallel Mode Differences:

Reader → line filtering/batching → optional multiline chunker → Worker pool
Each worker independently:

  - Parse event strings
  - Event-level processing
Results → Ordering buffer → Merged output
Metrics → GlobalTracker → Merged stats

Performance Characteristics¶

Streaming¶

Low memory usage - Events processed and discarded
Real-time capable - Works with tail -f and live streams
No lookahead - Cannot access future events (except with --window)

Sequential vs Parallel¶

Sequential (default):

Events processed in order
Lower memory usage
Predictable output order
Supports spans and cross-event state
Best for streaming and interactive use

Parallel (--parallel):

Events processed in batches across cores
Higher throughput for CPU-bound work
Higher memory usage (batching + worker pools)
Limited cross-event features
Best for batch processing large files

Optimization Tips¶

Early filtering:

# Good: Cheap filters first
kelora -j app.log \
    --levels error \
    --filter 'e.message.matches(r"expensive.*regex")'

# Less efficient: Expensive filter on all events
kelora -j app.log \
    --filter 'e.message.matches(r"expensive.*regex")' \
    --levels error

Use --keys to reduce output processing:

kelora -j app.log --keys timestamp,message -F json

Parallel for CPU-bound transformations:

kelora -j large.log \
    --parallel \
    --exec 'e.hash = e.content.hash("sha256")' \
    --batch-size 1000

Use --take for quick exploration:

kelora -j large.log --take 100

Processing Architecture¶

Overview¶

Quick Start: What You'll Use Most¶

Layer 1: Input Layer¶

Input Sources¶

Automatic Decompression¶

Reader Threading¶

Layer 2: Line-Level Processing¶

Line Skipping (--skip-lines)¶

Line Filtering (--ignore-lines, --keep-lines)¶

Section Selection¶

Event Aggregation (Multiline)¶

Layer 3: Event-Level Processing¶

Parsing¶

Script Stages (Pipeline Core)¶

Complete Stage Ordering¶

Span Processing¶

Begin and End Stages¶

Context Lines¶

Output Stage¶

Parallel Processing Model¶

Architecture¶

How It Works¶

Constraints and Tradeoffs¶

Metrics and Statistics¶

User Metrics (--metrics)¶

Internal Statistics (--stats)¶

Parallel Metrics Merging¶

Error Handling¶

Resilient Mode (Default)¶

Strict Mode (--strict)¶

Verbosity Levels¶

Quiet/Output Modes¶

Complete Data Flow¶

Performance Characteristics¶

Streaming¶

Sequential vs Parallel¶

Optimization Tips¶

See Also¶

Line Skipping (`--skip-lines`)¶

Line Filtering (`--ignore-lines`, `--keep-lines`)¶

User Metrics (`--metrics`)¶

Internal Statistics (`--stats`)¶

Strict Mode (`--strict`)¶