Processing Architecture¶

Understanding how Kelora processes logs through its multi-layer architecture.

Overview¶

Kelora's processing model consists of three distinct layers operating on different data types:

Input Layer - File/stdin handling and decompression
Line-Level Processing - Raw string filtering and event boundary detection
Event-Level Processing - Structured data transformation and output

This layered architecture enables efficient streaming with low memory usage while supporting both sequential and parallel processing modes.

Kelora Pipeline Architecture

Layer 1: Input Layer¶

Input Sources¶

Stdin Mode:

Activated when no files specified or file is "-"
Background thread reads from stdin via channel
Supports one stdin source (error if "-" appears multiple times)
Useful for piping: tail -f app.log | kelora -j

File Mode:

Processes one or more files sequentially
Tracks current filename for context
Supports --file-order for processing sequence:
cli (default) - Process in CLI argument order
name - Sort alphabetically
mtime - Sort by modification time (oldest first)

Examples:

# Stdin mode
tail -f app.log | kelora -j

# File mode with ordering
kelora *.log --file-order mtime

# Mixed stdin and files
kelora file1.log - file2.log  # stdin in middle

Automatic Decompression¶

Kelora automatically detects and decompresses compressed input using magic bytes detection (not file extensions):

Supported Formats:

Gzip - Magic bytes 1F 8B 08 (.gz files or gzipped stdin)
Zstd - Magic bytes 28 B5 2F FD (.zst files or zstd stdin)
Plain - No magic bytes, passthrough

Behavior:

Transparent decompression before any processing
Works on both files and stdin
ZIP files explicitly rejected with error message
Decompression happens in Input Layer

Examples:

kelora app.log.gz                    # Auto-detected gzip
kelora app.log.zst --parallel        # Auto-detected zstd
gzip -c app.log | kelora -j          # Gzipped stdin

Reader Threading¶

Sequential Mode:

Spawns background reader thread
Sends lines via bounded channel (1024 line buffer)
Main thread processes lines one at a time
Supports multiline timeout flush (default: 200ms)

Parallel Mode:

Reader batches lines (default: 1000 lines, 200ms timeout)
Worker pool processes batches concurrently
No cross-batch state (impacts multiline, spans)

Layer 2: Line-Level Processing¶

Operations on raw string lines before parsing into events.

Line Skipping (`--skip-lines`)¶

Skip first N lines from input (useful for CSV headers, preambles).

kelora data.csv --skip-lines 1

Line Filtering (`--ignore-lines`, `--keep-lines`)¶

Regex-based filtering on raw lines before parsing:

--ignore-lines <REGEX> - Skip lines matching pattern
--keep-lines <REGEX> - Keep only lines matching pattern

Resilient mode: Skip non-matching lines, continue processing Strict mode: Abort on regex error

# Ignore health checks before parsing
kelora access.log --ignore-lines 'health-check'

# Keep only lines starting with timestamp
kelora app.log --keep-lines '^\d{4}-\d{2}-\d{2}'

Section Selection¶

Extract specific sections from logs based on start/end markers:

Flags:

--section-after <REGEX> - Begin section (exclude marker line)
--section-from <REGEX> - Begin section (include marker line)
--section-through <REGEX> - End section (include marker line)
--section-before <REGEX> - End section (exclude marker line)
--max-sections <N> - Limit number of sections

State Machine:

NotStarted → (match start) → InSection → (match end) → BetweenSections → ...

Example:

# Extract sections between markers
kelora system.log \
    --section-from '=== Test Started ===' \
    --section-through '=== Test Completed ==='

Event Aggregation (Multiline)¶

Detects event boundaries to combine multiple lines into single events before parsing.

Four Strategies:

1. Timestamp Strategy (auto-detect timestamp headers)

kelora app.log -M timestamp
kelora app.log -M 'timestamp:format=%Y-%m-%d %H-%M-%S'

Detects lines starting with timestamps as new events. Continuation lines (stack traces, wrapped messages) are appended to current event.

2. Indent Strategy (whitespace continuation)

kelora app.log -M indent

Lines starting with whitespace are continuations of previous event.

3. Regex Strategy (custom patterns)

kelora app.log -M 'regex:match=^\['
kelora app.log -M 'regex:match=^\[:end=^\['

Define custom start/end patterns for event boundaries using match= (required) and end= (optional) segments within the -M argument.

4. All Strategy (entire input as one event)

kelora config.json -M all

Buffers entire input as single event (use for structured files).

Note: The current CLI treats : as an option separator inside the -M value. For regex patterns, encode literal colons (for example \x3A). Timestamp hints that require : currently need pre-normalised input or a regex-based strategy.

Multiline Timeout:

Sequential mode: Flush incomplete events after timeout (default: 200ms)
Parallel mode: Flush at batch boundaries (no timeout)

Critical: Multiline creates event boundaries before parsing. Each complete event string is then parsed into structured data.

Layer 3: Event-Level Processing¶

Operations on parsed events (maps/objects).

Parsing¶

Convert complete event strings into structured maps:

Event string → Parser → Event map (e.field accessible)

Parsers: json, logfmt, syslog, combined, csv, tsv, cols, etc.

Script Stages (Pipeline Core)¶

User-controlled stages execute exactly where you place them on the CLI:

--filter <EXPR> – Boolean filter (true = keep, false = skip)
--levels/-l <LIST> – Include log levels (case-insensitive, repeatable)
--exclude-levels/-L <LIST> – Exclude log levels (case-insensitive, repeatable)
--exec <SCRIPT> – Transform/process event
--exec-file <PATH> – Execute script from file (alias: -E)

You can mix and repeat these flags; each stage sees the output of the previous one.

Inside --exec, call skip() to drop the current event immediately; later stages and output are skipped, and the event is counted as filtered.

Example:

kelora -j app.log \
    --levels error,critical \        # Stage 1: Level filter
    --filter 'e.status >= 400' \     # Stage 2: Filter
    --exec 'e.alert = true' \        # Stage 3: Exec (only 4xx/5xx errors)
    --exclude-levels debug \         # Stage 4: Remove any downgraded events
    --exec 'track_count(e.path)'     # Stage 5: Exec (track surviving paths)

Each stage processes the output of the previous stage sequentially.

Complete Stage Ordering¶

User-controlled stages (run in the order you specify them on the CLI):

--filter, --levels, --exclude-levels, --exec, --exec-file

Fixed-position filters (always run after user-controlled stages, regardless of CLI order):

Timestamp filtering – --since, --until
Key filtering – --keys, --exclude-keys

Place --levels before heavy transforms to prune work early, or add another --levels after a script if you synthesise a level field there.

Span Processing¶

Groups events into spans for aggregation:

Count-based Spans:

kelora -j app.log --span 100 \
    --span-close 'print("Span complete: " + meta.span_id)'

Closes span every N events that pass filters.

Time-based Spans:

kelora -j app.log --span 5m \
    --span-close 'track_sum("requests", span.size)'

Closes span on aligned time windows (5m, 1h, 30s, etc.).

Span Processing Flow:

Event passes through filters/execs
Span processor assigns span_id and SpanStatus
Event processed with span context
When span closes → --span-close hook executes
Hook has access to meta.span_id, meta.span_start, meta.span_end, metrics

Constraints:

Spans force sequential mode (incompatible with --parallel)
Span state maintained across events

Begin and End Stages¶

--begin: Execute once before processing any events --end: Execute once after all events processed

kelora -j app.log \
    --begin 'print("Starting analysis")' \
    --exec 'track_count(e.service)' \
    --end 'print("Services seen: " + metrics.len())' \
    --metrics

In parallel mode:

--begin runs sequentially before worker pool starts
--end runs sequentially after workers complete (with merged metrics)

Context Lines¶

Show surrounding lines around matches:

--before-context N / -B N - Show N lines before match
--after-context N / -A N - Show N lines after match
--context N / -C N - Show N lines before and after

Requires active filtering (--filter, --levels, --since, etc.).

kelora -j app.log \
    --filter 'e.level == "ERROR"' \
    --before-context 2 \
    --after-context 2

Output Stage¶

Format and emit events:

Apply --keys field selection
Convert timestamps (--normalize-ts, --show-ts-local, --show-ts-utc)
Format output (--output-format: default, json, csv, etc.)
Apply --take limit
Write to stdout or files

kelora -j app.log \
    --keys timestamp,service,message \
    -F json \
    --take 100

Parallel Processing Model¶

Kelora's --parallel mode is batch-parallel, not stage-parallel.

Architecture¶

Sequential:  Line → Line filters → Multiline → Parse → Script stages → Output
             (one at a time)

Parallel:    Batch of lines → Worker pool
             Each worker: Line filters → Multiline → Parse → Script stages
             Results → Ordering buffer → Output

Where:

Line filters = --skip-lines, --ignore-lines, --section-start, etc.
Multiline = Event boundary detection (aggregates multiple lines into events)
Script stages = --filter and --exec in CLI order

How It Works¶

Reader thread batches lines (default: 1000 lines, 200ms timeout)
Worker pool processes batches independently (default: CPU count workers)
Each worker has its own Pipeline instance
Results merged with ordering preservation (default) or unordered (--unordered)
Stats/metrics merged from all workers

Configuration:

kelora -j large.log \
    --parallel \
    --threads 8 \
    --batch-size 2000 \
    --batch-timeout 500

Constraints and Tradeoffs¶

Incompatible Features:

Spans - Cannot maintain span state across batches (forces sequential)
Cross-event context - Each batch processed independently

Multiline Behavior:

Multiline chunking happens per-batch
Event boundaries may not span batch boundaries
Consider larger batch sizes for multiline workloads

Ordering:

Default: Preserve input order (adds overhead)
--unordered: Trade ordering for maximum throughput

Best For:

Large files with independent events
CPU-bound transformations (regex, hashing, calculations)
High-throughput batch processing

Not Ideal For:

Real-time streaming (use sequential)
Cross-event analysis (use spans in sequential mode)
Small files (overhead exceeds benefit)

Metrics and Statistics¶

Kelora maintains two tracking systems:

User Metrics (`--metrics`)¶

Populated by Rhai functions in --exec scripts:

kelora -j app.log \
    --exec 'track_count(e.service)' \
    --exec 'track_sum("total_bytes", e.bytes)' \
    --exec 'track_unique("users", e.user_id)' \
    --metrics

Available Functions:

track_count(key) - Increment counter
track_sum(key, value) - Sum values
track_min(key, value) - Track minimum value
track_max(key, value) - Track maximum value
track_unique(key, value) - Collect unique values
track_bucket(key, bucket) - Track values in buckets

Access in --end stage:

kelora -j app.log \
    --exec 'track_count(e.service)' \
    --end 'print("Total services: " + metrics.len())' \
    --metrics

Output:

Printed to stderr with --metrics
Written to JSON file with --metrics-file metrics.json

Internal Statistics (`--stats`)¶

Auto-collected counters:

events_created - Parsed events
events_output - Output events
events_filtered - Filtered events
discovered_levels - Log levels seen
discovered_keys - Field names seen
Parse errors, filter errors, etc.

kelora -j app.log --stats

Parallel Metrics Merging¶

In parallel mode:

Each worker maintains local tracking state
GlobalTracker merges worker states after processing:
Counters: summed
Unique sets: unioned
Averages: recomputed from sums and counts
Merged metrics available in --end stage

Error Handling¶

Resilient Mode (Default)¶

Parse errors: Skip line, continue processing
Filter errors: Treat as false, skip event
Transform errors: Return original event unchanged
Summary: Show error count at end

kelora -j app.log --verbose  # Show errors as they occur

Strict Mode (`--strict`)¶

Any error: Abort immediately with exit code 1
No summary: Program exits on first error

kelora -j app.log --strict

Verbosity Levels¶

-v / --verbose - Show detailed errors (level 1)
-vv - More verbose (level 2)
-vvv - Maximum verbosity (level 3)

Quiet/Output Modes¶

-q / --quiet - Suppress events
--no-diagnostics - Suppress diagnostics (fatal line still emitted)
--silent - Suppress pipeline terminal output (events, diagnostics, stats, terminal metrics); script output still allowed unless you add --no-script-output or use data-only modes; one fatal line on errors; metrics files still write

Complete Data Flow¶

┌─────────────────────────────────────────┐
│  Layer 1: Input                         │
├─────────────────────────────────────────┤
│  • Stdin or Files (--file-order)        │
│  • Automatic decompression (gzip/zstd)  │
│  • Reader thread spawning               │
└──────────────┬──────────────────────────┘
               │ Raw lines
┌──────────────▼──────────────────────────┐
│  Layer 2: Line-Level Processing         │
├─────────────────────────────────────────┤
│  • --skip-lines (skip first N)          │
│  • --section-start/through (sections)   │
│  • --ignore-lines/--keep-lines (regex)  │
│  • Multiline chunker (event boundaries) │
└──────────────┬──────────────────────────┘
               │ Complete event strings
┌──────────────▼──────────────────────────┐
│  Layer 3: Event-Level Processing        │
├─────────────────────────────────────────┤
│  • Parser → Event map                   │
│  • Span preparation (assign span_id)    │
│  • Script stages (--filter/--exec)      │
│    - User stages in CLI order           │
│    - Timestamp filtering (--since)      │
│    - Level filtering (--levels)         │
│    - Key filtering (--keys)             │
│  • Span close hooks (--span-close)      │
│  • Output formatting                    │
└──────────────┬──────────────────────────┘
               │ Formatted output
               ▼
           stdout/files

Parallel Mode Differences:

Line batching (1000 lines) → Worker pool
Each worker independently:

  - Line-level processing
  - Event-level processing
Results → Ordering buffer → Merged output
Metrics → GlobalTracker → Merged stats

Performance Characteristics¶

Streaming¶

Low memory usage - Events processed and discarded
Real-time capable - Works with tail -f and live streams
No lookahead - Cannot access future events (except with --window)

Sequential vs Parallel¶

Sequential (default):

Events processed in order
Lower memory usage
Predictable output order
Supports spans and cross-event state
Best for streaming and interactive use

Parallel (--parallel):

Events processed in batches across cores
Higher throughput for CPU-bound work
Higher memory usage (batching + worker pools)
Limited cross-event features
Best for batch processing large files

Optimization Tips¶

Early filtering:

# Good: Cheap filters first
kelora -j app.log \
    --levels error \
    --filter 'e.message.matches(r"expensive.*regex")'

# Less efficient: Expensive filter on all events
kelora -j app.log \
    --filter 'e.message.matches(r"expensive.*regex")' \
    --levels error

Use --keys to reduce output processing:

kelora -j app.log --keys timestamp,message -F json

Parallel for CPU-bound transformations:

kelora -j large.log \
    --parallel \
    --exec 'e.hash = e.content.hash("sha256")' \
    --batch-size 1000

Use --take for quick exploration:

kelora -j large.log --take 100