Multiline Strategies¶
Kelora can treat clusters of lines as a single event so stack traces, YAML payloads, JSON blobs, and other multi-line records stay intact. This page explains how multiline detection fits into the pipeline and how to pick the right strategy for your data set.
Why Multiline Matters¶
- Application errors often spill over multiple lines (Java stack traces, Python tracebacks, Go panics).
- Structured payloads such as JSON, YAML, or CEF frequently span multiple lines when logged with indentation.
- Batch systems may wrap related log entries between explicit boundary markers
like
BEGIN/END.
Without multiline detection, Kelora parses each physical line as its own event, making it hard to correlate context.
How Multiline Processing Works¶
- Pre-parse stage – Multiline runs before the input parser. The chunker groups input lines into blocks according to the configured strategy.
- Parsing – The aggregated block is fed into the selected parser (
-f). Use-f rawwhen you want to keep the block exactly as-is, including newlines. - Downstream pipeline – Filters, exec scripts, and formatters see the aggregated event exactly once.
Multiline increases per-event memory usage. When processing large files, keep an
eye on chunk size via --stats and consider tuning --batch-size/--batch-timeout
when using --parallel.
Built-in Strategies¶
Kelora ships four strategies. Only one can be active at a time.
1. Timestamp Headers (--multiline timestamp)¶
Best for logs where each entry begins with a timestamp. Detection uses Kelora's
adaptive timestamp parser; you can hint a specific format with
timestamp:format=<chrono>.
kelora -f raw examples/multiline_stacktrace.log \
--multiline timestamp \
--filter 'e.raw.contains("Traceback")' \
-F json --take 1
{"raw":"2024-01-15 10:01:00 ERROR Failed to process request\nTraceback (most recent call last):\n File \"/app/server.py\", line 42, in handle_request\n result = process_data(request.body)\n File \"/app/processor.py\", line 15, in process_data\n return json.loads(data)\nValueError: Invalid JSON format at line 3\n"}
The event now contains the full Python traceback until the next timestamped
header. Pair this strategy with --ts-format if you also need chronological
filtering later in the pipeline.
2. Indentation Continuations (--multiline indent)¶
Combine lines that start with leading whitespace. This matches Java stack traces and similar outputs where continuation lines are indented.
In this example the stack trace block remains an atomic event. If the first line
of a block is not indented (for example, Traceback ...), combine strategies by
preferring timestamp or switching to regex (see below) so the header line is
included.
3. Regex Boundaries (--multiline regex:match=...[:end=...])¶
Define explicit start and optional end markers. This is ideal for logs that wrap
records with guard strings such as BEGIN/END or XML tags.
If no end= is provided, a new match= line flushes the previous block. Regex
patterns are Rust regular expressions—the same engine used by --filter.
4. Treat Everything as One Event (--multiline all)¶
This strategy buffers the entire stream and emits it as a single event. Useful for one-off conversions (for example piping a whole JSON array into a script). Use with care: the entire input must fit in memory.
Choosing the Right Parser¶
-f rawpreserves newlines (\n) so you can post-process blocks withsplit("\n"), regex extractions, or write them to disk unchanged.- Structured parsers (
-f json,-f logfmt,-f cols:...) expect a single logical record. Use multiline to restore that logical record before parsing. - After parsing, you can still keep the original text by copying the raw block into another field inside an exec script.
Observability and Debugging¶
- Run with
--statsor--stats-onlyto see how many events were emitted after chunking. A sudden drop or spike indicates the strategy might be too broad or too narrow. - Use
--takewhile experimenting so you do not print massive aggregates to the terminal. - Inspect the aggregated text with
-f raw -F jsonduring tuning to confirm the block boundaries look correct.
Advanced Tips¶
- Custom timestamp formats:
--multiline 'timestamp:format=%d/%b/%Y:%H:%M:%S %z'mirrors Apache/Nginx access log headers. - Prefix extraction: When container runtimes prepend metadata, run
--extract-prefixbefore multiline so the separator line is preserved. - Parallel mode: With
--parallel, tune--batch-sizeand--batch-timeoutif you have extremely large blocks to prevent workers from buffering too much at once. - Fallback for JSON/YAML: Complex nested documents may require
regexboundaries or pre-processing (for example,jq) because closing braces often return to column zero, breaking theindentheuristic.
Troubleshooting¶
- Strategy misfires: If you see every line printed individually, your start
detector did not trigger. Try
--multiline regexwith an explicit pattern, or switch totimestampwith a format hint. - Truncated blocks: For JSON or YAML, remember that closing braces/brackets
often start at column zero. Use regex boundaries that match
^}or^\]to keep the termination line. - Out-of-memory risk:
--multiline alland poorly tuned regex patterns can accumulate the entire file. Run on a sample first, or set--take/--statsto monitor chunk counts. - Context flags:
-A/-B/-Crequire a sliding window. If you combine context with multiline, increase--windowso the context has enough buffered events.
Related Reading¶
- Pipeline Model – see where multiline sits relative to parsing and transformation.
- Reference: CLI Options – full
flag syntax for
--multiline,--extract-prefix, and timestamp controls. - Tutorial: Parsing Custom Formats – practical recipes that often start with multiline normalization.