Build a Service Health Snapshot¶
Create a concise operational report from JSON service logs, tracking errors, latency, and resource pressure without leaving the terminal.
When to Reach for This¶
- You need a quick health summary after a deployment or incident.
- Product teams request a daily/weekly status report built from application logs.
- You are validating that a suspected fix reduced error rates or latency.
Before You Start¶
- Example commands use
examples/simple_json.jsonl. Replace it with your application logs. - Ensure key fields exist in the payload (e.g.,
service,level,duration_ms,memory_percent,status). - Rhai metric helpers (
track_count,track_avg,track_percentiles,track_max, etc.) power the summary; see Tutorial: Metrics and Tracking if you need a refresher.
Step 1: Build a Baseline Scoreboard¶
Count events by service and severity to frame the rest of the investigation.
kelora -j examples/simple_json.jsonl \
-e 'track_count(e.service); track_count("level_" + e.level)' \
--metrics \
--stats
--metricsprints the tracked values once input completes.--statsreports parse errors and throughput, giving you confidence in the numbers.
Step 2: Add Performance Signals¶
Track latency and resource metrics with averages, percentiles, and extremes per service.
kelora -j examples/simple_json.jsonl \
-e 'let latency = e.get_path("duration_ms");
if latency != () {
track_avg("latency_avg|" + e.service, latency);
track_percentiles("latency|" + e.service, latency, [0.95, 0.99]);
track_max("latency_max|" + e.service, latency);
}
track_max("memory_peak|" + e.service, e.get_path("memory_percent"))' \
--metrics
Guidance:
e.get_path()returns unit()when a field is missing; check for()to avoid polluting metrics.track_avg()automatically computes averages without manual sum/count tracking.track_percentiles()creates auto-suffixed metrics likelatency|auth_p95andlatency|auth_p99for true percentile tracking.- Track additional business KPIs (orders, sign-ups) with
track_sum()ortrack_count()as needed.
Step 3: Flag Error Hotspots¶
Separate healthy traffic from incidents and identify recurring failure modes.
kelora -j examples/simple_json.jsonl \
-l error,critical \
-e 'let code = e.get_path("error.code", "unknown");
track_count("errors|" + e.service);
track_count("error_code|" + code)' \
-k timestamp,service,message \
--metrics
- Running the same pipeline without
-kis useful when you only need counts. - Combine multiple passes (baseline + errors) to build an overall picture or wrap the logic in a tiny shell script.
Step 4: Generate a Shareable Summary¶
Create a compact report for status updates or documentation.
kelora -j examples/simple_json.jsonl \
-e 'track_count(e.service);
let latency = e.get_path("duration_ms");
if latency != () {
track_avg("latency_avg|" + e.service, latency);
track_percentiles("latency|" + e.service, latency, [0.99]);
}
track_max("memory_peak|" + e.service, e.get_path("memory_percent"))' \
-m \
--end '
print("=== Service Snapshot ===");
let counts = #{};
let avg_latency = #{};
let p99_latency = #{};
let memory = #{};
for key in metrics.keys() {
let name = key.to_string();
let value = metrics[name];
if name.contains("|") {
let parts = name.split("|");
if parts.len() == 2 {
let kind = parts[0];
let service = parts[1];
if kind == "latency_avg" {
avg_latency[service] = value;
} else if name.ends_with("_p99") {
// Extract service name before _p99 suffix
let base = name.split("_p99")[0];
let svc = base.split("|")[1];
p99_latency[svc] = value;
} else if kind == "memory_peak" {
memory[service] = value;
}
}
} else {
counts[name] = value;
}
}
for service in counts.keys() {
print(service + ": " + counts[service].to_string());
if avg_latency.contains(service) {
print(" latency_avg: " + avg_latency[service].to_string());
}
if p99_latency.contains(service) {
print(" latency_p99: " + p99_latency[service].to_string());
}
if memory.contains(service) {
print(" memory_peak: " + memory[service].to_string());
}
}
'
-m(or--metrics) keeps the metrics map available in the--endblock.- The
--endscript parses auto-suffixed percentile metrics (e.g.,latency|auth_p99) to extract service names. - Redirect the output to a file (
> reports/service-health.txt) or pipe it into notification tooling.
Step 5: Export Structured Data¶
Share filtered events with teams that need to deep dive.
kelora -j examples/simple_json.jsonl \
-l error \
-e 'e.error_code = e.get_path("error.code", "unknown")' \
-k timestamp,service,error_code,message \
-F csv > service-errors.csv
- Use
-Jor-F jsonfor downstream analytics pipelines. - When privacy matters, combine with Sanitize Logs Before Sharing.
Variations¶
-
Per-service drilldown
-
Time-boxed reports
-
Live monitoring
Addtail -f /var/log/app.log | kelora -j -q \ -l error \ -e 'track_count(e.service); eprint("ALERT: error in " + e.service)'--no-emojiwhen piping into systems that cannot render emoji.
Validate and Iterate¶
- Compare counts against existing dashboards to ensure your pipeline includes every service instance.
- Make
--statspart of the report so readers see parse failures or skipped events. - If metrics explode after a deploy, combine this guide with Triage Production Errors to capture concrete samples.
See Also¶
- Tutorial: Metrics and Tracking for a deeper tour of Rhai telemetry helpers.
- Design Streaming Alerts to turn the snapshot into a continuous monitor.
- Process Archives at Scale when historical context is required.