Quick start

Install the binary, ingest a log file, run a query. Five lines, no flags worth worrying about yet.

 $ cargo install logdive
$ logdive ingest --file /var/log/app.log
 → 12,847 lines indexed in 78ms $ logdive query 'level=error last 1h' $ logdive stats

By default, the index lives at ~/.logdive/index.db. Override with --db <path> on any command, or set LOGDIVE_DB.

Installation

Three supported paths.

From crates.io

 $ cargo install logdive
$ cargo install logdive-api # optional, for the HTTP server 

From Docker

 $ docker pull ghcr.io/aryagorjipour/logdive:0.3.0

From source

 $ git clone https://github.com/Aryagorjipour/logdive
$ cd logdive
$ cargo build --release

Resulting binaries: logdive at 3.8 MB stripped, logdive-api at 4.1 MB stripped. MSRV: Rust 1.85.

The CLI

One binary, four subcommands.

ingest

Reads a file or stdin, parses log lines, and inserts them into the SQLite index. Supports JSON (default), logfmt, and plain text. Deduplicates via blake3 content hash.

 $ logdive ingest --file ./logs/app.log
$ logdive ingest --file ./logs/app.log --format logfmt --tag production
$ docker logs my-container | logdive ingest --tag my-container
$ logdive ingest --file ./logs/app.log --follow
--file <PATH>
Read from a file. Mutually exclusive with stdin.
--format json|logfmt|plain
Input format. Default json.
--tag <TAG>
Attach a tag to every ingested entry that does not already have a tag field.
--timestamp-now
Assign current UTC time to entries lacking a timestamp field instead of skipping them.
--follow
Tail the file for new lines, similar to tail -f. Detects log rotation and truncation. Requires --file. Unix only (Windows support is v0.4+).
--db <PATH>
Database path override. Default ~/.logdive/index.db. Also settable via LOGDIVE_DB.

query

Evaluates a query expression and prints matching rows.

 $ logdive query 'level=error AND service=payments last 2h' $ logdive query '(level=error OR level=warn) AND service=payments' $ logdive query 'level=error OR level=warn' --output json
$ logdive query 'message contains "timeout" last 24h' $ logdive query 'since 2026-01-01' --limit 0 --offset 200
--output pretty|json
Output format. Default pretty (colored). json is newline-delimited, pipe-friendly.
--limit <N>
Maximum results. Default 1000. Use 0 for unlimited.
--offset <N>
Skip the first N results. Use with --limit for page navigation. Default 0.
--db <PATH>
Database path override.

stats

Reports aggregate metadata about the index — row count, time range, tags, and DB size on disk.

 $ logdive stats
logdive index: /home/user/.logdive/index.db
  Entries:       42,317
  Time range:    2026-03-14T08:22:01Z → 2026-04-22T19:45:03Z
  Tags:          api, nginx, payments, worker, (untagged)
  DB size:       8.4 MB (8,400,000 bytes) 

prune

Deletes entries outside a retention window, then vacuums the database file to reclaim disk space. Safe for cron.

 $ logdive prune --older-than 30d
$ logdive prune --before 2026-01-01
$ logdive prune --older-than 7d --yes
--older-than <DURATION>
Delete entries older than this. Format: integer + m, h, or d. E.g. 30d, 24h. Mutually exclusive with --before.
--before <DATETIME>
Delete entries before this datetime. Accepts RFC 3339, ISO naive datetime, or ISO date. Mutually exclusive with --older-than.
--yes
Skip the interactive [y/N] confirmation. Useful in scripts and cron.
--db <PATH>
Database path override.

The HTTP API

The logdive-api binary serves the same query language over HTTP, read-only. No authentication — bind it to localhost.

 $ logdive-api --db ~/.logdive/index.db --port 4000

GET /query

Runs a query and returns matching entries as newline-delimited JSON.

q (required)
Query expression. URL-encoded. Same syntax as CLI.
limit (optional)
Maximum results. Default 1000. 0 for unlimited.
offset (optional)
Skip the first N results. Default 0. Use with limit for pagination.
 $ curl 'http://127.0.0.1:4000/query?q=level%3Derror&limit=50' {"timestamp":"2026-05-21T14:02:31Z","level":"error","message":"..."}
{"timestamp":"2026-05-21T14:02:33Z","level":"error","message":"..."} 

GET /stats

Returns aggregate metadata as a single JSON object.

 {
  "entries": 42317,
  "min_timestamp": "2026-03-14T08:22:01Z",
  "max_timestamp": "2026-04-22T19:45:03Z",
  "tags": [null, "api", "nginx", "payments", "worker"],
  "db_size_bytes": 8400000,
  "db_path": "/home/user/.logdive/index.db"
} 

GET /version

Returns build version and supported capabilities. Never touches the database. Use as a liveness probe.

 {
  "version": "0.3.0",
  "formats": ["json", "logfmt", "plain"],
  "capabilities": ["query", "stats", "version"]
} 

Query language

Boolean expressions over fields, plus an optional trailing time window. Fields can be indexed columns (timestamp, level, message, tag) or arbitrary JSON paths (user.id, request.method).

Grammar

query ::= or_expr [ time_range ] or_expr ::= and_expr ( "OR" and_expr )* and_expr ::= clause ( "AND" clause )* clause ::= field op value | field "CONTAINS" string | "(" or_expr ")" op ::= "=" | "!=" | ">" | "<" field ::= ident ( "." ident )* value ::= ident | number | quoted_string time_range ::= "last" duration | "since" datetime duration ::= number ( "m" | "h" | "d" )

Keywords are case-insensitive. AND binds tighter than OR; use parentheses to override precedence.

Operators

=
Exact match. Hits an index on known fields.
!=
Negation of =. Still indexed.
>, <
Numeric or lexicographic comparison.
CONTAINS
Case-insensitive substring match. Full-table scan on the target field.
AND
Binds clauses within a group. Tighter precedence than OR.
OR
Separates AND groups. Each group is evaluated independently.
last <N>m|h|d
Time window ending now.
since <datetime>
Absolute lower bound. RFC 3339, ISO naive datetime, or ISO date.

Examples

 # All errors from payments in the last 2 hours
level=error AND service=payments last 2h
# Errors OR warnings
level=error OR level=warn
# Parentheses for explicit precedence
(level=error OR level=warn) AND service=payments
# AND within each OR branch (no parens needed — AND binds tighter)
level=error AND service=payments OR level=warn AND tag=worker
# Case-insensitive level match (ERROR, error, Error all hit the same rows)
level=ERROR AND service=payments
# Substring search on the message body
message contains "timeout" last 24h
# Slow requests over 500ms
duration_ms > 500
# Time range by absolute date
since 2026-04-15T09:00:00Z

Configuration

All configuration is via command-line flags, with environment-variable fallbacks for containerised deployments.

LOGDIVE_DB
Database path. Fallback for --db. Default ~/.logdive/index.db.
LOGDIVE_LOG
Verbosity filter for internal diagnostics (tracing_subscriber::EnvFilter). Default warn.
LOGDIVE_API_PORT
Port for logdive-api. Fallback for --port. Default 4000.
LOGDIVE_API_HOST
Bind host for logdive-api. Fallback for --host. Default 127.0.0.1.
LOGDIVE_API_CORS_ORIGINS
Allowed CORS origins. Comma-separated list or *. Default: disabled.
NO_COLOR
Suppress ANSI colour in logdive query output when set.

Docker

Multi-arch images (linux/amd64 and linux/arm64) published to GHCR on every merge to main and every version tag.

 # Start the API server $ docker volume create logdive-data
$ docker run -d \
    --name logdive \
    -v logdive-data:/data \
    -p 4000:4000 \
    ghcr.io/aryagorjipour/logdive:0.3.0
 # Ingest with the CLI against the same volume $ docker run --rm \
    -v logdive-data:/data \
    -v /path/to/your/logs:/logs:ro \
    --entrypoint logdive \
    ghcr.io/aryagorjipour/logdive:0.3.0 \
    ingest --file /logs/app.log --tag production

Default entrypoint is logdive-api. The image pre-sets LOGDIVE_DB=/data/index.db and LOGDIVE_API_HOST=0.0.0.0. Runtime is gcr.io/distroless/cc-debian12:nonroot — no shell, no curl, no root. HEALTHCHECK uses logdive-api --health-check, which opens a TCP connection to the server's own port via stdlib — no HTTP client required.

Architecture

A Cargo workspace with three crates. logdive-core is publishable to crates.io as a standalone library.

logdive/
├── logdive-core 
├── logdive 
└── logdive-api  

Storage model

Hybrid schema: timestamp, level, message, tag are indexed columns. Everything else is stored in a fields TEXT JSON blob and queried at read time via SQLite's json_extract(). Deduplication uses a raw_hash UNIQUE column with blake3 hashes and INSERT OR IGNORE.

Why SQLite

Zero infrastructure. A single file, transactional, with a query planner that handles indexes, joins, and aggregates. The interesting work is the query parser and the storage schema; SQLite handles the rest.

Why Rust

Parsing log lines at 200k/s with near-zero GC budget. The ingest path is where Rust earns its place. The query path is mostly SQL.