Benchmarks

Talos includes a k6-based load test suite that measures throughput, latency, and correctness under concurrent load. Use these benchmarks to validate your deployment and catch performance regressions.

note

These benchmarks require the Commercial edition with PostgreSQL (or CockroachDB/MySQL). The OSS edition uses SQLite, which does not support concurrent writers and cannot handle the parallel load generated by multi-VU test profiles.

Reference results

Measured on Apple M-series (M4 Pro Max), single-process commercial binary with PostgreSQL 16, stress profile (ramping 0→437 VUs over 5 minutes):

Metric	Value
Total requests	~5,000,000
Peak throughput	16,766 req/s
Overall p99 latency	123ms
Verify p95 latency	48ms
Verify p99 latency	95ms
Error rate	0.00%
Peak VUs	437
Key creations	493/s
Verifications	3,797/s
Token derivations	3,797/s

Profiles

The test suite provides three profiles selected via the TEST_PROFILE environment variable:

Profile	VUs	Duration	Executor	Purpose
`smoke`	1 read + 1 write	15s	constant-vus	Quick validation after changes
`load`	15 read + 5 write	2min	constant-vus	Sustained load for regression detection
`stress`	0→437 ramping	5min	ramping-vus	Find breaking points and measure peak capacity

The stress profile ramps through four stages:

Warm-up: 0→25 VUs over 30s
Ramp 1: 25→75 VUs over 60s
Ramp 2: 75→150 VUs over 60s
Hold: 150 VUs for 120s
Ramp down: 150→0 VUs over 30s

Read scenarios (verify, batch verify, get key, list keys, JWKS, derive token) get ~70% of VUs. Write scenarios (create, rotate, revoke, import, update, self-revoke) get ~30%.

Running benchmarks

Prerequisites

k6 load testing tool
Docker (for local PostgreSQL) or an existing PostgreSQL instance
Go toolchain (to build the binary)

Quick start

# Smoke test (quick validation)
TEST_PROFILE=smoke bash test/load/run.sh

# Load test (sustained)
TEST_PROFILE=load bash test/load/run.sh

# Stress test (peak capacity)
TEST_PROFILE=stress bash test/load/run.sh

The run.sh script handles everything: builds the commercial binary, starts PostgreSQL in Docker, runs migrations, seeds tenant data, starts the server, and executes k6.

Using an existing database

SKIP_DOCKER=true DB_DSN="postgres://user:pass@host:5432/db?sslmode=disable" \
  TEST_PROFILE=load bash test/load/run.sh

Environment variables

Variable	Default	Description
`TEST_PROFILE`	`smoke`	Test profile: `smoke`, `load`, or `stress`
`BASE_URL`	`http://localhost:4420`	Server base URL
`AUTH_TOKEN`	`test-token`	Bearer token for admin endpoints
`DB_DSN`	`postgres://talos:talos@localhost:5432/talos_test?sslmode=disable`	PostgreSQL connection string
`SKIP_DOCKER`	`false`	Skip Docker PostgreSQL setup (use existing DB)

Thresholds

Each profile enforces regression thresholds. Tests fail if any threshold is breached.

Smoke and load profiles

Metric	Threshold	Rationale
All checks	100% pass	Zero tolerance for correctness failures
HTTP errors	0%	No errors allowed at low concurrency
Overall p99	< 500ms	Generous headroom for CI runners
Verify p95	< 50ms	~25ms measured in CI (postgres)
Verify p99	< 100ms	Allows for CI variance

Stress profile

Metric	Threshold	Rationale
All checks	100% pass	Correctness under load
HTTP errors	< 1%	Small tolerance for stress conditions
Overall p99	< 400ms	~3x headroom over measured 123ms
Verify p95	< 100ms	~2x headroom over measured 48ms
Verify p99	< 200ms	~2x headroom over measured 95ms

Interpreting results

After a k6 run, look for:

checks rate: Must be 100%. Any failure indicates a correctness bug.
http_req_duration percentiles: Compare against the thresholds above. Significant increases suggest a regression.
http_req_failed rate: Should be 0% for smoke/load. Under 1% for stress.
Custom counters (key_creations, verifications, token_derivations): Compare rates against the reference results to detect throughput regressions.
iteration_duration: End-to-end time for each VU iteration including all operations.

Results are saved to .test/k6-output.txt (human-readable) and .test/k6-results.json (machine-readable).

Reference results​

Profiles​

Running benchmarks​

Prerequisites​

Quick start​

Using an existing database​

Environment variables​

Thresholds​

Smoke and load profiles​

Stress profile​

Interpreting results​

Ory Network