Benchmarks
Talos includes a k6-based load test suite that measures throughput, latency, and correctness under concurrent load. Use these benchmarks to validate your deployment and catch performance regressions.
These benchmarks require the Commercial edition with PostgreSQL (or CockroachDB/MySQL). The OSS edition uses SQLite, which does not support concurrent writers and cannot handle the parallel load generated by multi-VU test profiles.
Reference results
Measured on Apple M-series (M4 Pro Max), single-process commercial binary with PostgreSQL 16, stress profile (ramping 0→437 VUs over 5 minutes):
| Metric | Value |
|---|---|
| Total requests | ~5,000,000 |
| Peak throughput | 16,766 req/s |
| Overall p99 latency | 123ms |
| Verify p95 latency | 48ms |
| Verify p99 latency | 95ms |
| Error rate | 0.00% |
| Peak VUs | 437 |
| Key creations | 493/s |
| Verifications | 3,797/s |
| Token derivations | 3,797/s |
Profiles
The test suite provides three profiles selected via the TEST_PROFILE environment variable:
| Profile | VUs | Duration | Executor | Purpose |
|---|---|---|---|---|
smoke | 1 read + 1 write | 15s | constant-vus | Quick validation after changes |
load | 15 read + 5 write | 2min | constant-vus | Sustained load for regression detection |
stress | 0→437 ramping | 5min | ramping-vus | Find breaking points and measure peak capacity |
The stress profile ramps through four stages:
- Warm-up: 0→25 VUs over 30s
- Ramp 1: 25→75 VUs over 60s
- Ramp 2: 75→150 VUs over 60s
- Hold: 150 VUs for 120s
- Ramp down: 150→0 VUs over 30s
Read scenarios (verify, batch verify, get key, list keys, JWKS, derive token) get ~70% of VUs. Write scenarios (create, rotate, revoke, import, update, self-revoke) get ~30%.
Running benchmarks
Prerequisites
- k6 load testing tool
- Docker (for local PostgreSQL) or an existing PostgreSQL instance
- Go toolchain (to build the binary)
Quick start
# Smoke test (quick validation)
TEST_PROFILE=smoke bash test/load/run.sh
# Load test (sustained)
TEST_PROFILE=load bash test/load/run.sh
# Stress test (peak capacity)
TEST_PROFILE=stress bash test/load/run.sh
The run.sh script handles everything: builds the commercial binary, starts PostgreSQL in Docker, runs migrations, seeds tenant
data, starts the server, and executes k6.
Using an existing database
SKIP_DOCKER=true DB_DSN="postgres://user:pass@host:5432/db?sslmode=disable" \
TEST_PROFILE=load bash test/load/run.sh
Environment variables
| Variable | Default | Description |
|---|---|---|
TEST_PROFILE | smoke | Test profile: smoke, load, or stress |
BASE_URL | http://localhost:4420 | Server base URL |
AUTH_TOKEN | test-token | Bearer token for admin endpoints |
DB_DSN | postgres://talos:talos@localhost:5432/talos_test?sslmode=disable | PostgreSQL connection string |
SKIP_DOCKER | false | Skip Docker PostgreSQL setup (use existing DB) |
Thresholds
Each profile enforces regression thresholds. Tests fail if any threshold is breached.
Smoke and load profiles
| Metric | Threshold | Rationale |
|---|---|---|
| All checks | 100% pass | Zero tolerance for correctness failures |
| HTTP errors | 0% | No errors allowed at low concurrency |
| Overall p99 | < 500ms | Generous headroom for CI runners |
| Verify p95 | < 50ms | ~25ms measured in CI (postgres) |
| Verify p99 | < 100ms | Allows for CI variance |
Stress profile
| Metric | Threshold | Rationale |
|---|---|---|
| All checks | 100% pass | Correctness under load |
| HTTP errors | < 1% | Small tolerance for stress conditions |
| Overall p99 | < 400ms | ~3x headroom over measured 123ms |
| Verify p95 | < 100ms | ~2x headroom over measured 48ms |
| Verify p99 | < 200ms | ~2x headroom over measured 95ms |
Interpreting results
After a k6 run, look for:
checksrate: Must be 100%. Any failure indicates a correctness bug.http_req_durationpercentiles: Compare against the thresholds above. Significant increases suggest a regression.http_req_failedrate: Should be 0% for smoke/load. Under 1% for stress.- Custom counters (
key_creations,verifications,token_derivations): Compare rates against the reference results to detect throughput regressions. iteration_duration: End-to-end time for each VU iteration including all operations.
Results are saved to .test/k6-output.txt (human-readable) and .test/k6-results.json (machine-readable).
