Daily Maintenance Cron
Design and rationale for the scheduled maintenance pipeline, including retry processing, usage-buffer flush, request-log cleanup, and expired API key revocation.
This page describes the /api/cron/daily-maintenance pipeline and explains why each step is implemented this way.
Trigger and schedule
The job is triggered by Vercel Cron:
- Path:
/api/cron/daily-maintenance - Schedule:
0 3 * * *(once per day)
In production, the route requires CRON_SECRET (Bearer token). In non-production, it allows local/manual invocation without the secret.
Design goals
The cron pipeline optimizes for:
- Correct core state first: key revocation must be durable and idempotent.
- Best-effort side effects: denormalized counts and cache invalidation should not block the whole run.
- Recoverability: failed side effects are persisted and retried later.
- Low operational overhead: one daily cron can handle all periodic work on Vercel Free/Hobby.
- Upgrade-friendly design: usage flush is exposed separately so higher-frequency schedules can be added later.
For low-frequency environments, operators can also trigger a protected manual usage flush from the dashboard without changing the scheduled architecture.
End-to-end pipeline
Step 1: Process retry queue first
Before running new maintenance work, the cron processes due entries in maintenance_retry_task.
Each retry task contains:
taskType: what operation to retrysync_project_api_key_countinvalidate_api_key_cache
taskKey: target identifier (projectIdorpublicKey)attempts,maxAttemptsnextRunAt,lastError
Why run retries first
- Prevents old failures from being starved by new work.
- Helps keep denormalized/cached state converged over time.
- Keeps behavior deterministic for operators: "old debt first, then fresh work."
Retry behavior
- Due tasks (
nextRunAt <= nowandattempts < maxAttempts) are processed in batches. - Success removes the task from the queue.
- Failure increments
attempts, storeslastError, and schedules next attempt with exponential backoff. - Exhausted tasks (
attempts >= maxAttempts) stay recorded and stop auto-running until re-enqueued by a new failure event.
Step 2A: Usage buffer flush
flushUsageBufferToDatabase() moves Redis minute buckets into usage_record.
Why this runs in daily cron
- Supports free-tier environments that can only schedule one cron job.
- Keeps authoritative usage metrics (
usage_record) up to date without requiring a second scheduler. - Still keeps the request path non-blocking by buffering writes in Redis first.
Step 2B: Request log cleanup
cleanupOldRequestLogs() deletes logs older than retention and returns deletedCount.
Why return deleted count
- Gives an immediate operational signal ("did cleanup do work this run?").
- Makes cron output more useful for monitoring and debugging.
Step 2C: Expired API key sweep
revokeExpiredApiKeys() executes:
- Select active keys where
expiresAt <= nowandrevokedAt IS NULL. - Bulk update them with
revokedAt = now. - Recompute
project.apiKeyCountfor affected projects. - Invalidate cache entries for revoked public keys.
Why this ordering
- Revocation is the source-of-truth state transition, so it happens first.
- Count sync and cache invalidation are side effects; they can fail independently.
- Side-effect failures are captured into retry tasks instead of aborting all work.
Failure model
- Project count sync uses
Promise.allSettled: one project failure does not short-circuit others. - Cache invalidation also uses
Promise.allSettled. - Rejected items are:
- Logged with target identifier and error
- Enqueued into
maintenance_retry_taskfor future retry
This provides eventual convergence without requiring external workers/queues.
Result contract
runDailyMaintenance() returns structured per-job status:
retryQueue: processed/succeeded/failed/exhausted/remainingusageBufferFlush: scanned/processed keys, upserted rows, counters, and failuresrequestLogCleanup:ok+deletedCountexpiredApiKeySweep:ok+expiredKeys+affectedProjects+queuedRetryTasks- top-level
okanddurationMs
This shape keeps the endpoint machine-readable for alerting and human-readable for incident triage.
Why this fits Vercel Free/Hobby
- Uses one scheduled cron route with Postgres and Redis (buffer/lock) — no extra queue infrastructure.
- Avoids single-point short-circuit failures on batch side effects.
- Supports eventual repair by retry queue + next cron execution.
- Maintains idempotent core updates (
revokedAt IS NULLguard) for safe re-runs.
Trade-offs and future upgrades
Current trade-offs:
- Side effects are eventually consistent, not strongly consistent.
- Exhausted retries require operator visibility (monitoring is important).
Potential future upgrades:
- Dedicated reconciliation job (periodic full
apiKeyCountrebuild). - Separate dead-letter handling UI/reporting for exhausted tasks.
- Distributed lock to prevent overlapping cron executions under high latency.
- Re-enable a higher-frequency schedule for
/api/cron/flush-usage-bufferwhen plan limits allow.
Last updated on
Usage Metering Pipeline
End-to-end design of usage metering in OptStuff, including hot-path buffering, Redis minute buckets, cron flushing, consistency model, and operations runbook.
Control Plane and Multi-Tenancy
How OptStuff models tenancy and manages teams, projects, API keys, and access control through the dashboard and tRPC routers.