top | item 44422883

(no title)

Curious to hear how others have handled scale challenges in billing infrastructure:

If you're running usage-based billing for AI, infra, or API-heavy platforms— How do you deal with high-throughput event ingestion (say, 10k+ events/sec) without dropping events or messing up customer metering?

We’ve seen setups struggle hard with:

Event ordering guarantees

Idempotency at scale

Handling retries without double-counting

Would love to hear what infra patterns, queues, or storage choices worked (or failed) for you—especially?

discuss

Koshima|8 months ago

Great question!

Our approach focuses on: - Fire-and-forget ingestion with in-memory queues so events don’t block product requests - Strict idempotency tokens tied to every event, enforced at the API layer - Lightweight retry logic that prevents double-counting but guarantees delivery under transient failures

Storage-wise, we’ve leaned on a mix of time-series DBs for raw events and pre-aggregated summaries for billing views.

Would love to swap notes on failure patterns or queue setups if you’ve dealt with similar scale.