top | item 39816263

(no title)

karlmdavis | 1 year ago

From personal experience, it scales very well vertically. Have a system in production with tens of billions of rows and north of 12 TB of storage total. That system is read-heavy with large batched inserts, not many deletes or updates.

Biggest limiter is memory, where the need for it grows linearly with table index size. Postgres really really wants to keep the index pages hot in the OS cache. Gets very sad and weird if it can’t: will unpredictably resort to table scans sometimes.

We are running on AWS Aurora, on a db.r6i.12xlarge. Nowhere even close to maxed out on potential vertical scaling.

discuss

order

brightball|1 year ago

Isn’t Aurora horizontal by default?

EDIT: Here's what I was thinking about. It's chunked in 10gb increments that are replicated across AZs.

> Fault-tolerant and self-healing storage

Aurora's database storage volume is segmented in 10 GiB chunks and replicated across three Availability Zones, with each Availability Zone persisting 2 copies of each write. Aurora storage is fault-tolerant, transparently handling the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability. Aurora storage is also self-healing; data blocks and disks are continuously scanned for errors and replaced automatically.

https://aws.amazon.com/rds/aurora/features/

dalyons|1 year ago

No? it’s a standard single node primary with replicas setup. With a fancy log based storage layer.

SOLAR_FIELDS|1 year ago

I recently did some db maintenance on a write heavy workload and I found that eventually it will bloat over time with a table with 500 million records. Switching it to use a proper partitioning scheme helped a lot. So people should not read this and assume you can just dump massive workloads into pg and they will be screamingly performant without some tuning and thoughtful design (I don’t think this is what you are implying, just a PSA)

riku_iki|1 year ago

is there a chance you run some older version of PG? They reduced bloating significantly in last few releases.

mrbonner|1 year ago

My cluster is clocked in at 230TB in Aurora. It is hitting a hard limit of 250TB AWS can support.

mrbonner|1 year ago

No, we do not store log or IoT. The data are all business related metrics. I didn't choose aurora but inherited from another team. We have 4 replication reads to scale out the read access. The internal team owns the ingestion (insert) to the write node. All other external accesses are read.

I think the reason behind aurora pick is to support arbitrary aggregation, filtering and low latency read (p90 < 3000ms). We could not pick distributed DB based on Presto, Athena or Redshift mainly for latency requirements.

The other contender I consider is Elastic search. But, I do think using it in this case is akin to fitting a square peg in round hole saying.

LunaSea|1 year ago

Being curious I was wondering what type of applications could generate this quantity of data.

Is it IoT / remote sensing related?