An unexpected journey, a Postgres DBA's tale

[+] limaoscarjuliet|9 years ago|reply

There are more surprises around Postgres XID to be had:

- Temp tables cannot be vacuumed by anyone else than connection that owns the table (no way around it) - do not keep temp tables too long, nightly vacuum or auto vacuum will not clear these,

- Temp tables do not get automatically dropped on recovery if your DB cluster crashes. They get reused but if you have many connections before crash and few after some namespaces will linger - you may need to manually cascade drop temp namespaces if you see the warning,

- Full vacuum does not touch XIDs - use freeze or just normal vacuum,

- Check oldest XID per table:

  SELECT c.oid::regclass as table_name,

        greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age

        FROM pg_class c

        LEFT JOIN pg_class t ON c.reltoastrelid = t.oid

        WHERE c.relkind = 'r'

        ORDER by age DESC

        LIMIT %s

- Check oldest XID per db:

        select datname db, age(datfrozenxid) FROM pg_database ORDER BY age DESC

[+] anarazel|9 years ago|reply

> - Full vacuum does not touch XIDs - use freeze or just normal vacuum,

Don't think that's right.

      copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
  			   bool *pSwapToastByContent, TransactionId *pFreezeXid,
  			   MultiXactId *pCutoffMulti)
  ...
  
  /*
  	 * Compute xids used to freeze and weed out dead tuples and multixacts.
  	 * Since we're going to rewrite the whole table anyway, there's no reason
  	 * not to be aggressive about this.
  	 */
  	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
  						  &OldestXmin, &FreezeXid, NULL, &MultiXactCutoff,
  						  NULL);
  
  /*
  	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
  	 * backwards, so take the max.
  	 */
  	if (TransactionIdPrecedes(FreezeXid, OldHeap->rd_rel->relfrozenxid))
  		FreezeXid = OldHeap->rd_rel->relfrozenxid;

i.e. the cutoff is computed in copy_heap_data(). And then in

      void
      rewrite_heap_tuple(RewriteState state,
  				   HeapTuple old_tuple, HeapTuple new_tuple)
  {
  ...
  	/*
  	 * While we have our hands on the tuple, we may as well freeze any
  	 * eligible xmin or xmax, so that future VACUUM effort can be saved.
  	 */
  	heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
  					  state->rs_cutoff_multi);

[+] the_mitsuhiko|9 years ago|reply

Are you sure the leftover temp table thing still exists? I have been using postgres for development on a project with temp tables and even though of being rough with postgres (killing, starting laptop in not so nice ways etc.) i can't find any leftovers.

[+] radiowave|9 years ago|reply

At a slight tangent, I don't think I've ever used a temporary table without living to regret it.

- Queries that perform complex operations involving the creation of temp tables cannot be joined onto themselves[1].

- Operations that create temp tables can never be used as part of the definition of a materialized view[1]. This is a permissions issue, but not one that can be resolved by altering permissions. It's baked in.

- Confusion over how to test for the existance of a temp table, "it's in the catalog but it's not in my session".

[1] - Without resorting to dblink shenanigans, or such like.

[+] saurik|9 years ago|reply

It is my vague understanding that a lot of these XID vacuum issues are being massively improved by a patch that landed a few months ago.

http://rhaas.blogspot.com/2016/03/no-more-full-table-vacuums...

"No More Full-Table Vacuums" - Robert Haas

[+] vinothgopi|9 years ago|reply

I don't think I’ve ever read a tech blog post that was so gripping, with twists and turns AND ends with a cliffhanger (until you click on the link to part 2)

[+] the_duke|9 years ago|reply

Haha I agree.

For a second I was conserned about the level of my geekiness.

Then I moved on the the next HN article...

[+] Femur|9 years ago|reply

I wonder is there a compelling reason that Postgres does not use 64-bit XIDs?

[+] saurik|9 years ago|reply

There are four XIDs per tuple, so a 64-bit XID would make every row 32-bytes larger. (That's a lot larger.)

[+] ivoras|9 years ago|reply

For the elucidation of the audience: one thing which I found out can kill PostgreSQL (and luckily there are few of those) is temp table over-use. Imagine a busy app where most page loads create and work on wide temp tables. What isn't directly obvious is that temp tables, like regular tables, have their metadata (attributes) stored in system catalogues, and the catalogues themselves apparently are very much (if not identical) like regular tables. So now, there's huge write traffic on system tables which by the nature of temp tables generates huge numbers of dead records, as information on every field of every temp table, etc. is created and deleted. I'm also pretty sure there's some ugly locking also in the mix.

So imagine now that autovacuum was turned off...

The user tables in such a system can survive if they are not updated often, with an occasional vacuum. However, the situation here was such that for every GB of user data, another GB has accumulated in dead records in system tables. Half of disk usage has gone to bloat in system tables!

It's easy to fix, of course, with a full vacuum of everything, if you have the time. Or more quickly, a full vacuum of system tables should be very fast, right? Right, except in that particular old version of PostgreSQL there was a race condition bug with vacuuming system tables which locks up the entire cluster with a possibility of data corruption for the db in question. Guess what happened.

A restore from backup.

[+] whack|9 years ago|reply

Given the dangers of the XID wraparound, I wonder if it makes sense for Postgres to auto-launch a Vacuum Freeze, or some minimal variant of it, when it detects that a XID-wraparound is imminent.

[+] pilif|9 years ago|reply

It does do that. But it runs at very low priority, so it might take a while.

But on the other hand, it starts warning you, first just in the log file, then on every query you issue way before it actually stops working, so if you're not ignoring warnings in your logs / your client-code, there should be ample time to react.

At least, there was for me when I first ran into this a few years ago.

[+] hvo|9 years ago|reply

One of the main points of VACUUM is to protect you against XID-wraparound or multixact ID wraparound.So,your idea of auto-launch of a Vacuum Freeze is good but i still believe it is important to be aware that you may lose your very old data if you dont do anything about it.

[+] hobs|9 years ago|reply

200,000 tables in one database? What?

[+] jimktrains2|9 years ago|reply

Image a case in which census data and the associated geometries. https://github.com/censusreporter/census-postgres has 22 surveys, each with 230+ tables. That's 5000+ tables right there. Now, the TIGER tables for all of that is another 50 tables per year, so another 350 tables.

If these were to be partitioned by state, instead of all records for all states in a single table, then we're looking at 270,000.

So, it's not _that_ difficult.

[+] dmuth|9 years ago|reply

What indeed. Having that many tables in a database is just asking for trouble.

My gut feeling for a database of this size is to suggest sharding. While I don't claim to be an expert in this sort of thing, here's how Pintrest approached a similar problem: https://engineering.pinterest.com/blog/sharding-pinterest-ho...

[+] barrkel|9 years ago|reply

If you have a lot of variable schema datasets that you need to be able to index and query, you can do a lot worse than create a table per data set.

At my job, in our largest environment we only have 30,000 tables in our biggest database, but that's only a year or two's worth of data and the rate of increase is increasing.

[+] sciurus|9 years ago|reply

As this post demonstrates, once you have a setup like that you'd really benefit from having someone on staff (or on retainer as a contractor) whose primary skill set is postgresql adminstration.

[+] eddd|9 years ago|reply

I lifted both of my eyebrows while reading that sentence.

[+] zwily|9 years ago|reply

Imagine your app uses 500 tables, and you use a separate schema per tenant on a single database. With 400 tenants, you now have 200,000 tables.

[+] audleman|9 years ago|reply

This article mirrors my own experience with Postgres. When something goes wrong, furiously research for 10 days, try a bunch of complex things, then discover a simple command that gracefully fixes everything =)

[+] crb002|9 years ago|reply

Or just change XIDs to 64 bit...

[+] colanderman|9 years ago|reply

An ironic comment, given yesterday's popular article: https://news.ycombinator.com/item?id=12202735

[+] approachingtraj|9 years ago|reply

The title is misleading. The post only talks about Amazon's RDS which is not Postgres.

[+] vinothgopi|9 years ago|reply

Amazon RDS is the managed version of Postgres (among other flavors of relational DBs). Maybe you are confusing it with Redshift.

https://aws.amazon.com/rds/

59 comments