zombodb's comments

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

Yes there is. It's not documented/example'd yet tho.

There's a derive macro called #[derive(PostgresType)]. Combine that with serde's Serialize, Deserialize, and you're gtg.

I'm going to be working on more docs and twitch streams over this week.

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

Fair. With pgx, however, Rust "panic!"s are translated into standard Postgres "ERROR"s, such that instead of crashing, only the current transaction aborts.

So while you're pretty much correct in general, pgx handles it in the way a PG extension author would expect.

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

That's a great question, and one probably best answered over on pgx's GitHub page.

But! I plan on adding a command to "cargo-pgx" to package up the extension for you into a directory structure (or tarball, maybe).

The idea is that you'd just run: cargo pgx package

And it would just build a --release library, and create the proper directory structure (based on what pg_config) says for the shared library and the associated .sql.

I actually need this ASAP for supporting ZomboDB proper, so... Coming Soon!

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

One more follow-up...

The top one is pgx, the bottom is Postgres. So there's a little room for improvement here with pgx, but that's okay for a v0.0.3 release.

    test=# select count(*) from srf.generate_series(1, 10000000);
    Time: 1552.115 ms (00:01.552)

    test=# select count(*) from generate_series(1, 10000000);
    Time: 1406.357 ms (00:01.406)

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

re: v0.0.3 -- sure. I just published it last night.

We've been working on it since November last year, and have now fully ported ZomboDB to it.

It's proving out nicely, but keep in mind that Postgres' internals are infinitely complex. Getting safe wrappers around all its "things" is going to take a very very long time.

I'd rather get something that seems very stable now, and continue to iterate on it over time.

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

I've looked into them. It seems they're designed to work within a single process, and it's not quite clear to me how sharing the underlying data files across postgres backends (even with proper Postgres locking) would work.

That's not say they aren't good frameworks. I'm sure they are. It just seems like they're designed for different use cases.

That said, I have other ideas on this front that I can't talk about today. ;)

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

Yeah, "rich data types" is a good point.

pgx provides a #[derive(PostgresType)] macro that lets you represent any normal Rust struct (that serde_cbor can (de)serialize) as a Postgres type. You write no code.

It even generates all the boring SQL boilerplate for you.

I plan on putting together an example about this and doing a twitch stream this week to discuss in detail.

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

You're not wrong, but barring bugs in `pgx` (of which I'm sure there are plenty right now), at least Rust gives you compile-time guarantees around not crashing.

And when running inside your database process, that's a huge win.

zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C

Author of `pgx` here.

We developed pgx so that we could rewrite "ZomboDB" (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom "Index Access Method" for Postgres. Think btree index, but stored in Elasticsearch.

So that's definitely a thing.

Other ideas might be custom data analytics/transformation/processing engines that you'd prefer to run in the database instead of externally.

Custom data types to represent, for example, street addresses or genetic information.

The only limit is yourself! ;)

zombodb | 6 years ago | on: The state of full text search in PostgreSQL 12

ZomboDB developer here. With ZDB you don't need tsvector at all. With a properly defined mapping, Elasticsearch can do the stemming for dozens of languages, including English.

ZDB exposes darn near everything ES supports.

zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch

Let me try to be a little less vague here (I guess to much time has passed to edit?).

In general, we developed this for searching structured (but not necessarily relational) full-text content with an enormous amount of associated metadata.

ZomboDB has come out of the legal e-Discovery world and is the backing search platform for one of the major e-Disco service providers in the industry.

It's hard to describe a typical dataset, but anywhere from 600k rows to 100M rows. Some datasets are just a few gig (on disk in Postgres) and others approach 1TB.

A typical usage pattern for a document review system is that humans (and sometimes automated computer processes) literally read each document, make judgement calls about each one, apply metadata, and move to the next document. Rinse-wash-repeat. On a large-scale litigation review, it's not uncommon to have hundreds of users doing this simultaneously.

As such, over time, every document gets UPDATEd at least once (typically 4-5 times due to administrative tasks).

You might can imagine that 100M documents with maybe 400 reviewers is a bit of an organizational problem in terms of teams of reviewers, what they should be reviewing, etc, so it's important that the data never lie. If the system says there's 1,863,462 "Potentially Privileged" documents, then that better the actual answer.

Because a system like this has to provide real-time progress reporting, analytics on judgement calls, along with the ability to generally search and organize a lot of data, we needed something that first and foremost provided transaction isolation. Enter Postgres. We also needed sophisticated full-text searching. Enter Elasticsearch.

From there it was trying to answer the question "how do we make full-text searching honor our transaction isolation rules?" That's how ZomboDB came to be.

I would think any kind of data management system that wants a canonical "source of truth" system (ie, Postgres) easily text-searchable, ZomboDB would be useful. Document Review, Medical Records Management, Inventory/Catalog Systems, etc. As I said in another post, the fact that ES is abstracted away and that need to asynchronously synchronize data into it goes away, it's fairly compelling. My hope is that some of its bigger caveats (such as crash recovery) get solved sooner rather than later.

zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch

I'm just some guy on the Internet that is fortunate enough to be able bring a thing he's spent the past two years working on behind closed doors to the public. I haven't participated in open-source since 1999 (no, really) so I'm out of the loop with how things work in 2015.

But if I'm going to continue to shepherd this beast forward, there's no point in hiding its flaws. Besides, how will I know to fix them if I don't document them? Along those lines, I'm sure it's clear the docs are still a WIP, and the more I've written the more I realize needs to be written. I suspect the actual code won't change one bit (ha!) over the next few weeks.

Regarding "production credentials", the README mentions the company where this started, and I've alluded to some large (to me) round numbers. Those will have to do for now. :)

This sort of scenario (a PG index based on ES) isn't what I'd want to use for the scale of something like Netflix (for example) where you've got billions of rows and tens-of-thousands of queries a second. But at the same time, that level of scale only happens at the top. And it's lonely up there. There's a lot more room down here on the ground.

----

While I'm sitting here at 4:30am waiting for HN to stop telling me I'm posting too quickly (wth, I'd like to get some sleep eventually!), here's another thought...

Data is hard. I've spent my entire professional career dealing with data. Trying to bridge the gap between two distinct databases has proven really challenging (and fun and rewarding) but there's quite a bit of work left to do, and there's thousands of programmers out there that are waay smarter than I (starting with the entire crew of postgresql-hackers), so I feel like if I can at least list the things I know I don't know, someone else may come along and have an answer.

zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch

That's a good question and a point of improvement for the documentation.

What I was trying to get across by saying that it's not "crash safe" is that the Elasticsearch index is not WAL logged.

As such, if Postgres crashes (bugs, hardware failure, kernel oops, etc) and goes into recovery mode and has to recover, from WAL, transactions that touched blocks belonging to tables with ZomboDB indexes, the ZomboDB indexes are now inconsistent. :(

In this regard, it's akin to Postgres' "hash" index type.

That said, this may be a "simple matter of programming" to resolve. I have some ideas, but it hasn't been a priority yet.

The recovery path is to REINDEX TABLE foo; Of course, the total time it takes to reindex is highly dependent on how much data you have, but indexing is really fast. My MBP can sustain 30k records/sec indexing documents with an average size of 7.5k against a single-node ES cluster. We've seen >100k recs/sec on high-performance hardware with very large ES clusters.

It's also worth noting that if the ES cluster disappears, any transaction that tries to touch the index will abort.

page 2