zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
zombodb's comments
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
So while you're pretty much correct in general, pgx handles it in the way a PG extension author would expect.
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
But! I plan on adding a command to "cargo-pgx" to package up the extension for you into a directory structure (or tarball, maybe).
The idea is that you'd just run: cargo pgx package
And it would just build a --release library, and create the proper directory structure (based on what pg_config) says for the shared library and the associated .sql.
I actually need this ASAP for supporting ZomboDB proper, so... Coming Soon!
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
The top one is pgx, the bottom is Postgres. So there's a little room for improvement here with pgx, but that's okay for a v0.0.3 release.
test=# select count(*) from srf.generate_series(1, 10000000);
Time: 1552.115 ms (00:01.552)
test=# select count(*) from generate_series(1, 10000000);
Time: 1406.357 ms (00:01.406)zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
If you decide to jump into it, definitely let us know any pain points you have.
It takes a bit of time to work out the kinks in a thing like this.
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
And here's what you'd have to do to implement it in C: https://github.com/postgres/postgres/blob/dad75eb4a8d5835ecc...
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
We've been working on it since November last year, and have now fully ported ZomboDB to it.
It's proving out nicely, but keep in mind that Postgres' internals are infinitely complex. Getting safe wrappers around all its "things" is going to take a very very long time.
I'd rather get something that seems very stable now, and continue to iterate on it over time.
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
That's not say they aren't good frameworks. I'm sure they are. It just seems like they're designed for different use cases.
That said, I have other ideas on this front that I can't talk about today. ;)
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
pgx provides a #[derive(PostgresType)] macro that lets you represent any normal Rust struct (that serde_cbor can (de)serialize) as a Postgres type. You write no code.
It even generates all the boring SQL boilerplate for you.
I plan on putting together an example about this and doing a twitch stream this week to discuss in detail.
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
You could hook Postgres "emit_log_hook" and probably just use serde to xform the provided "ErrorData" pointer right to json and ship it off where ever you want.
(edit: typeos)
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
And when running inside your database process, that's a huge win.
zombodb | 5 years ago | on: PGX: Write Postgres extensions in Rust instead of C
We developed pgx so that we could rewrite "ZomboDB" (https://github.com/zombodb/zombodb) in Rust. ZDB is a custom "Index Access Method" for Postgres. Think btree index, but stored in Elasticsearch.
So that's definitely a thing.
Other ideas might be custom data analytics/transformation/processing engines that you'd prefer to run in the database instead of externally.
Custom data types to represent, for example, street addresses or genetic information.
The only limit is yourself! ;)
zombodb | 6 years ago | on: The state of full text search in PostgreSQL 12
zombodb | 6 years ago | on: The state of full text search in PostgreSQL 12
ZDB exposes darn near everything ES supports.
zombodb | 7 years ago | on: Toshi: An Elasticsearch competitor written in Rust
zombodb | 9 years ago | on: PostgreSQL 9.6 Released
Feel free to email the mailing list ([email protected]). I'd be happy to help answer any questions you might have
zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch
In general, we developed this for searching structured (but not necessarily relational) full-text content with an enormous amount of associated metadata.
ZomboDB has come out of the legal e-Discovery world and is the backing search platform for one of the major e-Disco service providers in the industry.
It's hard to describe a typical dataset, but anywhere from 600k rows to 100M rows. Some datasets are just a few gig (on disk in Postgres) and others approach 1TB.
A typical usage pattern for a document review system is that humans (and sometimes automated computer processes) literally read each document, make judgement calls about each one, apply metadata, and move to the next document. Rinse-wash-repeat. On a large-scale litigation review, it's not uncommon to have hundreds of users doing this simultaneously.
As such, over time, every document gets UPDATEd at least once (typically 4-5 times due to administrative tasks).
You might can imagine that 100M documents with maybe 400 reviewers is a bit of an organizational problem in terms of teams of reviewers, what they should be reviewing, etc, so it's important that the data never lie. If the system says there's 1,863,462 "Potentially Privileged" documents, then that better the actual answer.
Because a system like this has to provide real-time progress reporting, analytics on judgement calls, along with the ability to generally search and organize a lot of data, we needed something that first and foremost provided transaction isolation. Enter Postgres. We also needed sophisticated full-text searching. Enter Elasticsearch.
From there it was trying to answer the question "how do we make full-text searching honor our transaction isolation rules?" That's how ZomboDB came to be.
I would think any kind of data management system that wants a canonical "source of truth" system (ie, Postgres) easily text-searchable, ZomboDB would be useful. Document Review, Medical Records Management, Inventory/Catalog Systems, etc. As I said in another post, the fact that ES is abstracted away and that need to asynchronously synchronize data into it goes away, it's fairly compelling. My hope is that some of its bigger caveats (such as crash recovery) get solved sooner rather than later.
zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch
But if I'm going to continue to shepherd this beast forward, there's no point in hiding its flaws. Besides, how will I know to fix them if I don't document them? Along those lines, I'm sure it's clear the docs are still a WIP, and the more I've written the more I realize needs to be written. I suspect the actual code won't change one bit (ha!) over the next few weeks.
Regarding "production credentials", the README mentions the company where this started, and I've alluded to some large (to me) round numbers. Those will have to do for now. :)
This sort of scenario (a PG index based on ES) isn't what I'd want to use for the scale of something like Netflix (for example) where you've got billions of rows and tens-of-thousands of queries a second. But at the same time, that level of scale only happens at the top. And it's lonely up there. There's a lot more room down here on the ground.
----
While I'm sitting here at 4:30am waiting for HN to stop telling me I'm posting too quickly (wth, I'd like to get some sleep eventually!), here's another thought...
Data is hard. I've spent my entire professional career dealing with data. Trying to bridge the gap between two distinct databases has proven really challenging (and fun and rewarding) but there's quite a bit of work left to do, and there's thousands of programmers out there that are waay smarter than I (starting with the entire crew of postgresql-hackers), so I feel like if I can at least list the things I know I don't know, someone else may come along and have an answer.
zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch
zombodb | 10 years ago | on: Show HN: ZomboDB – Postgres extension for indexes backed by Elasticsearch
What I was trying to get across by saying that it's not "crash safe" is that the Elasticsearch index is not WAL logged.
As such, if Postgres crashes (bugs, hardware failure, kernel oops, etc) and goes into recovery mode and has to recover, from WAL, transactions that touched blocks belonging to tables with ZomboDB indexes, the ZomboDB indexes are now inconsistent. :(
In this regard, it's akin to Postgres' "hash" index type.
That said, this may be a "simple matter of programming" to resolve. I have some ideas, but it hasn't been a priority yet.
The recovery path is to REINDEX TABLE foo; Of course, the total time it takes to reindex is highly dependent on how much data you have, but indexing is really fast. My MBP can sustain 30k records/sec indexing documents with an average size of 7.5k against a single-node ES cluster. We've seen >100k recs/sec on high-performance hardware with very large ES clusters.
It's also worth noting that if the ES cluster disappears, any transaction that tries to touch the index will abort.
There's a derive macro called #[derive(PostgresType)]. Combine that with serde's Serialize, Deserialize, and you're gtg.
I'm going to be working on more docs and twitch streams over this week.