databrecht's comments

databrecht | 4 years ago | on: Prisma – ORM for Node.js and TypeScript

There is value in both, at Fauna we provide GraphQL out the box. Using Fauna directly would eliminate an indirection and is probably slightly more efficient. However, if there is a GraphQL layer like Prisma in between you could essentially change to any database with less impact on your application. This is tremendously interesting for people who develop frameworks, using prisma gives them the advantage of supporting multiple databases immediately. Or for application developers it could allow you to move from a non-scalable database to a scalable database once it becomes necessary or simply just switch databases if the database maintenance is causing you grief. I'm for one looking forward to Prisma supporting Fauna since if the interface is the same, there are even less reasons not to choose a scalable managed database instead of managing your own db :). And I would say that Prismas interface is quite great!

Note: the performance impact does depend heavily on whether your database maps well on ORMs. Traditional databases have an impedance missmatch when it comes to translating tables to an objet format. Graph databases or the way Fauna works (documents with relations and map/reduce-like joins) map well on ORMs so the performance impact would be small.

databrecht | 5 years ago | on: Latency Comparison: DynamoDB vs. FaunaDB vs. Redis

Which should be the case, depending on what you do you will probably experience 10-50ms read latencies. Look for Evan's answers why the measured values here are higher.

databrecht | 5 years ago | on: Latency Comparison: DynamoDB vs. FaunaDB vs. Redis

No it's not, that price is far far smaller and should not impact pure reads. This is probably an artifact of an anti-pattern where the same documents are constantly updated which creates significant history. At this point, that can have an impact on the index. We are working on optimizing that in which case history will no longer have an impact on these latencies while retaining the possibility to go back in time or get changesets.

databrecht | 5 years ago | on: Latency Comparison: DynamoDB vs. FaunaDB vs. Redis

Region selection is coming up if that interests you. We are actively working on it :)

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

Exactly! That's how I've built until now, a mix of databases. But it's also harder to manage. Database vendors notice this and that results in databases that start offering alternative ways of modeling.

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

Freeform flexibility is one aspect, but a document-style could also simply be a preference for how you want to structure your data or typically has an impact on how joins happen (if the document-database offers joins). Those joins will work in a graph-like fashion instead of how flat sets are typically joined. Or a nested document could be an optimization to provide data in the exact format that your client wants to see it. Although some document databases have popularized the idea that you should join in the client because they didn't provide joins initially, it doesn't have to be that way.

Mixing paradigms in one database is probably going to be happening more. Just like Postgres is offering a 'document' style, some document databases are offering documents with relations. It wouldn't surprise me to see document databases offer optional schemas. I think that the future is a mix of options and tools in one database (which JSONB columns are a first step for). Depending on the situation we'll just model differently. The best database might become the one that makes us use these different tools most elegantly together. The difference between a document and a table is only a 'flatten' and a 'schema' away.

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

'NoSQL' can be transactional and relational. The question should always be: "this is my problem, what's the best database?". NoSQL is such a huge bucket that the original question doesn't make sense imo. So is SQL, some traditional databases have quite some nifty features to support specific patterns.

SQL will (maybe sadly?.. maybe not?) not go away. Many so-called 'NoSQL' are looking into providing SQL or already provided SQL (with or without limitations) to their users because they just want to use what they know. I would be stoked for an SQLV2 standard!

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

I would go a step further, you can't even talk about NoSQL vs SQL. It's about database features, the join patterns, how scaling happens, both are overlapping more and more and will continue to overlap more. Products built on SQL are aiming to scale and 'NoSQL' is aiming to provide the features that SQL provides in a scalable manner. Her original statement was already quite confusing. A relational store doesn't necessarily mean SQL, many 'NoSQL' offer relations and are a perfect fit for social media or were even built to support this kind of applications :)

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

Exactly, but it goes further than that. The mentality never made sense since the term NoSQL never made sense to start. It's amazing how many people use a term that just originated from a meeting to talk about alternative databases. How we keep using it, although it's practically impossible to say what NoSQL is. Depending on whom you ask that term means different things. This is a very good introduction to the term: https://www.youtube.com/watch?v=qI_g07C_Q5I

Graph databases are considered 'NoSQL' yet they have relations and transactions. Schemaless is often also one of the properties give to NoSQL, but it's also a bit strange to consider that a NoSQL attribute. Some traditional databases offer schemaless options and databases like Cassandra has schema yet is considered NoSQL. I work at Fauna which has relations, stronger consistency than many traditional databases. It is schemaless at this point but that might change in the future. Since it doesn't offer SQL it's thrown into the NoSQL bucket with the assumptions that come along with it.

None of these one-liners in computer science make sense IMHO and we listen way too often to colleagues who use them. Similarly "Use SQL for enforced schema" might be accurate in many cases but in essence it depends on your situation, and we need to do research about what we use instead of following one-liners ;)

databrecht | 5 years ago | on: Ask HN: SQL or NoSQL?

Social media are typically quite heavy on tree traversals. That kind of pattern is very similar to trying to resolve a deep ORM query or a deep GraphQL query which also doesn't map very well on 'traditional' relational databases https://en.wikipedia.org/wiki/Object%E2%80%93relational_impe.... I believe this 'issue' depends on: A) the type of join B) whether your relational databases flattens between consecutive joins. C) is there easy/efficient pagination on multiple levels

The type of join shouldn't be a problem, SQL engines should in most cases be able to determine the best join. In the cases it can't you can go start tweaking (although tricky to get right, especially if your data evolves, it's possible, you probably want to fix your query plan). B is however tricky and a performance loss since it's really a bit silly that data is flattened into a set each time to be then (probably) put into a nested (Object-Oriented or JSON) format to provide the data to the client. This is closely related to C, in a social graph you might have nodes (popular people or tweets) who have a much higher amount of links than others. That means if you do a regular join on tweets and comments and sort it, on the tweet you might not get beyond the first person. Instead, you probably only want the first x comments. That query might result in an amount of nested groups. So it might look more like the following SQL (wrote it by heart, probably not correct):

SELECT tweet.*, jsonb_agg(to_jsonb(comment)) ->> 0 as comments, FROM tweet JOIN comment ON tweet.id = comment.tweet_id

GROUP BY tweet.id HAVING COUNT(comment.tweet_id) < 64 LIMIT 64

That obviously becomes increasingly complex if you want a feed with comments, likes, retweets, people, etc.. all in one. There are reasons why two engineers that helped to scale twitter create a new database (https://fauna.com/) where I work. Although relational, the relations are done very differently. Instead of flattening sets, you would essentially walk the tree and on each level join. I did an attempt to explain that here for the GraphQL case: https://www.infoworld.com/article/3575530/understanding-grap...

TLDR, in my opinion you can definitely use a traditional relational database. But it might not be the most efficient choice due to the impedance mismatch. Relational applies to more than traditional SQL databases though, graph database or something like fauna is also relational and would be a better match (Fauna is similar in the sense that joins are very similar to how a graph database does these). Obviously I'm biased though since I work for Fauna.

databrecht | 5 years ago | on: Comparing Fauna and DynamoDB: Architecture and Pricing

Easy searching is definitely in our roadmap but for people who might pass by I did want to point out that you can already get some form of text search due to the way we index arrays. You can easily write a sort of 'inverted index' yourself if you place ngrams in an array within a document. The concept of bindings makes this particularly easy. This is option 2 as explained here: https://stackoverflow.com/questions/62109035/how-to-get-docu...

We realize it's not the perfect solution and doesn't deliver the best Dev XP at this point though :)

databrecht | 5 years ago | on: Comparing Fauna and DynamoDB: Architecture and Pricing

If I'm not mistaking, Dynamo's streaming is pulling under the hood. I suppose it depends on how frequently it pulls how quickly the price goes up and how expensive such a pull is in reads.

In Fauna we have temporality as a first-class citizen. You can get efficient and cheap changesets by leveraging temporality since you can just ask: "what has been added/removed in this collection or index match after timestamp X" and can combine that by writing an index that delivers you the answer to "what are the updated documents in a collection after a certain timestamp?". That brings you very cheap pull-based CDC.

We recently introduced a second possibility that allows for push-based streaming for documents. Document-based streaming allows you to open separate streams for multiple existing documents to get updates on those. This is only the first phase, sets (such as index matches or whole collections) are coming up. Streaming becomes cheaper if you want your data to be really life (<1s) which is excellent for UI redraws but could potentially also be used for CDC (probably in combination with the temporal features if you need to restart streams). Both push-based as pull-based are strongly ordered.

I describe how to get the query for the pull-based approach here: https://forums.fauna.com/t/example-custom-subscription-funct... And a blog on the streaming API can be found here: https://fauna.com/blog/live-ui-updates-with-faunas-real-time...

Once set streaming (next to document streaming) is out as well, we have two strong solutions and you can choose what suits you best and what is most efficient for your use case. Do you want instant UI redraws? Use push-based, are you pulling a changeset every houre? Use pull-based.

databrecht | 5 years ago | on: Comparing Fauna and DynamoDB: Architecture and Pricing

> Why do this? One possibility is that the ways that Fauna actually is better than DynamoDB are too subtle to get anyone's attention. They're real and useful, but not ridiculous. The people who actually use DynamoDB at massive scale might understand them, but also probably won't want to change up.

I respectfully disagree :). I don't think that the combination of relations, strong consistency, flexible/powerful indexing, a language that allows you to do complex conditional transactions or reads in one query are subtle differentiators. Especially when you can maintain all those things while being multi-region and scalable (and you also get a flexible security system and get to query back-in-time and/or query/alter history and/or get changesets cheaply). Of course, this post didn't go in depth on all of these since that's not the topic of this post.

Many databases have limitations on the former and present workarounds that require you to either do a lot of work or build something in such an inflexible hard-coded way that it would be very hard to change. The mere fact that they present workarounds (which essentially what a single-table design is for me), to me, indicates that there is a need for their users to work around it.

> So you go after people who aren't using DynamoDB at massive scale. Say, early-stage startup founders who want to be on DynamoDB from day 1 because someday their product will be Web Scale. But don't have a lot of time to carefully evaluate claims like this. They just say "10x cost reduction? Wow, Fauna is the new best DB!" Most of these guys fail, but a few of them are a runaway success (and would have been equally so if they'd used DynamoDB), are now stuck with Fauna whether they like it or not (but let's assume they like it as least as well as DynamoDB, maybe even slightly more), and are now listed as large scale users of Fauna on their website. You too could be a unicorn startup! Start using Fauna today!

I think you just described the life of a developer when selecting <insert random new technology>. Technological advances are accelerating, and we don't have enough time to research them all, so we skim through the posts/docs and look at what other companies have done. I understand what you mean and it's an everyday source of frustration to me as well that many chase new technologies based on one article. That's how many startups ended up with microservices they didn't need or how a new SPA technology takes the world by storm every 2 years.

> Basically, I think the makers of Fauna are trying to con you with this article. It's not that their product is bad, it's that they're trying to get you to buy it for reasons other than that it's good.

The last sentence is quite unfair imho though. Fauna is one of the databases that tries to be correct in their messaging and respects other products deeply. In my personal opinion, Dynamo and Fauna are very different products with a different focus. Dynamo focuses on a use case where you need scalability and sheer speed and are less interested in relations, many access patterns or consistency over many collections. At the same time they do appeal to people who do need those features by presenting workarounds. Maybe someone from a relational background sees these workarounds, didn't think it through and then gets stuck in their inflexibility? Is Dynamo to blame? No, they are just helping their users with questions that often come back. Similarly, the question of 'how is Fauna different from Dynamo' and more importantly for this article 'how does pricing compare' is a question that often comes back here. A question that is hard to answer since it depends entirely on how you use it and many subtleties that are not visible at first hand. Do you need relations? A single-table approach would help but will also blow up your table with redundant data and therefore increase pricing although at first Dynamo looked cheap, it depends on the use case. If you do not research a product thoroughly, chances are you will run into a wall and be stuck with it no matter whether the product is Dynamo, Fauna, Spanner, Firebase, etc. All we can do is provide as much details on what we can and can't do and I think the Fauna docs and forums do quite a good job on that.

I am a developer advocate at Fauna, this reply is however entirely my personal opinion.

databrecht | 5 years ago | on: Comparing Fauna and DynamoDB: Architecture and Pricing

At the bottom of this page you can see the regions (and future regions) https://fauna.com/features. Since Fauna is inspired by Calvin we are not dependent on clocks like Spanner to deliver global consistency and can run on any hardware. Currently, each database is automatically distributed so your lambdas would read from the closest location. We are working on region selection in case you want to avoid the overhead (arguably small latency overhead thanks to the algorithm) of multi-region.

Latencies can be found here: https://status.fauna.com/ As a long fan of Fauna (who now works as a dev adv for them after following them for 2 years) for me what was the most attractive is the combination of features without compromises. Scalability/distribution without losing consistency, relations and powerful indexing (e.g. best of NoSQL and traditional databases combined). I was also attracted by the temporality aspect personally.

databrecht | 5 years ago | on: Please stop calling databases CP or AP (2015)

It's at the same time miss-education through over-simplification IMHO although it was never intended like that. Availability is never 100% so everything is far more subtle. I'd prefer that teaching would focus on resolution strategies, compromises and learn exactly what goes wrong in each situation.

databrecht | 5 years ago | on: Please stop calling databases CP or AP (2015)

It also doesn't really overcome CAP. As far as I understood, it just comes so close that people perceive it as a CAP system but technically it's still a CP system. Which can also be achieved without atomic clocks though nowadays: https://fauna.com/blog/consistency-without-clocks-faunadb-tr...

databrecht | 5 years ago | on: Prisma Raises $12M Series A

Assuming something in the same vein as fizx with the addition that it's probably very similar to what Join Monster tries to solve: https://github.com/join-monster/join-monster

databrecht | 5 years ago | on: Supabase (YC S20) – An open source Firebase alternative

I'm wondering how FaunaDB's GraphqL would suit you? It's basically: "drop in schema, get API". You can't do everything that you'll ever need yet but you can go quite far and the implementation of each query is efficient since every query is one query and thus one transaction, is backed by indexes that are automatically generated and finally we are quite suitable for GraphQL-like queries. (because those are a bit like tree-walking queries for which the 'Index-free Adjacency' concept in graph databases is very suitable and FaunaDB's References are quite similar. Although we're not a Graph database we don't have problems with huge graph-like joins because of that. besides that, the compatibility of our native FQL language made it ridiculously easy to generate an FQL query GraphQL queries)

If there is something missing in our GraphQL feature set, feedback is very welcome :).

databrecht | 5 years ago | on: Supabase (YC S20) – An open source Firebase alternative

Thanks, that's valuable feedback :) We indeed are currently providing a lot of building blocks that can solve a wide range of scenarios but did not focus too much yet (until now) on providing easy out-of-the-box combinations of those building blocks that just solve a specific scenario.

databrecht | 5 years ago | on: Supabase (YC S20) – An open source Firebase alternative

Did you check FaunaDB by any chance? I'm wondering whether their ABAC systems + UDFs would satisfy your needs.

page 1