top | item 15614231

BigchainDB – A scalable blockchain database

56 points| jashmenn | 8 years ago |github.com | reply

66 comments

order
[+] warent|8 years ago|reply
Why use Python rather than a statically-typed language? This thing looks like it will become a resource gobbling beast that runs very slowly
[+] trentmc|8 years ago|reply
Hi, it's Trent here, CTO at BigchainDB.

Summary: Python isn't the bottleneck yet, and if it becomes one, C will become the last 1%.

I've been working on production apps in Python since 2002, including ones doing large-scale compute running 1000+ machines at once. How: 99% python, 1% C. But the trick is, you only build in the C once you've worked out all the kinks and optimized the big-picture stuff elsewhere. Python is great for not only connecting things, but rapidly iterating on algorithms and building maintainable code.

The AI / ML community has discovered this too: Python is now the most popular language in that community. Despite the heavy compute. How: most of the popular libraries have efficient C (etc) implementations under the hood.

This is exactly the philosophy we've been following at BigchainDB, with success. Python to connect things, iterate quickly in improving algorithms, and ship maintainable code. We haven't got far enough to resort to building our own C libraries yet, though many 3rd party libraries we use are implemented in C.

[EDIT] Based on the comments below, I'll now mention here too: BigchainDB wraps MongoDB, which is written in C++. And, Python 3.5+ (which BDB uses) has gradual typing, which brings many benefits of static typing to Python.

[+] snissn|8 years ago|reply
I'm unclear how this is any different from a centralized mongodb as a service platform. It seems that it doesn't offer any proof of work related security or consensus building other than a centralized "trust our cluster" policy [1].

[1] https://docs.bigchaindb.com/en/latest/bft.html

[+] trentmc|8 years ago|reply
Hi, it's Trent here, I'm CTO at BigchainDB.

BigchainDB targets giving the following benefits beyond traditional database as a service:

1. decentralized - no single entity controls it, which means tolerant to malicious / Byzantine faults. Benefit: groups that don't necessarily trust each other can share infrastructure.

2. immutable - which practically speaking means more tamper resistant, e.g. it's append only. Benefit: well-defined provenance for history of assets, data, etc.

3. assets - you can create and issue assets, where you own them if you have the private key. How: each tx is signed. Benefit: moving around assets on a substrate that no single entity owns or controls. Lower friction in exchanges. And the digital signatures gives cryptographic proof that about who did what.

These are the targets. We're not fully there yet. Most notably, we still need to address some Byzantine faults as our docs ([1] above) mention. This will come in an upcoming release. We are also working on improved scalability while maintaining the security guarantees [2].

Re consensus: BigchainDB has a two-layer consensus, as follows.

* The lower layer directly uses MongoDB's consensus (which it builds on) to agree on whether a transaction should be stored.

* The higher level has federation-style voting on whether a transaction is valid or not.

Our documentation describes this further.

BigchainDB is explicitly not trying to do Bitcoin-style proof of work. PoW solves for an additional problem: (theoretical) anonymity of servers. That additional goal compromises scale. And, in practice you know who's running the servers anyway (ie big Chinese Bitcoin miners), which is why I say "theoretical". BTW I am a fan of Bitcoin, it just has different goals than BigchainDB.

[2] https://blog.bigchaindb.com/bigchaindb-developer-update-2d32...

[+] misterdata|8 years ago|reply
Not sure whether BigchainDB implements consensus/PoW, but my attempt at building a blockchain-based database sure does. [1] Scaling a solution like that however is very difficult, and I can imagine BigchainDB aiming for applications where consensus/PoW is less needed.

[1] https://github.com/pixelspark/catena

[+] stuxnet79|8 years ago|reply
I find it very amusing that "scalable blockchain database" is the core selling point here. I must be experiencing buzzword fatigue I think. Why would I want to Chuck my boring MySQL installation for this?
[+] trentmc|8 years ago|reply
Hi, it's Trent here, CTO of BigchainDB.

FYI we started working on the precursor to this in 2013 (ascribe). Bitcoin hadn't even hit the mainstream then, let alone blockchain. We didn't build it to mash together buzzwords. We built it because we saw a clear need for it. We had been building on Bitcoin, and (a) didn't scale to meet our needs and (b) was super-hard to use because it didn't act like a database. So we built BigchainDB to address issues (a) and (b).

Obviously, good old "boring" MySQL is incredibly useful for tons of problems. If you're already solving a problem with MySQL, then BigchainDB is not a fit. Don't use it, stick with the thing that's working.

Where it is useful is applications that want at least one of the following benefits:

1. decentralized, so that >1 orgs can share resources

2. immutable / tamper resistance, for provenance of ownership of art, spare parts, food, etc

3. assets, so you can exchange digital goods more readily.

More details here: https://blog.bigchaindb.com/three-blockchain-benefits-ae3a2a...

[+] yahyaheee|8 years ago|reply
I’m curious about the choice of python here. If your building a scalable high throughput data store, python seems like an odd choice
[+] trentmc|8 years ago|reply
See the comment "Why use Python rather than..." and my response to it. Cheers,
[+] k__|8 years ago|reply
I never understood how blockchains could scale.

A global list of hashes that is also append only? This just screams to blow up sooner or later.

[+] orthecreedence|8 years ago|reply
I know. I think if I read "scalable" and "blockchain" in the same sentence again I'll scream.

Once bitcoin can handle Visa's transaction volume (250M transactions/day) we can talk scale.

EDIT: after reading the comments more, seems the project operates without PoW, which allows it to scale more. I'm curious, then, what differentiates it from something like cockroachdb. It seems to be an append-only, distributed database. How does the bockchain fit in?

[+] ronaldmannak|8 years ago|reply
Some types of data need to be geographically stay in a certain area. For example, European PII data cannot be hosted on servers outside of the EU. That's easy to handle if you're using AWS for example.

How can a company comply to these kinds of laws using a distributed system like BigChainDB? Is there a some kind of geo fencing possibility?

[+] trentmc|8 years ago|reply
Hi, it's Trent here, CTO of BigchainDB.

That's a great question. It's surprising how few people are aware of the current German data protection laws (where we're based) and the upcoming EU data protection laws aka GDPR.

There are a few ways to address the issue:

1. Don't store any PII on the database, rather only use it to link to data that's stored on-premise in many places. The database has permissioning, and therefore acts as (decentralized) access control logic. Have a TOS with proper legal teeth so that if a database user does store PII on the database, they are liable in the real world.

2. Run an instance of BigchainDB within a region, e.g. within Germany, and comply with the appropriate laws there. Let PII be on the database. But, each node must follow data protection guidelines, similar to how a single centralized entity would, but now do it for each node.

3. Force encryption of all PII, and pray.

(3) is really a non-option. I stated it because many people are saying "just encrypt". But the problem is quantum computing. In 5-15 years quantum computing will be sufficiently easy to access that any encrypted data that's publicly available can be decrypted. You might say "well let's migrate to quantum-tolerant crypto before then" but that doesn't stop a malicious actor from copying encrypted PII now. You might say "let's use quantum tolerant crypto now" but we've seen with most crypto algorithms that it takes years to harden them. Would you trust your PII with untested crypto algorithms? I wouldn't. In short: putting encrypted PII on public nets is a bad idea. Please, please don't do it.

[+] foxhedgehog|8 years ago|reply
just a suggestion but it might be worth having in your FAQs some technical questions like "why would I want to use this over a SQL alternative?" and "what kind of applications can I build with this?"
[+] mrguyorama|8 years ago|reply
"Scaleable BlockChain" is redundant, isn't it? Same with "BlockChain Database"?

The abstract also seems to imply that this in fact IS NOT a blockchain?

[+] trentmc|8 years ago|reply
Hi, it's Trent here, CTO of BigchainDB.

The word "blockchain" is much misunderstood. There is a ton of argument over what it actually is. Just a linked list of hashes? Bitcoin and nothing else? I could go on and on.

To me, that debate is less interesting than building systems that actually work. For this, it's useful to think about compute stacks in the past, from mainframe to desktop, from web to cloud to mobile. In each, there are core building blocks that each have their own way of instantiating the elements of computing (storage, processing, communications). Yes, let's go back to first principles:)

Take cloud, on say AWS. Here are some blocks:

* Storage:blob storage -- S3

* Storage:database -- DynamoDB

* Processing -- EC2

* And so on.

The emerging decentralized stack [1] is no different. There is no single monolithic block called "blockchain" that magically does everything, though much of the rhetoric would have you believe that. Rather, there are emerging building blocks.

* Storage:blob storage -- IPFS + FileCoin (and more)

* Storage:database -- BigchainDB

* Storage:pure-play-token -- Bitcoin (this is specific to decentralized space)

* Processing:business logic (aka "smart contracts) -- Ethereum (and more)

"Blockchain" is best treated as a label for the space of decentralization. Side by side with other fields like "artificial intelligence" or "cloud computing".

I think we can all agree that DynamoDB is not "a cloud". It's just an implementation of the "database" building block for the "cloud computing" field. Similarly, BigchainDB is not "a blockchain". It's just an implementation of the "database" building block for the "blockchain" field.

[1] https://blog.bigchaindb.com/blockchain-infrastructure-landsc...

[+] mrep|8 years ago|reply
I would say they are polar opposites. Blockchains are incredibly inefficient since every node has to redo the computations. Adding more nodes does not scale them which is what i consider a core part of scalable systems.
[+] Devagamster|8 years ago|reply
correct me if I'm wrong, but I'm pretty sure block chains are not super scalable ATM. bitcoin is blocked at 6 transactions a second and future changes are supposed to improve it to 30? that doesn't seem very scalable to me. I could very well be wrong about the details but the point stands. block chains aren't all that scalable
[+] beager|8 years ago|reply
What if I don’t need a scalable blockchain, but I need a tamper-proof persistent ledger? Are there tools that are lighter-weight that can help me there?
[+] trentmc|8 years ago|reply
Perhaps you're looking for an append-only logging / messaging system like Apache Kafka? Good to explore what's out there and understand what's possible.

BTW using BigchainDB can feel pretty lightweight: it feels like a DBaaS but you don't have to set up the back end, you just get going. In the following, you'll have a tx on the BigchainDB public net (IPDB) in seconds. And the JS or py code to do it yourself is right there too. https://www.bigchaindb.com/getstarted/