top | item 17970160

Facebook open-sources LogDevice, a distributed storage for sequential data

336 points| cedricvg | 7 years ago |logdevice.io | reply

118 comments

order
[+] akavel|7 years ago|reply
Can someone from FB chime in with some info how much storage is needed for the logs/data? Say, for 1 GB of raw input logs from a http server (nginx/apache), when stored in LogDevice would they take notably less space on disk (compression), or more (overhead)? This interests ne for evaluating resources/costs I'd need to prepare if I were to deploy it...
[+] cedricvg|7 years ago|reply
These numbers really depend on the compressibility of the content, compression scheme and the type of batching used. The metadata overhead is fairly minimal. LogDevice allows you to configure this on either the client, sequencer or rocksdb level.
[+] AhmedSoliman|7 years ago|reply
Happy to finally see LogDevice open. We have been working on this for years now.
[+] thinkersilver|7 years ago|reply
Hi Ahmed,

I imagine you looked at other solutions before starting this. A distributed log is a fairly simple idea to understand (hard to implement) but what pain point is being solved?

Seeing that it is written in C/C++ - would it be that logdevice is optimised purely for speed and responsiveness?

[+] the_duke|7 years ago|reply
Can you give an overview over the difference to eg Apache Kafka?

It seems very similar.

[+] Rafuino|7 years ago|reply
Very interesting! I hadn't heard of this before but I'd love to see it in action.

If anyone from the FB team or anyone using LogDevice wants to test performance with Optane SSDs (and compare to a NAND SSD), make a request by submitting an issue on our GitHub page: https://github.com/AccelerateWithOptane/lab/issues. I'll hook you up with a server hosted by Packet.

[+] fullmetaleng|7 years ago|reply
Martin Kleppmann seems to point out technologies for problems of similar patterns already exist - https://twitter.com/martinkl/status/1039938408393662465
[+] tinco|7 years ago|reply
Those are streaming/pubsub services though, this actually claims to be a store. I feel that's an important difference.

Do people just point their system journal at Kafka and wait for something to break?

At my previous job we built something similar to this out of rabbitmq and mongodb. I always wondered what the other big log companies used. Mongodb seemed like a pretty good fit, but a pure append only database might be even better. Trimming performance in MongoDB was subpar so we worked around it by creating a new collection for each day, trimming became a simple operation of dropping a collection at the end of each day.

[+] thinkersilver|7 years ago|reply
The use cases overlap neatly with Kafka's. Everything from it's usage of zookeeper, time-and-storage-based retention tuning are similar

The announcement does not clarify the reason they use this over kafka. Is it because Kafka doesn't scale to millions of logs on a single cluster or is it because kafka is not sympathetic to heterogeneous disk arrays containing SSD and HDD. I strongly suspect it may be latency of writes at scale but this is pure speculation.

I don't know. If I understand why anyone might use this I'd contribute to building language bindings for the APIs.

[+] sh00s|7 years ago|reply
Some strengths of LogDevice include:

- It's designed to work with a large number of logs (roughly equivalent to partitions in Kafka), hundreds of thousands per cluster is common.

- Sequencer failover is very quick, typical failover time when a sequencer node fails is less than a second.

- It supports location awareness and can place data according to replication constraints specified (e.g. replicate it in 3 copies across 2 different regions and 3 racks).

- Because of non-deterministic data placement, it is very resilient to failures in terms of write availability.

- If a node/shard fails, it detects the failure and rebuilds the data that was replicated to failed nodes/shards automatically

[+] otterley|7 years ago|reply
> Is it because Kafka doesn't scale to millions of logs on a single cluster

I doubt that's it, since Kafka can certainly do that.

[+] manigandham|7 years ago|reply
Great to see this released. Some similar architecture decisions to Apache Pulsar as well with the separate of compute (in this case the sequencer) from the storage.

Kafka has done well so far, especially in making streaming systems more common, but it's about time for the next-gen systems.

[+] ashu|7 years ago|reply
How does LogDevice differ from Kafka?
[+] adev_|7 years ago|reply
Thank to Open Source that, it looks a great project.

Could a LogDevice give a bit of informations about the scale they use that at facebook ?

- How many record this thing can injest per day ? - Any limitations on the maximum number of storage nodes ? - What would be your maximum and advise size of record for a production usage ? - ZooKeeper seems to be the center point used as epoch provider. Did you encounter any scaling limitations or max number of client due to that ?

[+] sandstrom|7 years ago|reply
Very interesting!

I like the idea of decoupling compute from storage for streaming/log data.

I wonder if it would be easy to make it run under Consul, instead of ZooKeeper.

[+] AhmedSoliman|7 years ago|reply
We use Zookeeper primarily for the EpochStore. This is the abstraction that you can you use if you want to replace Zookeeper. It shouldn't be that hard as long as Consul offers the same guarantees as zookeeper.
[+] remh|7 years ago|reply
Am i the only being puzzled by

Scalable

Store up to a million logs on a single cluster. ?

This sounds pretty confusing / low volume.

[+] manigandham|7 years ago|reply
logs = topics, so they mean 1M separate topics on a single cluster.
[+] tryptophan|7 years ago|reply
What benefit to facebook is there from open sourcing technology they have developed?
[+] jMyles|7 years ago|reply
I don't see anything about trust requirements or verification. Does LogDevice assume that all devices in my cluster are trusted?
[+] Annatar|7 years ago|reply
"bin/logdeviced"

All daemons and system administration utilities belong into sbin, because bin is for end-user applications.

Historically, the "s" in sbin meant something else, but it always contained applications and scripts only root could run.

When I see these examples, it's depressing to see just how much understanding of UNIX is missing.

[+] AhmedSoliman|7 years ago|reply
Maybe sending a PR would help?
[+] majidazimi|7 years ago|reply
External logging service is my favorite way of doing replication. It provides nice features. Specifically:

- Cross vendor replication which makes migration much easier.

- No dependency on vendor provided replication protocols.

- Ability to use in-app databases such RocksDB, SQLite, ...

- Upgrading DB nodes becomes way easier since they are totally separated from each other.

[+] cardosof|7 years ago|reply
How does that fit in a ML training pipeline? (this is mentioned on the page)
[+] manigandham|7 years ago|reply
It's just streaming data but more scalable and with total ordering which can be important for ML.
[+] senderista|7 years ago|reply
Sounds like it might have been influenced by the MSR CORFU project (separate sequencer, write striping). Can anyone confirm?
[+] noahdesu|7 years ago|reply
It's hard to deny that there is at least some influence there. Like LogDevice, the zlog project [0] is influence by CORFU (separate sequencer, write striping), but both use different storage interfaces / strategies.

[0]: https://github.com/cruzdb/zlog