Facebook open-sources LogDevice, a distributed storage for sequential data

[+] akavel|7 years ago|reply

Can someone from FB chime in with some info how much storage is needed for the logs/data? Say, for 1 GB of raw input logs from a http server (nginx/apache), when stored in LogDevice would they take notably less space on disk (compression), or more (overhead)? This interests ne for evaluating resources/costs I'd need to prepare if I were to deploy it...

[+] cedricvg|7 years ago|reply

These numbers really depend on the compressibility of the content, compression scheme and the type of batching used. The metadata overhead is fairly minimal. LogDevice allows you to configure this on either the client, sequencer or rocksdb level.

[+] SirMonkey|7 years ago|reply

have you had a look at https://github.com/oklog/oklog https://www.youtube.com/watch?v=gWWK2eyZ-sc

I think it's fairly simple and might be enough. Can't comment on storage requirements thou.

[+] AhmedSoliman|7 years ago|reply

Happy to finally see LogDevice open. We have been working on this for years now.

[+] thinkersilver|7 years ago|reply

Hi Ahmed,

I imagine you looked at other solutions before starting this. A distributed log is a fairly simple idea to understand (hard to implement) but what pain point is being solved?

Seeing that it is written in C/C++ - would it be that logdevice is optimised purely for speed and responsiveness?

[+] the_duke|7 years ago|reply

Can you give an overview over the difference to eg Apache Kafka?

It seems very similar.

[+] martincmartin|7 years ago|reply

Congrats Ahmed!

[+] Rafuino|7 years ago|reply

Very interesting! I hadn't heard of this before but I'd love to see it in action.

If anyone from the FB team or anyone using LogDevice wants to test performance with Optane SSDs (and compare to a NAND SSD), make a request by submitting an issue on our GitHub page: https://github.com/AccelerateWithOptane/lab/issues. I'll hook you up with a server hosted by Packet.

[+] fullmetaleng|7 years ago|reply

Martin Kleppmann seems to point out technologies for problems of similar patterns already exist - https://twitter.com/martinkl/status/1039938408393662465

[+] tinco|7 years ago|reply

Those are streaming/pubsub services though, this actually claims to be a store. I feel that's an important difference.

Do people just point their system journal at Kafka and wait for something to break?

At my previous job we built something similar to this out of rabbitmq and mongodb. I always wondered what the other big log companies used. Mongodb seemed like a pretty good fit, but a pure append only database might be even better. Trimming performance in MongoDB was subpar so we worked around it by creating a new collection for each day, trimming became a simple operation of dropping a collection at the end of each day.

[+] mmcclellan|7 years ago|reply

I had just stumbled across https://github.com/facebookincubator/python-nubia and am anxious to try it out. Was wondering about the internal project it was factored out from. This appears to be it.

[+] AhmedSoliman|7 years ago|reply

Correct. LDShell in logdevice was the starting point of python-nubia.

[+] thinkersilver|7 years ago|reply

The use cases overlap neatly with Kafka's. Everything from it's usage of zookeeper, time-and-storage-based retention tuning are similar

The announcement does not clarify the reason they use this over kafka. Is it because Kafka doesn't scale to millions of logs on a single cluster or is it because kafka is not sympathetic to heterogeneous disk arrays containing SSD and HDD. I strongly suspect it may be latency of writes at scale but this is pure speculation.

I don't know. If I understand why anyone might use this I'd contribute to building language bindings for the APIs.

[+] sh00s|7 years ago|reply

Some strengths of LogDevice include:

- It's designed to work with a large number of logs (roughly equivalent to partitions in Kafka), hundreds of thousands per cluster is common.

- Sequencer failover is very quick, typical failover time when a sequencer node fails is less than a second.

- It supports location awareness and can place data according to replication constraints specified (e.g. replicate it in 3 copies across 2 different regions and 3 racks).

- Because of non-deterministic data placement, it is very resilient to failures in terms of write availability.

- If a node/shard fails, it detects the failure and rebuilds the data that was replicated to failed nodes/shards automatically

[+] otterley|7 years ago|reply

> Is it because Kafka doesn't scale to millions of logs on a single cluster

I doubt that's it, since Kafka can certainly do that.

[+] manigandham|7 years ago|reply

Great to see this released. Some similar architecture decisions to Apache Pulsar as well with the separate of compute (in this case the sequencer) from the storage.

Kafka has done well so far, especially in making streaming systems more common, but it's about time for the next-gen systems.

[+] ashu|7 years ago|reply

How does LogDevice differ from Kafka?

[+] posnet|7 years ago|reply

Awesome, I have been waiting for this since seeing the @scale talk about it. https://atscaleconference.com/videos/logdevice-a-file-struct...

[+] Rafuino|7 years ago|reply

Is there supposed to be a replay of that talk on the site you link to or is it just not loading for me?

[+] StreamBright|7 years ago|reply

The amount of great quality open source projects dein Facebook just keeps growing. I really like the consistency guarantees:

https://logdevice.io/docs/Concepts.html#consistency-guarante...

And it uses RocksDB under the hood:

https://logdevice.io/docs/Concepts.html#logsdb-the-local-log...

[+] adev_|7 years ago|reply

Thank to Open Source that, it looks a great project.

Could a LogDevice give a bit of informations about the scale they use that at facebook ?

- How many record this thing can injest per day ? - Any limitations on the maximum number of storage nodes ? - What would be your maximum and advise size of record for a production usage ? - ZooKeeper seems to be the center point used as epoch provider. Did you encounter any scaling limitations or max number of client due to that ?

[+] sandstrom|7 years ago|reply

Very interesting!

I like the idea of decoupling compute from storage for streaming/log data.

I wonder if it would be easy to make it run under Consul, instead of ZooKeeper.

[+] AhmedSoliman|7 years ago|reply

We use Zookeeper primarily for the EpochStore. This is the abstraction that you can you use if you want to replace Zookeeper. It shouldn't be that hard as long as Consul offers the same guarantees as zookeeper.

[+] remh|7 years ago|reply

Am i the only being puzzled by

Scalable

Store up to a million logs on a single cluster. ?

This sounds pretty confusing / low volume.

[+] manigandham|7 years ago|reply

logs = topics, so they mean 1M separate topics on a single cluster.

[+] tryptophan|7 years ago|reply

What benefit to facebook is there from open sourcing technology they have developed?

[+] javiermaestro|7 years ago|reply

Awesome to see this finally happening :)

Previous discussion in HN: https://news.ycombinator.com/item?id=15142266

[+] jMyles|7 years ago|reply

I don't see anything about trust requirements or verification. Does LogDevice assume that all devices in my cluster are trusted?

[+] cedricvg|7 years ago|reply

LogDevice uses SSL for authentication. This can be enabled for both clients and servers [1].

[1] https://logdevice.io/docs/Settings.html#security

[+] Annatar|7 years ago|reply

"bin/logdeviced"

All daemons and system administration utilities belong into sbin, because bin is for end-user applications.

Historically, the "s" in sbin meant something else, but it always contained applications and scripts only root could run.

When I see these examples, it's depressing to see just how much understanding of UNIX is missing.

[+] AhmedSoliman|7 years ago|reply

Maybe sending a PR would help?

[+] majidazimi|7 years ago|reply

External logging service is my favorite way of doing replication. It provides nice features. Specifically:

- Cross vendor replication which makes migration much easier.

- No dependency on vendor provided replication protocols.

- Ability to use in-app databases such RocksDB, SQLite, ...

- Upgrading DB nodes becomes way easier since they are totally separated from each other.

[+] unknown|7 years ago|reply

[deleted]

[+] cardosof|7 years ago|reply

How does that fit in a ML training pipeline? (this is mentioned on the page)

[+] manigandham|7 years ago|reply

It's just streaming data but more scalable and with total ordering which can be important for ML.

[+] senderista|7 years ago|reply

Sounds like it might have been influenced by the MSR CORFU project (separate sequencer, write striping). Can anyone confirm?

[+] noahdesu|7 years ago|reply

It's hard to deny that there is at least some influence there. Like LogDevice, the zlog project [0] is influence by CORFU (separate sequencer, write striping), but both use different storage interfaces / strategies.

[0]: https://github.com/cruzdb/zlog

118 comments