Can someone from FB chime in with some info how much storage is needed for the logs/data? Say, for 1 GB of raw input logs from a http server (nginx/apache), when stored in LogDevice would they take notably less space on disk (compression), or more (overhead)? This interests ne for evaluating resources/costs I'd need to prepare if I were to deploy it...
These numbers really depend on the compressibility of the content, compression scheme and the type of batching used. The metadata overhead is fairly minimal. LogDevice allows you to configure this on either the client, sequencer or rocksdb level.
I imagine you looked at other solutions before starting this. A distributed log is a fairly simple idea to understand (hard to implement) but what pain point is being solved?
Seeing that it is written in C/C++ - would it be that logdevice is optimised purely for speed and responsiveness?
Very interesting! I hadn't heard of this before but I'd love to see it in action.
If anyone from the FB team or anyone using LogDevice wants to test performance with Optane SSDs (and compare to a NAND SSD), make a request by submitting an issue on our GitHub page: https://github.com/AccelerateWithOptane/lab/issues. I'll hook you up with a server hosted by Packet.
Those are streaming/pubsub services though, this actually claims to be a store. I feel that's an important difference.
Do people just point their system journal at Kafka and wait for something to break?
At my previous job we built something similar to this out of rabbitmq and mongodb. I always wondered what the other big log companies used. Mongodb seemed like a pretty good fit, but a pure append only database might be even better. Trimming performance in MongoDB was subpar so we worked around it by creating a new collection for each day, trimming became a simple operation of dropping a collection at the end of each day.
I had just stumbled across https://github.com/facebookincubator/python-nubia and am anxious to try it out. Was wondering about the internal project it was factored out from. This appears to be it.
The use cases overlap neatly with Kafka's. Everything from it's usage of zookeeper, time-and-storage-based retention tuning are similar
The announcement does not clarify the reason they use this over kafka. Is it because Kafka doesn't scale to millions of logs on a single cluster or is it because kafka is not sympathetic to heterogeneous disk arrays containing SSD and HDD. I strongly suspect it may be latency of writes at scale but this is pure speculation.
I don't know. If I understand why anyone might use this I'd contribute to building language bindings for the APIs.
- It's designed to work with a large number of logs (roughly equivalent to partitions in Kafka), hundreds of thousands per cluster is common.
- Sequencer failover is very quick, typical failover time when a sequencer node fails is less than a second.
- It supports location awareness and can place data according to replication constraints specified (e.g. replicate it in 3 copies across 2 different regions and 3 racks).
- Because of non-deterministic data placement, it is very resilient to failures in terms of write availability.
- If a node/shard fails, it detects the failure and rebuilds the data that was replicated to failed nodes/shards automatically
Great to see this released. Some similar architecture decisions to Apache Pulsar as well with the separate of compute (in this case the sequencer) from the storage.
Kafka has done well so far, especially in making streaming systems more common, but it's about time for the next-gen systems.
Thank to Open Source that, it looks a great project.
Could a LogDevice give a bit of informations about the scale they use that at facebook ?
- How many record this thing can injest per day ?
- Any limitations on the maximum number of storage nodes ?
- What would be your maximum and advise size of record for a production usage ?
- ZooKeeper seems to be the center point used as epoch provider. Did you encounter any scaling limitations or max number of client due to that ?
We use Zookeeper primarily for the EpochStore. This is the abstraction that you can you use if you want to replace Zookeeper. It shouldn't be that hard as long as Consul offers the same guarantees as zookeeper.
It's hard to deny that there is at least some influence there. Like LogDevice, the zlog project [0] is influence by CORFU (separate sequencer, write striping), but both use different storage interfaces / strategies.
[+] [-] akavel|7 years ago|reply
[+] [-] cedricvg|7 years ago|reply
[+] [-] SirMonkey|7 years ago|reply
I think it's fairly simple and might be enough. Can't comment on storage requirements thou.
[+] [-] AhmedSoliman|7 years ago|reply
[+] [-] thinkersilver|7 years ago|reply
I imagine you looked at other solutions before starting this. A distributed log is a fairly simple idea to understand (hard to implement) but what pain point is being solved?
Seeing that it is written in C/C++ - would it be that logdevice is optimised purely for speed and responsiveness?
[+] [-] the_duke|7 years ago|reply
It seems very similar.
[+] [-] martincmartin|7 years ago|reply
[+] [-] Rafuino|7 years ago|reply
If anyone from the FB team or anyone using LogDevice wants to test performance with Optane SSDs (and compare to a NAND SSD), make a request by submitting an issue on our GitHub page: https://github.com/AccelerateWithOptane/lab/issues. I'll hook you up with a server hosted by Packet.
[+] [-] fullmetaleng|7 years ago|reply
[+] [-] tinco|7 years ago|reply
Do people just point their system journal at Kafka and wait for something to break?
At my previous job we built something similar to this out of rabbitmq and mongodb. I always wondered what the other big log companies used. Mongodb seemed like a pretty good fit, but a pure append only database might be even better. Trimming performance in MongoDB was subpar so we worked around it by creating a new collection for each day, trimming became a simple operation of dropping a collection at the end of each day.
[+] [-] mmcclellan|7 years ago|reply
[+] [-] AhmedSoliman|7 years ago|reply
[+] [-] thinkersilver|7 years ago|reply
The announcement does not clarify the reason they use this over kafka. Is it because Kafka doesn't scale to millions of logs on a single cluster or is it because kafka is not sympathetic to heterogeneous disk arrays containing SSD and HDD. I strongly suspect it may be latency of writes at scale but this is pure speculation.
I don't know. If I understand why anyone might use this I'd contribute to building language bindings for the APIs.
[+] [-] sh00s|7 years ago|reply
- It's designed to work with a large number of logs (roughly equivalent to partitions in Kafka), hundreds of thousands per cluster is common.
- Sequencer failover is very quick, typical failover time when a sequencer node fails is less than a second.
- It supports location awareness and can place data according to replication constraints specified (e.g. replicate it in 3 copies across 2 different regions and 3 racks).
- Because of non-deterministic data placement, it is very resilient to failures in terms of write availability.
- If a node/shard fails, it detects the failure and rebuilds the data that was replicated to failed nodes/shards automatically
[+] [-] otterley|7 years ago|reply
I doubt that's it, since Kafka can certainly do that.
[+] [-] manigandham|7 years ago|reply
Kafka has done well so far, especially in making streaming systems more common, but it's about time for the next-gen systems.
[+] [-] ashu|7 years ago|reply
[+] [-] posnet|7 years ago|reply
[+] [-] Rafuino|7 years ago|reply
[+] [-] StreamBright|7 years ago|reply
https://logdevice.io/docs/Concepts.html#consistency-guarante...
And it uses RocksDB under the hood:
https://logdevice.io/docs/Concepts.html#logsdb-the-local-log...
[+] [-] adev_|7 years ago|reply
Could a LogDevice give a bit of informations about the scale they use that at facebook ?
- How many record this thing can injest per day ? - Any limitations on the maximum number of storage nodes ? - What would be your maximum and advise size of record for a production usage ? - ZooKeeper seems to be the center point used as epoch provider. Did you encounter any scaling limitations or max number of client due to that ?
[+] [-] sandstrom|7 years ago|reply
I like the idea of decoupling compute from storage for streaming/log data.
I wonder if it would be easy to make it run under Consul, instead of ZooKeeper.
[+] [-] AhmedSoliman|7 years ago|reply
[+] [-] remh|7 years ago|reply
Scalable
Store up to a million logs on a single cluster. ?
This sounds pretty confusing / low volume.
[+] [-] manigandham|7 years ago|reply
[+] [-] tryptophan|7 years ago|reply
[+] [-] javiermaestro|7 years ago|reply
Previous discussion in HN: https://news.ycombinator.com/item?id=15142266
[+] [-] jMyles|7 years ago|reply
[+] [-] cedricvg|7 years ago|reply
[1] https://logdevice.io/docs/Settings.html#security
[+] [-] Annatar|7 years ago|reply
All daemons and system administration utilities belong into sbin, because bin is for end-user applications.
Historically, the "s" in sbin meant something else, but it always contained applications and scripts only root could run.
When I see these examples, it's depressing to see just how much understanding of UNIX is missing.
[+] [-] AhmedSoliman|7 years ago|reply
[+] [-] majidazimi|7 years ago|reply
- Cross vendor replication which makes migration much easier.
- No dependency on vendor provided replication protocols.
- Ability to use in-app databases such RocksDB, SQLite, ...
- Upgrading DB nodes becomes way easier since they are totally separated from each other.
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] cardosof|7 years ago|reply
[+] [-] manigandham|7 years ago|reply
[+] [-] senderista|7 years ago|reply
[+] [-] noahdesu|7 years ago|reply
[0]: https://github.com/cruzdb/zlog