(no title)
tutfbhuf | 1 year ago
Ceph, with which I have much experience, is a very solid and quite bulletproof storage solution that offers S3 protocol and FS. However, maintaining it in the long run is really challenging. You better become a Ceph expert.
SeaweedFS struggles with managing large data groups. It's inspired by an outdated Facebook study (Haystack) and is intended for storing and sharing large images. However, I think it's only average—it has poor documentation, underwhelming performance, and a confusing set of components to install. Its design allows each server process to use one big file for storage, bypassing slow file metadata operations. It offers various access points through gateways.
MinIO has evolved a lot recently, making it hard to evaluate. MinIO relies on many small databases. Currently, it's phasing out some features, like the gateway, and mainly consists of two parts: a command line interface (CLI) and a server. While MinIO's setup is complex, SeaweedFS's setup is much simpler. MinIO also seems to be moving from an open-source model towards a more commercial one, but I have not closely followed this transition.
All of these solutions are not simple enough to be the base for a distributed database application. What we really need would be something like an Ext4 successor, let's call it Ext5, with native distributed storage capabilities in the most dead-simple way. ZFS is another good candidate. ZFS has already solved the problem of how to distribute storage across multiple hard drives within one server very well, but it still lacks a good solution on how to distribute storage across different hard drives on different servers connected via a network.
Yes, I know there is the CAP theorem, so it is really a hard challenge to solve, but I think we can do better in terms of self-hosted solutions.
dikei|1 year ago
Are you sure you are not talking in reverse?
I find Minio single binary deployment very easy, and you also complained about SeaweedFS's complexity in the previous paragraph.
moritzruth|1 year ago
klabb3|1 year ago
Yes, but S3 is basically a standardized protocol at this point. There are many both open and commercial alternatives, like Cloudflare R2 (no egress). So depending on the reason for self-hosting (such as preventing lock-in), S3 might be the least important thing to actually move away from. It’s way more difficult to migrate away from eg a proprietary db, sometimes by design.
giovannibonetti|1 year ago
[1] https://www.tigrisdata.com/ [2] https://github.com/tigrisdata-archive/tigris