top | item 34292666

(no title)

trishume | 3 years ago

My friend mentioned this just before I published and I think that probably is the fastest largest thing you can get which would in some sense count as one machine. I haven't looked into it, but I wouldn't be surprised if they could get around the trickiest constraint, which is how many hard drives you can plug in to a non-mainframe machine for historical image storage. Definitely more expensive than just networking a few standard machines though.

I also bet that mainframes have software solutions to a lot of the multi-tenancy and fault tolerance challenges with running systems on one machine that I mention.

discuss

order

jiggawatts|3 years ago

> which is how many hard drives you can plug in to a non-mainframe machine for historical image storage.

You would be surprised. First off, SSDs are denser than hard drives now if you're willing to spend $$$.

Second, "plug in" doesn't necessarily mean "in the chassis". You can expand storage with external disk arrays in all sorts of ways. Everything from external PCI-e cages to SAS disk arrays, fibre channel, NVMe-over-Ethernet, etc...

It's fairly easy to get several petabytes of fast storage directly managed by one box. The only limit is the total usable PCIe bandwidth of the CPUs, which for a current-gen EPYC 9004 series processors in a dual-socket configuration is something crazy like 512 GB/s. This vastly exceeds typical NIC speeds. You'd have to balance available bandwidth between multiple 400 Gbps NICs and disks to be able to saturate the system.

People really overestimate the data volume put out by a service like Twitter while simultaneously underestimating the bandwidth capability of a single server.

ilyt|3 years ago

> People really overestimate the data volume put out by a service like Twitter while simultaneously underestimating the bandwidth capability of a single server.

It's outright comical. Above we have people thinking somehow amount of TLS connections single server can handle is a problem, in service where there would be hundreds of thousands lines of code to generate the content served over it, all while using numbers from what seems like 10+ years old server hardware

trishume|3 years ago

That's really cool! Each year of historical images I estimate at 2.8PB, so it would need to scale quite far to handle multiple years. How would you actually connect all those external drive chassis, is there some kind of chainable SAS or PCIe that can scale arbitrarily far? I consider NVMe-over-fabrics to be cheating and just using multiple machines and calling it one machine, but "one machine" is kinda an arbitrary stunt metric.

sayrer|3 years ago

It's a neat thought exercise, but wrong for so many reasons (there are probably like 100s). Some jump out: spam/abuse detection, ad relevance, open graph web previews, promoted tweets that don't appear in author timelines, blocks/mutes, etc. This program is what people think Twitter is, but there's a lot more to it.

I think every big internet service uses user-space networking where required, so that part isn't new.

trishume|3 years ago

I think I'm pretty careful to say that this is a simplified version of Twitter. Of the features you list:

- spam detection: I agree this is a reasonably core feature and a good point. I think you could fit something here but you'd have to architect your entire spam detection approach around being able to fit, which is a pretty tricky constraint and probably would make it perform worse than a less constrained solution. Similar to ML timelines.

- ad relevance: Not a core feature if your costs are low enough. But see the ML estimates for how much throughput A100s have at dot producting ML embeddings.

- web previews: I'd do this by making it the client's responsibility. You'd lose trustworthiness though so users with hacked clients could make troll web previews, they can already do that for a site they control, but not a general site.

- blocks/mutes: Not a concern for the main timeline other than when using ML, when looking at replies will need to fetch blocks/mutes and filter. Whether this costs too much depends on how frequently people look at replies.

I'm fully aware that real Twitter has bajillions of features that I don't investigate, and you couldn't fit all of them on one machine. Many of them make up such a small fraction of load that you could still fit them. Others do indeed pose challenges, but ones similar to features I'd already discussed.

mschuster91|3 years ago

> I haven't looked into it, but I wouldn't be surprised if they could get around the trickiest constraint, which is how many hard drives you can plug in to a non-mainframe machine for historical image storage.

Netapp is at something > 300TB storage per node IIRC, but in any case it would make more sense to use some cloud service. AWS EFS and S3 don't have any (practically reachable) limit in size.

threeseed|3 years ago

Have you actually used EFS/S3 before ?

Because both are ridiculously slow to the point where they would be completely unusable for a service such as Twitter whose current latency is based off everything largely being in memory.

And Twitter already evaluated using the cloud for their core services and it was cost-prohibitive compared to on-premise.

toast0|3 years ago

> I wouldn't be surprised if they could get around the trickiest constraint, which is how many hard drives you can plug in to a non-mainframe machine for historical image storage.

Some commodity machines use external SAS to connect to more disk boxes. IMHO, there's not a real reason to keep images and tweets on the same server if you're going to need an external disk box anyway. Rather than getting a 4u server with a lot of disks and a 4u additional disk box, you may as well get 4u servers with a lot of disks each, use one for tweets and the other for images. Anyway, images are fairly easy to scale horizontally, there's not much simplicity gained by having them all in one host, like there is for tweets.

trishume|3 years ago

Yah like I say in the post, the exactly one machine thing is just for fun and as an illustration of how far vertical scaling can go, practically I'd definitely scale storage with many sharded smaller storage servers.

jasonhansel|3 years ago

Incidentally, a lot of people have argued that the massive datacenters used by e.g. AWS are effectively single large ("warehouse-scale") computers. In a way, it seems that the mainframe has been reinvented.

sterlind|3 years ago

to me the line between machine and cluster is mostly about real-time and fate-sharing. multiple cores on a single machine can expect memory accesses to succeed, caches to be coherent, interrupts to trigger within a deadline, clocks not to skew, cores in a CPU not to drop out, etc.

in a cluster, communication isn't real-time. packets drop, fetches fail, clocks skew, machines reboot.

IPC is a gray area. the remote process might die, its threads might be preempted, etc. RTOSes make IPC work more like a single machine, while regular OSes make IPC more like a network call.

so to me, the datacenter-as-mainframe idea falls apart because you need massive amounts of software infrastructure to treat a cluster like a mainframe. you have to use Paxos or Raft for serializing operations, you have to shard data and handle failures, etc. etc.

but it's definitely getting closer, thanks to lots of distributed systems engineering.

dekhn|3 years ago

I wouldn't really agree with this since those machines don't share address spaces or directly attached busses. Better to say it's a warehouse-scale "service" provided by many machines which are aggregated in various ways.

hinkley|3 years ago

Oxide is basically building a minicomputer.