Interested to know what Rust was missing. I built an ad exchange last year and it has been great. I have been using nightly builds, mostly for access to async/await, and it has been very fast and stable.
I have had to submit a few pull requests to various projects along the way, but didn't find the ecosystem prohibitively lacking.
TechEmpower's Plaintext scenario is currently limited at 7M RPS due to network limits, though it uses a 10Gb NIC. Knowing that the Plaintext scenario is a very simple HTTP request (standard headers) that returns "Hello World!", how close to network saturation are you with 5M in this case with only "2 Gigabit Ethernet cards"?
Did you consider Vert.X? it's built on Netty and has it's own Linux epoll driver, async, and fiber support. It's impossible to know if it would be faster, but likely comparable and way less work than rolling your own.
In techempower benchmarks it exceeds 2 million http requests/second and it's a full REST framework.
And if you use the fiber support through Quasar you can pretend most things are normal blocking code.
I spent a few cycles in media buying and later in sell-side ad tech. Please do say what you will about advertising and its effects on the web, but I will say this: it is a world of fascinating tech. As a buyer I experienced janky pacing all of the time across various platforms, because this is a HARD problem. We had to manually adjust campaigns on a daily basis to ensure pacing worked properly. It was common to stop a campaign and overspend by hundreds of dollars while all of the caching spun down.
I'm fascinated to see they are running that all on a single node. Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow??? Google Fi/Spanner and BigTable have certainly been developed to support these issues. I've been trying to dig up what infrastructure powers Google AdX, but I haven't found anything. AdWords seems to be tied to Spanner, but AdX is/was an entirely different beast. In any case I'm quite certain that it isn't running pacing on a single, gigantic node.
As an anecdotal data point, I once configured a test campaign on Doubleclick Bid Manager (now Google DV360) about two years ago that I needed some quick exposure on. So I set a budget cap of 100$ just for safety and didn‘t do any targeting, so I was effectively bidding on half the worlds‘ ad inventory. What I didn‘t check or notice was that pacing wasn‘t set to even, but to Flight ASAP.
Suffice to say, I spent 730$ within _seconds_, so fast actually Googles systems couldn‘t even switch off fast enough to prevent 7,3x overspend, and the only thing that prevented stupid me from a five digit spend was probably choosing an unusual ad size.
> Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow???
The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.
> Google Fi/Spanner and BigTable have certainly been developed to support these issues.
I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.
Why, they wasted months evaluating with an obsession for statically typed languages that have so far not produced anything more quickly or markedly better than that which others are producing with less pedantic languages.
I work with Wael. Development is still ongoing. One implementation uses Golang, the other uses F# with a library that wraps libuv for faster network performance. Pony was used to write the stress-testing client for both implementations.
> "I didn't want to rewrite everything from scratch, and definitely, I didn't want to handle all edge cases for epoll. My choice was to use libuv. The architecture I opt for: use 16 cores out of 40 for networking, having 16 'uv_loop' each running on its own thread. Callbacks will be passed from F# to each 'uv_loop' instance. The event loop will call them after parsing the bid request in C11."
Looks like libuv directly in C11? (not F# as before edit).
It’s kind of sad that all this engineering effort was spent to essentially make the internet a worse place for everyone and waste users’ time and attention.
Imagine if a crime syndicate would brag about their efforts to make their worldwide criminal activities more efficient.
I can totally see where you're coming from. But major engineering achievements require efforts of many skilled people, who often like to be paid really well for their work. And the way the world works today is that a lot of big money is in the fields that are of questionable value to the society: advertising, finance, military, etc. And even in the fields that seem at first glance to be socially valuable, like health care, most of money comes not from healing people but from playing the game of "rip off public or private coverage providers".
Therefore I think the best we can hope for is that engineering breakthroughs achieved in profit driven fields will gradually leak into other fields where they can actually be used to improve people's lives.
... So when it's Google or FB blogging about technology originally developed to serve ads, it's hype and cool... But when the authors are more honest about the motivation behind developing a certain piece of tech, it's "kind of sad"?
Do you legitimately think the world would be a better place if gmail, youtube, flickr, reddit, EVERY search engine, and basically every web content site disappeared?
Because that's what happens if you don't have web advertising. Free things disappear without revenue.
Or maybe you'd prefer to go back to the days of randomly-targeted or "PUNCH THE MONKEY" ads. Because THAT'S what happens without ad auctions and targeting.
The reality is: advertisers and ad-supported sites WANT to show you a relevant ad that you're likely to click (modulo obvious bad actors). That's how they get paid. Anything else is, by definition, "[wasting] users' time and attention."
I'd love to read what a crime syndicate does to improve their activities. Doesn't mean I agree with them... but no doubt it's really interesting and I might learn something from it.
mej10|6 years ago
I have had to submit a few pull requests to various projects along the way, but didn't find the ecosystem prohibitively lacking.
coolsunglasses|6 years ago
sebastienros|6 years ago
nullwasamistake|6 years ago
In techempower benchmarks it exceeds 2 million http requests/second and it's a full REST framework.
And if you use the fiber support through Quasar you can pretend most things are normal blocking code.
Have to tried it or is this a case of NIH?
reilly3000|6 years ago
I'm fascinated to see they are running that all on a single node. Its a massive amount of state aggregated from billions of events that needs to be served at extremely low latency, but couldn't it be partitioned somehow??? Google Fi/Spanner and BigTable have certainly been developed to support these issues. I've been trying to dig up what infrastructure powers Google AdX, but I haven't found anything. AdWords seems to be tied to Spanner, but AdX is/was an entirely different beast. In any case I'm quite certain that it isn't running pacing on a single, gigantic node.
endymi0n|6 years ago
Suffice to say, I spent 730$ within _seconds_, so fast actually Googles systems couldn‘t even switch off fast enough to prevent 7,3x overspend, and the only thing that prevented stupid me from a five digit spend was probably choosing an unusual ad size.
Fascinating stuff indeed :)
reinhardt|6 years ago
The bidder/pacer state is not necessarily massive, and certainly it does not consist of all the gazillions of past events. Depending on the strategy/bidding model, it can range from a few MB to several GBs, something that can fit in a beefy node.
> Google Fi/Spanner and BigTable have certainly been developed to support these issues.
I doubt any external store can be used with so low latency constraints (2-10ms) and high throughput (millions RPS). Perhaps Aerospike but even that is a stretch to put it in the hot-path. At this scale you're pretty much limited to fetch the state in memory and update it asynchronously every couple of minutes/hours.
Source: I also work in ad tech.
pas|6 years ago
For anyone else confused it's probably Google F1 and Spanner.
ggregoire|6 years ago
Guthur|6 years ago
w3clan|6 years ago
Is it golang or pony or F$? CoreFX mention in the end confused me more.
rkallos|6 years ago
tracker1|6 years ago
Looks like libuv directly in C11? (not F# as before edit).
insulanian|6 years ago
philliphaydon|6 years ago
csdreamer7|6 years ago
I am learning Clojure so I would like to know if anyone knows of the highest performant applications written in it.
eliasson|6 years ago
This makes me curious - was it the language or the runtime characteristics?
Nextgrid|6 years ago
Imagine if a crime syndicate would brag about their efforts to make their worldwide criminal activities more efficient.
_cs2017_|6 years ago
Therefore I think the best we can hope for is that engineering breakthroughs achieved in profit driven fields will gradually leak into other fields where they can actually be used to improve people's lives.
mochomocha|6 years ago
packetslave|6 years ago
Because that's what happens if you don't have web advertising. Free things disappear without revenue.
Or maybe you'd prefer to go back to the days of randomly-targeted or "PUNCH THE MONKEY" ads. Because THAT'S what happens without ad auctions and targeting.
The reality is: advertisers and ad-supported sites WANT to show you a relevant ad that you're likely to click (modulo obvious bad actors). That's how they get paid. Anything else is, by definition, "[wasting] users' time and attention."
legohead|6 years ago
teej|6 years ago
llamataboot|6 years ago
stingraycharles|6 years ago
tgtweak|6 years ago
Found this article great, not many places to see 5m req/s, let alone on a single node.
I'm really interested in hearing more about those databases!
bob809|6 years ago
[deleted]