(no title)
jd20 | 5 years ago
- Applebot was originally written in Go (and uncovered a user agent bug on redirects, revealing it's Go origins to the world, which Russ Cox fixed the next day).
- Up until the release of iOS 9, Applebot ran entirely on four Mac Pro's in an office. Those four Mac Pro's could crawl close to 1B web pages a day.
- In it's first week of existence, it nearly took Apple's internal DNS servers offline. It was then modified to do it's own DNS resolution and caching, fond memories...
Source: I worked on the original version.
ospider|5 years ago
Unlike other languages, Go bypasses system's DNS cache, and goes directly to the DNS server, which is a root cause of many problems.
Spivak|5 years ago
Since netgo is faster, by default Go will try its best to determine if it must use netcgo by parsing /etc/nsswitch.conf, looking at the tld, reading env variables, etc..
If you're building the code you can force it to use netcgo by adding the netcgo build tag.
If you're an administrator the least intrusive method I think would be setting LOCALDOMAIN to something or '' if you can't think of anything which will force it to use NSS.
tylfin|5 years ago
If you're on a system with cgo available, you can use `GODEBUG=netdns=cgo` to avoid making direct DNS requests.
This is the default on MacOS, so if it was running on four Mac Pro's I wouldn't expect it to be the root cause.
oasisbob|5 years ago
As I understand it, Go and Java are both trying to avoid FFI and calling out to system libs for name resolution.
I tend to always offer a local caching resolver available over a socket.
ksec|5 years ago
Considering the timeline, are those Trash Can Mac Pro? Or was it the old Cheese Grater ?
jd20|5 years ago
nothis|5 years ago
The scale of web stuff sometimes surprises me. 1B web pages sounds like just about the daily web output of humanity? How can you handle this with 4 (fast) computers?
raxxorrax|5 years ago
thdrdt|5 years ago
throwaway4good|5 years ago
Does it use a traditional relational database or another existing database-like product? Or is built from scratch just sitting on top of a file system.
jd20|5 years ago
ricardo81|5 years ago
edoceo|5 years ago
pronoiac|5 years ago
unknown|5 years ago
[deleted]
polote|5 years ago
NiekvdMaas|5 years ago
jd20|5 years ago
doh|5 years ago
jd20|5 years ago
dx034|5 years ago
I guess there are not many places where you can easily get 4GB/s sustained throughput from a single office (especially with proxy servers and firewalls in front of it). Is that standard at Apple or did the infrastructure team get involved to provide that kind of bandwidth?
thatwasunusual|5 years ago
Silasdev|5 years ago
matthewhartmans|5 years ago
person_of_color|5 years ago
netsharc|5 years ago
All three uses of "it's" should be "its".
And I would just write "Mac Pros" instead of Mac Pro's".