" This saved costs, guaranteed better uptime, and made the site more portable and thus harder to take down "
Probably not true for " This saved costs ". From what i've seen, virtual machines usually cost more than twice the price of renting the equivalent "real" machine monthly.
They could have used dedicated servers; there are more dedicated server providers than VM providers, thus achieving the same goal, less expensively.
Probably not true for " better uptime " either; VMs are still hosted on real hardware, which fails, too. (Although distributing the work on more independent machines can improve uptime.)
They are more expensive, but they are usually easy and immediate to acquire. Which makes provisioning much more efficient in case of fluctuating traffic. And overall sysadmins will have less tendency to over-provision, meaning getting more and beefier machines than it's needed "to be safe".
1) Hardware seizure expenses vs LEOs duplicating the hdd of a virt.
2) TPB needs to locate in disparate jurisdictions to take advantages of different legal situations. That would involve a ton of shipping costs, probably more lost hardware, and paying for remote hands
3) They had been paying a premium for 'bulletproof' hosting.
If the load balancer is the weak point that would be first to be discovered, then I imagine they must have some mechanism to stop it leaving evidence that leads to the other machines if it were to get raided (it isn't on their hardware, so they can't prevent the files being backed up).
Is there a way the codebase could be entirely encrypted and not even accessible to the cloud provider (with some 'boot password' needed each time the server starts up)?
I remember reading their load-balancer shuts itself off after 2 minutes if something strange happens. The whole thing is run from memory, so no traces will be left.
It depends on what you mean by that. The only way to prevent a codebase from being seen by an adversary with physical access when the server is on is to not have the sensitive data on the server in the first place.
Encryption (with the decryption key being gotten at boot from, say, a particular .onion address) would work against backups, but won't protect against an adversary with admin access to the server when the virtual server is on.
"At the time of writing the site uses 21 virtual machines (VMs) hosted at different providers. [...] All virtual machines are hosted with commercial cloud hosting providers, who have no clue that The Pirate Bay is among their customers."
They may "have no clue" but it seems like that's only because they don't care and haven't looked. I don't see anything in the article that would prevent the providers from figuring this out unless I'm missing something.
Why can't people find where their servers? I understand they have their own IP allocation, thus they can use BGP tricks. But don't they need a sympathetic ISP or similar to help them get the routes in?
IIRC They have their load balancer hosted under a sovereign IP address (the IP block belongs to a political party). So attempting to mess with it could constitute infringement of free speech.
I guess if ISPs would really go sniffing on whether they host them they would probably be able to find out (probably!). But then when you have couple of hundred VPS customer and give a tiny bit about their privacy then as long as you get paid and receive no complaints why would you really go look for them?
Interesting, so I'm presuming there's several VPNs involved between the load-balancer and all the discrete servers. I wonder if they use a VPN provider with a static IP and no-logs policy or if it's simply yet another VPS.
I'd love to hear a little more about the architecture.
If memory serves, I think TPB is somehow related to iPredator [1][2], though I'm not sure if that is the case anymore. This may give them _lots_ of experience running VPN software, which would be usable if that is indeed how they're communicating between VPS providers.
> In total the VMs use 182 GB of RAM and 94 CPU cores. The total storage capacity is 620 GB, but that’s not all used.
That level of hardware/cores seems a bit over the top given what TPB does.
When I was a boy we had this thing called 'Alta Vista'. It was the search engine before Bing! came along. Processors did not run at gigahertz speeds back then and a large disk was 2Gb. Nonetheless most offices had the internet and when people went searching 'Alta Vista' was the first port of call for many.
TPB has an index of a selective part of the internets, i.e. movies, software, music, that sort of thing. Meanwhile, back in the 1990's, AltaVista indexed everything, as in the entire known internets, with everything stored away in less than the 620Gb used by TPB for their collection of 'stolen' material.
Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling
a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows:
Alta Vista -- AlphaStation 250 4/266
4 GB disk
196 MB memory
Primary web server for gotcha.com
Queries directed to WebIndexer or NewsIndexer
NewsServer -- AlphaStation 400 4/233
24 GB of RAID disks
160 MB memory
News spool from which news index is generated
Serves articles (via http) to those without news server
NewsIndexer -- AlphaStation 250 4/266
13 GB disk
196 MB memory
Builds news index using articles from NewsServer
Answers news index queries from Alta Vista
Spider -- DEC 3000 Model 900 (replacement for Model 500)
30 GB of RAID disk
1GB memory
Collects pages from the web for WebIndexer
WebIndexer -- Alpha Server 8400 5/300
210 GB RAID disk (expandable)
4 GB memory (expandable)
4 processors (expandable)
Builds the web index using pages sent by Spider.
Answers web index queries from Alta Vista
fooyc|11 years ago
Probably not true for " This saved costs ". From what i've seen, virtual machines usually cost more than twice the price of renting the equivalent "real" machine monthly.
They could have used dedicated servers; there are more dedicated server providers than VM providers, thus achieving the same goal, less expensively.
Probably not true for " better uptime " either; VMs are still hosted on real hardware, which fails, too. (Although distributing the work on more independent machines can improve uptime.)
drsintoma|11 years ago
droopyEyelids|11 years ago
1) Hardware seizure expenses vs LEOs duplicating the hdd of a virt.
2) TPB needs to locate in disparate jurisdictions to take advantages of different legal situations. That would involve a ton of shipping costs, probably more lost hardware, and paying for remote hands
3) They had been paying a premium for 'bulletproof' hosting.
TomAnthony|11 years ago
Is there a way the codebase could be entirely encrypted and not even accessible to the cloud provider (with some 'boot password' needed each time the server starts up)?
waxjar|11 years ago
I don't know how accurate this is, though.
TheLoneWolfling|11 years ago
Encryption (with the decryption key being gotten at boot from, say, a particular .onion address) would work against backups, but won't protect against an adversary with admin access to the server when the virtual server is on.
tjaerv|11 years ago
crazy1van|11 years ago
verroq|11 years ago
nkcmr|11 years ago
tete|11 years ago
Nanzikambe|11 years ago
I'd love to hear a little more about the architecture.
cookrn|11 years ago
[1] https://www.ipredator.se/ [2] http://torrentfreak.com/pirate-bay-announces-ipredator-globa...
icedchai|11 years ago
Theodores|11 years ago
That level of hardware/cores seems a bit over the top given what TPB does.
When I was a boy we had this thing called 'Alta Vista'. It was the search engine before Bing! came along. Processors did not run at gigahertz speeds back then and a large disk was 2Gb. Nonetheless most offices had the internet and when people went searching 'Alta Vista' was the first port of call for many.
TPB has an index of a selective part of the internets, i.e. movies, software, music, that sort of thing. Meanwhile, back in the 1990's, AltaVista indexed everything, as in the entire known internets, with everything stored away in less than the 620Gb used by TPB for their collection of 'stolen' material.
From http://en.wikipedia.org/wiki/AltaVista
Alta Vista is a very large project, requiring the cooperation of at least 5 servers, configured for searching huge indices and handling a huge Internet traffic load. The initial hardware configuration for Alta Vista is as follows:
Alta Vista -- AlphaStation 250 4/266 4 GB disk 196 MB memory Primary web server for gotcha.com Queries directed to WebIndexer or NewsIndexer
NewsServer -- AlphaStation 400 4/233 24 GB of RAID disks 160 MB memory News spool from which news index is generated Serves articles (via http) to those without news server
NewsIndexer -- AlphaStation 250 4/266 13 GB disk 196 MB memory Builds news index using articles from NewsServer Answers news index queries from Alta Vista
Spider -- DEC 3000 Model 900 (replacement for Model 500) 30 GB of RAID disk 1GB memory Collects pages from the web for WebIndexer
WebIndexer -- Alpha Server 8400 5/300 210 GB RAID disk (expandable) 4 GB memory (expandable) 4 processors (expandable) Builds the web index using pages sent by Spider. Answers web index queries from Alta Vista
drunkcatsdgaf|11 years ago
They also didn't get as much traffic as TBP, since there wasn't that many connected back then.
I would also imagine that they didn't have to HIDE their services either.
xamolxix|11 years ago
IIRC there where (quite) a few before bing. More to the point google was the pinnacle of web searches long before bing came into existence.
lmm|11 years ago
Nanzikambe|11 years ago