(no title)
unknownknowns | 10 years ago
It's not really surprising that bulk surveillance is simple, in my opinion. The act of collection/surveillance itself isn't the hard part. It's doing it on a much larger scale.
Wardriving a Starbucks as a hobby is slightly different than installing a specialized device into every downtown DC Starbucks, even if functionally both do the same thing (collect data and/or injection).
gonzo|10 years ago
Sampled netflow at 100k connections/sec is straight-forward. You won't do it with BPF, but it's straight-forward with technologies like netmap or DPDK.
Using "the web" as an example TCP session requires a 3 phase handshake (SYN -> SYN ACK -> ACK), then the client can make a request (GET / HTTP/1.0) followed by the response.
The SYN/SYN+ACK/ACK packets are 40 byte each (IP + TCP headers), plus another 18 bytes (DST, SRC, length, CRC), plus another 20 bytes of 'overhead' (preamble, SFD, IPC).
Unfortunately, minimum payload on Ethernet is 46 bytes ('octets'), so the actual on-the wire SYN/SYN+ACK/ACK exchange is 84 bytes each (includes all overhead). Even 1G Ethernet has bandwidth sufficient for 1.488 million packets per second at this size.
Even if all the frames on a 1Gbps link are 1538 bytes (includes all overhead), you still have to deal with 81,274 packets per second.
The HTTP GET request is likely between 100 and 1500 bytes (depends on headers), and, if we figure that the response fits in a 1500 byte frame, (1460 bytes of actual content, plus headers and overhead), we then have a 4 phase close (FIN/ACK from each side). These, again are minimum 84 bytes on the wire.
So the reality is that given a web server that doesn't spend too much time sending actual data (maybe several packets for the page data), it's easy to get to 100K connections per second.
However, 100K connections / sec (remember that's sampled) is 8.640 billion connections per day, and now you have a classic "big data" problem.