top | item 10154527

Google PDF Search: “not for public release”

268 points| webmonkeyuk | 10 years ago |google.com | reply

56 comments

order
[+] mindcrime|10 years ago|reply
[+] pakled_engineer|10 years ago|reply
"for official use only" or "U//FOUO" brings up interesting results, the pdf "U//FOUO Sovereign citizens extremist ideology" by the FBI was a good read so were all the Interpol recent internal reports about all their weapons that have been "misplaced" or stolen.
[+] snehesht|10 years ago|reply
Some times people are ignorant but sometimes they are clever.
[+] PeterWhittaker|10 years ago|reply
Much more interesting if you limit the search by time.

Less than a month old? A single screenful, mostly Australian.

[+] maudineormsby|10 years ago|reply
A lot of these are redacted and appear to be FOIA or similar requests that have been fulfilled.
[+] maudineormsby|10 years ago|reply
And then again, after looking, some are clearly not.
[+] shanemhansen|10 years ago|reply
This one was quite sad. The suicide of an inmate: http://www.drc.ohio.gov/public/after_action_castroA643371.pd...
[+] _tjry|10 years ago|reply
Uh. Unless I'm mistaken, that particular inmate pleaded guilty and was convicted of 937 various counts raised against him, including murder and rape. He kidnapped three women (in 2002, 2003 and 2004) and kept them imprisoned in his basement for nearly 11 years during which time he did horrible, unspeakable things to them.
[+] rusbus|10 years ago|reply
I wonder if those are the sort of links that can leave you on the wrong side of the Computer Fraud and Abuse act...
[+] jdavis703|10 years ago|reply
IANAL, but typically CFAA violations revolve around crafting special URLs, as in a forced browsing attack. Simply following a URL, is AFAIK not (yet) a crime.
[+] ck2|10 years ago|reply
Now subtract -"not for public release until"
[+] cvsv|10 years ago|reply
Assuming filetype:docx is even worse?
[+] feld|10 years ago|reply
Tennessee execution procedures? lovely
[+] ocdtrekkie|10 years ago|reply
This is pretty interesting. One did say "Not for Public Release UNTIL", so could presumably be intended, but in a lot of cases webmasters probably didn't think something would be found and indexed by Google wherever they put it. And were wrong.
[+] seiji|10 years ago|reply
This is a great example of the house of cards all our network systems are built on top of.

Imagine this scenario: you maintain a network of web servers, database servers, file servers, etc. They all combine to generate a large website used by tens of millions of users every month. One day you are just doing a cursory look over a certain server, but you see something strange. Someone is logged in to your server. And they have a Russian IP address.

What do you do? Obviously, the first step is you login to your edge routers and null route all of Russia. GFTO. Next, you've got an idle session on one server. What were they doing?

How can you reconstruct what they were doing? bash history? maybe. Network forensics? Your network probably isn't recording every historical connection between servers—99.9999% of the time useless—but critical in this case. File system access? Your file system probably isn't logging every historical access—useless 99.99999% of the time—but would be really freaking useful in this case.

So, you investigate their history, double check some database logs, check netstat, check lsof, and in the end, you really have no idea what they were doing at all. Our systems don't leave enough bread crums around to reconstruct even interior hostile activities, much less semi-intelligently disallowing Google to not index confidential information when accidentally left exposed.

[+] gcb0|10 years ago|reply
the magic of "turn-key solutions"

when you decide to buy something for $x instead of paying someone who knows what they are doing to implement something with proper standards for $5x show on things like this

[+] unsignedint|10 years ago|reply
They should have at least have set an owner password for these documents. (In practice, they are not effective preventing people to disregard limitation that you set on the document, but at least it'll exclude documents for indexing at least by Google.)
[+] r3bl|10 years ago|reply
I think the bare minimum would be to put them all in one directory and use robots.txt to hide them from Google.

Sure, it's weak, but at least it won't be accessible through Google.

[+] peterwwillis|10 years ago|reply
What's especially crazy about these are that so many have been cached by Google. Anyone can read these docs and only Google would ever have a record.