What would the search engines be giving the feds?
February 2nd, 2006 It was recently made public that the US government has asked MSN, Yahoo, AOL and Google for some of their search logs. They’ve all handed over the search logs - except Google who is fighting the government. This has raised a lot of concerns about privacy - Danny Sullivan has covered this fairly comprehensively.
The only contribution I’d like to make to this is to give you an idea of what the data might look like that the search engines gave the feds. Here is a link to a file containing almost half a million search terms (zip 2.2MB), from consumer search activity. The file is 7.1MB when it’s unzipped - you can open it in wordpad. It’s really interesting looking at what people search for.
The file is from a day’s activity back in 2001. We’ve removed phone numbers and email addresses. It is ordered alphabetically and each search term is only shown once. Some processing has been done on these keywords - the terms are all lower case, have had white space and punctuation removed and some other processing performed.
This isn’t exactly the format that the search engines would have given the data to the feds and although it does make interesting reading, it does underscore how difficult it would be to get any useful information out of it. Privacy issues aside this seems like a silly request from the Department of Justice. Can anyone see how they could analyze this file to support the child protection law?
FYI - This file is not censored and does contain adult search terms - viewer discretion is advised.


February 7th, 2006 at 6:57 am
Useful, but would be more interesting to see something more up to date. Can you tell us how you came by this data and how it might be possible to get access to this without paying WordTracker lots of money?