Google’s Counts: Worth Every Penny You Pay For Them

statistician 400x310

The Internet theoretically is a statistician’s dream. Let’s hope it’s not an nightmare. In our March 10, 2014 post about the irreproducible results of an Ngram search we warned that nothing prevents Google from changing their definitions or conventions … and not telling us about them. But since they tell us precious little, it seems wise not to base important conclusions or critical decisions solely on any relatively lengthy history of the counts data. And that “relatively lengthy” may be even as short as a month or a quarter, because it is easy for Google to change their mind and their software. This was brought to our attention in the December 21 New York Times by economist Seth Stephens-Davidowitz, who apparently makes a career analyzing counts produced by Google searches of certain key words or survey data collected by other surveyers. Overall, the New York Times article showed mostly upbeat behavior during the holiday season, which one would hope for. Whether the annual trends are accurate or not, likely only Google knows for sure. And we are not opining that Google is doing anything malicious in making their changes; they may all be done with the goal of improved accuracy and usability. But without more transparency we will never know.

BIG Data Is Not Necessarily GOOD Data

BIG Data or GOOD Data 563x180

Social networks collect enormous amounts of data about people’s intentions and actions, but they have come into being so quickly that there hasn’t been time for much wisdom to have been gleaned from this data. The large majority of both the staffs of the social network companies and their users have little or no experience with the practical challenges of collecting and interpreting data. A just-published study by Derek Ruths of McGill University and Jürgen Pfeffer of Carnegie Mellon University in Science Daily warns of some of the pitfalls. Foremost among them is not dealing with the biases due to the composition of the sample. Technology Bloopers’ Statistics and Surveys webpage states at the outset “Be sure your sample is representative.” Different social networks attract different sorts of people, in terms of age, gender, ethnicity, etc. Findings based on data from one almost certainly do not represent the U.S. population as a whole. One flagrant example, which occurred decades before most of the people designing or using today’s social networks were born, was the mistaken prediction that Dewey would beat Truman in the 1948 U.S. presidential race; this was caused by a failure to sample voters properly. There are certainly a number of similar errors that have already been made by failures to understand the underlying samples from social networks’ being used for decision-making.

Pew Research’s Web IQ Quiz Suggests Need for Higher Questionnaire Design IQ

Pew Research iPhone Release Date 600x300

What makes a survey good? It needs to be correct (i.e., to have correct answers that are representative of the populations being surveyed) and actionable (i.e., to have conclusions that are useful and can be implemented). Unfortunately the recent Pew Research Center’s Web IQ Quiz fails on the second criterion. Most of its questions would be more appropriate to a game of Trivial Pursuit than to a proof that U.S. adults are clueless about using the Web. Being able to identify Bill Gates or Sheryl Sandberg from their pictures is meaningless. Ditto the name of the first browser or the date of introduction of the iPhone. Moore’s Law is important as a predictor of computer speed and storage size, but has nothing to do with how people should use the Web. Etc. Interestingly, two of the most useful questions—on privacy policy and net neutrality—were answered as well (or even better) by older (50+) people than younger ones.

Can’t the FCC Count (the Viable Wireless Companies) Correctly? But Now They Have a Second Chance

3 is better than 2 600x403

What are we at Technology Bloopers missing here? The Federal Communications Commission nixed the merger between T-Mobile and Sprint early in 2014, saying it was non-competitive. Huh? They apparently can’t count. That would have created three meaningful competitors rather than two meaningful ones and two also-rans. Even less charitable than us was The Washington Post. However, now T-Mobile’s prospective buyer Iliad SA has bowed out, giving the hapless FCC an opportunity to do some behind-the-scenes politicking to get Sprint to refresh their bid … if it isn’t too late.

Be Careful When Using Any Google or YouTube Search Quantitative Results

Who's Bossy Graph Composite 600x250

There are scattered comments criticizing the criteria (and changes in them) that Google uses to include, and especially to rank, the results of their searches. It is tempting—but risky–to use these searches to quantify trends. And we are not even sure that Google itself understands that this is happening. Given the origins of Google, and some of their early goals, we doubt that Google intentionally is trying to mislead people who use their Ngram search, though their using vague terms to describe the number of documents is highly suspicious. In any case, ethical scientific practice requires that findings are REPRODUCIBLE. In the case of the Don’t Call Us Bossy article in the Wall Street Journal (a publication that should know better), which listed the search terms, it was not possible to arrive at a set of curves that showed the first peak in the curves during the 1930s, and there was thus no basis for the authors to draw any conclusions about the trends in that time frame. The amount of bossiness in the 1930’s is not the point here; the point is that we should be very careful to validate any supposed trends beyond just what Google searches indicate.