BIG Data Is Not Necessarily GOOD Data

BIG Data or GOOD Data 563x180

Social networks collect enormous amounts of data about people’s intentions and actions, but they have come into being so quickly that there hasn’t been time for much wisdom to have been gleaned from this data. The large majority of both the staffs of the social network companies and their users have little or no experience with the practical challenges of collecting and interpreting data. A just-published study by Derek Ruths of McGill University and Jürgen Pfeffer of Carnegie Mellon University in Science Daily warns of some of the pitfalls. Foremost among them is not dealing with the biases due to the composition of the sample. Technology Bloopers’ Statistics and Surveys webpage states at the outset “Be sure your sample is representative.” Different social networks attract different sorts of people, in terms of age, gender, ethnicity, etc. Findings based on data from one almost certainly do not represent the U.S. population as a whole. One flagrant example, which occurred decades before most of the people designing or using today’s social networks were born, was the mistaken prediction that Dewey would beat Truman in the 1948 U.S. presidential race; this was caused by a failure to sample voters properly. There are certainly a number of similar errors that have already been made by failures to understand the underlying samples from social networks’ being used for decision-making.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.