I never apologise for substituting "complete data" for "Big Data". Volume of data is no substitute for relevant data and, as always, unless you know the quality of the day analysed you cannot define the confidence limits of any predictions.
Google's Flu Trands is a good example of a project where correlations drawn from search data were a good indicator of potential flue outbreaks. But the accuracy broke down.
It worries me that finding apparent correlations between more and more "Big" data and acting on these may lead to poor decision making. This is why it is always important that people on the sharp end of any organisation are engaged in testing insights. You cannot leave this to data scientists and analysts with little operational experience.
New self-service BI tools help this as these personas can test hypotheses against practical experience. But don't leave this process to data discovery workgroup tools. Make sure the data is the core enterprise data so that any conclusions drawn are from the right data. Use a BI and Analytics platform that ensures the highest standards of multi-tenancy security even for self-service BI and Analytics.
Google’s Flu Trends project serves as a good example. Designed to produce accurate maps of flu outbreaks based on the searches being made by Google users, at first it provided compelling results. However as time went on, its predictions began to diverge increasingly from reality. It turned out that the algorithms behind the project just weren’t accurate enough to pick up anomalies such as the 2009 H1N1 pandemic, vastly reducing the value that could be gained from them.