Thursday, September 22, 2016

The law of Large Numbers fails in big data

The law of Large Numbers is often regarded as a sort of "law of nature" by which variables' averages always gravitate to fixed clear values. 
The question is, does the law of large number hold true in the case of big-data?
The key to the answer is in the law's underlying assumptions regarding sample-representation and data stability. One of the qualities that signify big data is Volatility. Volatility thrives in large multi-variant and closely-packed interrelated events that usually exist in big data, and it is the dynamics that follows which interferes in the convergence of averages and prevents it from happening.
In my view, even if the law of large numbers was true for big data, it would not have been of much use, due to its focus on common "average" behavior that is already known, rather than on irregularities and exceptions that are yet unknown and requiring research, such as in the study of early-detection indicators, adverse-effects, fraud-detection, quality-assurance, customer-retention, accidents, and long-tail marketing – to mention a few. Long-tails for example, consist of overlooked hidden phenomena, thus their discovery has to look, by definition, elsewhere then the already considered law of large numbers.

The above weak points of the law of large numbers, are just a small part of analytics "peculiarities" that can be expected in big data.
This paragraph is the first in series of assays on a proposed new concept of science in view of the IT industrial revolution.


No comments:

Post a Comment