The law of Large
Numbers is often regarded as a sort of "law of nature" by which
variables' averages always gravitate to fixed clear values.
The question is,
does the law of large number hold true in the case of big-data?
The key to the
answer is in the law's underlying assumptions regarding sample-representation
and data stability. One of the qualities that signify big data is Volatility. Volatility
thrives in large multi-variant and closely-packed interrelated events that
usually exist in big data, and it is the dynamics that follows which interferes
in the convergence of averages and prevents it from happening.
In my view, even
if the law of large numbers was true for big data, it would not have been of
much use, due to its focus on common "average" behavior that is
already known, rather than on irregularities and exceptions that are yet
unknown and requiring research, such as in the study of early-detection
indicators, adverse-effects, fraud-detection, quality-assurance, customer-retention,
accidents, and long-tail marketing – to mention a few. Long-tails for example, consist
of overlooked hidden phenomena, thus their discovery has to look, by definition,
elsewhere then the already considered law of large numbers.
The above weak
points of the law of large numbers, are just a small part of analytics
"peculiarities" that can be expected in big data.
This paragraph is
the first in series of assays on a proposed new concept of science in view of
the IT industrial revolution.
No comments:
Post a Comment