Sunday, September 25, 2016

A perpetuum mobile of data – the essence of the IT revolution


The essence of the Information Technology revolution, the engine that propels it, is the reality in todays' information systems, of data bring about more and more data in a closed self-amplifying loop: the data invite applications, applications bring users, users attract new service ideas, new services create more operations & management data, and so forth. 
Data is the raw materials of the information industry, understanding that makes one appreciate the huge opportunity of free materials that this industry enjoys (remark: of organization information infrastructure does cost, but it is regarded as a general investment or overhead).      

The problem with the virtual data assets is that most of them are intangible, i.e. do not have specific registered value in the accountancy. Hence managements may miss their existence. As long as the organization's competitors are sleepy, the waste does not really hurt and usually goes unnoticed by managements. But, the minute somebody else in the branch is starting to use information for strategic advantage, the rules of game are changing, forever.

For example, the story of the meteoric rise of Netflix to world leadership in movies supply by the web. Prior to the foundation of Netflix on 1997, the market was dominated by Blockbuster that was not inclined to adopt advanced technologies, in contrary to Netflix that was quick to employ new techniques and data from operations, for their "agile" business development. Blockbuster simply stayed behind. They did not have much chance to close the widening gap. It does not help in this case, even if a company is big, strong, reputable and internationally spread as Blockbuster was.

Edith Ohri
Home of GT data mining 
Sep.2016

Friday, September 23, 2016

Is Machine Learning chasing its own tail (of presumptions)?

Machine Learning (ML) as a method of learning is indeed a machine, i.e. it operates consistently, repeatedly and predictably, by a designed method, which is made for specific conditions; but its "learning" part is more like "training" or "verification" rather than the acquisition of new knowledge that is suggested in this name. Practically speaking, ML is made to improve prescribed response formulas, not to invent such formulas, (and I know the statement might be seemed controversial) not even to correct them.

Here is then my take on the issue:

Law #1 A dog (or a cat) chasing its tail for long enough time will eventually catch it.

Law #2 The catching will heart!

Law #3 Getting painful results will not stop the chase; it will stop only due to boredom or the exhaustion of all energy resources. 



Thursday, September 22, 2016

The law of Large Numbers fails in big data

The law of Large Numbers is often regarded as a sort of "law of nature" by which variables' averages always gravitate to fixed clear values. 
The question is, does the law of large number hold true in the case of big-data?
The key to the answer is in the law's underlying assumptions regarding sample-representation and data stability. One of the qualities that signify big data is Volatility. Volatility thrives in large multi-variant and closely-packed interrelated events that usually exist in big data, and it is the dynamics that follows which interferes in the convergence of averages and prevents it from happening.
In my view, even if the law of large numbers was true for big data, it would not have been of much use, due to its focus on common "average" behavior that is already known, rather than on irregularities and exceptions that are yet unknown and requiring research, such as in the study of early-detection indicators, adverse-effects, fraud-detection, quality-assurance, customer-retention, accidents, and long-tail marketing – to mention a few. Long-tails for example, consist of overlooked hidden phenomena, thus their discovery has to look, by definition, elsewhere then the already considered law of large numbers.

The above weak points of the law of large numbers, are just a small part of analytics "peculiarities" that can be expected in big data.
This paragraph is the first in series of assays on a proposed new concept of science in view of the IT industrial revolution.