Here is a list of challenges from my personal encounters with the subject:
- How to make use of unsupervised data?
- Untangling mixed phenomena
- The need for on time (unexpected) decisions
- Identifying "black swans"
- Deploying legacy data - this is similar to #1 using unsupervised data
- Devising a method for exponential growth of data
- Using old tools in a new environment
- Is there any size that is too big to handle?
- Statistics in a dynamic reality
- What would be considered a right hypothesis?
(or is there such a thing as a wrong question to ask?)
~~~~~~~~~~~
A Buddhist story about blind men trying to describe an elephant:
Five blind people were asked to describe an elephant. Each felt
a part of the elephant. One person felt the elephant's trunk and said
it is just like a plow pole. A second person touched the elephant's foot
and said it is just like a post. A third person felt the elephant's tusk and
said it is just like a plowshare. A fourth person had a hold of the elephant's
tail and said it is just like a broom. A fifth person felt the elephant's
ear and said it is like a winnowing basket. As each one described the elephant, the
others disagreed...
Hi, seems that I missed in the above list an important challenge: OVER-FITTING.
ReplyDeleteThat problem is avoided by the GT's self verification process; an over-fitting result would fail its Consistency Tests.
I'll discuss it it in the next post - "How to not let the data ruin your (beautiful) theory"