Thursday, June 13, 2013

Some thoughts on big data challenges


Here is a list of challenges from my personal encounters with the subject:

  1. How to make use of unsupervised data? 
  2. Untangling mixed phenomena
  3. The need for on time (unexpected) decisions
  4. Identifying "black swans"
  5. Deploying legacy data - this is similar to #1 using unsupervised data
  6. Devising a method for exponential growth of data 
  7. Using old tools in a new environment
  8. Is there any size that is too big to handle?
  9. Statistics in a dynamic reality
  10. What would be considered a right hypothesis?
    (or is there such a thing as a wrong question to ask?)

~~~~~~~~~~~
A Buddhist story about blind men trying to describe an elephant:


Five blind people were asked to describe an elephant. Each felt a part of the elephant. One person felt the elephant's trunk and said it is just like a plow pole. A second person touched the elephant's foot and said it is just like a post. A third person felt the elephant's tusk and said it is just like a plowshare. A fourth person had a hold of the elephant's tail and said it is just like a broom.  A fifth person felt the elephant's ear and said it is like a winnowing basket. As each one described the elephant, the others disagreed...



1 comment:

  1. Hi, seems that I missed in the above list an important challenge: OVER-FITTING.
    That problem is avoided by the GT's self verification process; an over-fitting result would fail its Consistency Tests.
    I'll discuss it it in the next post - "How to not let the data ruin your (beautiful) theory"

    ReplyDelete