Tuesday, September 15, 2015

Digging in financial data

Using any data for in-depth conclusions

Lessons from a  GT study* of 1,000 NYSE companies from year 2000, just before the dot-com bubble crash. 

---------------
https://docs.google.com/file/d/0B1tc2-
duf3_4YzM2M2M2OWMtZjAwNS00Y2FlLWJhOWUtOTc3ZjM3NTY1YzVm/edit?usp=sharing


Conclusion 1
A pattern of behavior can be as small as a fraction of 1% of the total number of events.
In this study, GT found a tiny subgroup containing only 4 out of 1000 "exception" companies. It consists of 4 banks with the exception feature of very high net profit - twice as much as others in the financial sector. An explanation to their unusual high performances was offered 8 years(!) later, during the 2008 credit /derivatives crisis, when the 4 banks' names were mentioned in news headlines.

Conclusion 2
Large data sets require a general view on top of the detailed one.Here the general view fit almost exactly the common Industries definition. There is only one difference, yet a most significant one, some giant corporations are found to behave like financial institutes rather than their own Industries. This observation strengthen our understanding of the 2008 crisis.

Conclusion 3
Big data is about using AVAILABLE unsupervised data, without cleaning as commonly suggested.
This study is based on free data from http://www.ics.uci.edu. The data quality seems insufficient for research: there is no historical "depth", no shares value information, and the sample does not reflect the subgroups. Yet, GT turned quite good results. It means that data are useful even if partial and unsupervised!  

No comments:

Post a Comment