Sunday, September 27, 2015

GT data mining demonstration - finances

Prediction of the daily US $ up/down change



Edith Ohri, edith@fabhighq.com




The goal


In this demo the goal has been to predict with 55% accuracy, the next day's $ direction (if it is going to be UP or DOWN).

To attain this, one is required to establish objective rules and formulas that are independent of the specific input.

  

How it works


a -  Deciding on input (from already existing available sources)
b -  Defining with GT the patterns of behavior (= groups) and cause-effect formulas.
c -   The above consist an expert system that is used then for early alerts and real time decisions.
d -  The expert system can improve itself and periodic reviews it rules.

Note: GT's formulas can be integrated in the control of almost any product.


The data set

The data include 760 daily records over two and a half years period, and 7 variables: Date, Open price, Close price, High, Low, and an index named RSI (Relative Strength Indicator - it compares the magnitude of recent gains to recent losses in an attempt to determine overbought and oversold. When it goes above 70 or below 30, it indicates that a stock is overbought or oversold and vulnerable to a trend reversal)
Rem.: Trade Volume information could not be attained in this demo.

On top of the 7 basic variables, another 30 or more calculated variables were added, such as Trends, Week Days etc.

The Test set includes 122 records from the end of the period.


Figure 1  Input records


The GT Learning results

First thing is creating a lower hurdle, which is "the best results that can one can achieve without the GT algorithm.
Here the lower hurdle was 55.7% right predictions in the Test set, and 56.6% in the Learning set.

Rem.: the good results are credited to the discovery of typical Weekdays' Close price changes.

Reaching beyond the assigned target

The assigned target of 55% prediction success was achieved, but it can be further improved with the GT Patterns-of-Behavior definition.

It is well known low (and quite intuitive one) which says that a greater precision can be always attained by adjusting the prediction factors to the subgroups of a given dataset. Following is a short demonstration of this low, by employing the special abilities of GT algorithm.


GT Results* 

(* Initial results, for this demonstration)

Count  of true/false predictions:
Right -   59%
Wrong - 41%


    • A 3% rate of improvement in right prediction was achieved in just the beginning of the GT process.
    • In a full data mining and input that includes detailed transactions, further significant improvement can be expected.

Improvement tips

      
1.    Include non-linear variables if there are, for example "RSI" – a non linear Relative Strength index, that describes the pressure on prices due to excess Demand or Supply.

2.    Split the data to hierarchical patterns of behavior.
  
3.    Avoid "overfitting" by assuming new subsets of data once exhausting their information.        
  

 Conclusion of example demo

   
GT proves effective in predicting the daily USD trend.
Finding the patterns (clusters) enables separate prediction to each segment and a greater precision.



GT success is in its Industrial & Management Engineering roots

  
a.      Its first application was on-the-job where the assignment was much practical, to improve the line work-flow, not to invent a theoretical model.

b.      Industrial Engineers are almost never expert in the area of application, therefore the model needed to be strengthened with scientific internal validations.

c.      As often done in IE the development was carried out without investors. That fact enabled a very long incubation period and the evolvement of important personal experience.

d.      The IE practical approach led to focusing on "discovery of hidden patterns", instead of the more academic approach that prioritizes correlations and the speed of execution.

e.      Full cycle product costs of implementation are considered, no hard sell wizardry.

f.       Real work forced starting the algorithm ahead of time, which turned out to help greatly to avoid conventional misconceptions...

g.      Product development means primarily its work method substantiation, not its market-share.

h.      From IE perspective it is only natural to offer an option of SaaS.

i.        IE should always adhere to the actual implementation on top of business musts.

j.        High-tech or not "we do business the old way, we earn it".




~~~~

Edith Ohri, Home of GT data mining

No comments:

Post a Comment