Why GT, and why data mining at all?
My quest for mining algorithm started a long while ago. I sort of grew up with that field. It intrigued me to know, how natural formation of data (clusters) occur? Are there any principles? And how may one make use of them?
In this blog I'll try to write about these and other subjects of what makes data mining tick.
Thanks to Avishai Schur from
FabHighQ for encouraging me to open this blog. See also presentation list and posts in Hebrew at http://gtdatamning-heb.blogspot.co.il/
Your comments are appreciated.
Edith
What
is GT and what does it stand for?
GT is a solution for creating new hypotheses based on
identifying patterns of behavior. The special thing about it is hierarchical
clustering and analytics (analysis) of unsupervised data.
Origins
The name GT stands for Group Technology that is an old method of
Industrial Engineering aiming to increase the efficiency of production and
material handling by grouping items according to their similarity. In
today's work environment its function is extended from the original
shop-floor management to the management of "any type of database entities". GT can be regarded in this sense, as the abstract/universal generalized
model of the old Group Technology.
Group Technology consists of several methods that were developed
through the years, starting in World War II when the Russians needed
to relocate their factories and move them to the East where they could be safe from the advancing German army. Their idea was to keep the
different product lines in a simple order that would be quick to reconstruct.
That order which they defined resembled the western "production
line" approach, with one difference - instead of work-orders for
identical items, the Russians allowed Groups of mixed items that shared the
same processing route.
Evolvement
The Group approach gained more and more appeal in the West due to
(to the best of my knowledge) two emerging technologies that later on swept the
manufacturing world:
(a) Operations Research with its efficiency
optimization – one should mention a prominent professor at Cranefield
University England – Sir John Burbidge, who was knighted by the Queen for his
activity in this field.
(b) Flexible Manufacturing developed in Japan as
part of the CNC and cell-production concept.
Both technologies – Operations Research and Flexible
Manufacturing, had to deal with increasingly diversified products and
activities, for which the flexibility embedded in the multi-functional Groups,
had a tremendous advantage compared to the rigid idea of dedicated mass production lines.
Then in the 80's, a third leap occurred that brought forward the GT
idea as a desirable solution - the IT revolution. IT has
introduced 'information' as an item by itself (not just adjacent to 'real'
items) and by this it opened the door to many new products and changes in the
organization and the whole commercial scene. As IT redefined
almost everything it needed also to rebalance and regain efficiency, and
the GT ability to organize the work in Groups or Clusters according to
processing sequence, has proven more valid than ever. This need to reorganize production was the initial aim of my GT data mining algorithm.
All the above-mentioned upheavals were, as it appears, just an introduction
to what comes to be known as Big Data. Big Data poses new
challenges to data mining analysts, and mostly two features which have now become critical – AI and automation.
But how to generate AI rules automatically? Can we replace the
expert in creating insights, observations, and new hypotheses?
For testing purposes we have well
developed methods, but for creating hypotheses (to the testing) - nothing!
This statement deserves a whole discussion of its own. For a start, the basic
solution of GT data mining is about making new rules and validating them methodically.