In my decades of custom programming and consultation, I have explored diverse applications, including automated analysis of high-altitude photographs, automated medical diagnosis, realtime detection of threatening military vehicles, and automated trading of financial markets. A common thread in all of these applications is that I was faced with a multitude of observed or computed variables, and my task involved finding and analyzing relationships among these variables. As a result, I have accumulated a wealth of algorithms for doing so. This book presents theoretical and intuitive justifications, along with highly commented source code, for my favorite data-mining techniques. This book makes no pretense of being 'complete' in any manner whatsoever. Please do not be annoyed if your own favorite techniques did not make my cut, or if the book ignores some popular standard techniques. These are simply the algorithms that I have found most useful in my own work over the years. Some of them are venerable old techniques such as the use of maximum-likelihood factor analysis for determining the degree to which variables contain unique information, versus being redundant due to hidden common factors impacting several variables. Some of them are powerful modern techniques, such as Combinatorially Symmetric Cross Validation for determining if a model is hampered by overfitting, or Feature Weighting as Regularized Energy-Based Learning for ranking variables in predictive power when there are too few training cases to employ traditional methods. Some of them are (I believe) my own invention, such as a method for clustering variables in the restricted context of a subspace of interest, and visual display of anomalous regions in which joint and marginal densities conflict, or in which contribution to mutual information is concentrated. But all of them share a great quality: I have found them to be exceptionally useful in my own data-mining endeavors. I suspect that you will as well.
"synopsis" may belong to another edition of this title.
Find the various relationships among variables that can be present in big data as well as other data sets. This book also covers information entropy, permutation tests, combinatorics, predictor selections, and eigenvalues to give you a well-rounded view of data mining and algorithms in C++.
Furthermore, Data Mining Algorithms in C++ includes classic techniques that are widely available in standard statistical packages, such as maximum likelihood factor analysis and varimax rotation. After reading and using this book, you'll come away with many code samples and routines that can be repurposed into your own data mining tools and algorithms toolbox. This will allow you to integrate these techniques in your various data and analysis projects.
You will:
Timothy Masters has a PhD in statistics and is an experienced programmer. His dissertation was in image analysis. His career moved in the direction of signal processing, and for the last 25 years he's been involved in the development of automated trading systems in various financial markets.
"About this title" may belong to another edition of this title.
(No Available Copies)
Search Books: Create a WantCan't find the book you're looking for? We'll keep searching for you. If one of our booksellers adds it to AbeBooks, we'll let you know!
Create a Want