Implementation
Based on paper by "Classification and Regression Trees" by Brieman et al.
Parameters
- Related to Tree Pruning (Post-Build)
- use_1st_rule, truncate_prune_tree
- Is cv_fold use for both tree building and pruning with respect to average Gini error?
- Support surrogate splits to handle unknown data - meaning it finds 'backup' attributes from the feature vector that would split the node with similar 'purity'.
Sample (mushroom)
- Classification of mushrooms of being Poisonous or Edible based on 20 discrete attributes.
- Demonstrate decision tree traversal with interactive-prediction phase.
- Display a table of importance of attributes after the tree is built.
- Lots of data available to use from UC-Irvine ML Data Repository (see Resources)
- Sample could be easily modified to tackle other classification databases.
- Setting 'penalty-weight' to 1 gives 8 percent of false-negatives. Quickly decreased to 0 when it is set to 2.
- http://www.statsoft.com/textbook/classification-and-regression-trees/
- http://www.ics.uci.edu/~mlearn/MLRepository.html (Machine Learning Data Set Repository at UC-Irvine)
Readings
- Learning OpenCV, Gary Bradski & Adrian Kaebler (O'Reilly Press)
- Introduction to Machine Learning, 2nd Edition, Ethern Alpaydin (MIT Press)
No comments:
Post a Comment