Decision Trees

Sources
1. ICML '96
  1. Non-Linear Decision Trees -- NDT
    1. Andreas Ittner
    2. Michael Schlosser
  2. Experiments with a New Boosting Algorithm
    1. Subtopic 1
    2. Yoav Freund
    3. Robert E. Schapire
Induction
1. advantages
  1. relatively inexpensive
  2. thorough
2. "impurity"
  1. essentially randomness
  2. value between 0 and 1
    1. Topic
  3. lower is better
  4. evaluation function
    1. examples
      1. entropy
      2. gini
3. algorithms
  1. ID3
    1. top-down induction
  2. C4.5
    1. good performance
  3. AdaBoost
4. test selection
  1. information theory
    1. evaluate information gain
accuracy
1. error rate
  1. true
  2. apparent
2. overfitting
  1. smaller is better
  2. pruning
3. tree size/complexity
  1. larger
    1. lower apparent error rate
    2. higher true error rate
      1. overfitting
      2. good on training data
      3. 100% for decision trees
      4. poor on novel test cases
  2. smaller
    1. higher apparent error rate
    2. lower true error rate
    3. more "generality"
      1. fewer branches
      2. fewer conjunctions
      3. fewer attribute comparisons
pruning
1. alternatives
  1. lookahead
2. effectiveness
  1. empirically proven
3. improvements
  1. reduced-error pruning
  2. weakest link
  3. train and test
  4. resampling
problems
1. data issues
  1. bad data
  2. some attribute data missing
  3. continuous attributes
  4. large data sets
  5. solutions
    1. new learning algorithms
    2. techniques
      1. bagging
      2. boosting
comparison
1. production rules
  1. decision trees
    1. mutually exclusive paths
    2. easier to visualize
    3. easier to generate
  2. production rules
    1. not mutually exclusive
      1. may require ordering of rules
    2. more complex learning system
    3. more powerful
  3. compatibility
    1. easy mapping from decistion trees to rules
applications
1. fault detection
2. data mining
3. OCR
machine learning
1. organization of knowledge
  1. static objects
    1. decision list
    2. inference network
    3. concept hierarchy
      1. decision trees
      2. discrimination networks
  2. change over time
    1. state-transition networks
    2. search-control rules
    3. macro-operators
2. learning methods
  1. nonincremental
  2. incremental
3. problem types
  1. online
  2. offline
4. paradigms (langley 21)
  1. neural networks
  2. case-based learning
  3. genetic algorithms
  4. *rule induction
    1. structures
      1. condition-action rules
      2. *decision trees
      3. similar logical structures
    2. methods
      1. recursive partitioning
      2. disjoint sets
      3. "classes"
      4. conjunction of logical conditions
  5. analytic learning
    1. uses search to solve multi-step problems
    2. *backward chaining (me)
      1. represents knowledge as rules
      2. problems phrased as theorems
      3. performance system searches for proofs
  6. hybrid methods
    1. becoming more common
    2. field is maturing
    3. convergence of paradigms
expert systems
1. encoding of expert knowledge
2. sometimes used as implementation
3. used to build expert systems
types
1. fuzzy logic
  1. crisp dTrees
  2. fuzzy dTrees
2. univariate
3. multivariate
  1. oblique
4. non-linear multivariate