| title | Practical Machine Learning Overview | ||||
|---|---|---|---|---|---|
| author | Jeffrey Leek | ||||
| job | Johns Hopkins Bloomberg School of Public Health | ||||
| logo | bloomberg_shield.png | ||||
| framework | io2012 | ||||
| highlighter | highlight.js | ||||
| hitheme | tomorrow | ||||
| url |
|
||||
| widgets |
|
||||
| mode | selfcontained |
- Prediction study design
- Types of Errors
- Cross validation
- The caret package
- Plotting for prediction
- Preprocessing
- Predicting with regression
- Predicting with trees
- Boosting
- Bagging
- Model blending
- Forecasting
In general, Positive = identified and negative = rejected. Therefore:
- True positive = correctly identified
- False positive = incorrectly identified
- True negative = correctly rejected
- False negative = incorrectly rejected
Medical testing example:
- True positive = Sick people correctly diagnosed as sick
- False positive= Healthy people incorrectly identified as sick
- True negative = Healthy people correctly identified as healthy
- False negative = Sick people incorrectly identified as healthy.
http://en.wikipedia.org/wiki/Sensitivity_and_specificity
library(caret)
library(kernlab)
data(spam)
inTrain <- createDataPartition(y = spam$type, p = 0.75, list = FALSE)
training <- spam[inTrain, ]
testing <- spam[-inTrain, ]
M <- abs(cor(training[, -58]))
diag(M) <- 0
which(M > 0.8, arr.ind = T)## row col
## num415 34 32
## direct 40 32
## num857 32 34
## num857 32 40
- Start with a set of classifiers
$h_1,\ldots,h_k$
- Examples: All possible trees, all possible regression models, all possible cutoffs.
- Create a classifier that combines classification functions:
$f(x) = \rm{sgn}\left(\sum_{t=1}^T \alpha_t h_t(x)\right)$ .
- Goal is to minimize error (on training set)
- Iterative, select one
$h$ at each step - Calculate weights based on errors
- Upweight missed classifications and select next
$h$
http://webee.technion.ac.il/people/rmeir/BoostingTutorial.pdf