Choose your approach based on whether your data is labeled and what patterns you're looking for
Machine Learning Cheat Sheet
Visual Overview: ML Algorithm Decision Flowchart
Learning Types
| Type | Label | Examples | Goal |
|---|---|---|---|
| Supervised | Yes | Regression, Classification | Predict output from input |
| Unsupervised | No | Clustering, Dimensionality Reduction | Find hidden patterns |
| Reinforcement | Reward | Game AI, Robotics | Maximize reward through actions |
| Semi-supervised | Mixed | Few labeled + many unlabeled | Leverage unlabeled data |
| Self-supervised | Self-generated | Contrastive learning, BERT | Learn from unlabeled data |
Regression Algorithms
| Algorithm | Complexity | When to Use | Notes |
|---|---|---|---|
| Linear Regression | Low | Linear relationship | Fast, interpretable, simple baseline |
| Polynomial Regression | Medium | Curved relationships | Prone to overfitting |
| Ridge/Lasso | Low | With multicollinearity | Adds regularization penalty |
| SVR (Support Vector Regression) | Medium | Non-linear, outliers | Good for small-medium datasets |
| Decision Tree Regression | Medium | Non-linear, interactions | Easy to interpret |
| Random Forest | High | Complex patterns | Ensemble, reduces overfitting |
| Gradient Boosting | High | Maximum accuracy | Often wins competitions |
Classification Algorithms
| Algorithm | Data Type | When to Use | Pros/Cons |
|---|---|---|---|
| Logistic Regression | Linear | Fast baseline | Simple, interpretable |
| Naive Bayes | Probabilistic | Text, fast | Assumes independence, biased |
| SVM | Non-linear | Small-medium data | Powerful, slow on large data |
| Decision Tree | Non-linear | Interpretability needed | Easy to overfit |
| Random Forest | Non-linear | Best general-purpose | Accurate, less interpretable |
| Gradient Boosting | Non-linear | High accuracy needed | Slow, prone to overfitting |
| K-Nearest Neighbors | Non-linear | Small datasets | Simple, slow prediction |
| Neural Network | Complex | Deep patterns, large data | Powerful, needs tuning |
Unsupervised Learning
// Clustering
K-Means: Group by distance to centroids
Hierarchical: Tree-like cluster structure
DBSCAN: Density-based, finds irregular shapes
Gaussian Mixture: Probabilistic clustering
// Dimensionality Reduction
PCA: Linear dimensionality reduction
t-SNE: Non-linear visualization (2D/3D)
UMAP: Better than t-SNE for structure
Autoencoders: Neural network compression
// Rules
Apriori: Find frequent itemsets
Eclat: Depth-first variant
Association Rules: If X then Y
// Anomaly Detection
Isolation Forest: Isolate anomalies
Local Outlier Factor (LOF): Density-based
One-Class SVM: Learn normal behavior
Evaluation Metrics
| Metric | Formula/Use | Range | Good When |
|---|---|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | 0-1 | Balanced classes |
| Precision | TP/(TP+FP) | 0-1 | False positives costly |
| Recall | TP/(TP+FN) | 0-1 | False negatives costly |
| F1-Score | 2×(Precision×Recall)/(P+R) | 0-1 | Balance P & R |
| ROC-AUC | Area under curve | 0-1 | Threshold-independent |
| PR-AUC | Precision-Recall curve | 0-1 | Imbalanced classes |
| MAE | Mean Absolute Error | 0-∞ | Regression |
| RMSE | √(Mean Squared Error) | 0-∞ | Regression, penalizes large errors |
| R² | Variance explained | -∞ to 1 | Regression |
Bias-Variance Tradeoff
// High Bias (Underfitting)
- Model too simple for data
- High training error
- High test error
- Example: Linear model on non-linear data
Solutions:
→ Increase model complexity
→ Add more features
→ Train longer
→ Reduce regularization
// High Variance (Overfitting)
- Model too complex for data
- Low training error
- High test error
- Example: High-degree polynomial on few samples
Solutions:
→ More training data
→ Reduce model complexity
→ Regularization (L1/L2)
→ Dropout, early stopping
→ Cross-validation
// Optimal Balance
→ Use validation set to find sweet spot
→ Learning curves: plot train vs validation loss
→ Bias decreases, Variance increases with model complexity
Hyperparameter Tuning
// Grid Search: Try all combinations
from sklearn.model_selection import GridSearchCV
params = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf']
}
grid = GridSearchCV(SVM(), params, cv=5)
grid.fit(X, y)
best_model = grid.best_estimator_
// Random Search: Random sample of space
from sklearn.model_selection import RandomizedSearchCV
random = RandomizedSearchCV(model, params, n_iter=20, cv=5)
random.fit(X, y)
// Cross-Validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
# Splits data into 5 folds, trains 5 times
// Learning Rate Schedules
Constant: lr = 0.01
Step decay: Reduce after N epochs
Exponential decay: lr = lr0 × e^(-kt)
1/t decay: lr = lr0 / (1 + kt)
// Early Stopping
Stop training when validation loss plateaus
Prevents overfitting and saves computation