Machine Learning Cheat Sheet

ai-mlGrades 10-128 sections

Visual Overview: ML Algorithm Decision Flowchart

Choose your approach based on whether your data is labeled and what patterns you're looking for

Learning Types

Type	Label	Examples	Goal
Supervised	Yes	Regression, Classification	Predict output from input
Unsupervised	No	Clustering, Dimensionality Reduction	Find hidden patterns
Reinforcement	Reward	Game AI, Robotics	Maximize reward through actions
Semi-supervised	Mixed	Few labeled + many unlabeled	Leverage unlabeled data
Self-supervised	Self-generated	Contrastive learning, BERT	Learn from unlabeled data

Regression Algorithms

Algorithm	Complexity	When to Use	Notes
Linear Regression	Low	Linear relationship	Fast, interpretable, simple baseline
Polynomial Regression	Medium	Curved relationships	Prone to overfitting
Ridge/Lasso	Low	With multicollinearity	Adds regularization penalty
SVR (Support Vector Regression)	Medium	Non-linear, outliers	Good for small-medium datasets
Decision Tree Regression	Medium	Non-linear, interactions	Easy to interpret
Random Forest	High	Complex patterns	Ensemble, reduces overfitting
Gradient Boosting	High	Maximum accuracy	Often wins competitions

Classification Algorithms

Algorithm	Data Type	When to Use	Pros/Cons
Logistic Regression	Linear	Fast baseline	Simple, interpretable
Naive Bayes	Probabilistic	Text, fast	Assumes independence, biased
SVM	Non-linear	Small-medium data	Powerful, slow on large data
Decision Tree	Non-linear	Interpretability needed	Easy to overfit
Random Forest	Non-linear	Best general-purpose	Accurate, less interpretable
Gradient Boosting	Non-linear	High accuracy needed	Slow, prone to overfitting
K-Nearest Neighbors	Non-linear	Small datasets	Simple, slow prediction
Neural Network	Complex	Deep patterns, large data	Powerful, needs tuning

Unsupervised Learning

// Clustering
K-Means: Group by distance to centroids
Hierarchical: Tree-like cluster structure
DBSCAN: Density-based, finds irregular shapes
Gaussian Mixture: Probabilistic clustering

// Dimensionality Reduction
PCA: Linear dimensionality reduction
t-SNE: Non-linear visualization (2D/3D)
UMAP: Better than t-SNE for structure
Autoencoders: Neural network compression

// Rules
Apriori: Find frequent itemsets
Eclat: Depth-first variant
Association Rules: If X then Y

// Anomaly Detection
Isolation Forest: Isolate anomalies
Local Outlier Factor (LOF): Density-based
One-Class SVM: Learn normal behavior

Evaluation Metrics

Metric	Formula/Use	Range	Good When
Accuracy	(TP+TN)/(TP+TN+FP+FN)	0-1	Balanced classes
Precision	TP/(TP+FP)	0-1	False positives costly
Recall	TP/(TP+FN)	0-1	False negatives costly
F1-Score	2×(Precision×Recall)/(P+R)	0-1	Balance P & R
ROC-AUC	Area under curve	0-1	Threshold-independent
PR-AUC	Precision-Recall curve	0-1	Imbalanced classes
MAE	Mean Absolute Error	0-∞	Regression
RMSE	√(Mean Squared Error)	0-∞	Regression, penalizes large errors
R²	Variance explained	-∞ to 1	Regression

Bias-Variance Tradeoff

// High Bias (Underfitting)
- Model too simple for data
- High training error
- High test error
- Example: Linear model on non-linear data

Solutions:
→ Increase model complexity
→ Add more features
→ Train longer
→ Reduce regularization

// High Variance (Overfitting)
- Model too complex for data
- Low training error
- High test error
- Example: High-degree polynomial on few samples

Solutions:
→ More training data
→ Reduce model complexity
→ Regularization (L1/L2)
→ Dropout, early stopping
→ Cross-validation

// Optimal Balance
→ Use validation set to find sweet spot
→ Learning curves: plot train vs validation loss
→ Bias decreases, Variance increases with model complexity

Hyperparameter Tuning

// Grid Search: Try all combinations
from sklearn.model_selection import GridSearchCV
params = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf']
}
grid = GridSearchCV(SVM(), params, cv=5)
grid.fit(X, y)
best_model = grid.best_estimator_

// Random Search: Random sample of space
from sklearn.model_selection import RandomizedSearchCV
random = RandomizedSearchCV(model, params, n_iter=20, cv=5)
random.fit(X, y)

// Cross-Validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
# Splits data into 5 folds, trains 5 times

// Learning Rate Schedules
Constant: lr = 0.01
Step decay: Reduce after N epochs
Exponential decay: lr = lr0 × e^(-kt)
1/t decay: lr = lr0 / (1 + kt)

// Early Stopping
Stop training when validation loss plateaus
Prevents overfitting and saves computation

Machine Learning Cheat Sheet

Visual Overview: ML Algorithm Decision Flowchart

Learning Types

Regression Algorithms

Classification Algorithms

Unsupervised Learning

Evaluation Metrics

Bias-Variance Tradeoff

Hyperparameter Tuning

More Cheat Sheets

Deep Learning Cheat Sheet

Neural Network Math Cheat Sheet

NLP Essentials Cheat Sheet