Probability Cheat Sheet

mathGrades 10-126 sections

Visual Overview: Probability Rules

The key insight: Always subtract the intersection when combining probabilities to avoid double-counting

Basic Probability

// Probability: Likelihood of event (0 to 1)
P(A) = 0: Impossible
P(A) = 1: Certain
P(A) = 0.5: Equally likely

// Complement
P(A^c) = 1 - P(A)

// Conditional probability
P(A|B) = P(A and B) / P(B)
Probability of A given B happened

Example: P(rain | cloudy)

// Independence
A and B independent if P(A|B) = P(A)
Occurrence of B doesn't affect A

// Joint probability
P(A and B) = P(A) × P(B) if independent
P(A and B) = P(A) × P(B|A) if dependent

// Marginal probability
P(A) = Σ P(A and B_i) for all B_i
Sum over all possible other events

// Law of total probability
P(A) = P(A|B) × P(B) + P(A|¬B) × P(¬B)

// Example: Disease detection
P(positive) = P(positive|disease) × P(disease)
           + P(positive|no disease) × P(no disease)
           = 0.95 × 0.01 + 0.05 × 0.99
           = 0.0545

Bayes Theorem

// Bayes Theorem: Update beliefs with evidence
P(A|B) = P(B|A) × P(A) / P(B)

P(A|B): Posterior (updated probability)
P(B|A): Likelihood (evidence strength)
P(A): Prior (initial belief)
P(B): Evidence (normalizing constant)

// Bayesian reasoning
Start: Prior belief P(A)
Observe: Event B
Update: P(A|B) = P(B|A) × P(A) / P(B)

// Medical test example
Disease prevalence: 1% (prior)
Test sensitivity: 95% (P(positive|disease))
Test specificity: 95% (P(negative|no disease))

P(disease|positive) = 0.95 × 0.01 / P(positive)

P(positive) = 0.95 × 0.01 + 0.05 × 0.99 = 0.0545

P(disease|positive) = 0.0095 / 0.0545 ≈ 17.4%

Surprising! Despite 95% accurate test, only 17% chance of disease

// Spam email example
Prior: 80% of emails are spam
P(word "cash" | spam) = 0.8
P(word "cash" | legitimate) = 0.1

If email contains "cash":
P(spam | "cash") = 0.8 × 0.8 / P("cash")
= 0.64 / [0.64 + 0.1 × 0.2]
= 0.64 / 0.66 ≈ 97%

// Bayesian updates (multiple evidence)
Posterior becomes new prior for next observation
P(A|B,C) = P(C|A,B) × P(A|B) / P(C|B)

// Machine learning
Naive Bayes classifier uses Bayes theorem
P(class|features) = P(features|class) × P(class) / P(features)

Expectation & Variance

// Expected value: Long-run average
E[X] = Σ x × P(x)  (discrete)
E[X] = ∫ x × f(x) dx  (continuous)

Example: Fair die
E[X] = 1×(1/6) + 2×(1/6) + ... + 6×(1/6) = 3.5

// Linearity of expectation
E[X + Y] = E[X] + E[Y]  (always true!)
E[c×X] = c × E[X]

// Variance: Spread around mean
Var(X) = E[(X - μ)²] = E[X²] - (E[X])²
σ = √Var(X)  (standard deviation)

// Example: Fair die
μ = 3.5
E[X²] = 1²×(1/6) + ... + 6²×(1/6) = 91/6
Var(X) = 91/6 - (3.5)² = 35/12 ≈ 2.92
σ ≈ 1.71

// Variance properties
Var(X + c) = Var(X)  (adding constant doesn't change spread)
Var(c×X) = c²×Var(X)  (scaling increases variance)
Var(X + Y) = Var(X) + Var(Y) if independent

// Covariance: Joint variability
Cov(X,Y) = E[(X-μ_X)(Y-μ_Y)]

Positive: When X high, Y tends high
Negative: When X high, Y tends low
Zero: No linear relationship

// Correlation coefficient
ρ = Cov(X,Y) / (σ_X × σ_Y)
Range: -1 to 1
Standard measure of linear relationship

Distributions (Properties)

Distribution	Mean μ	Variance σ²	Practical Use
Binomial (n,p)	np	np(1-p)	Number of successes in n trials
Poisson (λ)	λ	λ	Count of rare events
Normal (μ,σ)	μ	σ²	Natural phenomena
Exponential (λ)	1/λ	1/λ²	Waiting time
Uniform (a,b)	(a+b)/2	(b-a)²/12	Equal probability
Beta (α,β)	α/(α+β)	-	Probabilities as RV
Chi-squared (k)	k	2k	Variance testing

68-95-99.7 rule (Normal): 68% within μ±σ, 95% within μ±2σ, 99.7% within μ±3σ

Probability Inequalities & Laws

// Markov's inequality
For non-negative X: P(X ≥ a) ≤ E[X] / a
Loose but always true bound

// Chebyshev's inequality
P(|X - μ| ≥ kσ) ≤ 1/k²
E.g., P(|X - μ| ≥ 2σ) ≤ 1/4 = 25%
(Normal: actually ≈ 5%)

// Law of Large Numbers
Average of many independent samples → expected value
X̄ = (X₁ + ... + X_n) / n → E[X] as n → ∞
Justifies: Using sample mean as estimate

// Central Limit Theorem
Sum/average of many independent variables → Normal distribution
Regardless of original distribution!
X̄ ~ Normal(μ, σ²/n)

Example: Average of 30 dice rolls ~ Normal

// Bernoulli's inequality
(1 + x)^n ≥ 1 + nx for x ≥ -1

// Triangle inequality
P(A ∪ B) ≤ P(A) + P(B)
Union probability ≤ sum of probabilities

// Bonferroni correction
Testing multiple hypotheses: Divide α by number of tests
α_adjusted = α / m
Prevents: Multiple comparisons increasing false positives

// Example: 20 hypothesis tests
Normal α = 0.05
Bonferroni α = 0.05 / 20 = 0.0025
Much stricter threshold