| Aspect | SQL (Relational) | NoSQL (Non-relational) |
|---|---|---|
| Structure | Tables with fixed schema | Flexible (documents, key-value, graphs) |
| Schema | Predefined | Dynamic |
| Scaling | Vertical (bigger server) | Horizontal (distributed) |
| ACID | Guaranteed | BASE (eventual consistency) |
| Joins | Multiple tables (powerful) | Denormalized data |
| Transactions | Complex, multi-table | Limited |
| Examples | PostgreSQL, MySQL, Oracle | MongoDB, Redis, Cassandra, DynamoDB |
| Best for | Complex queries, ACID needs | Big data, speed, flexibility |
Databases Cheat Sheet
SQL vs NoSQL
ACID Properties
// ACID: Guarantees for reliable transactions
A - Atomicity: All or nothing
Either all operations complete OR all rollback
Example: Transfer ₹100 between accounts
- If withdrawal succeeds but deposit fails → ROLLBACK both
C - Consistency: Valid state transitions
Database moves from valid state to valid state
Constraints maintained (primary key unique, foreign keys valid)
Example: Account balance never negative
I - Isolation: No interference between transactions
Concurrent transactions don't see each other's intermediate states
Levels (weakest → strongest):
- READ UNCOMMITTED: Dirty reads allowed
- READ COMMITTED: See committed data only (default)
- REPEATABLE READ: Same data on re-read
- SERIALIZABLE: Transactions are serialized
D - Durability: Persisted data survives failures
Committed data isn't lost
Written to disk, survives power loss
Example: After COMMIT, ₹100 transfer is permanent
// Example transaction (SQL)
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
// If error between statements:
ROLLBACK; -- Reverts both updates
Normalization & Design
// Normalization: Eliminate redundancy
1st Normal Form (1NF):
- Atomic values only (no repeating groups)
- ❌ User: {name, tags: [java, python]}
- ✓ UserTag table: {user_id, tag}
2nd Normal Form (2NF):
- 1NF + Non-key attributes fully depend on primary key
- ❌ Order: {order_id, user_id, user_name, product_id}
(user_name depends on user_id, not order_id)
- ✓ User & Order tables separately
3rd Normal Form (3NF):
- 2NF + Non-key attributes don't depend on each other
- ❌ User: {id, email, country, country_code}
(country_code depends on country, not user)
- ✓ User & Country tables separately
Boyce-Codd Normal Form (BCNF):
- Stricter than 3NF, every determinant is a candidate key
- Rare in practice
// Denormalization: Intentional redundancy for speed
- Duplicate data to avoid expensive joins
- Trade: Update complexity vs query speed
- Example: Store user_name in Order table (avoid User join)
// Key concepts
Primary Key: Unique identifier
Foreign Key: References another table
Surrogate Key: Artificial (auto-increment)
Natural Key: Domain-meaningful (email)
Indexing & Query Optimization
// Indexes: Speed up queries
B-Tree: Default, range queries fast
Hash: Fast equality, not range
Full-text: Text search
Bitmap: Low cardinality (gender: M/F)
// When to index
✓ WHERE clause columns
✓ JOIN columns
✓ ORDER BY columns
✓ SELECT columns in cover query (avoid table lookup)
❌ Low cardinality (gender, boolean)
❌ Frequently updated columns
// Composite indexes
CREATE INDEX idx_name_email ON users(name, email);
Useful for:
- (name, email) queries
- name-only queries
- NOT useful for email-only (index skipped)
// Query analysis
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Look for:
- Index used (good)
- Full table scan (bad)
- Sequential scan (expected sometimes)
// Slow queries
❌ Functions in WHERE: YEAR(date) = 2024
✓ Range: date >= '2024-01-01' AND date < '2025-01-01'
❌ SELECT *
✓ SELECT specific columns
❌ N+1 queries: Loop fetching individual records
✓ Batch query or join
// Caching strategies
Query cache: Cache results
Application cache: Redis, Memcached
CDN: Cache static assets
Sharding & Scaling
// Scaling: Growing data/traffic
Vertical: Bigger server (limited, expensive)
Horizontal: More servers (distributed)
// Sharding: Horizontal partitioning
Split data across multiple databases
Each shard holds subset of data
Sharding strategies:
Range-based: id 1-1M on shard1, 1M-2M on shard2
Hash-based: hash(user_id) % num_shards
Directory-based: Lookup table maps keys to shards
Geographic: By region
Challenges:
- Hot shards (uneven distribution)
- Cross-shard joins (complex)
- Rebalancing (moving data)
- Transactions across shards
// Replication
Master-Slave: Master writes, slaves read-only
Multi-Master: All can write (conflicts possible)
Trade-offs: Consistency vs availability
// CAP Theorem
Can achieve 2 of 3:
C - Consistency: All nodes see same data
A - Availability: Service always available
P - Partition tolerance: Network failures
Examples:
CP: PostgreSQL (prioritize consistency)
AP: MongoDB (prioritize availability)
NoSQL Types
| Type | Storage | Examples | Use Cases |
|---|---|---|---|
| Document | JSON/BSON | MongoDB, CouchDB | Flexible schema, nested data |
| Key-Value | Hash table | Redis, Memcached | Cache, sessions, real-time |
| Column Family | Sparse columns | Cassandra, HBase | Time-series, wide tables |
| Graph | Nodes & edges | Neo4j, ArangoDB | Relationships, social networks |
| Search | Inverted index | Elasticsearch, Solr | Full-text search, logs |