Home → Grade 8 → NumPy: Fast Numerical Computing

NumPy: Fast Numerical Computing

📚 Databases & Data Science⏱️ 16 min read🎓 Grade 8

✍️ AI Computer Institute Editorial Team Published: March 2026 CBSE-aligned · Peer-reviewed · 16 min read

Content curated by subject matter experts with IIT/NIT backgrounds. All chapters are fact-checked against official CBSE/NCERT syllabi.

NumPy: Supercharge Your Numerical Computing 100x Faster

NumPy powers data science, machine learning, and scientific computing worldwide. Google, NASA (ISRO), and every AI company use NumPy for processing massive datasets. Pure Python loops are slow—NumPy arrays are 10-100x faster because they're implemented in C. In this chapter, you'll learn vectorized operations that replace loops and unlock professional-grade data processing.

Why NumPy? Python is Slow, NumPy is Fast

Consider processing 1 million student marks:

import numpy as np
import time

marks = list(range(1000000))  # 1 million marks

# Pure Python (SLOW)
start = time.time()
squared = []
for mark in marks:
    squared.append(mark ** 2)
python_time = time.time() - start
print(f"Python loop: {python_time:.3f} seconds")

# NumPy (FAST!)
marks_array = np.array(marks)
start = time.time()
squared_numpy = marks_array ** 2
numpy_time = time.time() - start
print(f"NumPy: {numpy_time:.3f} seconds")

print(f"NumPy is {python_time / numpy_time:.0f}x faster!")
# Output: NumPy is 50x faster!

Creating NumPy Arrays: The Foundation

Arrays are NumPy's core data structure. They're like lists but much more powerful:

import numpy as np

print("=== Creating arrays ===")

# From Python list
arr = np.array([1, 2, 3, 4, 5])
print(f"Array from list: {arr}")
print(f"Array shape: {arr.shape}, dtype: {arr.dtype}")  # shape: (5,), dtype: int64

# 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"2D array shape: {matrix.shape}")  # (3, 3)

# Generate special arrays
print("
=== Special arrays ===")
zeros = np.zeros(5)              # [0. 0. 0. 0. 0.]
ones = np.ones((2, 3))           # [[1. 1. 1.]
                                  #  [1. 1. 1.]]
identity = np.eye(3)             # Identity matrix (1s on diagonal)
range_arr = np.arange(0, 10, 2)  # [0 2 4 6 8]
linspace_arr = np.linspace(0, 1, 5)  # [0.   0.25 0.5  0.75 1.  ]

print(f"Zeros: {zeros}")
print(f"Range: {range_arr}")
print(f"Linspace: {linspace_arr}")

# Random arrays (useful for simulations)
print("
=== Random arrays ===")
random_marks = np.random.randint(0, 100, 10)  # 10 random marks 0-99
print(f"Random marks: {random_marks}")

random_normal = np.random.normal(75, 10, 5)   # Normal distribution (mean=75, std=10)
print(f"Random normal: {random_normal}")

Array Indexing and Slicing: Smart Data Access

import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(f"Original array: {arr}")

print("
=== 1D Indexing ===")
print(f"First element: {arr[0]}")        # 10
print(f"Last element: {arr[-1]}")        # 100
print(f"Fifth element: {arr[4]}")        # 50

print("
=== 1D Slicing ===")
print(f"Elements 2-5: {arr[2:5]}")       # [30 40 50]
print(f"Every 2nd element: {arr[::2]}")  # [10 30 50 70 90]
print(f"Reversed: {arr[::-1]}")          # [100 90 80 ... 20 10]
print(f"Last 3 elements: {arr[-3:]}")    # [80 90 100]

print("
=== 2D Indexing (Matrix) ===")
matrix = np.arange(1, 13).reshape(3, 4)
print("Matrix:")
print(matrix)
# Output:
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]

print(f"Element at row 1, col 2: {matrix[1, 2]}")  # 7
print(f"First row: {matrix[0, :]}")                # [1 2 3 4]
print(f"Second column: {matrix[:, 1]}")            # [2 6 10]
print(f"Submatrix (first 2 rows, last 2 cols):")
print(matrix[0:2, 2:4])  # [[3 4] [7 8]]

# Real example: Extract student marks for specific subjects
print("
=== Real Example: Student Marks ===")
student_marks = np.array([
    [92, 88, 85],  # Student 1: Math, Science, English
    [78, 82, 79],  # Student 2
    [95, 93, 94],  # Student 3
])

print(f"All Science scores: {student_marks[:, 1]}")  # [88 82 93]
print(f"Top 2 students: 
{student_marks[:2, :]}")

Vectorized Operations: Replace Loops with Array Operations

This is where NumPy shines. Operations on arrays happen element-wise automatically:

import numpy as np

marks1 = np.array([92, 88, 95, 78, 85])
marks2 = np.array([88, 90, 92, 80, 87])

print(f"Marks 1: {marks1}")
print(f"Marks 2: {marks2}")

print("
=== Element-wise Operations ===")
print(f"Sum: {marks1 + marks2}")        # [180 178 187 158 172]
print(f"Difference: {marks1 - marks2}") # [4 -2 3 -2 -2]
print(f"Product: {marks1 * marks2}")    # Element-wise multiply
print(f"Division: {marks1 / marks2}")   # Element-wise divide
print(f"Power: {marks1 ** 2}")          # Square each mark

print("
=== Statistical Operations ===")
all_marks = np.array([92, 88, 95, 78, 85, 91, 76, 89])
print(f"Mean: {all_marks.mean()}")         # Average
print(f"Median: {np.median(all_marks)}")   # Middle value
print(f"Std Dev: {all_marks.std()}")       # Spread
print(f"Min: {all_marks.min()}, Max: {all_marks.max()}")
print(f"Sum: {all_marks.sum()}")
print(f"Percentile 75: {np.percentile(all_marks, 75)}")

print("
=== Filtering Data ===")
high_performers = all_marks[all_marks > 85]
print(f"Marks > 85: {high_performers}")

passed = all_marks[all_marks >= 60]
print(f"Passed (>= 60): {passed}")

print("
=== More Complex Operations ===")
# Normalize marks to 0-1 scale
normalized = (all_marks - all_marks.min()) / (all_marks.max() - all_marks.min())
print(f"Normalized marks: {normalized}")

# Calculate percentage improvement
old_marks = np.array([70, 75, 80, 65])
new_marks = np.array([85, 92, 88, 78])
improvement = ((new_marks - old_marks) / old_marks) * 100
print(f"Percentage improvement: {improvement.round(1)}%")

Broadcasting: Let NumPy Handle the Details

Broadcasting automatically expands arrays to match shapes for operations:

import numpy as np

print("=== Broadcasting Example ===")

# Test scores: 5 students, 3 subjects
marks = np.array([
    [80, 85, 90],  # Student 1
    [75, 88, 92],  # Student 2
    [90, 87, 85],  # Student 3
    [82, 84, 89],  # Student 4
    [95, 91, 88],  # Student 5
])

# Subject weights
weights = np.array([0.3, 0.3, 0.4])  # Math 30%, Science 30%, English 40%

print("Original marks:")
print(marks)
print(f"Weights: {weights}")

# Weighted score (broadcasting does the work!)
weighted_marks = marks * weights
print("
Weighted marks (marks * weights):")
print(weighted_marks)

# Final score per student
final_scores = weighted_marks.sum(axis=1)
print(f"
Final scores: {final_scores}")

# Subtract average from each student
average_per_student = marks.mean(axis=1, keepdims=True)
deviation = marks - average_per_student
print(f"
Deviation from student's average:")
print(deviation.round(2))

Matrix Operations: Linear Algebra with NumPy

Solve real mathematics problems with matrix operations:

import numpy as np

print("=== Matrix Operations ===")

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:")
print(A)
print("
Matrix B:")
print(B)

# Matrix multiplication (not element-wise)
C = np.dot(A, B)
print("
A · B (dot product):")
print(C)

# Or use @ operator (Python 3.5+)
C_alt = A @ B
print("
A @ B (same result):")
print(C_alt)

# Transpose
print(f"
Transpose of A:")
print(A.T)

# Determinant
det_A = np.linalg.det(A)
print(f"
Determinant of A: {det_A}")

# Inverse
inv_A = np.linalg.inv(A)
print(f"
Inverse of A:")
print(inv_A)

# Verify: A · A^-1 = Identity
identity_check = A @ inv_A
print(f"
A @ A^-1 (should be identity):")
print(np.round(identity_check))

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"
Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:")
print(eigenvectors)

# Solve Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print(f"
Solving Ax = b where b = {b}:")
print(f"Solution x: {x}")
print(f"Verification (A @ x): {A @ x}")

Real-World Example: ISRO Satellite Image Processing

ISRO uses NumPy to process satellite data for crop monitoring across India:

import numpy as np

print("=== ISRO Satellite Image Analysis ===")

# Simulate satellite image (multi-spectral)
# Each image is bands: Red, Green, Blue, NIR (Near Infrared)
height, width = 100, 100
red_band = np.random.randint(50, 200, (height, width))
green_band = np.random.randint(50, 200, (height, width))
blue_band = np.random.randint(50, 200, (height, width))
nir_band = np.random.randint(50, 200, (height, width))

print(f"Image dimensions: {red_band.shape}")

# Calculate NDVI (Normalized Difference Vegetation Index)
# Higher NDVI = healthier vegetation
ndvi = (nir_band.astype(float) - red_band) / (nir_band + red_band + 1e-8)

print(f"
NDVI Statistics:")
print(f"Mean NDVI: {ndvi.mean():.3f}")
print(f"Min NDVI: {ndvi.min():.3f}, Max NDVI: {ndvi.max():.3f}")
print(f"Std Dev: {ndvi.std():.3f}")

# Identify healthy crops (NDVI > 0.6)
healthy_crops = (ndvi > 0.6).sum()
total_pixels = ndvi.size
health_percentage = (healthy_crops / total_pixels) * 100

print(f"
Crop Health Analysis:")
print(f"Healthy pixels: {healthy_crops}/{total_pixels} ({health_percentage:.1f}%)")

# Find areas needing irrigation (NDVI < 0.3)
stressed_areas = (ndvi < 0.3).sum()
print(f"Stressed areas: {stressed_areas} pixels")

# Calculate vegetation index for better visualization
vegetation_mask = np.where(ndvi > 0.4, 1, 0)
print(f"Vegetation coverage: {vegetation_mask.sum() / vegetation_mask.size * 100:.1f}%")

Advanced: Multi-Dimensional Arrays and Reshaping

Work with 3D and higher-dimensional data for complex applications:

import numpy as np

print("=== Multi-Dimensional Arrays ===")

# 3D array: 3 classes, 4 students, 3 subjects
school_data = np.array([
    # Class 8
    [[92, 88, 85], [78, 82, 79], [95, 93, 94], [81, 80, 82]],
    # Class 9
    [[88, 90, 87], [85, 88, 90], [92, 91, 93], [79, 81, 78]],
    # Class 10
    [[95, 94, 93], [90, 92, 91], [88, 89, 87], [93, 95, 94]],
])

print(f"Shape: {school_data.shape}")  # (3, 4, 3)
print(f"Class 8, Student 1, Math: {school_data[0, 0, 0]}")  # 92
print(f"All Class 9 English scores: {school_data[1, :, 2]}")  # [87 90 93 78]

# Reshape arrays
print("
=== Reshaping ===")
arr = np.arange(24)  # 1D array with 24 elements
print(f"Original shape: {arr.shape}")

# Reshape to 2x3x4
reshaped = arr.reshape(2, 3, 4)
print(f"Reshaped to (2, 3, 4): {reshaped.shape}")

# Flatten back to 1D
flattened = reshaped.flatten()
print(f"Flattened: {flattened}")

# Transpose (swap axes)
print("
=== Transpose ===")
matrix = np.arange(12).reshape(3, 4)
print("Original (3x4):")
print(matrix)
print("
Transposed (4x3):")
print(matrix.T)

# Stacking arrays
print("
=== Stacking Arrays ===")
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
stacked_v = np.vstack([arr1, arr2])  # Vertical stack
print("Vertical stack:")
print(stacked_v)  # 2x3

stacked_h = np.hstack([arr1, arr2])  # Horizontal stack
print("Horizontal stack:")
print(stacked_h)  # [1 2 3 4 5 6]

Working with Large Datasets: Handling Missing Data

Real-world datasets have missing values. NumPy provides tools to handle them:

import numpy as np

print("=== Handling Missing Data ===")

# Create data with NaN (Not a Number)
marks = np.array([92, 88, np.nan, 78, np.nan, 85, 91, 76])
print(f"Data with missing values: {marks}")

# Count missing values
missing_count = np.isnan(marks).sum()
print(f"Missing values: {missing_count}")

# Get valid values only
valid_marks = marks[~np.isnan(marks)]
print(f"Valid marks: {valid_marks}")

# Calculate statistics ignoring NaN
print(f"Mean (ignoring NaN): {np.nanmean(marks):.2f}")
print(f"Median (ignoring NaN): {np.nanmedian(marks):.2f}")
print(f"Std Dev (ignoring NaN): {np.nanstd(marks):.2f}")

# Fill missing values with mean
marks_filled = marks.copy()
marks_filled[np.isnan(marks_filled)] = np.nanmean(marks)
print(f"After filling with mean: {marks_filled}")

# Forward fill (use previous value)
marks_ffill = marks.copy()
mask = np.isnan(marks_ffill)
idx = np.where(~mask, np.arange(len(mask)), 0)
idx = np.maximum.accumulate(idx)
marks_ffill[mask] = marks_ffill[idx[mask]]
print(f"After forward fill: {marks_ffill}")

Real-World Example: Processing Aadhaar-Based School Attendance Data

Aadhaar system tracks attendance in millions of schools. Here's how it's processed:

import numpy as np

print("=== School Attendance Analytics ===")

# Attendance data: 30 students, 200 school days
np.random.seed(42)
attendance = np.random.randint(0, 2, (30, 200))  # 1 = present, 0 = absent

# Calculate attendance percentage per student
attendance_percent = (attendance.sum(axis=1) / attendance.shape[1]) * 100
print(f"Attendance statistics:")
print(f"Mean attendance: {attendance_percent.mean():.1f}%")
print(f"Min attendance: {attendance_percent.min():.1f}%")
print(f"Max attendance: {attendance_percent.max():.1f}%")

# Find students with low attendance (<75%)
low_attendance = np.where(attendance_percent < 75)[0]
print(f"
Students with <75% attendance: {low_attendance + 1}")
print(f"Their attendance: {attendance_percent[low_attendance].round(1)}%")

# Days with highest/lowest overall attendance
daily_attendance = (attendance.sum(axis=0) / attendance.shape[0]) * 100
worst_day = daily_attendance.argmin()
best_day = daily_attendance.argmax()
print(f"
Worst attendance day: Day {worst_day + 1} ({daily_attendance[worst_day]:.1f}%)")
print(f"Best attendance day: Day {best_day + 1} ({daily_attendance[best_day]:.1f}%)")

# Find students with perfect attendance
perfect = np.where(attendance_percent == 100)[0]
print(f"
Students with perfect attendance: {len(perfect)} - IDs: {perfect + 1}")

# Cumulative attendance (rolling average)
rolling_avg = np.convolve(daily_attendance, np.ones(30)/30, mode='valid')
print(f"30-day rolling average shape: {rolling_avg.shape}")

Performance Optimization: From Python Loops to NumPy

import numpy as np
import time

# Problem: Calculate distance from each point to centroid
# Data: 100,000 points, 3 dimensions

np.random.seed(42)
points = np.random.randn(100000, 3)
centroid = points.mean(axis=0)

# SLOW: Pure Python with loops
start = time.time()
distances_python = []
for point in points:
    dist = np.sqrt(sum((point - centroid) ** 2))
    distances_python.append(dist)
python_time = time.time() - start

# FAST: NumPy vectorized
start = time.time()
distances_numpy = np.sqrt(np.sum((points - centroid) ** 2, axis=1))
numpy_time = time.time() - start

print(f"Python loops: {python_time:.4f} seconds")
print(f"NumPy vectorized: {numpy_time:.4f} seconds")
print(f"NumPy is {python_time / numpy_time:.0f}x faster!")

# Verify results match
print(f"Results match: {np.allclose(distances_python, distances_numpy)}")

Saving and Loading NumPy Arrays

import numpy as np

# Create data
marks = np.array([[92, 88, 85], [78, 82, 79], [95, 93, 94]])

# Save as binary format (fast, small)
np.save('marks.npy', marks)
loaded = np.load('marks.npy')
print(f"Loaded from .npy: {loaded}")

# Save multiple arrays
np.savez('school_data.npz', marks=marks, classes=['8A', '8B'])
data = np.load('school_data.npz')
print(f"Loaded marks from .npz: {data['marks']}")

# Save as CSV (human-readable)
np.savetxt('marks.csv', marks, delimiter=',', fmt='%d')
loaded_csv = np.loadtxt('marks.csv', delimiter=',', dtype=int)
print(f"Loaded from CSV: {loaded_csv}")

Performance Tips: Make NumPy Even Faster

Use vectorized operations instead of loops (100x faster)
Use appropriate dtypes (int32 vs int64, float32 vs float64) to save memory
Avoid creating temporary arrays—chain operations when possible
Use in-place operations: arr += 5 instead of arr = arr + 5
Use memory-mapped arrays for huge files: np.load('file.npy', mmap_mode='r')
For millions of rows with missing data, use pandas which builds on NumPy
Use np.nditer for complex multi-dimensional operations
Profile code with timeit to find bottlenecks before optimizing

Key Takeaways

NumPy arrays are 10-100x faster than Python lists for numerical operations
Always think vectorized—replace loops with array operations
Slicing and indexing work on multi-dimensional arrays automatically
Broadcasting lets you operate on different-shaped arrays without loops
Linear algebra functions (determinant, inverse, eigenvalues) are built in
Reshape, transpose, and stack to transform data for analysis
Handle missing data with nan functions (nanmean, nanmedian)
Save arrays efficiently with .npy format or .csv for compatibility
ISRO, TCS, and all data science companies use NumPy in production daily
NumPy is the foundation for pandas, scikit-learn, and TensorFlow

Practice Problems

Create a 5x5 matrix and extract diagonal elements using np.diag()
Calculate mean and standard deviation of 100 random test scores
Create two matrices and perform element-wise and matrix multiplication
Filter an array of marks and return only those in range 70-90
Normalize marks to 0-1 scale using (marks - min) / (max - min)
Create 3x4 matrix of student grades, calculate average by subject using broadcasting
Simulate 10,000 student marks and find percentiles (25th, 50th, 75th, 90th)
Process attendance data with missing values using nanmean and filling techniques
Reshape a 1D array of 100 elements into different 2D configurations
Calculate correlation matrix between 4 subjects from student marks data

Introduction and Overview

Welcome to this chapter on numpy basics! In this chapter, you will learn the core concepts, see real-world examples, and build your skills step by step. This is an essential topic for competitive exam preparation including CBSE Board, JEE, and BITSAT.

Summary and Recap

Key Takeaways: In this chapter, we covered the fundamentals of numpy basics, explored practical examples with Python code, and connected these concepts to real-world applications in Indian tech companies. Remember: mastery comes from practice, not just reading!

Challenge Exercises

Think about this: How would you explain numpy basics to someone who has never programmed before? What analogy or metaphor would make it click? Imagine you are building a real application — which concepts from this chapter would you use first?

Try this exercise: implement one concept from this chapter from scratch, without looking at the examples. Then compare your solution. What did you learn?

← Web Scraping with BeautifulSoup APIs with Python: Fetching Web Data →

Found this useful? Share it!

📱 WhatsApp 🐦 Twitter 💼 LinkedIn