Pandas
, SciPy
, and scikit-learn
)lists
(optimized C)Install
Import
Check Ch.3 for more information
np.full(dim, value)
: repeated value
sndim
: the number of dimensionsshape
: the size of each dimensionsize
: the total size of the arraydtype
: the data type of the array elementsitemsize
: the size (in bytes) of each array elementnbytes
: the total size (in bytes) of the arrayprint(x2)
print("ndim: ", x2.ndim)
print("shape:", x2.shape)
print("size: ", x2.size)
print("dtype: ", x2.dtype)
print("itemsize:", x2.itemsize, "bytes")
print("nbytes: ", x2.nbytes, "bytes")
[[3 5 2 4]
[7 6 8 8]
[1 6 7 7]]
ndim: 2
shape: (3, 4)
size: 12
dtype: int64
itemsize: 8 bytes
nbytes: 96 bytes
dtype
: Assign data type when creating a numpy array
x[row, column]
-1
: last index:
) operator: slice arraysx[start:stop:step]
start=0
stop=size of dimension
step=1
x[start:stop:step]
.copy()
if you need a copyreshape(dim)
: gives a new shape to an arrayYou’re working on a project to analyze and manipulate digital images using NumPy. Your task is to perform various operations on image data represented as NumPy arrays.
import numpy as np
import matplotlib.pyplot as plt
# Create a sample 8x8 grayscale image (0-255 values)
image = np.array([
[50, 50, 50, 50, 200, 200, 200, 200],
[50, 50, 50, 50, 200, 200, 200, 200],
[50, 90, 90, 50, 200, 200, 200, 200],
[50, 90, 90, 50, 200, 200, 200, 200],
[50, 50, 50, 50, 200, 200, 200, 200],
[50, 50, 50, 50, 200, 200, 200, 200],
[50, 50, 50, 50, 50, 50, 50, 50],
[50, 50, 50, 50, 50, 50, 50, 50]
])
np.concatenate([array1, array2, ...])
: Concatenates arrays along an existing (or first) axis.np.vstack([array1, array2])
: Vertical stack (row-wise concatenation).np.hstack([array1, array2])
: Horizontal stack (column-wise concatenation).np.concatenate([array1, array2, ...])
: Concatenates arrays along an existing (or first) axis.
np.concatenate([array1, array2, ...])
: Concatenates arrays along an existing (or first) axis.
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
Set axis=1
for the second axis
np.vstack([array1, array2])
: Vertical stack (row-wise concatenation).np.hstack([array1, array2])
: Horizontal stack (column-wise concatenation).The opposite of concatenation is splitting
np.split(
np array,[split points])
or np.split(
np array, # of sections)
np.vsplit(
np array,[split points])
or np.vsplit(
np array, # of sections)
np.hsplit(
np array,[split points])
or np.hsplit(
np array, # of sections)
split points
[1 2 3] [99 99] [3 2 1]
number of sections
[[1 2 3]
[3 2 1]
[1 2 3]]
np.vsplit(
np array,[split points])
or np.vsplit(
np array, # of sections)
[[1 2 3]
[3 2 1]
[1 2 3]]
np.hsplit(
np array,[split points])
or np.hsplit(
np array, # of sections)
You’re a meteorologist working on analyzing and combining weather data from multiple stations.
import numpy as np
import matplotlib.pyplot as plt
# Generate sample weather data for 3 stations over 5 days
np.random.seed(42)
station1 = np.random.randint(15, 25, size=(5, 3)) # 5 days, temp/humidity/wind
station2 = np.random.randint(18, 28, size=(5, 3))
station3 = np.random.randint(20, 30, size=(5, 3))
print("Station 1 data:")
print(station1)
Station 1 data:
[[21 18 22]
[19 21 24]
[17 21 22]
[19 18 22]
[22 17 20]]
big_array = np.random.randint(1, 100, size=1000000)
#Compare the time between looping and ufuncs.
%timeit compute_reciprocals(big_array) #previous slide's method
%timeit (1.0 / big_array) #ufunc implementation
1.27 s ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.94 ms ± 17.4 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.abs
, np.sin
, np.exp
.np.add
, np.subtract
, np.multiply
, np.divide
, np.power
.x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5) #np.add(x,5)
print("x - 5 =", x - 5) #np.subtract(x,5)
print("x * 2 =", x * 2) #np.multiply(x,2)
print("x / 2 =", x / 2) #np.divide(x,2)
print("x // 2 =", x // 2) # Floor division
print("-x =", -x)
print("x ** 2 =", x ** 2)
print("x % 2 =", x % 2)
x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0. 0.5 1. 1.5]
x // 2 = [0 0 1 1]
-x = [ 0 -1 -2 -3]
x ** 2 = [0 1 4 9]
x % 2 = [0 1 0 1]
Operator | Equivalent UFunc | Description |
---|---|---|
+ |
np.add |
Addition |
- |
np.subtract |
Subtraction |
- |
np.negative |
Unary negation |
* |
np.multiply |
Multiplication |
/ |
np.divide |
Division |
// |
np.floor_divide |
Floor division |
** |
np.power |
Exponentiation |
% |
np.mod |
Modulus/remainder |
np.sin()
, np.cos()
, np.tan()
# Return evenly spaced numbers over a specified interval
theta = np.linspace(0, np.pi, 3) #
print("theta =", theta)
print("sin(theta) =", np.sin(theta))
print("cos(theta) =", np.cos(theta))
print("tan(theta) =", np.tan(theta))
theta = [0. 1.57079633 3.14159265]
sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) = [ 1.000000e+00 6.123234e-17 -1.000000e+00]
tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]
np.arcsin()
, np.arccos()
, np.arctan()
np.exp()
, np.exp2()
, np.power()
np.log()
(natural log)np.log2()
(base-2)np.log10()
(base-10)You’re analyzing sensor data from industrial equipment. The dataset contains temperature (℃), vibration (mm/s²), and pressure (kPa) readings sampled every 5 minutes over 30 days.
# Generate synthetic sensor data (4320 = 30 days * 144 samples/day)
np.random.seed(42)
time_points = 4320
temperature = 25 + 10 * np.sin(2 * np.pi * np.arange(time_points)/144) + np.random.normal(0, 1, time_points)
vibration = np.abs(2 * np.random.randn(time_points) + np.sin(np.arange(time_points)/100))
pressure = 100 + 20 * np.cos(2 * np.pi * np.arange(time_points)/288) + np.random.normal(0, 3, time_points)
np.clip
scipy.special
is an excellent source for more specialized mathematical functions.scipy.special
gamma()
, gammaln()
, beta()
, erf()
, erfc()
, erfinv()
from scipy import special
import numpy as np
# Gamma functions and related functions
x = [1, 5, 10]
print("gamma(x) =", special.gamma(x))
print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2) =", special.beta(x, 2))
gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]
ln|gamma(x)| = [ 0. 3.17805383 12.80182748]
beta(x, 2) = [0.5 0.03333333 0.00909091]
sum()
function:np.sum()
function:np.sum()
is Fastermin()
and max()
functions(3.7745761005680833e-07, 0.9999993822414859)
np.min()
and np.max()
functionsnp.min()
and np.max()
are Faster!min
, max
, sum
, etc., use the array object’s methods:axis
argument.axis
Argument Example (Columns)axis=0
: collapses the first axis
axis
Argument Example (Rows)axis=1
: collapses the second axis
np.sum
(np.nansum
): Compute sum of elementsnp.prod
(np.nanprod
): Compute product of elementsnp.mean
(np.nanmean
): Compute mean of elementsnp.std
(np.nanstd
): Compute standard deviationnp.var
(np.nanvar
): Compute variancenp.min
(np.nanmin
): Find minimum valuenp.max
(np.nanmax
): Find maximum valuenp.argmin
(np.nanargmin
): index of minimum valuenp.argmax
(np.nanargmax
): index of maximum valuenp.median
(np.nanmedian
): median of elementsnp.percentile
(np.nanpercentile
): rank-based statistics of elementsnp.any
: whether any elements are truenp.all
: whether all elements are true為了成功從https (加密封包傳輸)下載資料,首先取消證書驗證
!pip3 install seaborn
import numpy as np
import pandas as pd
data = pd.read_csv('https://raw.githubusercontent.com/jakevdp/PythonDataScienceHandbook/refs/heads/master/notebooks/data/president_heights.csv')
heights = np.array(data['height(cm)'])
print("Mean height: ", heights.mean())
print("Standard deviation:", heights.std())
print("Minimum height: ", heights.min())
print("Maximum height: ", heights.max())
print("25th percentile: ", np.percentile(heights, 25))
print("Median: ", np.median(heights))
print("75th percentile: ", np.percentile(heights, 75))
Mean height: 180.04545454545453
Standard deviation: 6.983599441335736
Minimum height: 163
Maximum height: 193
25th percentile: 174.75
Median: 182.0
75th percentile: 183.5
You’re a financial analyst tasked with analyzing historical stock data for several tech companies.
np.random.seed(42)
companies = ['TechCorp', 'DataSys', 'AIGlobal', 'CloudNet', 'CyberSec']
trading_days = 252
stock_data = np.random.randint(100, 200, size=(len(companies), trading_days))
stock_data = stock_data * (1 + np.random.randn(len(companies), trading_days) * 0.01) # Add some randomness
print("Stock Data Shape:", stock_data.shape)
print("First six days of data:\n", stock_data[:, :6])
Stock Data Shape: (5, 252)
First six days of data:
[[149.4074983 190.59940972 112.19491634 173.42047559 158.72559527
120.79293028]
[196.30842854 174.9778714 157.40229724 169.92405711 178.52343482
191.48421877]
[186.46131515 155.38937619 126.11957816 175.40848392 188.43959037
169.95625021]
[190.52887619 106.23981577 156.97119204 161.0857364 147.58640448
124.25445626]
[123.82077119 161.16804061 194.75072474 158.84621237 155.07543654
159.51183387]]
np.diff
Enabling UFuncs to operate on arrays of different sizes
“stretching” or “duplicating” the smaller array to match the shape of the larger array
The 1D array a
is “broadcast” across the 2nd dimension of M
Both a
and b
are broadcast to a common shape
[[1. 1. 1.]
[1. 1. 1.]]
[0 1 2]
[[1. 2. 3.]
[1. 2. 3.]]
M.shape = (2, 3)
a.shape = (3)
a
with ones:
M.shape -> (2, 3)
a.shape -> (1, 3)
a
:
M.shape -> (2, 3)
a.shape -> (2, 3)
[[0]
[1]
[2]]
[0 1 2]
[[0 1 2]
[1 2 3]
[2 3 4]]
a.shape = (3, 1)
b.shape = (3)
b
with ones (left):
a.shape -> (3, 1)
b.shape -> (1, 3)
a.shape -> (3, 3)
b.shape -> (3, 3)
M = np.ones((3, 2))
a = np.arange(3)
print(M)
print(a)
# This will raise a ValueError:
# print(M + a)
[[1. 1.]
[1. 1.]
[1. 1.]]
[0 1 2]
M.shape = (3, 2)
a.shape = (3,)
a
with ones:
M.shape -> (3, 2)
a.shape -> (1, 3)
a
:
M.shape -> (3, 2)
a.shape -> (3, 3)
Centering an Array
[[0.39543427 0.31320626 0.14706185]
[0.4878512 0.13004778 0.83548024]
[0.37039699 0.37480357 0.47400643]
[0.44412771 0.99485797 0.54538112]
[0.0882017 0.60687734 0.15158429]
[0.03093764 0.97074215 0.77353889]
[0.98010899 0.47787845 0.53225753]
[0.16833696 0.24690743 0.83854921]
[0.35618565 0.82681497 0.70901372]
[0.17717093 0.50744637 0.67738732]]
[ 4.99600361e-17 -9.99200722e-17 -6.66133815e-17]
Broadcasting allows subtracting the feature means from each observation efficiently.
You have temperature data for multiple cities over a week.
temperatures = np.random.randint(15, 35, size=(5, 7))
cities = ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print(temperatures)
[[26 25 21 18 33 15 17]
[27 29 16 25 18 33 31]
[17 32 22 33 19 21 22]
[19 31 19 18 33 16 25]
[27 29 26 25 21 19 31]]
np.argsort()
np.dot()
rainfall = pd.read_csv('https://raw.githubusercontent.com/amankharwal/Website-data/refs/heads/master/Seattle2014.csv')['PRCP'].values
inches = rainfall / 254 # 1/10mm -> inches
print("Number days without rain: ", np.sum(inches == 0))
print("Number days with rain: ", np.sum(inches != 0))
print("Days with more than 0.5 inches:", np.sum(inches > 0.5))
print("Rainy days with < 0.2 inches :", np.sum((inches > 0) &
(inches < 0.2)))
Number days without rain: 215
Number days with rain: 150
Days with more than 0.5 inches: 37
Rainy days with < 0.2 inches : 75
any()
and all()
any()
: Are any values True?all()
: Are all values True?