Statistics involves gathering data, analyzing it, and drawing conclusions based on the information collected.
NumPy provides us with various statistical functions that can perform statistical data analysis.
Common NumPy Statistical Functions
Here are some of the statistical functions provided by NumPy:
Functions | Descriptions |
---|---|
median() |
return the median of an array |
mean() |
return the mean of an array |
std() |
return the standard deviation of an array |
percentile() |
return the nth percentile of elements in an array |
min() |
return the minimum element of an array |
max() |
return the maximum element of an array |
Next, we will see examples using these functions.
Find Median Using NumPy
The median value of a numpy array is the middle value in a sorted array.
In other words, it is the value that separates the higher half from the lower half of the data.
Suppose we have the following list of numbers:
1, 5, 7, 8, 9, 12, 14
Then, median is simply the middle number, which in this case is 8.
It is important to note that if the number of elements is
- Odd, the median is the middle element.
- Even, the median is the average of the two middle elements.
Now, we will learn how to calculate the median using NumPy for arrays with odd and even number of elements.
Example 1: Compute Median for Odd Number of Elements
import numpy as np
# create a 1D array with 5 elements
array1 = np.array([1, 2, 3, 4, 5])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.0
In the above example, the array named array1 contains an odd number of elements (5 elements).
So, np.median(array1)
returns the median of array1
as 3, which is the middle value of the sorted array.
Example 2: Compute Median for Even Number of Elements
import numpy as np
# create a 1D array with 6 elements
array1 = np.array([1, 2, 3, 4, 5, 7])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.5
Here, since the array1 array has an even number of elements (6 elements), the median is calculated as the average of the two middle elements (3 and 4) i.e. 3.5.
Median of NumPy 2D Array
Calculation of the median is not just limited to 1D array. We can also calculate the median of the 2D array.
In a 2D array, median can be calculated either along the horizontal or the vertical axis individually, or across the entire array.
When computing the median of a 2D array, we use the axis
parameter inside np.median()
to specify the axis along which to compute the median.
If we specify,
axis = 0
, median is calculated along vertical axisaxis = 1
, median is calculated along horizontal axis
If we don't use the axis
parameter, the median is computed over the entire array.
Example: Compute the median of a 2D array
import numpy as np
# create a 2D array
array1 = np.array([[2, 4, 6],
[8, 10, 12],
[14, 16, 18]])
# compute median along horizontal axis
result1 = np.median(array1, axis=1)
print("Median along horizontal axis :", result1)
# compute median along vertical axis
result2 = np.median(array1, axis=0)
print("Median along vertical axis:", result2)
# compute median of entire array
result3 = np.median(array1)
print("Median of entire array:", result3)
Output
Median along horizontal axis : [ 4. 10. 16.] Median along vertical axis: [ 8. 10. 12.] Median of entire array: 10.0
In this example, we have created a 2D array named array1.
We then computed the median along the horizontal and vertical axis individually and then computed the median of the entire array.
np.median(array1, axis=1)
- median along horizontal axis, which gives[4. 10. 16.]
np.median(array1, axis=0)
- median along vertical axis, which gives[8. 10. 12.]
np.median(array1)
- median over the entire array, which gives10.0
To calculate the median over the entire 2D array, first we flatten the array to [ 2, 4, 6, 8, 10, 12, 14, 16, 18]
and then find the middle value of the flattened array which in our case is 10.
Compute Mean Using NumPy
The mean value of a NumPy array is the average value of all the elements in the array.
It is calculated by adding all elements in the array and then dividing the result by the total number of elements in the array.
We use the np.mean()
function to calculate the mean value. For example,
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the mean of marks
mean_marks = np.mean(marks)
print(mean_marks)
# Output: 77.2
In this example, the mean value is 77.2, which is calculated by adding the elements (76, 78, 81, 66, 85) and dividing the result by 5 (total number of array elements).
Example 3: Mean of NumPy N-d Array
import numpy as np
# create a 2D array
array1 = np.array([[1, 3],
[5, 7]])
# calculate the mean of the entire array
result1 = np.mean(array1)
print("Entire Array:",result1) # 4.0
# calculate the mean along vertical axis (axis=0)
result2 = np.mean(array1, axis=0)
print("Along Vertical Axis:",result2) # [3. 5.]
# calculate the mean along (axis=1)
result3 = np.mean(array1, axis=1)
print("Along Horizontal Axis :",result3) # [2. 6.]
Output
Entire Array: 4.0 Along Vertical Axis: [3. 5.] Along Horizontal Axis : [2. 6.]
Here, first we have created the 2D array named array1. We then calculated the mean using np.mean()
.
np.mean(array1)
- calculates the mean over the entire arraynp.mean(array1, axis=0)
- calculates the mean along vertical axisnp.mean(array1, axis=1)
calculates the mean along horizontal axis
Standard Deviation of NumPy Array
The standard deviation is a measure of the spread of the data in the array. It gives us the degree to which the data points in an array deviate from the mean.
- Smaller standard deviation indicates that the data points are closer to the mean
- Larger standard deviation indicates that the data points are more spread out.
In NumPy, we use the np.std()
function to calculate the standard deviation of an array.
Example: Compute the Standard Deviation in NumPy
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the standard deviation of marks
std_marks = np.std(marks)
print(std_marks)
# Output: 6.803568381206575
In the above example, we have used the np.std()
function to calculate the standard deviation of the marks
array.
Here, 6.803568381206575
is the standard deviation of marks
. It tells us how much the values in the marks
array deviate from the mean value of the array.
Standard Deviation of NumPy 2D Array
In a 2D array, standard deviation can be calculated either along the horizontal or the vertical axis individually, or across the entire array.
Similar to mean and median, when computing the standard deviation of a 2D array, we use the axis
parameter inside np.std()
to specify the axis along which to compute the standard deviation.
Example: Compute the Standard Deviation of a 2D array.
import numpy as np
# create a 2D array
array1 = np.array([[2, 5, 9],
[3, 8, 11],
[4, 6, 7]])
# compute standard deviation along horizontal axis
result1 = np.std(array1, axis=1)
print("Standard deviation along horizontal axis:", result1)
# compute standard deviation along vertical axis
result2 = np.std(array1, axis=0)
print("Standard deviation along vertical axis:", result2)
# compute standard deviation of entire array
result3 = np.std(array1)
print("Standard deviation of entire array:", result3)
Output
Standard deviation along horizontal axis: [2.86744176 3.29983165 1.24721913] Standard deviation along vertical axis: [0.81649658 1.24721913 1.63299316] Standard deviation of entire array: 2.7666443551086073
Here, we have created a 2D array named array1.
We then computed the standard deviation along horizontal and vertical axis individually and then computed the standard deviation of the entire array.
Compute Percentile of NumPy Array
In NumPy, we use the percentile()
function to compute the nth percentile of a given array.
Let's see an example.
import numpy as np
# create an array
array1 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
# compute the 25th percentile of the array
result1 = np.percentile(array1, 25)
print("25th percentile:",result1)
# compute the 75th percentile of the array
result2 = np.percentile(array1, 75)
print("75th percentile:",result2)
Output
25th percentile: 5.5 75th percentile: 14.5
Here,
- 25% of the values in array1 are less than or equal to 5.5.
- 75% of the values in array1 are less than or equal to 14.5.
Note: To learn more about percentile, visit NumPy Percentile.
Find Minimum and Maximum Value of NumPy Array
We use the min()
and max()
function in NumPy to find the minimum and maximum values in a given array.
Let's see an example.
import numpy as np
# create an array
array1 = np.array([2,6,9,15,17,22,65,1,62])
# find the minimum value of the array
min_val = np.min(array1)
# find the maximum value of the array
max_val = np.max(array1)
# print the results
print("Minimum value:", min_val)
print("Maximum value:", max_val)
Output
Minimum value: 1 Maximum value: 65
As we can see min()
and max()
returns the minimum and maximum value of array1 which is 1 and 65 respectively.
Note: To learn more about min()
and max()
, visit NumPy min() and NumPy max().