Introduction to NumPy
In this lesson, we will discuss the NumPy library, its significance in Python data analysis, and how to create your first NumPy arrays.
In this lesson, we'll discuss NumPy, a popular Python library for handling numerical data.
What exactly is a library, though?
Simply put, a library is a collection of code designed to perform specific tasks or solve particular problems. You can download and reuse this code in your own programs to achieve your goals more quickly and efficiently.
There are hundreds of libraries out there that make your life easier. For example, important data analysis libraries such as Pandas, SciPy, and Matplotlib.
What these 3 libraries have in common is that they all heavily rely on NumPy, another Python library.
NumPy is the go-to library for numerical computing. It's widely used in scientific computing, data analysis, machine learning, and other domains where numerical operations are prevalent.
To use NumPy in your code, ensure it is installed in your Python environment.
Then, you need to import it using an import
statement.
# import the numpy library
import numpy
Now you can access all of numpy's functionality under the name numpy
.
But, you can also define a custom alias for a library using the as
keyword:
# import numpy with alias np
import numpy as np
Now we can access NumPy using the alias np.
This practice is widely used because it makes your code easier to understand for anyone who works with it.
Now that we have access to NumPy, it's time to start working with it.
At the core of NumPy lies its array and matrix data structures. They are used to make an ordered collection of homogeneous elements, typically numbers.
Let's create our first NumPy array:
# import numpy with alias np
import numpy as np
# create numpy array from list
a = np.array([1,2,3])
# print type of a
print(type(a))
print(a)
Here, we used the np.array()
method to create a numpy array.
The argument we passed to np.array()
was the list [1,2,3]
.
Using the type()
function, we observe that the type of the resulting array is numpy.ndarray
.
When printed, the NumPy array output resembles that of a normal list.
So, why not use a normal list in the first place?
The simple answer: Lists are slow, whereas NumPy arrays are highly efficient.
One reason for their efficiency is that all array elements generally have the same type. This not only saves memory but also enables much faster computations.
Let's create a NumPy array from a list that contains a mix of integers and strings:
import numpy as np
# create numpy array from list
a = np.array([1, 2, 'a', 'b'])
print(a)
The integers have been transformed to strings.
NumPy will always try to coerce elements of different types into the same type.
What will be the output?
import numpy as np
a = np.array([1, 2, 4, 'd'])
print(a)
You can use indexing to get values from a NumPy array:
import numpy as np
# create numpy array from list
a = np.array([1, 2, 3, 4])
print(a[2])
And slicing to get a subset of the array:
import numpy as np
# create numpy array from list
a = np.array([1, 2, 3, 4])
# get all elements starting from index 2
sub = a[2:]
print(sub)
When creating a subset using slicing, NumPy will return a view.
We have already talked about views in the context of dictionaries and methods like dict.items()
A view is not a copy of the sliced data, but a dynamic representation of the original.
This saves memory and is faster.
But, you need to be careful because if you modify the data in a view, you will also modify the original array.
For example, here changing the first element of our slice will also modify the original array:
import numpy as np
a = np.array([1, 2, 3, 4])
print(a)
# create view
sub = a[2:]
# modify the view
sub[0] = 999999
print(a)
If you have to reuse the original array again, it's better to create a real copy of the data using array.copy()
:
Here, we slice the original array and then apply the array.copy()
method to the resulting sub-array:
import numpy as np
a = np.array([1, 2, 3, 4])
print(a)
# create slice with copied data
sub = a[2:].copy()
# modify the sub-array
sub[0] = 999999
print(a)
Now, the original array remains unchanged.
What will be the output?
import numpy as np
a = np.array([1, 2, 3, 4])
print(a[-1])
What will be the output?
import numpy as np
a = np.array([1, 2, 3, 4])
print(a[1:3])
What will be the output?
import numpy as np
a = np.array([1, 2, 3, 4])
b = a[1:3]
b[1] = 10
print(a)
What will be the output?
import numpy as np
a = np.array([1, 2, 3, 4])
b = a[1:3].copy()
b[1] = 10
print(b)
print(a)