- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Which is faster, NumPy or pandas?
Both NumPy and pandas are essential tools for data science and machine learning technologies. We know that pandas provides DataFrames like SQL tables allowing you to do tabular data analysis, while NumPy runs vector and matrix operations very efficiently.
pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files).
If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array. since pandas is doing a lot more stuff like aligning labels, dealing with heterogeneous data, and so on.
import numpy as np import pandas as pd array = np.arange(100, 200) s = pd.Series(array) print('Series object time: ',end ='') %timeit -n10 -r2 s.mean() print('Numpy array time: ',end ='') %timeit -n10 -r2 np.mean(array)
Here we have created a NumPy array with 100 values ranging from 100 to 200 and also created a pandas Series object using a NumPy array.
We used the built-in IPython magic function %timeit to find the average time consumed by each function. To calculate the mean of each object data.
-n10 is used to set the number of loops counts and -r2 for set the number of runs counts.
Series object: 225 µs ± 83 µs per loop (mean ± std. dev. of 2 runs, 10 loops each) Numpy array: 33.1 µs ± 10.8 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
We can see the time taken by both the NumPy array and Series object to calculate the mean.
import numpy as np import pandas as pd array = np.arange(100, 200) s = pd.Series(array) print('Series object time: ',end ='') %timeit -n10 -r2 s.std() print('Numpy array time: ',end ='') %timeit -n10 -r2 np.std(array)
Here we have verified the time taken by both the NumPy array and and the pandas Series object to calculate the standard deviation. We can see the timestamp in the below output block.
Series object time: 443 µs ± 26.6 µs per loop (mean ± std. dev. of 2 runs, 10 loops each) Numpy array time: 104 µs ± 12.1 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
As we can see in the above two examples the average time consumed by pandas is more compared to the NumPy object.
Creating a pandas DataFrame took approximately 6000 times longer to time than creating a NumPy array. Pandas takes extra time to set up the index labels.
- Related Articles
- Which one is faster Array or List in Java
- Which is faster? Constants, Variables or Variable Arrays in PHP?
- Which is faster, a MySQL CASE statement or a PHP if statement?
- In which medium sound travels faster: air or iron?
- Which is faster between C++ and C#?
- Python - Which is faster to initialize lists?
- Which of the two diffuses faster: a liquid or a gas?
- In which material do you think light rays travel faster-glass or air?
- What is faster: many ifs, or else if in PHP?
- What is the difference between NumPy and pandas?
- How can we tell which body is travelling faster or slower by looking at their distance-time graphs?
- Does pandas depend on NumPy?
- Python - Filter Pandas DataFrame with numpy
- What is the speed of current when the bulb lighted up? And what is the speed of light? Is light faster than current or current is faster than light?