- Trending Categories
- Data Structure
- Operating System
- C Programming
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Which is faster, NumPy or pandas?
Both NumPy and pandas are essential tools for data science and machine learning technologies. We know that pandas provides DataFrames like SQL tables allowing you to do tabular data analysis, while NumPy runs vector and matrix operations very efficiently.
pandas provides a bunch of C or Cython optimized functions that can be faster than the NumPy equivalent function (e.g. reading text from text files).
If you want to do mathematical operations like a dot product, calculating mean, and some more, pandas DataFrames are generally going to be slower than a NumPy array. since pandas is doing a lot more stuff like aligning labels, dealing with heterogeneous data, and so on.
import numpy as np import pandas as pd array = np.arange(100, 200) s = pd.Series(array) print('Series object time: ',end ='') %timeit -n10 -r2 s.mean() print('Numpy array time: ',end ='') %timeit -n10 -r2 np.mean(array)
Here we have created a NumPy array with 100 values ranging from 100 to 200 and also created a pandas Series object using a NumPy array.
We used the built-in IPython magic function %timeit to find the average time consumed by each function. To calculate the mean of each object data.
-n10 is used to set the number of loops counts and -r2 for set the number of runs counts.
Series object: 225 µs ± 83 µs per loop (mean ± std. dev. of 2 runs, 10 loops each) Numpy array: 33.1 µs ± 10.8 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
We can see the time taken by both the NumPy array and Series object to calculate the mean.
import numpy as np import pandas as pd array = np.arange(100, 200) s = pd.Series(array) print('Series object time: ',end ='') %timeit -n10 -r2 s.std() print('Numpy array time: ',end ='') %timeit -n10 -r2 np.std(array)
Here we have verified the time taken by both the NumPy array and and the pandas Series object to calculate the standard deviation. We can see the timestamp in the below output block.
Series object time: 443 µs ± 26.6 µs per loop (mean ± std. dev. of 2 runs, 10 loops each) Numpy array time: 104 µs ± 12.1 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)
As we can see in the above two examples the average time consumed by pandas is more compared to the NumPy object.
Creating a pandas DataFrame took approximately 6000 times longer to time than creating a NumPy array. Pandas takes extra time to set up the index labels.
- Which one is faster Array or List in Java
- Which is faster? Constants, Variables or Variable Arrays in PHP?
- Which is faster, a MySQL CASE statement or a PHP if statement?
- Which is faster between C++ and C#?
- Python - Which is faster to initialize lists?
- What is faster: many ifs, or else if in PHP?
- Which is better - Hike or Whatsapp?
- Which is better PHP SOAP or NuSOAP?
- Canva or Adobe Spark: Which is better?
- What is the difference between NumPy and pandas?
- Which Storage drive is better, SSD or HDD?
- Which is superior, Western music or Eastern Music?
- Which out of apple or papaya is healthier?
- Check which element in a masked array is greater than or equal to a given value in NumPy