- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Difference between series and vectors in Python Pandas
Pandas is a well-known open-source Python library that provides a wide range of capabilities to make data analysis more effective. The Pandas package is mostly utilised for pre-processing data activities, including cleaning, transforming, and manipulating data. As a result, it is a highly useful tool for analysts and data scientists. The two most popular data structures in Pandas—Series, and DataFrame—as well as the comparison of Series and vectors, are discussed in this article.
Python Pandas Series
In the Python Pandas library, a series is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a data frame in R. A series is created by passing a list of values to the pd.Series() function, and can be accessed using indices or labels.
Labels must be a hashable type but do not need to be unique. The object has a variety of methods for working with the index and supports integer and label-based indexing.
It has the following parameter −
Data − Any list, dictionary, or scalar value can be used as data.
index − The index's value ought to be both distinct and hashable. It has to be the same size as the data. If no index is provided, np.arrange(n) will be used by default.
Dtype − It alludes to the series' data type.
copy − It is utilized to copy info.
Creating a Series
We can create a Series in four ways −
Using the pd.Series function from the Pandas library
import pandas as pd import numpy as np # Create a series from a list s = pd.Series([1, 3, 5, np.nan, 6, 8]) print(s)
Output
0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64
This will create a Pandas Series with the values 1, 3, 5, NaN, 6, 8.
Creating a Series directly from a NumPy array
import numpy as np import pandas as pd # Create a NumPy array data = np.array([1, 3, 5, np.nan, 6, 8]) # Create a series from the array s = pd.Series(data) print(s)
Output
0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64
Both of these methods will create a Pandas Series with an index that is a range of integers starting from 0. You can also specify your own index values when creating the Series.
Creating a Series From Scalar Values
Making a Series with Scalar values is the last approach we'll examine today. In this case, you may provide the data with a single value and have it repeated for the duration of the index.
Example
import pandas as pd if __name__ == '__main__': series = pd.Series(data=3., index=['a', 'b', 'c', 'd'], name='series_from_scalar') print(series)
Output
a 3.0 b 3.0 c 3.0 d 3.0 Name: series_from_scalar, dtype: float64
Creating a Series From ndarray
NumPy's random.randint() function, which creates a ndarray filled with random numbers, is one of the easiest methods to create a
Example
import numpy as np import pandas as pd if __name__ == '__main__': data = np.random.randint(0, 10, 5) series = pd.Series(data=data, index=['a', 'b', 'c', 'd', 'e'], name='series_from_ndarray') print(series)
Output
a 5 b 7 c 0 d 8 e 5 Name: series_from_ndarray, dtype: int64
Dataframes
On the other hand, a vector is a one-dimensional array of numerical values. In Pandas, a vector can be represented as a series with a single dtype (e.g., integer, float, or object). Vectors are commonly used in mathematical and statistical operations, and can be created using the pd.to_numeric() function or by selecting a single column from a data frame.
Using the pd, you may generate a DataFrame from several data sources, including dictionaries, 2D NumPy arrays, and series. Creating a Pandas DataFrame Using a Dictionary of Pandas Series
The index must be the same length as the Series. If the index is not specified, it will be created automatically with values: [0, …, len(data) – 1].
#Creating a DataFrame from a dictionary of Series import pandas as pd data = pd.DataFrame({ "Class 1": pd.Series([22, 33, 38], index=["math avg", "science avg", "english avg"]), "Class 2": pd.Series([45, 28, 36], index=["math avg", "science avg", "english avg"]), "Class 3": pd.Series([32, 41, 47], index=["math avg", "science avg", "english avg"]) }) print(data)
Output
Class 1 Class 2 Class 3 math avg 22 45 32 science avg 33 28 41 english avg 38 36 47
Following is the conclusion of difference between series and Data frame in Python Pandas
DataFrame |
Series |
|
---|---|---|
Data structure |
2D table |
1D array |
Can contain heterogeneous data |
Yes |
Yes |
Can contain column labels |
Yes |
No |
Can contain row labels |
Yes |
No |
Can be indexed by column or row labels |
Yes |
Yes |
Can be sliced by column or row labels |
Yes |
Yes |
Supports arithmetic operations |
Yes |
Yes |
Supports arithmetic operations |
Yes |
Yes |
Conclusion
In summary, the main differences between series and vectors in Python Pandas are −
Series can hold any data type, while vectors can only hold numerical values
Series have a label index, while vectors do not
Series can be accessed using labels or indices, while vectors can only be accessed using indices
Understanding the difference between series and vectors can be useful for selecting the appropriate data structure for your data and for manipulating and analyzing it in Pandas.