Difference between series and vectors in Python Pandas


Pandas is a well-known open-source Python library that provides a wide range of capabilities to make data analysis more effective. The Pandas package is mostly utilised for pre-processing data activities, including cleaning, transforming, and manipulating data. As a result, it is a highly useful tool for analysts and data scientists. The two most popular data structures in Pandas—Series, and DataFrame—as well as the comparison of Series and vectors, are discussed in this article.

Python Pandas Series

In the Python Pandas library, a series is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a data frame in R. A series is created by passing a list of values to the pd.Series() function, and can be accessed using indices or labels.

Labels must be a hashable type but do not need to be unique. The object has a variety of methods for working with the index and supports integer and label-based indexing.

It has the following parameter −

  • Data − Any list, dictionary, or scalar value can be used as data.

  • index − The index's value ought to be both distinct and hashable. It has to be the same size as the data. If no index is provided, np.arrange(n) will be used by default.

  • Dtype − It alludes to the series' data type.

  • copy − It is utilized to copy info.

Creating a Series

We can create a Series in four ways −

Using the pd.Series function from the Pandas library

import pandas as pd
import numpy as np
# Create a series from a list
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

Output

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

This will create a Pandas Series with the values 1, 3, 5, NaN, 6, 8.

Creating a Series directly from a NumPy array

import numpy as np
import pandas as pd
# Create a NumPy array
data = np.array([1, 3, 5, np.nan, 6, 8])
# Create a series from the array
s = pd.Series(data)
print(s)

Output

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

Both of these methods will create a Pandas Series with an index that is a range of integers starting from 0. You can also specify your own index values when creating the Series.

Creating a Series From Scalar Values

Making a Series with Scalar values is the last approach we'll examine today. In this case, you may provide the data with a single value and have it repeated for the duration of the index.

Example

import pandas as pd
if __name__ == '__main__':
   series = pd.Series(data=3.,
      index=['a', 'b', 'c', 'd'],
      name='series_from_scalar')
   print(series)

Output

a    3.0
b    3.0
c    3.0
d    3.0
Name: series_from_scalar, dtype: float64

Creating a Series From ndarray

NumPy's random.randint() function, which creates a ndarray filled with random numbers, is one of the easiest methods to create a

Example

import numpy as np
import pandas as pd
if __name__ == '__main__':
   data = np.random.randint(0, 10, 5)
   series = pd.Series(data=data,
      index=['a', 'b', 'c', 'd', 'e'],
      name='series_from_ndarray')
   print(series)

Output

a    5
b    7
c    0
d    8
e    5
Name: series_from_ndarray, dtype: int64

Dataframes

On the other hand, a vector is a one-dimensional array of numerical values. In Pandas, a vector can be represented as a series with a single dtype (e.g., integer, float, or object). Vectors are commonly used in mathematical and statistical operations, and can be created using the pd.to_numeric() function or by selecting a single column from a data frame.

Using the pd, you may generate a DataFrame from several data sources, including dictionaries, 2D NumPy arrays, and series. Creating a Pandas DataFrame Using a Dictionary of Pandas Series

The index must be the same length as the Series. If the index is not specified, it will be created automatically with values: [0, …, len(data) – 1].

#Creating a DataFrame from a dictionary of Series
import pandas as pd
data = pd.DataFrame({
   "Class 1": pd.Series([22, 33, 38], index=["math avg", "science avg",  "english avg"]),
   "Class 2": pd.Series([45, 28, 36], index=["math avg", "science avg",  "english avg"]),
   "Class 3": pd.Series([32, 41, 47], index=["math avg", "science avg",  "english avg"])
})
 
print(data)

Output

             Class 1  Class 2  Class 3
math avg          22       45       32
science avg       33       28       41
english avg       38       36       47

Following is the conclusion of difference between series and Data frame in Python Pandas

DataFrame

Series

Data structure

2D table

1D array

Can contain heterogeneous data

Yes

Yes

Can contain column labels

Yes

No

Can contain row labels

Yes

No

Can be indexed by column or row labels

Yes

Yes

Can be sliced by column or row labels

Yes

Yes

Supports arithmetic operations

Yes

Yes

Supports arithmetic operations

Yes

Yes

Conclusion

In summary, the main differences between series and vectors in Python Pandas are −

  • Series can hold any data type, while vectors can only hold numerical values

  • Series have a label index, while vectors do not

  • Series can be accessed using labels or indices, while vectors can only be accessed using indices

Understanding the difference between series and vectors can be useful for selecting the appropriate data structure for your data and for manipulating and analyzing it in Pandas.

Updated on: 05-May-2023

462 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements