Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data).Since real-world data is never ideal, there is a possibility that the data would have missing cells, errors, outliers, discrepancies in columns, and much more.Sometimes, images may not be correctly aligned, or may not be clear or may have a very large size. The goal of pre-processing is to remove these discrepancies and errors.To get the pixels of an image, a built-in function named ‘flatten’ ... Read More
Data pre-processing basically refers to the task of gathering all the data (which is collected from various resources or a single resource) into a common format or into uniform datasets (depending on the type of data). Since real-world data is never ideal, there is a possibility that the data would have missing cells, errors, outliers, discrepancies in columns, and much more. Sometimes, images may not be correctly aligned, or may not be clear or may have a very large size. The goal of pre-processing is to remove these discrepancies and errors.To get the resolution of an image, a built-in function ... Read More
Sometimes, it may be required to get the mean values of a specific column or mean values of all columns that contains numerical values. This is where the mean() function can be used.The term ‘mean’ refers to finding the sum of all values and dividing it by the total number of values in the dataset.Let us see a demonstration of the same −Example Live Demoimport pandas as pd my_data = {'Name':pd.Series(['Tom', 'Jane', 'Vin', 'Eve', 'Will']), 'Age':pd.Series([45, 67, 89, 12, 23]), 'value':pd.Series([8.79, 23.24, 31.98, 78.56, 90.20]) } print("The dataframe is :") my_df = pd.DataFrame(my_data) print(my_df) print("The mean is :") print(my_df.mean())OutputThe dataframe is ... Read More
Sometimes, it may be required to get the sum of a specific column. This is where the ‘sum’ function can be used.The column whose sum needs to be computed can be passed as a value to the sum function. The index of the column can also be passed to find the sum.Let us see a demonstration of the same −Example Live Demoimport pandas as pd my_data = {'Name':pd.Series(['Tom', 'Jane', 'Vin', 'Eve', 'Will']), 'Age':pd.Series([45, 67, 89, 12, 23]), 'value':pd.Series([8.79, 23.24, 31.98, 78.56, 90.20]) } print("The dataframe is :") my_df = pd.DataFrame(my_data) print(my_df) print("The sum of 'age' column is :") print(my_df.sum(1))OutputThe dataframe is ... Read More
Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.It can be visualized as an SQL data table or an excel sheet representation. A column in a dataframe can be deleted using different methods.We will see the pop function that takes the name of the column that needs to be deleted as a parameter, and deletes it.Example Live Demoimport pandas as pd my_data = {'ab' : pd.Series([1, 8, 7], index=['a', 'b', 'c']), 'cd' : pd.Series([1, 2, 0, 9], index=['a', 'b', 'c', 'd']), 'ef' : pd.Series([56, 78, 32], index=['a', 'b', ... Read More
Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.It can be visualized as an SQL data table or an excel sheet representation. A column in a dataframe can be deleted using different methods.We will see the ‘del’ operator that takes the name of the column that needs to be deleted as a parameter, and deletes it −Example Live Demoimport pandas as pd my_data = {'ab' : pd.Series([1, 8, 7], index=['a', 'b', 'c']), 'cd' : pd.Series([1, 2, 0, 9], index=['a', 'b', 'c', 'd']), 'ef' : pd.Series([56, 78, 32], index=['a', ... Read More
Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.It can be visualized as an SQL data table or an excel sheet representation. It can be created using the following constructor −pd.Dataframe(data, index, columns, dtype, copy)Let us understand how a dataframe can be created using a dictionary of Series.Series is a one dimensional data structure present in the ‘Pandas’ library.The axis label is collectively known as index.Series structure can store any type of data such as integer, float, string, python objects, and so on.Let us see an example ... Read More
Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.It can be visualized as an SQL data table or an excel sheet representation. It can be created using the following constructor −pd.Dataframe(data, index, columns, dtype, copy)A new column can be added to a dataframe in different ways.Let us see one of the ways, in which a new column is created by first forming a series data structure and passing this as an additional column to the existing dataframe.Let us see the code in action −Example Live Demoimport pandas as ... Read More
Dataframe is a two dimensional data structure, where data is stored in a tabular format, in the form of rows and columns.It can be visualized as an SQL data table or an excel sheet representation.It can be created using the following constructor −pd.Dataframe(data, index, columns, dtype, copy)The ‘data’, ‘index’, ‘columns’, ‘dtype’ and ‘copy’ are not compulsory values.A list of dictionaries can be passed as input to the dataframe. The keys of dictionary are taken as column names by default. Let us see an example −Example Live Demoimport pandas as pd my_data = [{'ab' : 34}, {'mn' : 56}, { 'gh' : ... Read More
When the index values are customized, they are accessed using series_name[‘index_value’]. The ‘index_value’ passed to series is tried to be matched to the original series. If it is found, that corresponding data is also displayed on the console.When the index that is tried to be accessed is not present in the series, it throws an error. It has been shown below.Example Live Demoimport pandas as pd my_data = [34, 56, 78, 90, 123, 45] my_index = ['ab', 'mn' ,'gh', 'kl', 'wq', 'az'] my_series = pd.Series(my_data, index = my_index) print("The series contains following elements") print(my_series) print("Accessing elements using customized index") print(my_series['mm'])OutputThe series ... Read More