If we have factor columns in an R data frame then we want to find the frequency of each factor level for all the factor columns. This can be done with the help of sapply function with table function. For example, if we have a data frame called df that contains some factor columns then the frequency table for factor columns can be created by using the command sapply(df, table).Example1Consider the below data frame −Live Demo> x1 x2 df1 df1Output x1 x2 1 D a 2 D b 3 D c 4 D b 5 D c 6 C a ... Read More
If we have a vector where alternate values may create a tabular form then we might want to convert the vector into a data frame. For this purpose, we first need to convert the vector into a matrix with appropriate number of columns/rows and then read it as a data frame using as.data.frame function. Check out the below examples to understand how it works.Example1Live Demo> x1 x1Output[1] "1" "male" "1" "male" "1" "male" "1" "male" [9] "1" "male" "1" "male" "1" "male" "1" "male" [17] "1" "male" "1" "male" "2" "female" "2" "female" [25] "2" "female" "2" "female" "2" "female" ... Read More
If we have a character column in the data frame that contains string as well as numeric values and the first digit of the numeric values has some meaning that can help in data analysis then we can extract those first digits. For this purpose, we can use stri_extract_first function from stringi package.Example1Consider the below data frame −Live Demo> x1 y1 df1 df1Output x1 y1 1 1 HT14L 2 2 HT14L 3 3 HT23L 4 4 HT14L 5 5 HT32L 6 6 HT32L 7 ... Read More
Most of the times we need to deal with missing values in data science projects and these missing values can be occurred at any position. We might want to change the position of these missing values and send them to the end of the columns in the data frame. This can be done with the help of lapply function as shown in the below examples.Example1Consider the below data frame −Live Demo> x1 x2 x3 df1 df1Output x1 x2 x3 1 0 0 2 2 1 1 NA 3 1 NA 0 4 0 NA 2 5 1 NA 2 6 ... Read More
To convert an old data frame to a new data frame, we can simply set the new name. For example, if we have a data frame called df and want to convert it to a new one let’s say df_new then it can be done as df_new x1 x2 df1 df1Output x1 x2 1 8 6 2 4 9 3 3 2 4 3 5 5 7 4 6 4 8 7 8 6 8 12 12 9 8 6 10 ... Read More
To remove rows from the data frame that duplicate values greater than a certain number of times, we can create a subset for rows having duplicate values less than the certain number of times. For this purpose, we first need to extract the rows and then subset the data frame with the particular column as shown in the below examples.Example1Consider the below data frame −Live Demo> x1 x2 df1 df1Output x1 x2 1 0 0 2 0 0 3 1 0 4 0 1 5 0 0 6 1 1 7 0 1 8 1 1 9 1 2 10 ... Read More
We know that a data frame can contain any type of columns such as numerical, character, logical, factor, etc. And if a data frame contains multiple type of columns then we might want to find the number of columns for each type or of one type say numerical. For this purpose, we can use select_if function of dplyr package along with the length function as shown in the below examples.Example1Consider the below data frame −Live Demo> x1 x2 x3 x4 df1 df1Output x1 x2 x3 x4 1 a -0.18404831 0.1082741 2 2 b ... Read More
The character values can be stored in uppercase, lowercase, or a mixture of the two. If we have values that are either in uppercase or the mixture of lower and upper then we can convert those character values to only lowercase by using tolower function. We simply need to pass the vector or column of the data frame inside the tolower function as shown in the below examples.Example1Consider the below data frame −Live Demo> x1 y1 df1 df1Output x1 y1 1 C -0.1036851 2 C -0.6176530 3 B 0.5763786 4 A 0.1943794 5 C 1.1196470 ... Read More
If column is categorical then there can be at least two categories and there is no limit for the total number of categories but it will also depend on the total number of cases. If we have a data frame that contain some categorical columns having more or less categories than 4 then we might want to subset columns having less than four categories. This could be required in situations when we want to subset the data biasedly or have some predefined data characteristics that allows this change. The subset of such columns can be done with the help of ... Read More
To create a frequency table in R, we can simply use table function but the output of table function returns a horizontal table. If we want to read the table in data frame format then we would need to read the table as a data frame using as.data.frame function. For example, if we have a table called T then to convert it into a data frame format we can use the command as.data.frame(T).Example1Live Demo> x1 x1Output[1] 2 0 2 3 2 3 1 2 1 4 0 0 4 4 1 3 1 2 1 3 2 3 2 1 ... Read More