When we deal with text data it is difficult to make it clean and one of the most of basic problem with this type of data is that the values are separated with some unique characters such as special characters. For this purpose, we can use strsplit function that makes it easy to do the separation among text values. Check out the examples below to understand how it can be done.Example Live Demox1
We can use str_detect function to check whether a single string or a vector of strings is in lowercase or uppercase. Along with str_detect function, we need to use either upper or lower to check whether the string is in lowercase or uppercase and the output will be returned in TRUE or FALSE form, if the string will be in lowercase and we pass lower with str_detect function then the output will be TRUE and vice-versa.Example Live Demox1
To find the rank of a vector of elements we can use rank function directly but this will result in ranks from smallest to largest. For example, if we have a vector x that contains values 1, 2, 3 in this sequence then the rank function will return 1 2 3. But if we want to get ranks from largest to smallest then it would be 3 2 1 and it can be done in R as rank(-x).Example Live Demox1
Mostly, we get data that contain column names in lowercase or just first letter is in upper case. If we want to convert those column names to all capital letter words or uppercase then toupper function can be used to the names of the columns. This can be done by using the below syntax −Syntaxnames(“data_frame_name”)
To create the simple logistic model, we need to use glm function with family = binomial because the dependent variable in simple logistic model or binomial logistic model has two categories, if there are more than two categories then the model is called as multinomial logistic model. If we want to extract the odds ratio of slope and intercept from the simple logistic model then exp function needs to be used with model object as shown in the below examples.Example Live Demoset.seed(999) x1
Suppose we have a data frame df1 that contains 5 columns and another data frame df2 that contains only column but the data type of the columns in both the data frames is same. Now we might want to add the column of the second data frame starting at the end of the rows of the first data frame by creating the same number of columns as in first data frame. This might be required by researchers to understand the impact of an external variable on the result of the analysis and it can be done with the help of ... Read More
Random sampling is a technique used by almost every researcher, analyst, financial analyst, data scientist, or even a leader and if we way that almost everyone uses it at least once in a lifetime then it won’t be surprise. Because we use it in one or the way in our life even if we don’t know about it. To take a random sample or creating random values up to a range of values starting from 1, we can simply use sample function in R. Checkout below examples to understand how this function works for sampling with replacement.Example Live Demosample(100)Output[1] 17 76 ... Read More
A data.table object is very similar to a data frame in R, therefore, converting a data.table object to a matrix is not a difficult job. We just need to use as.matrix function and store the data.table object into a new object that will belong to the matrix, otherwise R will not be able to convert the data.object to a matrix. For example, if we have a data.table object DT then to convert it into a matrix, we should use the below example code −DT_matrix
If we have missing values/NA in our data frame and create a plot using ggplot2 without excluding those missing values then we get the warning “Removed X rows containing missing values”, here X will be the number of rows for the column that contain NA values. But the plot will be correct because it will be calculated by excluding the NA’s. To avoid this error, we just need to pass the subset of the data frame column that do not contains NA values as shown in the below example.Consider the below data frame with y column having few NA values ... Read More
The categorical variables can be easily visualized with the help of mosaic plot. In a mosaic plot, we can have one or more categorical variables and the plot is created based on the frequency of each category in the variables. To create a mosaic plot in base R, we can use mosaicplot function. The categories that have higher frequencies are displayed by a bigger size box and the categories that have less frequency are displayed by smaller size box.Consider the below data frame −Example Live Demox1