Generally, when we add a new column to an existing R data frame that column is added at the end of the columns but we might need it at the front. This totally depends on our ease of use, familiarity with variables, and their need. We can add a new column at the front of an existing R data frame by using cbind function.ExampleConsider the below data frame −ID
A matrix has only numeric values and sometimes these values are either incorrectly entered or we might want to replace some of the values in a matrix based on some conditions. For example, if we have few fives in a matrix then we might want to replace all fives to an another number which is greater than 5 or less than 5.ExampleConsider the below matrix −set.seed(123) M
Generally, we extract columns as a vector from an R data frame but sometimes we might need a column as a data frame, therefore, we can use as.data.frame to extract columns that we want to extract as a data frame with single square brackets. The purpose behind this could be merging the column with another data frame.ExampleConsider the below data frame −set.seed(9) x1
When we use par(mfrow), we define the number of plots we want to draw on the plot window and when we draw all the necessary plots then starts again with the first plot. For example, if we set par(mfrow) to (2,2) then we will have four plots on the plot window but if we want to create one plot on the plot window then it does not work, it will show a small plot on the upper left side. To deal with the problem, we can set par(mfrow) to (1,1).Examplepar(mfrow=c(2,2)) x
A matrix contains only numeric values, therefore, if we will convert a data frame that has factor variables as strings then the factor levels will be converted to numbers. These numbering is based on the first character of the factor level, for example, if the string starts with an A then it will get 1, and so on. To convert a data frame to a matrix if the data frame contains factor variable as strings, we need to read the data frame as matrix.ExampleConsider the below data frame −x1
When we create a scatterplot or any other plot and the values are presented in scientific form in the original data then the axes values of the plot are also plotted in scientific form. This makes the plot ambiguous, therefore, reading the plot or interpreting it becomes difficult. Hence, we need to convert the scientific form of the axes labels to numbers and it can be done by using scale_x_continuous(labels =comma) and scale_y_continuous(labels=comma) for both the axes.ExampleConsider the below data frame −set.seed(101) x
A bar plot is one of the most commonly used plots for categorical data and it can be easily done in R with the help of ggplot2. When we create a bar plot using ggplot2, there exists some space between bars and the X-axis and the largest bar and top area of the plot. This can be reduced or increased by using scale_y_continuous function.ExampleConsider the below data frame −x
If we want to provide more information about the data, we have in columns of an R data frames then we might want to use prefixes. These prefixes help everyone to understand the data, for example, we can use data set name as a prefix, the analysis objective as a prefix, or something that is common among all the columns. To add a prefix to columns of an R data frame, we can use paste function to separate the prefix with the original column names.ExampleConsider the below data frame −Exampleset.seed(100) Rate
Sometimes, we want to use some columns of an R data frame for analysis, therefore, it is better to get a list of all the columns that we need. In this way, we don’t have to worry about the column operations, if required because we will be having only necessary columns. To get the list of all columns except one or more columns can be done with the help of single square brackets.ExampleConsider the below data frame −set.seed(100) x1
Scatterplot helps us to identify the linear relationship between two variables and it is the first step of determining a predictive model. Before using any predictive modeling technique we must draw a scatterplot between independent and dependent variables to check what kind of relationship exists between them. A scatterplot generally represented by circular points on the plot area but we can have different types of points such as square, rectangle, diamond, etc. In ggplot2, pch argument of geom_point can help us to create scatterplot with these types of points.ExampleConsider the below data frame −set.seed(123) x