Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Articles by Nizamuddin Siddiqui
Page 27 of 196
How to create a new data frame for the mean of rows of some columns from an R data frame?
Finding row means help us to identity the average performance of a case if all the variables are of same nature and it is also an easy job. But if some of the columns have different type of data then we have to extract columns for which we want to find the row means. Therefore, we can create a new data frame with row means of the required columns using rowMeans function.Examplerow_means_3.4_cols_df
Read MoreHow to a split a continuous variable into multiple groups in R?
Splitting a continuous variable is required when we want to compare different levels of a categorical variable based on some characteristics of the continuous variable. For example, creating the salary groups from salary and then comparing those groups using analysis of variance or Kruskal-Wallis test. To split a continuous variable into multiple groups we can use cut2 function of Hmisc package −Exampledf$Salary_Group
Read MoreHow to deal with the error "Error in int_abline---plot.new has not been called yet" in R?
The above error means plot is not being created yet hence abline function cannot be used to draw anything on the plot. Therefore, a plot needs to be created first to use abline function for creating a line or any other thing. Mostly, abline is used to create regression line on the plot, thus we need to create a scatterplot first before using abline.Exampleabline(lm(y~x))Output
Read MoreHow to create correlation matrix plot in R?
To create a correlation matrix plot, we can use ggpairs function of GGally package. For example, if we have a data frame called df that contains five columns then the correlation matrix plot can be created as ggpairs(df). A correlation matrix plot using ggpairs display correlation value as well as scatterplot and the distribution of variable on diagonal.Examplelibrary(GGally) ggpairs(df)Output
Read MoreHow to find the mean of columns of an R data frame or a matrix?
If all the columns in an R data frame are numeric then it makes sense to find the mean for each of the columns. This calculation will help us to view how different the values of means are for each of the columns but to make sure that they are significantly different, we will need to run a hypothesis test. To find the column means of a data frame or a matrix we can use colMeans function.ExampleConsider the below data frame −set.seed(9) x1
Read MoreHow to create a boxplot using ggplot2 for single variable without X-axis labels in R?
The important part of a boxplot is Y−axis because it helps to understand the variability in the data and hence, we can remove X−axis labels if we know the data description. To create a boxplot using ggplot2 for single variable without X−axis labels, we can use theme function and set the X−axis labels to blank as shown in the below example.Exampleggplot(df,aes(x=factor(0),y))+geom_boxplot()+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())Output
Read MoreHow to perform shapiro test for all columns in an R data frame?
The shapiro test is used to test for the normality of variables and the null hypothesis for this test is the variable is normally distributed. If we have numerical columns in an R data frame then we might to check the normality of all the variables. This can be done with the help of apply function and shapiro.test as shown in the below example.Exampleapply(df, 2, shapiro.test)Output$x1 Shapiro-Wilk normality test data: newX[, i] W = 0.94053, p-value = 0.2453 $x2 Shapiro-Wilk normality test data: newX[, i] W = 0.95223, p-value = 0.4022 $x3 Shapiro-Wilk normality test data: newX[, i] W = ...
Read MoreHow to change the color of points in a scatterplot using ggplot2 in R?
To color the points in a scatterplot using ggplot2, we can use colour argument inside geom_point with aes. The color can be passed in multiple ways, one such way is to name the particular color and the other way is to giving a range or using a variable. If range or a variable will be used then the color of the points will be in different shades.Exampleggplot(df,aes(x,y))+geom_point(aes(colour=x))Output
Read MoreHow to increase the width of the median line in boxplot using ggplot2 in R?
The default width of the median line is wider than the rest of the lines that represent minimum, first quartile, third quartile or maximum but we can make it a little wider to make it more appealing. This can be done with the help of fatten argument inside geom_boxplot function, the default value of fatten is 2.Exampleggplot(df,aes(x,y))+geom_boxplot(fatten=6)Output
Read MoreHow to highlight text inside a plot created by ggplot2 using a box in R?
There might be many ways to highlight text inside a plot but the easiest one would be using geom_label function of ggplot2 package, with the help of this function we can put the required text and the aesthetics of that text by using a single line of code. It is highly recommended that we should use geom_label function with desired specifications.Examplelibrary(ggplot2) ggplot(df,aes(x))+geom_histogram(bins=30)+geom_label(aes(x=6,y=450,label="Normal Distribution"),fill="red")Output
Read More