Create Hierarchical Cluster Dendrogram in R

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:47:28

389 Views

A dendrogram display the hierarchical relationship between objects and it is created by using hierarchical clustering. In base R, we can use hclust function to create the clusters and the plot function can be used to create the dendrogram. For example, if we want to create the dendrogram for mtcars data then it can be done as shown below:> hc=hclust(dist(mtcars)) > plot(hc)Example1Live Demo> head(mtcars)Outputmpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 ... Read More

Create Row Sum and Row Product Columns in R Data Frame

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:45:36

422 Views

To create a row sum and a row product column in an R data frame, we can use rowSums function and the star sign (*) for the product of column values inside the transform function. For example, if we have a data frame df that contains x, y, z then the column of row sums and row product can be created as:transform(df, RowSums=rowSums(df), RowProducts=x*y*z)ExampleConsider the below data frame:Live Demo> set.seed(3251) > x1 y1 z1 a1 b1 df1 df1Outputx1 y1 z1 a1 b1 1 2 4 10 10 5 2 0 9 5 5 8 3 4 7 6 12 9 ... Read More

Create Exponential Curve in R

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:44:37

7K+ Views

To create an exponential curve, we can use exp function inside the plot function for the variable that we want to plot. For example, if we have a vector x then the exponential curve for the vector x can be created by using plot(x,exp(x)). We can use the exponential function for the variable that is appropriate based on the objective of the analysis, here we have shown only an example of how it works.Example1Live Demo> x plot(x,exp(x))OutputExample2Live Demo> y plot(y,exp(y))Output

Create a Staircase Plot in R

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:43:03

617 Views

The simple staircase plot can be created by using geom_tile function of ggplot2 package. We just need to use the vector or the column for which we want to create the staircase plot in place of x and y as well. For example, if we have a column say x of an R data frame df then the staircase plot can be created as ggplot(df, aes(x, x))+geom_tile().ExampleConsider the below data frame:Live Demo> x df dfOutputx 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10Loading ggplot2 package and creating staircase ... Read More

Create Colored Barplot Using ggplot2 Without Legend Entries in R

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:41:32

129 Views

When we create a colored barplot using ggplot2 the legend entries are automatically created. If we want to create the plot without those legend entries then theme function can be used. For example, if we have a data frame df that contains x as categorical variable and y as count variable then barplot without legend entries can be created as:ggplot(df, aes(x, y, fill=x))+geom_bar(stat="identity")+theme(legend.position="none")ExampleConsider the below data frame:Live Demo> x y df dfOutputx y 1 A 24 2 B 28 3 C 25 4 D 27 5 E 26Loading ggplot2 package and creating the barplot:> library(ggplot2) > ggplot(df, aes(x, y, fill=x))+geom_bar(stat="identity")Output:Creating ... Read More

Create Point Chart for Categorical Variable in R

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:40:16

491 Views

The point chart of a categorical has points corresponding to the categories just like the bar chart has bars. If we want to create point chart for categorical variable then we just need to use geom_point function of ggplot2 package. For example, if we have a data frame df that contains categorical column x and frequency column defined sd freq then the point chart for the categories in x can be ggplot(df, aes(x, freq))+geom_point().ExampleConsider the below data frame:Live Demo> set.seed(3521) > x freq df dfOutputx freq 1 B 2 2 C 12 3 A 8 4 D 12 5 C ... Read More

Extract First Highest Occurring Value in R Data Frame Column

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:39:01

216 Views

The highest occurring value is called the mode and there can be multiple modes in a variable. If we have multiple modes then we can find the first mode or first highest occurring value by using sort function. For example, if we have a vector x that contains more than two modes then the first mode can be found as:sort(table(df$x), decreasing=TRUE)[1]ExampleConsider the below data frame:Live Demo> set.seed(36521) > x df1 df1Outputx 1 B 2 E 3 A 4 A 5 D 6 E 7 D 8 B 9 B 10 C 11 E 12 D 13 E 14 A 15 ... Read More

Count Values Satisfying a Condition in an R Vector

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:37:47

3K+ Views

Sometimes we want to find the frequency of values that satisfy a certain condition. For example, if we have a vector say x that contains randomly selected integers starting from 1 and ends at 100, in this case we might want to find how many values are exactly equal to 10. This can be done by using which and length function.Example1Live Demo> x1 x1Output[1] 5 7 3 3 2 7 3 7 6 3Example> length(which(x1==5)) [1] 1 > length(which(x1==7)) [1] 3 > length(which(x1==3)) [1] 4Example2Live Demo> x2 x2Output[1] 4 1 5 5 5 3 8 9 8 4 8 1 ... Read More

Remove Common Suffix from Column Names in R Data Frame

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:35:45

8K+ Views

To remove a common suffix from column names we can use gsub function. For example, if we have a data frame df that contains column defined as x1df, x2df, x3df, and x4df then we can remove df from all the column names by using the below command:colnames(df) x1Data x2Data x3Data df1 df1Outputx1Data x2Data x3Data 1 29.26500 26.64124 2.598983 2 21.82170 23.41442 4.134393 3 22.71918 25.21586 4.442823 4 19.88633 25.23487 3.338448 5 20.48989 23.33683 3.829757 6 29.07910 25.54084 3.519393 7 24.28573 23.67258 4.667397 8 27.99849 22.97148 4.100405 9 23.48148 25.36574 2.618030 10 26.39401 23.80191 4.235092 11 29.39867 24.36261 2.782559 12 30.11137 ... Read More

Find Standard Deviation with NA Values in R Data Frame

Nizamuddin Siddiqui
Updated on 07-Nov-2020 07:34:26

4K+ Views

If there exists an NA in a vector or column of an R data frame, the output of the sd command for standard deviation results in NA. To solve this problem, we need to use na.rm=TRUE as we do it for vectors that do not contain missing values. For example, if we have a column of a data frame df defined as x that contains missing values then sd of x can be calculated as sd(df$x).ExampleConsider the below data frame:Live Demo> set.seed(3521) > x df1 df1Outputx 1 NA 2 5.107864 3 4.797851 4 5.184345 5 4.680958 6 5.245151 7 5.760667 ... Read More

Advertisements