Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Server Side Programming Articles - Page 1657 of 2650
341 Views
Most of the times the format of the data we get is not we are looking for therefore, we need to change that according to our need. When the levels of categorical variables are represented by words instead of numbers then we can convert those levels to lowercase or to uppercase. Sometimes, this is done just to make the information look user friendly. Mostly, we find that the values are in lowercase, so we can convert it to the upper case with the help of sapply function.ExampleConsider the below data frame −> x1 x2 x3 df df x1 x2 ... Read More
2K+ Views
If an R data frame contains a group variable that has many group levels then finding the minimum and maximum values of a discrete or continuous variable based on the group levels becomes difficult. But this can be done with slice function in dplyr package.Consider the below data frame that has one group variable and continuous as well as discrete variables −> set.seed(2) > x1 x2 x3 x4 x5 x6 x7 Group df df x1 x2 x3 x4 x5 x6 x7 Group 1 85 8 14 7 8 2.900301 749 1 2 79 7 12 4 3 3.331022 200 2 ... Read More
2K+ Views
When a data frame is large, we can split it into multiple parts randomly. This might be required when we want to analyze the data partially. We can do this with the help of split function and sample function to select the values randomly.ExampleConsider the trees data in base R −> str(trees) 'data.frame': 31 obs. of 3 variables: $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ... $ Height: num 70 65 63 72 81 83 66 75 80 75 ... $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ... Read More
726 Views
When our data has empty values then it is difficult to perform the analysis, we might to convert those empty values to NA so that we can understand the number of values that are not available. This can be done by using single square brackets.ExampleConsider the below data frame that has some empty values −> x1 x2 x3 df df x1 x2 x3 1 1 2 5 2 2 2 5 3 3 2 4 4 1 2 4 5 2 4 4 6 3 4 4 7 1 4 4 8 2 4 2 9 3 2 10 1 2 11 2 12 3 13 1 4 14 2 4 15 3 4 16 4 17 18 19 2 20 1Converting empty values to NA −> df[df == ""] df x1 x2 x3 1 1 2 5 2 2 2 5 3 3 2 4 4 1 2 4 5 2 4 4 6 3 4 4 7 1 4 4 8 2 4 2 9 3 2 10 1 2 11 2 12 3 13 1 4 14 2 4 15 3 4 16 4 17 18 19 2 20 1
704 Views
During the survey or any other medium of data collection, getting all the information from all units is not possible. Sometimes we get partial information and sometimes nothing. Therefore, it is possible that some rows in our data are completely blank and some might have partial data. The blank rows can be removed and the other empty values can be filled with methods that helps to deal with missing information.ExampleConsider the below data frame, it has some missing rows and some missing values −> x1 x2 x3 df df x1 x2 x3 1 1 2 5 2 2 2 5 ... Read More
1K+ Views
Selection of columns in R is generally done with the column number or its name with $ delta operator. We can also select the columns with their partial name string or complete name as well without using $ delta operator. This can be done with select and matches function of dplyr package.ExampleLoading dplyr package −> library(dplyr)Consider the BOD data in base R −> str(BOD) 'data.frame': 6 obs. of 2 variables: $ Time : num 1 2 3 4 5 7 $ demand: num 8.3 10.3 19 16 15.6 19.8 - attr(*, "reference")= chr "A1.4, p. 270"Selecting the column of BOD ... Read More
622 Views
Comparison of rows is an influential part of data analysis, sometimes we compare variable with variable, value with value, case or row with another case or row, or even a complete data set with another data set. This is required to check the accuracy of data values and its consistency therefore we must do it. For this purpose, we need to select the required rows, columns etc. To select the first row for each level of a factor variable we can use duplicated function with ! sign.ExampleConsider the below data frame −> x1 x2 x3 df head(df, 20) x1 ... Read More
263 Views
To check the trend of all columns of a data frame, we need to create line charts for all of those columns. These line charts help us to understand how data points fall or rise for the columns. Once we know the trend, we can try to find the out the reasons behind them and take appropriate actions. We can plot line charts for each of the column by using plot.ts function that plots data as a time series.ExampleConsider the below data frame.> set.seed(1) > x1 x2 x3 x4 x5 x6 df head(df, 20) x1 x2 x3 x4 x5 x6 ... Read More
454 Views
While doing the data exploration in an analytical project, we sometimes need to find the index of some values, mostly the indices of minimum and maximum values to check whether the corresponding data row has some crucial information or we may neglect it. Also, these values sometimes transformed to another values based on the data characteristics if we don’t want to neglect them.Example> x which(x==min(x)) [1] 1 > which(x==max(x)) [1] 25 > set.seed(2) > x1 x1 [1] 85 79 70 6 32 8 17 93 81 76 41 50 75 65 3 80 96 50 55 [20] 63 8 33 ... Read More
429 Views
In data analysis, time series is one of the common data we have to deal with and it might also contain dates data along with other variables. We might want to find the difference between two times to check how many days or weeks have changed the time series. This can be easily done with the help of difftime function.Example> difftime(strptime("25/07/2021", format = "%d/%m/%Y"), + strptime("25/07/2020", format = "%d/%m/%Y"), units="weeks") Time difference of 52.14286 weeks > difftime(strptime("25.07.2021", format = "%d.%m.%Y"), + strptime("25.07.2020", format = "%d.%m.%Y"), units="weeks") Time difference of 52.14286 weeks > difftime(strptime("25.07.2021", format = "%d.%m.%Y"), + strptime("25.07.2020", format = ... Read More