Asymmetric Hashing in Data Structure

Arnab Chakraborty
Updated on 11-Aug-2020 06:18:11

304 Views

In this section we will see what is Asymmetric Hashing technique. In this technique, the hash table is split into d number of blocks. Each split is of length n/d. The probe value xi, 0 ≤ i ≤ d, is drawn uniformly from $$\lbrace\frac{i*n}{d}, ..., \frac{(i+1)*n}{d-1}\rbrace$$. As with multiple choice hashing, to insert x, the algorithm checks the length of the list A[x0], A[x1], . . ., A[xd – 1]. Then appends x to the shortest of these lists. If there is a tie, then it inserts x to the list with smallest index.According to Vocking, the expected length of ... Read More

Split Big Data Frame into Smaller Ones in R

Nizamuddin Siddiqui
Updated on 10-Aug-2020 15:38:43

1K+ Views

Dealing with big data frames is not an easy task therefore we might want to split that into some smaller data frames. These smaller data frames can be extracted from the big one based on some criteria such as for levels of a factor variable or with some other conditions. This can be done by using split function.ExampleConsider the below data frame −> set.seed(1) > Grades Age Category df head(df, 20) Grades Age Category 1 A 25 6 2 B 4 ... Read More

Create Polynomial Model in R

Nizamuddin Siddiqui
Updated on 10-Aug-2020 15:25:29

293 Views

Most of the times the explanatory variables are not linearly related to the response variable and we need to find the best model for our data. In this type of situations, we move on to polynomial models to check whether they will be helpful in determining the accuracy of the predictions. This can be done by using power of the independent variables in lm function.ExampleConsider the below data frame −> set.seed(99) > x1 x2 x3 x4 y df PolynomialModel1 summary(PolynomialModel1) Call: lm(formula = y ~ x1 + I(x1^2) + x2 + x3 + x4) Residuals: Min 1Q Median 3Q Max ... Read More

Add Column in R Data Frame

Nizamuddin Siddiqui
Updated on 10-Aug-2020 15:17:08

233 Views

Since no one is perfect, people might forget to add all columns that are necessary for the analysis but this problem can be solved. If a column is missing in our data frame and we came to know about it later then it can be added easily with the help of reordering the columns.ExampleConsider the below data frame −> x1 x2 x3 df df x1 x2 x3 1 1 a 1 2 2 b 2 3 3 c 1 4 4 d 2 5 5 e 1 ... Read More

Delete a Row from an R Data Frame

Nizamuddin Siddiqui
Updated on 10-Aug-2020 15:06:44

550 Views

While doing the analysis, we might come across with data that is not required and we want to delete it. This data can be a whole row or multiple rows. For example, if a row contains values greater than, less than or equal to a certain threshold then it might not be needed, therefore we can delete it. In R, we achieve this with the help of subsetting through single square brackets.ExampleConsider the below data frame −> set.seed(99) > x1 x2 x3 x4 x5 df df ... Read More

Replace Missing Values in R with NA or Any Other Value

Nizamuddin Siddiqui
Updated on 10-Aug-2020 14:49:40

1K+ Views

Sometimes when we read data in R, the missing values are recorded as blank spaces and it is difficult to replace them with any value. The reason behind this is we need to know how many spaces we have used in place of missing values. If we know that then assigning any value becomes easy.ExampleConsider the below data frame of vectors x and y.> x y df df x y 1 1 2 3 2 3 2 4 1 43 5 2 2 6 3 7 2 3 ... Read More

Find Correlation Matrix in R Using All Variables of a Data Frame

Nizamuddin Siddiqui
Updated on 10-Aug-2020 14:42:15

811 Views

Correlation matrix helps us to determine the direction and strength of linear relationship among multiple variables at a time. Therefore, it becomes easy to decide which variables should be used in the linear model and which ones could be dropped. We can find the correlation matrix by simply using cor function with data frame name.ExampleConsider the below data frame of continuous variable −> set.seed(9) > x1 x2 x3 x4 x5 df df x1 x2 ... Read More

Change the Order of Columns in an R Data Frame

Nizamuddin Siddiqui
Updated on 10-Aug-2020 14:32:20

741 Views

Ordering columns might be required when we want to manipulate the data. Manipulation can have several reasons such as cross verification, visualisation, etc. We should also be careful when we change anything in the original data because that might affect our processing. To change the order of columns we can use the single square brackets.ExampleConsider the below data frame −> set.seed(1) > Class Grade Score df df   Class Grade Score 1   a     A     68 2   b     B     39 3   c     C      1 4   ... Read More

Create Bar Chart Using ggplot2 with Sub Title in R

Nizamuddin Siddiqui
Updated on 10-Aug-2020 14:21:44

196 Views

There are different ways to express any chart. The more information we can provide in a chart, the better it is because a picture says thousand words. Since nobody likes to read a long-reports, we should have better reporting of charts. Therefore, we can add a chart title as well as chart sub-title in ggplot2 to help the readers.ExampleConsider the below data −> set.seed(1) > x table(x) x 2 3 4 5 6 7 8 9 11 1 3 4 2 4 2 2 1 1 > df library(ggplot2)Creating a simple bar chart −> ggplot(df, aes(x))+ + geom_bar()OutputCreating a ... Read More

Create Data Frame in R with Repeated Rows

Nizamuddin Siddiqui
Updated on 10-Aug-2020 14:18:16

1K+ Views

There are times when duplicated rows in a data frame are required, mainly they are used to extend the data size instead of collecting the raw data. This saves our time but surely it will have some biasedness, which is not recommended. Even though it is not recommended but sometimes it becomes necessary, for example, if it is impossible to collect raw data then we can do it. If we do so then we must specify it in our analysis report. In R, we can use rep function with seq_len and nrows to create a data frame with repeated rows.ExampleConsider ... Read More

Advertisements