Spearman's Rank Correlation


Correlation is a statistical approach for determining the degree to which two variables are related. The Spearman's rank correlation coefficient, usually known as Spearman's rho, is a non-parametric correlation measure that assesses the monotony of two variables. It was named for its inventor, Charles Spearman, who created it in 1904. Assume we need to determine the age difference between two people. Spearman's rank coefficient can be used. There are two kinds of correlation:

  • Parametric Correlation: It is known as a parametric correlation test because it assesses the linear dependency between two variables (x and y) and is dependent on the data distribution.

  • Non-Parametric Correlation: Non-parametric correlation is defined as rank-based correlation coefficients.

Spearman Correlation formula

$\mathrm{r_{s}=1-\frac{6\sum d_i^{2}}{n(n^{2}-1)}}$

$\mathrm{r_{s}}$=Spearman Correlation coefficient

$\mathrm{\sum d_i^{2}}$=The sum of the squared differences between the two variables ranks

n = The number of observations

Algorithm

Spearman's rank correlation coefficient calculation algorithm:

  • Given n observations and two variables X and Y.

  • Separately rank the X and Y values. Assign ranks based on the order of the values, with the lowest value being assigned a rank of 1 and the greatest value being assigned a rank of n.

  • Calculate the differences (d) between the ranks of X and Y for each observation.

  • Square each difference (d) to obtain $\mathrm{d^{2}}$.

  • Compute the sum of the squared differences, $\mathrm{\sum d^{2}2}$.

  • Using the following formula, compute the Spearman's rank correlation coefficient (rs): $\mathrm{r_{s}=1-(6*\sum d^{2})/(n*(n^{2}-1))}$

  • The resulting rs number shows the intensity and direction of X and Y'smonotonic relationship. A number of 1 indicates that there is a perfectpositive correlation, a value of -1 indicates that there is a perfectnegative correlation, and a value of 0 indicates that there is no correlation.

Example 1

We now understand what the correlation coefficient is.So let's look at an example to see how to compute the Spearman's rank correlation coefficient. Assume we have the following information:

X 1 2 3 4 5
Y 3 5 4 1 2

First, we need to rank the values of X and Y:

X 1 2 3 4 5
RankX 1 2 3 4 5
Y 3 5 4 1 2
RankX 3 5 4 1 2

Next, let's calculate the differences and $\mathrm{d^{2}}$ between the ranks of X and Y

d -2 -3 -1 3 3
$\mathrm{d^{2}}$ 4 9 1 9 9

Hence,$\mathrm{d^{2}=22}$

Now, we can finally use this value in the above formulae:

$\mathrm{r_{s}=1-(6*\sum d^{2})/(n*(n^{2}-1))}$

$\mathrm{r_{s}=1-(6*22)/(5*(52-1))}$

$\mathrm{r_{s}=1-0.559}$

$\mathrm{r_{s}=0.441}$

Output

As a result, for the given data, the Spearman's rank correlation coefficient is 0.441, indicating a moderately positive correlation between X and Y.

Example 2

Let us take another example to understand how to calculate the Spearman's rank correlation coefficient. Suppose we have the following data:

X 1 2 3 4 5
Y 1 2 3 4 5

First, we need to rank the values of X and Y

X 1 2 3 4 5

RankX

1 2 3 4 5
Y 1 2 3 4 5
RankX 1 2 3 4 5

Next, let's calculate the differences and $\mathrm{d^{2}}$ between the ranks of X and Y:

d 0 0 0 0 0
$\mathrm{d^{2}}$ 0 0 0 0 0

Hence,$\mathrm{d^{2}=0}$

Now, we can finally use this value in the above formulae:

$\mathrm{r_{s}=1-(6*\sum d^{2})/(n*(n^{2}-1))}$

$\mathrm{r_{s}=1-(6*0)/(5*(0-1))}$

$\mathrm{r_{s}=1-0}$

$\mathrm{r_{s}=1}$

Output

As a result, for the given data, the Spearman's rank correlation coefficient is 1,indicating a perfect positive correlation between X and Y

Advantages

  • Spearman's rank correlation coefficient is a non-parametric measure ofcorrelation that makes no assumptions about the distribution of thevariable.

  • It is capable of handling both normal and non-normal data, making ituseful for assessing data that Pearson's correlation coefficient isincapable of handling.

  • Spearman's rank correlation coefficient is easy to compute andcomprehend.

  • This method is simpler to understand and learn.

  • It is superior for calculating qualitative observations such as people'sintelligence, physical appearance, and so on

  • This method is appropriate when the series only provides the order ofpreference rather than the actual value of the variable.

Disadvantages

  • In detecting and analysing linear relationships, Spearman's rank correlation coefficient can be less effective than Pearson's correlation coefficient

  • It cannot be suitable for data with extreme values or outliers

  • It does not mention the correlation's direction, i.e., whether it is positive or negative.

Conclusion

We have discussed Spearman's Rank Correlation and how it can be used to detect strength and association between two variables.

We also discussed the types of correlation:

  • Parametric Correlation

  • Non-Parametric Correlation

It is simple to compute, non-parametric, and appropriate for non-normal data. However, it may not be appropriate for data with extreme values or outliers, and it is less powerful in detecting linear relationships than Pearson's correlation coefficient. As a result, before deciding on a correlation coefficient, researchers must carefully consider the nature of their data.

Updated on: 19-Jul-2023

240 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements