Up  

 

 

Correlation

correlation is a much abused word/term

correlation is a term which implies that there is an association between the paired values of 2 variables, where association means that the fluctuations in the values for each variable is sufficiently regular to make it unlikely that the association has arisen by chance

assumes: independent random samples are taken from a distribution in which the 2 variables are together normally distributed

example 1:

variable A (income of family) (1000s of Swiss francs)

variable B (# of autos owned)

 

paired values

A

3

6

9

12

15

B

1

2

3

4

5

 

 

Here there is a perfect and positive correlation as one variable increases in precisely the same proportion as the other variate increases

example 2

variable A (income of family) (1000s of Zambian pounds)

variable B (# of children)

 

 

paired values

A

3

6

9

12

15

B

5

4

3

2

1

here is a perfect and negative correlation as one variate decreases in precisely the same proportion as the other variate increases

example 3

variable A (income of family)

variable B (last number of postal code)

 

paired values

A

3

6

9

12

15

B

4

1

3

5

2

 

here there is almost no correlation because one variate does not systematically change with the other. Any association is caused by A and B being randomly distributed

correlation is a method whereby a coefficient is calculated to describe the degree of association between sets of paired values, and then tested to determine the probability that the association might be due to chance variation

ie. Can show there is only a 5% chance or less of the association being caused by a random influence - but this does not mean that one variables is causing fluctuations in the other

no causal link can be deduced from a correlation alone- it requires other evidence and good judgement

in the above examples

example 1 - correlation coefficient =1

example 2 - correlation coefficient =-1

example 3 - correlation coefficient =0

the correlation coefficient for the parametric case is called the Pearson product moment correlation coefficient (r)

it is powerful but data has to satisfy ‘normal’ conditions

calculation

x,y are values of the 2 variables

Sx, Sy are the sample standard deviation

data should be set up in a table to facilitate calculations

 

 

total proteins consumed

log of income/capital

     
 

X

Y

X2

Y2

XY

Argentina

98

2.715

9604

7.37

266.1

Brazil

61

2.401

3721

5.77

146.5

Denmark

92

3.289

8464

10.82

302.6

Spain

71

2.849

5041

8.12

202.3

Turkey

73

2.476

5329

6.13

180.7

UK

86

3.193

7396

10.20

274.6

US

92

3.519

8464

12.38

323.7

G

573

20.45

48019

60.79

1696.5

 

n=7

n=7

     
 

x=81.9

y=2.92

x2=6707.6

y2= 8.52

xy=239.15

 

testing the significance of r

H0: r is not significantly different than 0

H1: r is significantly different than 0

 

example

df=N-2

tcritical("=0.05)=2.571 we must accept the null hypothesis

 

Correlation Coefficient Rule of Thumb

Size of Coefficient

General Interpretation

0.8 to 1.0

Very Strong Relationship

0.6 to 0.8

Strong relationship

0.4 to 0.6

Moderate relationship

0.2 to 0.4

Weak relationship

0.0 to 0.2

Very Weak or No relationship