Up  

 

Kruskal-Wallis test

more than 2 sample tests, test if means of k samples are drawn from different populations

K-W 2 equivalent of parametric F-test (1 way ANOVA)

no need to meet all the assumptions of parametric test

(is distribution free ANOVA by ranks)

simple, good for small samples, powerful

K-W for strong ordinal (ranked) data

example 1: for up to 3 samples, 5 values each

if there are more than 5 values per sample it’s distributed as a P2 with df = k-1

calculate H statistic, can get an exact probability that samples are from the same population

samples from the population of Canadians, which of the following list of 15 cities would you prefer to live in? Separate results in western, central and maritime cities

{note the rankings are of the overall sample, not the rankings within the samples]

 

preference rankings

sum of ranks in jth sample

rj

number in jth sample

nj

maritime

4

8

10

14

15

51

5

western

3

6

7

11

13

40

5

central

1

2

5

9

12

29

5

 

STEP 1: set up hypothesis

H0: no difference in terms of residence preferences between the 3 groups of Canadian cities-

observed differences due to chance variations in response

H1: residence preferences are significantly related to location, preference differences are so great that they are unlikely to have arisen by chance

level of rejection at "=0.05, 95% confidence not by chance

step 2: calculate H

define: j=sample, j=1...k

k=number of samples

rj=sum of ranks in jth sample

nj=number of ranks in jth sample

n=Gnj - sum of individuals in samples

if there are tied ranks use the mean of ranks of values they would have otherwise received

step 3: look up critical value

(Pg 281 in text)

critical H, n1=5,n2=5,n3=5, k=3, "=0.05=5.78

computed H must be $ critical value to reject H0

since 2.42 < 5.78 we cannot reject H0

large samples

sampling distributions for cases where all the sample >5, is similar to that of P2 with df=k-1, where is the number of samples

correction for tied rankings

Ti=ti3-ti with ti being the number of tied observations in the ith group of scores, m=number of ties

t is the number of individuals involved in each set of tied ranks

the effect of the correction is quite small

the effect is to make the value of H larger and so increase the chance of rejecting the null hypothesis

so if the correction is ignored you are erring on the side of caution

unless more than 1/4 of the values in the data set produce tied ranks, the effect of the correction is negligible