kw test

Kruskal-Wallis test

more than 2 sample tests, test if means of k samples are drawn from different populations

K-W 2 equivalent of parametric F-test (1 way ANOVA)

no need to meet all the assumptions of parametric test

(is distribution free ANOVA by ranks)

simple, good for small samples, powerful

K-W for strong ordinal (ranked) data

example 1: for up to 3 samples, 5 values each

if there are more than 5 values per sample it’s distributed as a P² with df = k-1

calculate H statistic, can get an exact probability that samples are from the same population

samples from the population of Canadians, which of the following list of 15 cities would you prefer to live in? Separate results in western, central and maritime cities

{note the rankings are of the overall sample, not the rankings within the samples]

	preference rankings					sum of ranks in jth sample r_j	number in jth sample n_j
maritime	4	8	10	14	15	51	5
western	3	6	7	11	13	40	5
central	1	2	5	9	12	29	5

STEP 1: set up hypothesis

H₀: no difference in terms of residence preferences between the 3 groups of Canadian cities-

observed differences due to chance variations in response

H₁: residence preferences are significantly related to location, preference differences are so great that they are unlikely to have arisen by chance

level of rejection at "=0.05, 95% confidence not by chance

step 2: calculate H

define: j=sample, j=1...k

k=number of samples

r_j=sum of ranks in jth sample

n_j=number of ranks in jth sample

n=Gn_j - sum of individuals in samples

if there are tied ranks use the mean of ranks of values they would have otherwise received

step 3: look up critical value

(Pg 281 in text)

critical H, n₁=5,n₂=5,n₃=5, k=3, "=0.05=5.78

computed H must be $ critical value to reject H₀

since 2.42 < 5.78 we cannot reject H₀

large samples

sampling distributions for cases where all the sample >5, is similar to that of P² with df=k-1, where is the number of samples

correction for tied rankings

T_i=t_i3-t_i with t_i being the number of tied observations in the ith group of scores, m=number of ties

t is the number of individuals involved in each set of tied ranks

the effect of the correction is quite small

the effect is to make the value of H larger and so increase the chance of rejecting the null hypothesis

so if the correction is ignored you are erring on the side of caution

unless more than 1/4 of the values in the data set produce tied ranks, the effect of the correction is negligible