t and z tests

parametric tests t-test/z test

parametric tests are more efficient/powerful than nonparametric tests but there are 3 restrictions on their use

1) data must be measured at the interval/ratio scale

2) data must be drawn from a normally distributed population

3) data must be drawn in independent samples

4) when you have 2 or more samples, the populations from which the samples are drawn are assumed to have equal variance = homoscedasticity assumption

its ok to assume this if n # 40 otherwise use the F test (ANOVA)

H₀=:₁=:₂ or :1-:₂=0 the population means are equal

or samples drawn from the same population or there is no significant difference between them

eg growth rates in northern and southern Ontario

H₀= there is no significant difference between growth rates in N and S Ontario cities t-test if n# 40

northern southern

0₁=10.6 0₂=15.0

n₁=11 n₂=10

s₁=11.8 s₂=9.6

s_i=standard deviation of i^th sample

t statistic

SE is the standard error of the difference

where SE*0₁-0₂*=

therefore

you may see somewhat different formulas if the analyst decides to use n-1 correction in the variance calculation of the sample

S is the pooled estimate of the variance of the data, a kind of average of the 2 sample variances

t-tables df 2tails

10 2.228

20 2.086

30 2.042

inf 1.960

as n increases t critical approaches 1.96 in other words a normal distribution

for our 2 tailed test df= (n₁-1)+(n₂-1)=(11-1)+(10-1)=10+9=19

at 0.05 with df=19, critical value of t=2.093 (pg 274 in textbook)

therefore we cannot reject H₀

conclude similar growth rates, they are not significantly different

z test if n$40

eg clay content at 2 sites

Site 1 site 2

0₁ = 62.7 0₂ 61.8 small difference in means

N₁ = 120 n₂ 150

s₁ = 2.50 s₂ 2.62 Small standard deviation

therefore reject H₀, there is a significant difference between sites

for z distribution with sig at 5% z=1.96 (2 tailed test)

t test for paired samples

where d_j is the difference between values x_1j, x_2j

if we make the assumption of difference d_j is a random sample from a normal population

we could generalize the test to allow hypotheses concerning any value for the mean difference in the population

:_d = :₁ - :₂

example

a cartographer test the time taken by intro students to perform a given set of tasks involving some extraction of information from some maps, at the end of the course this is repeated

student	1st time	2nd time	difference
1	16	15	1
2	23	21	2
3	17	16	1
4	14	15	-1
5	16	15	1
6	21	19	2
7	19	18	1
8	24	10	14
9	26	15	11
10	19	20	-1

d = 31/10=3.10

Sd=5.11

if "=0.05 t_c=2.262 with df=9

one tailed and two tailed tests

so far we’ve only looked at testing against the null hypothesis, against H₁ that there is a difference between the means of the population from which the 2 samples were taken

since we want to know if the difference lies in either direction it is a 2-tailed test

if we want to test that there is a difference between means in a specified direction we have a 1-tailed test

if H₁ is x >y then the null hypothesis can be rejected only if x >y and if it is significant at a chosen level