Lab Four

Lab Four: The 't' and the 'z' Tests

INTRODUCTION

Both the 't' and 'z' parametric tests serve to answer the same research question: does the sample (or samples) come from the same underlying population. Or in other words is the sample mean (X) significantly different from the population mean :. The 'z' test, is used when the standard deviation of the population (F) is known. When information about the population is unknown or when two sample means are being compared, then the 't' test is employed. Both tests assume that the variable measured comes from a normally distributed population and that the samples were selected randomly. It should also be realized that when the number of observations exceed 40, the 't' and 'z' tests are roughly equivalent.

Computational Formulae:

'z' test formulae

(if samples are drawn from an infinite population),

(if samples are drawn from a finite population)

(N = number of observations in the sample; N_p = number of observations in the population) .

't' test formulae

where x and y are the means of the two samples and n_x and n_y are the sizes of the two samples. This formula is equivalent to the formula given on page 168 in the textbook. The formula given here is the easiest to use if you are doing the t-test manually. It is not the best to use if your were writing a computer program to do it. If you chose to use the textbook formula you should refer to page 171 for an example.

the degrees of freedom for the t test are given by

you need only use this formula if you are comparing samples from two different populations with extremely large variances.

Therefore don’t use it in this exercise, use df = n₁ + n₂ -2.

where, x , y are the means of the two samples and n_X and n_y are the sizes of the two samples.

Example of t test for unmatched samples:

Assume a test of map reading skills is given to 35 randomly selected male students and 35 randomly selected female students. A score of 1 indicates low skill and 5 high skill.

Determine if the two groups have statistically different means.

males		females
1	2	1	2
1	4	1	1
1	5	2	1
2	1	1	1
1	1	1	1
1	2	1	2
3	1	3	3
3	2	1	1
1	1	1	1
1	2	2	1
2	1	4	2
1	1	1	2
2	1	1	2
1	1	1	1
1	3	1	1
1	3	5	1
1	1	1	1
	1	2

Step 1: Find the mean for each sample

Step 2: Find the variance for each sample

Step 3: Find the standard error of the difference between means

Note this yet another version of the formula for the standard error of the difference between means. All the formulas give an answer of .24. Try it! They are simply different algebraic manipulations for the same formula. This is common when looking at statistics textbooks. You won’t always see the same formulas. Use the one you like the best.

Step 4: Compute the t value from the difference between the means and the standard error of the difference

Step 5: Determine the critical value of t

df=N₁ + N₂ - 2 = 35 + 35 - 2 = 68

t=2.00

Step 6: Compare the calculated and table t values. t calculated = .71 which is less than the critcal value of 2.00 so we accept the null hypothesis.

Above example derived from Elementary Statistics in Social Sciences by J. Levin and J. A. Fox, 1991, Harpers Collins Publishers, NY, pp. 210-213.

Example 2: Use of t-test for paired samples

Below is a comparison of murder rates for 5 cities in the US for 1960 and 1970

1960	1970
10.1	20.4
10.6	22.1
8.2	10.2
4.9	9.8
11.5	13.7
From: Mendenhall, W. ,W. Ott and R.F. Larson, 1974, Statistics: a tool for the social sciences, Boston, Duxbury Press.

Determine if there is any difference between 1960 and 1970.

Step 1: H_o: There is no difference in the mean murder rate for 1960 and 1970.

H₁: There is a difference in the mean murder rate for 1960 and 1970.

Step 2: Find d and d²

1960	1970	d	d²
10.1	20.4	10.3	106.1
10.6	22.1	11.5	132.3
8.2	10.2	2.0	4.0
4.9	9.8	4.9	24.0
11.5	13.7	2.2	4.8
Sum		30.9	271.2
From: Mendenhall, W. ,W. Ott and R.F. Larson, 1974, Statistics: a tool for the social sciences, Boston, Duxbury Press.

Step 3: calculate the variance of the mean differences between pairs

Step 4: calculate t

Step 5: set "=.05 and find t table value for df=3, t = 3.182

we accept H₀

INSTRUCTIONS

Parts 1 and 2 are to be calculated manually with the use of the above formulae, part 3 is to be evaluated using SPSS for Windows.

Part I (20 marks)

From a Canada-wide study that looked at land conversion from rural to urban uses (for all urban areas with a population of more than 25 000 as of 1971), the following sample of Ontario cities was selected.

The question to be answered: are Ontario cities more susceptible to land conversion (meaning conversion to urban from rural or rural to urban) than other- Canadian destinations? Assume that the variable 'Total Area Converted' is normally distributed and that the 72 cities in Canada with a population of at least 25,000 have a mean of 1196 ha of converted land and a standard deviation of 200 ha. In answering, be sure to state:

CITY	Total Area Converted (ha)
Hamilton	1425
Kitchener	2529
London	1546
Oshawa	609
Ottawa	4612
St. Catharines	5087
Thunder Bay	573
Toronto	11755
Windsor	1223
Source: Gierman, D.M. (1977) Rural to Urban Land Conversion, Occasional Paper No. 16, Fisheries and Environment Canada, Ottawa.

a) The null and research hypotheses,

b) Whether the test is one-tailed or two-tailed,

c) Which test (the 't' or 'z') should be used and what the calculated value is,

d) At a 95% confidence interval, the critical value as found in the relevant table (see text book), and

e) If the null hypothesis can be rejected.

f) What, then, can be concluded about land conversion in Ontario in comparison to the rest of Canada?

Part 2a (20 marks)

Listed below are the values (in millions of dollars) of Canadian foreign direct investment transactions made in the states of Florida and New York for the years 1989 to 1991. The majority of these transactions are in the form of mergers, acquisitions or new plant openings by Canadian investors.

The question to be answered: is the average value of Canadian foreign direct investment in New York significantly different from that in Florida. Again, in answering, please state the following:

Value of Canadian Foreign Direct Investment ($ Million)

New York

4.7

10.0

810.0

49.0

2.9

5.1

5.0

62.5

188.0

43.0

1.8

160.0

15.0

8.0

15.0

31.0

60.9

10.6

14.0

6.0

550.0

Florida

2.0

3.9

10.0

6.4

2.2

125.0

1200.0

17.3

15.0

1.6

2.1

59.3

26.5

4.2

30.0

1.4

2.8

Source: US Department of Commerce, (1989-1991)

Foreign Direct Investment in The United States, Washington DC

a) The null and research hypotheses,

b) Whether the test is one- or two-tailed,

c) Which test should be used to answer the question and the calculated value attained,

d) At a 95% confidence interval, what the critical value is, and

e) If the null hypothesis can be rejected.

f) What can be said about Canadian foreign direct investment in New York and Florida; is there a significant difference?

Part 2b (20 marks)

Listed in Table 3 are winter wheat yields for some towns in England.

Site	1970	1973
Cambridge	46.81	32.61
Cockle Park	46.49	41.02
Harpers Adams	44.03	50.23
Headley hall	52.24	34.56
Morley	36.55	43.17
Myerscough	34.88	50.08
Rosemaund	56.14	38.99
Seale-Hayne	45.67	50.32
Sparsholt	42.97	47.49
Sutton Bonington	54.44	46.94
Terrington	54.95	39.13
Wye	48.94	59.12
from: Lyons, R., 1980, A review of multidimensional scaling, unpublished MSc thesis, University of Reading

The question to be answered is: did the yields change across time?

a) The null and research hypotheses,

b) Whether the test is one- or two-tailed,

c) Which test should be used to answer the question and the calculated value attained,

d) At a 95% confidence interval, what the critical value is, and

e) If the null hypothesis can be rejected.

f) What can be said about winter wheat yields; is there a significant difference?

Part 3: (20 marks)

Using the same data from Part 2, use SPSS for Windows to confirm your results. Make sure to hand in all output.

a) is the calculated value on the output the same as the one calibrated by hand? If not, account for the difference

b) How does one determine significance solely from the output? (ie. without the aid of any tables or calculations.) With 95% certainty, can the null hypothesis be rejected ?

Part 4: (20 marks)

Using data that you have collected, conduct either a t or z test on two samples. As usual discuss the imperfections of the data and what they mean and their significance. Be sure to outline your hypotheses and significance levels before you conduct the test.

Extra Credit: (20 marks)

Using data that you collect yourself, perform a 2-sample Difference of Proportions Test. Provide details of the source of the data and any limitations it might have.

State:

a) The null and research hypotheses,

b) Which test should be used to answer the question and the calculated value attained,

c) At a 95% confidence interval, what the critical value is, and

d) If the null hypothesis can be rejected.