|
|
When statistics are valuable
Introduction to statistics in geography difficult to give a definition to statistics but one try is - the study of numerical data, and as such is a branch of applied mathematics it has only been around since the end of the 19th century Aim: statistics is concerned with the analysis, organization and simplification of information about real world phenomena as an aid to their description, interpretation and prediction teaches you how to apply a set of techniques to data acquired either directly through field work or from secondary sources such as published materials i.e. - primary - stream discharge, temperature secondary - census data, economic data consequence - statistical analysis develops a more critical approach to research or real world phenomena, you think more rigorously and precisely about the phenonema - less likely to make unsubstantiated vague generalizations if there are computer packages why learn to any of this by hand 1) dramatically increases one’s understanding of statistics 2) aids in interpreting and implementing solutions difference between mathematics and statistics is that math is deductive Toronto > London London > Sarnia ˆ Toronto > Sarniastatistics is inductive inductive arguments give rise to conclusions that often exceed the content of the information on which they are based e.g. take a survey of 1000 Canadians, (not possible to ask all Canadians) about the next election if 550 respond liberal (this is info) we infer or conclude that 55% of all Canadians will vote liberal however qualifications are added like: it is probably true that almost certainly plus or minus because we took a sample of all voters
descriptive statistics there is a tremendous amount of info available that is increasing at an accelerating rate how can we deal with it all? one approach is to use descriptive statistics both for ordinary data and spatial data (data that relates to 2D surfaces or 3D space) descriptive statistics help to summarize this info (no inferences are made) i.e summarizes info about 1) places - what is average distance between stores 2) patterns - what is the arrangement of tall trees vs small trees 3) areas - what is the average income in London 4) temporal trends - is the average price of oil (corrected) increasing
we will first look at methods that help summarize and describe large sets of data data when summarized has properties like means and variance which will be useful when we move on to inferential stats
inferential statistics provides formal methods for calculating the limits of probability, certainty do this by 1) estimating the representativeness of samples 2) estimating the degree to which data support hypothesis
i.e. was a representative group of Canadians interviewed hypothesis: most Canadians will vote liberal (true or false)
significance one of the most powerful uses of statistics in helping to decide whether an observed difference or relationship between 2 sets of sample data is significant statistical significance is concerned with whether an observed difference truly exists significance relies heavily on the concept of probability statistics allows us to make more informed judgments
prediction the 4th major use of statistics completely accurate prediction is only possible in a completely deterministic system very few geographic processes are deterministic if the process is not completely random it may be possible to predict the outcome of a particular combination of circumstances
types of statistical approaches 1) confirmatory statistics - parametric and some nonparametric 2) exploratory data analysis - nonparametric
we'll do some of both! misuse of statistics statistics is not a method by which you can prove anything you want, it has a set of clearly defined rules so that interpretations don't exceed the data statistics is not a substitute for abstract theoretical reasoning or examination of exceptional cases it is a complementary tool
common mistakes (a) a nonrandom sample is drawn, ie. 'volunteers', people who dial in to 900 numbers, even if sample is random those that answer may not be, some may be motivated to respond (b) untruthful answers, i.e. age, income (c) ecological fallacy - happens when comparing statistics across scales or translating results across disparate environments i.e. using events observed at metropolitan level to predict individual behaviour so stats is not just a collection of facts or a recipe book variables variable is a quantity being measured that can take on a set of measured values 2 type (a) - discrete - only certain fixed whole number values (b) - continuous - theoretically can take on an 4 values between the 2 end pointsvariates are the single values observed e.g. variable=area of country measured on scale = mi2 observation #1 3,831,033 Canada observation #2 3,678,896 USA #3 145,709 Japan #4 .2 Vatican City
the number of variates of usually denoted by 'n' observation - a value assigned to one item of a variable
the symbol ' 3' = sigma, it means add up the 'n' observations of variable X in sequence i=1...neg variable = ph obs=1 x1=7.6 obs=2 x2=5.9 obs=3 x3=6.0 obs=4 x4=7.0
constant - a single derived value data a body of information in numerical form a set of data in tabular form is referred to as a data matrix i.e. a spreadsheet data are the 'raw materials' of descriptive and inferential statistics and model building 'garbage in garbage out' (GIGO) need to be concerned with the methods of collection so that data obtained can be used with confidence quality of data 1) valid - measurements actually measure what we think they measure concept º operational definition º variableconcepts are not directly measurable or observable, they are often abstract concepts such as social space, attitudes we often use surrogates - that is a variable that stands in for the concept we are trying to measure statistical modeling cannot proceed unless some means is found of expressing the concept in some form of measurement scale need some fairly 'pure' measurement of the concept the operational definition - specifies the measurement process i.e. how will it actually be measured?
variable(s) may be representative of the concept i.e social class = f(income, educ.,status, occup.) 2) reliability measurements should be free of substantial bias i.e. loaded questions, biased samples 3) precision and accuracy - preciseness is measure of degree of error in measurement, accuracy is exactness, we might measure precisely but the it could be off, i.e. a badly calibrated thermometer might measure precisely but not be accurate measurement scales 4 levels of measurement 1) nominal scale - the simplest scale - allows simple classification or categorization. there is no ordering of the items categories are mutually exclusive (no item in > 1 category exhaustive (include all known cases) any 2 items in a category are considered equal or the same 2) ordinal scale - where enough information is available to place categories in rank order (a) weak ordinal - where can identify more than or less than quality to > 2 categories i.e. high order, low order central places, agree/neutral/disagree on attitudinal scale (b) strong ordinal - stronger ranking - but where exact quantitative difference between pairs of ranked items is unknown i.e. preference scale [1/2/3/4/5] of preferred cities in terms of residential desirability 3) interval - measured and positioned on a continuous scale and can assume any value within its range, the starting point is arbitrary i.e. temp 4) ratio - has additional feature that the ratio of any 2 values on a ratio scale are independent of the scale of measurement, has an absolute nonarbitrary zero point i.e. pop i.e. 1lb/2lb is same ratio as 453 grams/906 grams you can go down the scale of measurement but not up good rule of thumb is to collect as much info as you can i.e. rainfall is at least interval 5 cm 3 15 cm 1 low 4 cm 4 medium 6 cm 2 high
errors and precision answers can be given to any level of precision but all measurements are subject to error beware of spurious precision in general you can only be as precise as your least precise number |