Tutorial 2: SAS Programs
More Data Steps

Introduction
An Example Data Set
The DATA Step
PROC PRINT
PROC CONTENTS
The Whole SAS Program
External Data Sets
Reading Text (ASCII) Data Sets
Saving a SAS System File
Reading a SAS System File
Potential Problems


Introduction

SAS (Statistical Analysis System) is a flexible program for conducting data analysis. SAS contains many built-in procedures for doing descriptive, analytic and exporatory analyses. SAS allows users to conduct wide range of statistical analyses, including analysis of variance, regression, categorical data analysis, multivariate analysis, survival analysis, cluster analysis, and nonparametric analysis.

SAS has a powerful macro language that allows for the implementation of new techniques and variations on old ones. SAS is also used in both corporate and scientific environments as a data base management program for handling large and complex data structures.

Data set

While the SAS is extremely powerful, it is possible to get "up and running" with a few simple commands. This tutorial begins by introducing the basics of data entry and display. Throughout this tutorial, an example dataset called nations will be used. The dataset contains a random sample of 37 countries and 15 variables from the early 1980s. The data are listed in the Table 1 below.

Table 1: 37 Country Sample Data from the 1980s
Austria      7.5  55   12   73   9880   C  U  0.3  80   1.5  98   2.7  Dev  Eu
Belgium      10.2 .    12   73   10899  C  U  0.1  80   1.6  98   3.6  Dev  Eu
Denmark      5.1  83   10   74   12470  C  U  -1   79   1.6  99   4.25 Dev  Eu
Finland      4.8  60   14   74   10870  C  U  0.3  80   1.7  100  6.5  Dev  Eu
France       54.4 73   15   75   11680  C  U  0.4  82   1.8  99   3.5  Dev  Eu
Greece       9.9  65   14   74   4290   C  U  0.2  80   1.5  95   14.8 Dev  Eu
Switzerland  6.5  58   12   76   17010  C  U  0.6  83   1.6  99   2.8  Dev  Eu
Spain        38.2 91   13   74   5430   C  U  0.3  82   1.4  97   7    Dev  Eu
UK           56.3 76   13   73   9660   C  U  0.3  79   1.8  99   7.8  Dev  Eu
Italy        57.5 69   11   74   6840   C  U  0.2  81   1.4  93   6.6  Dev  Eu
Sweden       8.3  83   11   76   14040  C  U  0.5  81   1.9  99   5.7  Dev  Eu
Portugal     10.1 30   16   71   2450   C  R  0.3  78   1.5  83   11.8 Dev  Eu
Netherlands  14.4 88   12   76   10930  C  U  0.6  81   1.6  99   1.5  Dev  Eu
Norway       4.1  70   12   76   14280  C  U  0.5  81   1.8  100  4.5  Dev  Eu
Poland       36.6 59   19   71   .      C  U  -1   77   2.1  98   640  Dev  Eu
Hungary      10.7 54   12   70   2270   C  U  -0.1 75   1.8  99   18   Dev  Eu
Czechoslov   15.4 67   15   71   .      C  U  0.3  76   2    99   1.5  Dev  Eu
Gambia       0.7  18   49   35   360    I  R  3.1  50   6.5  25.1 8    Emg  Af
Iraq         14.5 68   47   59   .      I  U  3.9  68   7.3  55   30   Emg  Ot
Pakistan     94.7 28   43   50   380    I  R  2.2  57   6.7  26   11   Emg  Ot
Bangladesh   95.9 11   49   48   140    I  R  2.8  53   5.7  29   8    Emg  Ot
Ethiopia     33.8 14   47   43   140    I  R  3.5  52   7    55.2 9.6  Emg  Af
Guinea       5.4  19   47   40   310    I  R  2.6  44   6.1  20   27   Emg  Af
Malaysia     15.1 30   31   67   1860   I  R  2.3  71   3.5  65   3.6  Dev  Ot
Senegal      6.1  34   48   43   490    I  R  3    56   6.3  28.1 1.8  Emg  Af
Mali         7.5  17   46   42   180    I  R  2.3  47   7.1  18   .    Emg  Af
Libya        3.3  52   46   58   8510   I  U  3.1  70   5.2  50   20   Dev  Af
Somalia      5.3  30   47   43   290    I  R  0.8  54   7.3  11.6 81.7 Emg  Af
Afghanistan  17.2 16   48   37   .      I  R  7.7  46   6.4  12   50   Emg  Ot
Sudan        20   21   47   48   440    I  R  2.9  55   6.5  31   70   Emg  Af
Turkey       48.7 45   31   63   1370   I  U  2.2  67   3.6  70   68.8 Emg  Ot
Algeria      21   52   44   60   2350   I  U  2.8  64   5.4  52   5.9  Dev  Af
Yemen        6.2  12   48   44   500    I  R  3.1  49   7.6  15   16.9 Emg  Af
Argentina    28   82   24   70   2520   C  U  1.2  74   2.8  94   4925 Dev  Am
Barbados     0.3  41   17   71   2900   C  U  0.6  77   2.1  99   4.7  Dev  Am
Bolivia      6    45   42   51   570    C  U  2.1  56   4.7  63   15.5 Emg  Am
Brazil       131  68   31   63   2240   C  U  1.9  68   3.1  76   1765 Dev  Am

Notice that the data are arranged into a rectangular array in which the rows represent observations (also called cases). In this instance the observations are countries, and the columns represent variables. Although we have tried to use meaningful variable names datasets can be confusing without more information about the data. The Table 2 below contains the codebook for the nations dataset.

Table 2: Codebook for HSB Dataset
Variable
Number
Variable
Name
Variable
Label
Coded
Response
1 country Country alphanumeric
2 pop83 Population in 1983 (in millions) numeric
3 pcturban Percent of population living in urban areas numeric
4 birthrte Births per 1,000 population numeric
5 life_exp Life expectancy at birth numeric
6 gnp82 GNP per person in $US numeric
7 religion Predominant religion C = Christian
I = Islamic
8 urban Urban if variable 3 is over 50%; rural otherwise R = rural
U = urban
9 growth Annual rate of growth (%) in economy numeric
10 life_fem Female life expectancy at birth numeric
11 tfr Total Fertility Rate numeric
12 literacy Percentage of population literate numeric
13 inflatn Rate of inflation (%) numeric
14 gdp Level of economic development Dev = developed
Emg = emerging
15 area Geographical region Eu = Europe
Af = Africa
Am = America
Ot = Other

The DATA Step

You can start your adventure in SAS programing by creating a SAS dataset using the 37 observations displayed above. A DATA statement will begin a DATA step that will create a temporary SAS dataset called work.nations. SAS procedures can only work on SAS datasets. Later on in the tutorial, we will show you how to create permanent data sets. The difference between a temporary and a permanent data set is that temporary data sets are kept in a scratch directory and deleted once the program is closed. Permanent data sets are saved for use in a later session.

In this example, the data are going to be read instream, that is, the data are entered into the Editor window and become part of the SAS program. The data are placed right after the DATALINES statement. Later in the turorial we will show you how to read data from a separate file.

Our first SAS program can be created by combining the data from Table 1 and the variable names presented in Table 2. Together, this information looks as follows.


OPTIONS ls=72;

DATA nations;

   INPUT country $ pop83 pcturban birthrte life_exp gnp82 religion $ urban $ 
	growth life_fem tfr literacy inflatn gdp $ area$;

DATALINES;
Austria      7.5  55   12   73   9880   C  U  0.3  80   1.5  98   2.7  Dev  Eu
Belgium      10.2 .    12   73   10899  C  U  0.1  80   1.6  98   3.6  Dev  Eu
Denmark      5.1  83   10   74   12470  C  U  -1   79   1.6  99   4.25 Dev  Eu
Finland      4.8  60   14   74   10870  C  U  0.3  80   1.7  100  6.5  Dev  Eu
France       54.4 73   15   75   11680  C  U  0.4  82   1.8  99   3.5  Dev  Eu
Greece       9.9  65   14   74   4290   C  U  0.2  80   1.5  95   14.8 Dev  Eu
Switzerland  6.5  58   12   76   17010  C  U  0.6  83   1.6  99   2.8  Dev  Eu
Spain        38.2 91   13   74   5430   C  U  0.3  82   1.4  97   7    Dev  Eu
UK           56.3 76   13   73   9660   C  U  0.3  79   1.8  99   7.8  Dev  Eu
Italy        57.5 69   11   74   6840   C  U  0.2  81   1.4  93   6.6  Dev  Eu
Sweden       8.3  83   11   76   14040  C  U  0.5  81   1.9  99   5.7  Dev  Eu
Portugal     10.1 30   16   71   2450   C  R  0.3  78   1.5  83   11.8 Dev  Eu
Netherlands  14.4 88   12   76   10930  C  U  0.6  81   1.6  99   1.5  Dev  Eu
Norway       4.1  70   12   76   14280  C  U  0.5  81   1.8  100  4.5  Dev  Eu
Poland       36.6 59   19   71   .      C  U  -1   77   2.1  98   640  Dev  Eu
Hungary      10.7 54   12   70   2270   C  U  -0.1 75   1.8  99   18   Dev  Eu
Czechoslov   15.4 67   15   71   .      C  U  0.3  76   2    99   1.5  Dev  Eu
Gambia       0.7  18   49   35   360    I  R  3.1  50   6.5  25.1 8    Emg  Af
Iraq         14.5 68   47   59   .      I  U  3.9  68   7.3  55   30   Emg  Ot
Pakistan     94.7 28   43   50   380    I  R  2.2  57   6.7  26   11   Emg  Ot
Bangladesh   95.9 11   49   48   140    I  R  2.8  53   5.7  29   8    Emg  Ot
Ethiopia     33.8 14   47   43   140    I  R  3.5  52   7    55.2 9.6  Emg  Af
Guinea       5.4  19   47   40   310    I  R  2.6  44   6.1  20   27   Emg  Af
Malaysia     15.1 30   31   67   1860   I  R  2.3  71   3.5  65   3.6  Dev  Ot
Senegal      6.1  34   48   43   490    I  R  3    56   6.3  28.1 1.8  Emg  Af
Mali         7.5  17   46   42   180    I  R  2.3  47   7.1  18   .    Emg  Af
Libya        3.3  52   46   58   8510   I  U  3.1  70   5.2  50   20   Dev  Af
Somalia      5.3  30   47   43   290    I  R  0.8  54   7.3  11.6 81.7 Emg  Af
Afghanistan  17.2 16   48   37   .      I  R  7.7  46   6.4  12   50   Emg  Ot
Sudan        20   21   47   48   440    I  R  2.9  55   6.5  31   70   Emg  Af
Turkey       48.7 45   31   63   1370   I  U  2.2  67   3.6  70   68.8 Emg  Ot
Algeria      21   52   44   60   2350   I  U  2.8  64   5.4  52   5.9  Dev  Af
Yemen        6.2  12   48   44   500    I  R  3.1  49   7.6  15   16.9 Emg  Af
Argentina    28   82   24   70   2520   C  U  1.2  74   2.8  94   4925 Dev  Am
Barbados     0.3  41   17   71   2900   C  U  0.6  77   2.1  99   4.7  Dev  Am
Bolivia      6    45   42   51   570    C  U  2.1  56   4.7  63   15.5 Emg  Am
Brazil       131  68   31   63   2240   C  U  1.9  68   3.1  76   1765 Dev  Am
;

RUN;

If you want, you can open up the SAS program and copy and paste the material between the OPTIONS and the RUN statements into the Editor window in SAS. Both SAS and your browser should use the standard windows commands for blocking, copying and pasting the exerpt.

Let's now look at the statements individually. The OPTIONS statement tells SAS to limit the output to a linesize of 72 characters. Typically, SAS will generate output or results in a window that is 80 characters in width. The DATA statement tells SAS that it should be prepared to enter or manipulate some data. The word nations on the DATA statement provides the data set with a name or handle to which we can refer in later analyses. The third line, or INPUT statement is where we list the names of the variables. Notice that the variables that contain alphanumerics (characters) have a '$' sign following them. Neglecting the $ sign will generate and error message which will appear in the Log window.

The next statement, DATALINES, tells SAS that the data are to follow immediately. In some documentation, you will find that the DATALINES statement is replaced by a CARDS statement. That is a throwback to the old days when data were entered into mainframe computers on punch cards.

Notice that that the end of the data is signified by placing a semicolon on a line by itself. The RUN; statement tells SAS that we are at the end of the DATA step. The SAS DATA step does not do anything but enter the data. An indication that the data have been read should appear in the Log window. To ensure that the data have been read in properly, it is common to print out some or all of the data. This can be achieved by using the PROC PRINT procedure.

PROC PRINT

PROC PRINT is used to display the contents of a SAS dataset. PROC PRINT is often used to assure us that the data were read into SAS correctly.

In this instance, we have used the DATA= option to indicate which dataset is to be used. If the '=nations' section were left off, SAS would use the last active data set. Because it is possible to use more than one dataset at a time, it is good practice to name the dataset to be used so that there is no confusion.

PROC PRINT DATA=nations;
RUN;

PROC CONTENTS

PROC CONTENTS displays information about a SAS dataset.

PROC CONTENTS DATA=nations;
RUN;

The most important things to note right now is that there are 37 observations on 15 variables and that the variables are listed in alphabetical order.

Let's try the PROC CONTENTS again, this time using the POSITION option to display the variables in the order in which they were entered.

PROC CONTENTS DATA=nations POSITION;
RUN;

The Whole SAS Program

It is possible to put all of the SAS statements that were used above in a single SAS program. Review the whole program and see if you can identify the the following parts: The DATA step, the data, and the PROC steps.



OPTIONS ls=72;

DATA nations;

   INPUT country $ pop83 pcturban birthrte life_exp gnp82 religion $ urban $ 
	growth life_fem tfr literacy inflatn gdp $ area $;

DATALINES;
Austria      7.5  55   12   73   9880   C  U  0.3  80   1.5  98   2.7  Dev  Eu
Belgium      10.2 .    12   73   10899  C  U  0.1  80   1.6  98   3.6  Dev  Eu
Denmark      5.1  83   10   74   12470  C  U  -1   79   1.6  99   4.25 Dev  Eu
Finland      4.8  60   14   74   10870  C  U  0.3  80   1.7  100  6.5  Dev  Eu
France       54.4 73   15   75   11680  C  U  0.4  82   1.8  99   3.5  Dev  Eu
Greece       9.9  65   14   74   4290   C  U  0.2  80   1.5  95   14.8 Dev  Eu
Switzerland  6.5  58   12   76   17010  C  U  0.6  83   1.6  99   2.8  Dev  Eu
Spain        38.2 91   13   74   5430   C  U  0.3  82   1.4  97   7    Dev  Eu
UK           56.3 76   13   73   9660   C  U  0.3  79   1.8  99   7.8  Dev  Eu
Italy        57.5 69   11   74   6840   C  U  0.2  81   1.4  93   6.6  Dev  Eu
Sweden       8.3  83   11   76   14040  C  U  0.5  81   1.9  99   5.7  Dev  Eu
Portugal     10.1 30   16   71   2450   C  R  0.3  78   1.5  83   11.8 Dev  Eu
Netherlands  14.4 88   12   76   10930  C  U  0.6  81   1.6  99   1.5  Dev  Eu
Norway       4.1  70   12   76   14280  C  U  0.5  81   1.8  100  4.5  Dev  Eu
Poland       36.6 59   19   71   .      C  U  -1   77   2.1  98   640  Dev  Eu
Hungary      10.7 54   12   70   2270   C  U  -0.1 75   1.8  99   18   Dev  Eu
Czechoslov   15.4 67   15   71   .      C  U  0.3  76   2    99   1.5  Dev  Eu
Gambia       0.7  18   49   35   360    I  R  3.1  50   6.5  25.1 8    Emg  Af
Iraq         14.5 68   47   59   .      I  U  3.9  68   7.3  55   30   Emg  Ot
Pakistan     94.7 28   43   50   380    I  R  2.2  57   6.7  26   11   Emg  Ot
Bangladesh   95.9 11   49   48   140    I  R  2.8  53   5.7  29   8    Emg  Ot
Ethiopia     33.8 14   47   43   140    I  R  3.5  52   7    55.2 9.6  Emg  Af
Guinea       5.4  19   47   40   310    I  R  2.6  44   6.1  20   27   Emg  Af
Malaysia     15.1 30   31   67   1860   I  R  2.3  71   3.5  65   3.6  Dev  Ot
Senegal      6.1  34   48   43   490    I  R  3    56   6.3  28.1 1.8  Emg  Af
Mali         7.5  17   46   42   180    I  R  2.3  47   7.1  18   .    Emg  Af
Libya        3.3  52   46   58   8510   I  U  3.1  70   5.2  50   20   Dev  Af
Somalia      5.3  30   47   43   290    I  R  0.8  54   7.3  11.6 81.7 Emg  Af
Afghanistan  17.2 16   48   37   .      I  R  7.7  46   6.4  12   50   Emg  Ot
Sudan        20   21   47   48   440    I  R  2.9  55   6.5  31   70   Emg  Af
Turkey       48.7 45   31   63   1370   I  U  2.2  67   3.6  70   68.8 Emg  Ot
Algeria      21   52   44   60   2350   I  U  2.8  64   5.4  52   5.9  Dev  Af
Yemen        6.2  12   48   44   500    I  R  3.1  49   7.6  15   16.9 Emg  Af
Argentina    28   82   24   70   2520   C  U  1.2  74   2.8  94   4925 Dev  Am
Barbados     0.3  41   17   71   2900   C  U  0.6  77   2.1  99   4.7  Dev  Am
Bolivia      6    45   42   51   570    C  U  2.1  56   4.7  63   15.5 Emg  Am
Brazil       131  68   31   63   2240   C  U  1.9  68   3.1  76   1765 Dev  Am
;
RUN;

PROC PRINT DATA=nations;
RUN;

PROC CONTENTS DATA=nations;
RUN;

PROC CONTENTS DATA=nations POSITION;
RUN; 

 

External data sets

Large amounts of data can become difficult to include in the DATA section of your program. Fortunately, the data can be read directly from an external data file. SAS has the ability to read files in several formats, including files in excel, Dbase and other statistical formats. Often, however, data are provided to use in text or ASCII format. SAS can easily input those data directly if the entries in the data set are tab, space or comma delimited. That is, each data element must be separated by a tab, space or comma. It is also possible for SAS to read data where the observations run into each other but the variables are located in the same column for each observation.

SAS also has the ability to save and read its own system or binary files. The advantage of creating system files is that they can be processed much more quickly. This may not be an issue when the data contain only a few observations but it may become one when the data contain several thousand records.

Reading Text (ASCII) Data Sets

In the following example, it is assumed that the data comprise a comma, space or tab delimited text file. For instructions on importing other file formats, see the on-line SAS documentation under "Help".

To access an external data file in text format, a few changes need to be made to the DATA step. Specifically,
1) the DATALINES or CARDS statement is deleted, and
2) an INFILE statement is added.
The INFILE statement tells SAS where to find the dataset. The path to the dataset is enclosed in quotes on the INFILE statement

DATA nations;

INFILE 'A:\sas\nations.txt';

    INPUT country $ pop83 pcturban birthrte life_exp gnp82 religion $ urban $ 
	growth life_fem tfr literacy inflatn gdp $ area $;

RUN;

Saving a SAS System File

Often it is useful to save a SAS sytem file for later use. We do this by first creating a folder or directory in which the file is to be placed. For example, we might create the folder c:\myfiles. The SAS system refers to these folders as libraries. It is possible to create several libraries containing numerous files within each. This is a convenient way of organizing files that have common themes.

Once the directory is created, it is reference through a LIBNAME statement.


LIBNAME storage 'c:\myfiles';
DATA storage.nations;

INFILE 'A:\sas\nations.txt';

    INPUT country $ pop83 pcturban birthrte life_exp gnp82 religion $ urban $ 
	growth life_fem tfr literacy inflatn gdp $ area $;

RUN;
Here, the LIBNAME statement assigns the directory or library location the name 'storage.' This is particuarly handy when we are dealing with a complex subdirectory structure; it saves us the effort of writing out the location in full. Later in the program, files can be associated with the library by connecting the libname with the file name by placing a period between the two. When a file name is associated with a libname, SAS makes the assumption that the file is to be a permanent one and saves it in that library. Without a libname associated with it, files are treated as temporary entities and deleted once the we exit the program. By puting 'storage.nations' on the DATA statement, the data will be save as a system file in the directory c:\myfiles.


Reading a SAS System File

Once we have created and saved a SAS sytem file in a folder, we can access it by using the LIBNAME structure. Again, assume that we have create the folder c:\myfiles and stored the SAS data set in it. The system file can then be accessed by using the set statement within the DATA section
.

LIBNAME storage 'c:\myfiles';
DATA newnats;
	SET storage.nations;
RUN;
PROC PRINT (obs=10);
RUN;
Here, the set statement identifies the system file we want to read while the term 'newnats' on the DATA statement creates a temporary file called newnats (naturally). The PROC PRINT statement will print out the first 10 observations.

Problems to look out for...


Revised: August 25, 2000