Tutorial 4: SAS Programs
Displaying the Data



Introduction
Histograms with Continuous Data
Histograms with Categorical Data
Scatterplots
Boxplots


Introduction

Displaying the data visually is a central component of statistical analysis. Visual or graphical displays help us to determine whether we committed any errors when we input the data, whether there are outliers and what the overall pattern of data looks like. Graphical displays are excellent for summarizing large amounts of data and for performing exploratory analysis which helps to generate future hypotheses.

There are several ways of generating graphical images in SAS. Some procedures which we use to generate substantive results (e.g., PROC UNIVARIATE) also allow us to visualize the data. There are also a number of dedicated procedures that are designed for graphical presentation only. Among these are PROC CHART, PROC PLOT and PROC BOXPLOT. PROC CHART, PROC PLOT will produce low resolution or crude graphics. They do have analogues, however, which produce high resolution or 'publishable quality' graphics, and these are PROC GCHART, PROC GPLOT.

Histograms with Continuous Data

Occasionally, it is useful to create a histogram for a variable that is continuous (interval or ratio scale). This can be achieved quickly and easily by using PROC UNIVARIATE. In Tutorial 2 we introduced the nations data. Assume that you have converted these data into a SAS system file and saved it on a diskette. It is possible to produce a frequency histogram of the variable PCTURBAN (percent of the population living in urban areas) as follows.

LIBNAME place 'A:\';
PROC UNIVARIATE data=place.nations;
	var pcturban;
	histogram ;
RUN;

Histograms with Categorical Data

When one's variables of interest are categorical, it is necessary to use PROC CHART. Again, referring to the nations data, a chart or frequency histogram can be produced by using the following code

LIBNAME place 'A:\';
PROC CHART data=place.nations;
	var religion;
RUN;
Notice that the bars in the histogram generated by this procedure are defined by astrices. Run this procedure again but this time, replace CHART with GCHART. PROC GCHART produces a high quality graphic that appears inside a graphics box. This box can be copied and pasted into a MS Word or WordPerfect document.

Scatterplots

Two or more continuous variables can be graphed by using PROC PLOT. The nations data contains the variables PCTURBAN and LITERACY. An interesting graphic would be to show the relationship between the literacy rate and the percent of the population living in urban areas. Here we can invoke PROC PLOT to produce a quick low resolution graphic and PROC GPLOT for a high resolution rendition. Try the following code.

LIBNAME place 'A:\';
PROC PLOT data=place.nations;
	plot literacy * pcturban;
RUN;
Simply replace PROC PLOT with PROC GPLOT to produce the high resolution equivalent. Again, this box can be copied and pasted into a MS Word or WordPerfect document.

Boxplots

An excellent way of depicting the distribution of a continuous variable is through the use of a boxplot. PROC BOXPLOT produces high quality graphics which can be copied and inserted in a report or research paper. BOXPLOT can be used to produce side by side graphics of a variable broken down by one or more categories. For example, in the nations data, we might wish to compare the distribution of the percent of a nation which is urban by the dominant reglious group. Before doing so, however, it is necessary to sort the data. Hence, we run both the PROC SORT and the PROC BOXPLOT procedures.

LIBNAME place 'A:\';
PROC SORT data=place.nations out=nsort;
	by religion;
RUN;
PROC BOXPLOT data=nsort;
	plot pcturban * religion /boxstyle=schematic;
RUN;
Further information on the use of any of these procedure can be obtained by acessing the help screens. This is a particularly useful way for determining what options are available and how one should go about invoking them.

Revised: March 19, 2001