Introduction

Histograms with Continuous Data

Histograms with Categorical Data

Scatterplots

Boxplots

**Introduction**

Displaying the data visually is a central component of statistical analysis. Visual or graphical displays help us to determine whether we committed any errors when we input the data, whether there are outliers and what the overall pattern of data looks like. Graphical displays are excellent for summarizing large amounts of data and for performing exploratory analysis which helps to generate future hypotheses.

There are several ways of generating graphical images in SAS. Some procedures which we use to generate substantive results (e.g.,

PROC UNIVARIATE) also allow us to visualize the data. There are also a number of dedicated procedures that are designed for graphical presentation only. Among these arePROC CHART,PROC PLOTandPROC BOXPLOT.PROC CHART,PROC PLOTwill produce low resolution or crude graphics. They do have analogues, however, which produce high resolution or 'publishable quality' graphics, and these arePROC GCHART,PROC GPLOT.

**Histograms with Continuous Data**

Occasionally, it is useful to create a histogram for a variable that is continuous (interval or ratio scale). This can be achieved quickly and easily by usingPROC UNIVARIATE. In Tutorial 2 we introduced thenationsdata. Assume that you have converted these data into a SAS system file and saved it on a diskette. It is possible to produce a frequency histogram of the variable PCTURBAN (percent of the population living in urban areas) as follows.LIBNAME place 'A:\'; PROC UNIVARIATE data=place.nations; var pcturban; histogram ; RUN;

**Histograms with Categorical Data**

When one's variables of interest are categorical, it is necessary to usePROC CHART. Again, referring to the nations data, a chart or frequency histogram can be produced by using the following codeNotice that the bars in the histogram generated by this procedure are defined by astrices. Run this procedure again but this time, replaceLIBNAME place 'A:\'; PROC CHART data=place.nations; var religion; RUN;CHARTwithGCHART.PROC GCHARTproduces a high quality graphic that appears inside a graphics box. This box can be copied and pasted into a MS Word or WordPerfect document.

Two or more continuous variables can be graphed by usingPROC PLOT. The nations data contains the variables PCTURBAN and LITERACY. An interesting graphic would be to show the relationship between the literacy rate and the percent of the population living in urban areas. Here we can invokePROC PLOTto produce a quick low resolution graphic andPROC GPLOTfor a high resolution rendition. Try the following code.Simply replaceLIBNAME place 'A:\'; PROC PLOT data=place.nations; plot literacy * pcturban; RUN;PROC PLOTwithPROC GPLOTto produce the high resolution equivalent. Again, this box can be copied and pasted into a MS Word or WordPerfect document.

An excellent way of depicting the distribution of a continuous variable is through the use of a boxplot.PROC BOXPLOTproduces high quality graphics which can be copied and inserted in a report or research paper.BOXPLOTcan be used to produce side by side graphics of a variable broken down by one or more categories. For example, in the nations data, we might wish to compare the distribution of the percent of a nation which is urban by the dominant reglious group. Before doing so, however, it is necessary to sort the data. Hence, we run both thePROC SORTand thePROC BOXPLOTprocedures.Further information on the use of any of these procedure can be obtained by acessing the help screens. This is a particularly useful way for determining what options are available and how one should go about invoking them.LIBNAME place 'A:\'; PROC SORT data=place.nations out=nsort; by religion; RUN; PROC BOXPLOT data=nsort; plot pcturban * religion /boxstyle=schematic; RUN;