*Copyright @ www.statschoice.com;
/********************************************************************************
Create some input datasets
********************************************************************************/
/********************************************************************************
Obtaining the descriptive statistics for analyis variables into a dataset
********************************************************************************/
/********************************************************************************
|
Question
|
Answer
|
|---|
|
How do we specify the name of the input dataset to be used by proc means?
|
We need to specify the name of the input dataset using data= option on proc means statement
|
|
What happens if we do not provide the data= option?
|
The last created dataset in the current SAS session will be used as input for proc means
|
|
How do we specify the variables for which descriptive statistics are requested?
|
We need to specify the variables for which statistics are requested on the VAR statement
|
|
What happens if we do not provide VAR statement?
|
If we do not provide VAR statement SAS provides us the descriptive statistics for all numeric variables present in the input dataset
|
|
How do we specify the name of the output dataset to store the descriptive statistics?
|
We need to specify the name of the output dataset in output statement with out= option?
|
|
What happens if we do not provide the output statement?
|
If we do not provide the output statement, output dataset will not be created but the descriptive statistics will be displayed in the output window.
SAS will display the descriptive statistics in the output window irrespective of the presence of an output statement.
|
********************************************************************************/
/********************************************************************************
Create a dataset named stats01 to store the default descriptive statistics of all numeric variables of class01 dataset
********************************************************************************/
/********************************************************************************
- Specify the name of the input dataset on data= option on proc means statement
- Specfify the name of the output dataset on the out= option on output statement
- As we are interested in statistics for all numeric variables, we do not need to use VAR statement
- In the output dataset three automatic variables _TYPE_ , _FREQ_ , _STAT_ are created along with all the numeric variables in the input dataset
- _FREQ_ variable holds the number of observations contributed to the statistics
- As no specific statistics are requested SAS produces 5 rows in the dataset one row for each default statistic. The default statiscs are:N, Mean, Standard Deviation, Minimum and Maximum
- _STAT_ variable holds the name of one default statistic on each row
- Each numeric variable of the input dataset is created as a variable in the output dataset to hold the statistic result corresponding to the statistic present in _STAT_ variable on each row
********************************************************************************/
/********************************************************************************
Create a dataset named stats02 to store the default statistics of height variable of class01 dataset
********************************************************************************/
/********************************************************************************
- As we are interested in statistics of only one analysis variable, we use VAR statement to specify it
- The output dataset contains now only height variable along with three automatic variable _TYPE_, _FREQ_ and _STAT_
********************************************************************************/
/********************************************************************************
Create a dataset named stats03 to store the default statistics of height and weight variables of class01 dataset
********************************************************************************/
/********************************************************************************
- As we are interested in statistics of only two analysis variables, we use VAR statement to specify them. Analysis variable names are separated by a space on var statement
- The output dataset contains height and weight variables along with three automatic variable _TYPE_, _FREQ_ and _STAT_
********************************************************************************/
/********************************************************************************
Create a dataset named stats04 to store N,Mean and Median of height variable of class01 dataset
********************************************************************************/
/********************************************************************************
- The specific statistics required to be stored in output dataset are to be mentioned as options in the output statement
- Keywords for each statistic can be found in SAS documentation of proc means
- Each keyword of statistic has to be follwed with an 'equal to' sign and the name of the variable into which the statistic has to be stored
- In this example, the result of statistic 'n' is being stored in variable named height_n. Similary, the statistic 'mean' is being stored in height_mean variable.
- As we are requesting a list of explicitly specified statistics, the automatic variable _STAT_ is no longer created
- Output dataset will now only have _TYPE_, _FREQ_ and the requested variables (height_n, height_mean and height_median)
********************************************************************************/
/********************************************************************************
Create a dataset named stats05 to store N,Mean and Standard Deviation of height and weight variable of class01 dataset
********************************************************************************/
/********************************************************************************
- As we are interested in obtaining statistics for two variables only, we need to specify them on VAR statement
- As we need separate variables to store statistics of height and weight variable, we need to list two variable names
after the equal to sign following each statistic keyword
- In this example, height_n stores the 'n' statistic for the first variable listed on VAR statement, which is height. And, weight_n
variable stores the 'n' statistics for the second variable listed on VAR statement, which is weight.
********************************************************************************/
/********************************************************************************
Create a dataset named stats06 to store N,Mean and Median of age, height and weight variable of class01 dataset
********************************************************************************/
/********************************************************************************
- What value will x101 variable hold in the below code? X101 stores median value for third variable(weight) listed on VAR statement
- What value will a101 variable hold?
********************************************************************************/
/********************************************************************************
Create a dataset named stats07 to store N,Mean and Median of age, height and weight variable of class01 dataset using autoname option
********************************************************************************/
/********************************************************************************
- So far, the names of the variables to hold the statistics of analysis variables have been manually listed after the statistic keywords
- if the number of variables list is big, we need to name quite a few variables. Instead, we can use autoname option on output statement to instruct
SAS to atuomatically name the output variables of statistics
********************************************************************************/
/********************************************************************************
Create a dataset named stats08 to store N,Mean,Std of height variable within each Age group of class01 dataset
********************************************************************************/
/********************************************************************************
- As we are interested in fetching the statistics of a variable within groups of another variable, we need to use by group processing
- We enable by group processing by using 'by statement' in proc means
- As we need to group the stats based on age variable, we need to provide age variable on by statement
- To enable by group processing by proc means, we need to pre sort the dataset using the same by variables required in proc means
- Output dataset will now contain one observation for each unique value of AGE variable, with statistic variables representing the statistics
for that group of subjects
********************************************************************************/
/********************************************************************************
Create a dataset named stats09 to store N,Mean,Std of height variable within each Age and Sex group of class01 dataset
********************************************************************************/
/********************************************************************************
- As we need to group observations based on Age and Sex, we need to specify Age and Sex in the by statement
********************************************************************************/
/********************************************************************************
Create a dataset named stats10 to store N,Mean,Std of height variable within each Age and Sex group using CLASS statement
********************************************************************************/
/********************************************************************************
- In order to fetch the statistics of a variable grouped within the values of other variables, we can use class statement instead of by group
processing
- To use class statement for fetching statistics within groups, we need to list the grouping variables on class statement separated by spaces
- Pre sorting of dataset using proc sort is not needed when processing the data for groups with class statement
- By default, when class statement is used
a row is created in the output dataset to store overall summary of the dataset irrespective of by groups
a separate row is created to store statistics for each unique level in each variable listed on class statement
a separate row is created for each unique combination of all variable values listed on class statement(in most of the cases, we would be interested
in this subset of rows)
********************************************************************************/
/********************************************************************************
- When using class statement, if we want to suppress generation of overall summary and individual class variable value level summaries,
we can use nway option on proc means statement.
********************************************************************************/