When information is collected, it is generally collected as distinct components. Each distinct component is called a variable
For example, if we are collecting some demographic information of students of a class, we collect the information in distinct components like
Name, Age, Sex, Height and Weight of each student. Each component: Name, Age, Sex, Height, Weight is called a variable.
Collection of components about an entity(student,person,item etc) is called an observation
The space for each character on a line of raw data is considered as a column. For example, in 'google' we say that it is occupying 6 columns(one column for each letter).
Similarly, 'google.in' occupies 9 columns and 'www.google.in' occupies 13 columns
When specified number of columns are reserved for each variable on a row we say that the data is arranged columns
For example, one each row first 16 columns can be reserved for name, column 17 reserved for sex (to capture as M or F), columns 19,20 and 21 reserved for
age component. Even when a name is less than 16 characters, information related to sex variable has to be collected on reserved 17th column when data is planned to be arranged in columns
A SAS dataset has a tabular structure, with rows representing observations and columns representing variables
The type of information that can be stored in a SAS dataset column(a variable) is restricted to either character or numeric format
Based on the type of information that can be stored in a variable, SAS variables are of two types: NUMERIC variable or CHARACTER variable
The variables which store numbers, typically on which intend to perform some arithmetic operations, are called NUMERIC variables
The variables which store text, such as letters, special characters and even numbers (where numeric values are treated as character strings), are called CHARACTER variables
'DATA STEP' of SAS is used to create a SAS dataset from the raw data
We need to tell SAS about the following things for reading raw data
Provide the name of the dataset to be created using DATA statement
The filename, file extension and location of the raw data file using INFILE statement
Variables names, type of the variable (character or numeric) and the column positions for the data components present on reach row of raw data file using INPUT statement
A run statement to inform SAS to compile and execute the 'DATA STEP' code
Name: Columns 1 to 11 are reserved to capture the name of the student, and is read into a character variable 'Name'.
It is assumed here that no student's name exceeds 11 charcters. If maximum number of characters expected for a student's name
is 35, then we have to reserve 35 columns for it and next piece of information should start at or after column 36.
Sex: Column 12 is reserved for capturing gender of the student, and is read into a character variable 'Sex'
Age: Columns 13 to 15 are reserved for collecting age of the student, and is read into a NUMERIC variable 'Age'
Height: Columns 17 to 20 are reserved for capturing height of the student, and is read into a numeric variable 'Height'
Weight: Columns 23 to 27 are reserved for capturing weight of the student, and is read into a numeric variable 'Weight'
Notice that when a piece of information is read into a character variable, we have to indicate it to SAS by using a
dollar sign ($) as suffix after the name of the variable. If a dollar sign is not suffixed, then SAS assumes that we are requesting it
to create a variable of numeric type.
Each record of a student becomes an observation(row) in the SAS dataset
Each information component becomes a variable in the SAS dataset.So, we will have 5 variables in the dataset.
When a SAS dataset is available, we would be interested in knowing details about the dataset like
Name of the dataset
Number of observations in the dataset
Number of variables in the dataset
Date of creation and modification of dataset
The size (space required to store the dataset on the drive) of a dataset depends on number of variables and observations
The size of a dataset thus is dependent on the space (length required) required for each variable
The space required for a variable depends on the type(Numeric vs character) of the variable and number of characters stored in it
We have a procedure called CONTENTS to check the details (attributes) of a SAS dataset
The procedure displays some important information about the dataset on which the procedure is invoked
(in this example, we are invoking proc contents on students01 datasest)
Screenshots shown below highlights some important information from the proc contents output