- SAS Tutorial
- SAS - Home
- SAS - Overview
- SAS - Environment
- SAS - User Interface
- SAS - Program Structure
- SAS - Basic Syntax
- SAS - Data Sets
- SAS - Variables
- SAS - Strings
- SAS - Arrays
- SAS - Numeric Formats
- SAS - Operators
- SAS - Loops
- SAS - Decision Making
- SAS - Functions
- SAS - Input Methods
- SAS - Macros
- SAS - Dates & Times

- SAS Data Set Operations
- SAS - Read Raw Data
- SAS - Write Data Sets
- SAS - Concatenate Data Sets
- SAS - Merging Data Sets
- SAS - Subsetting Data Sets
- SAS - Sort Data Sets
- SAS - Format Data Sets
- SAS - SQL
- SAS - Output Delivery System
- SAS - Simulations

- SAS Data Representation
- SAS - Histograms
- SAS - Bar Charts
- SAS - Pie Charts
- SAS - Scatterplots
- SAS - Boxplots

- SAS Basic Statistical Procedure
- SAS - Arithmetic Mean
- SAS - Standard Deviation
- SAS - Frequency Distributions
- SAS - Cross Tabulations
- SAS - T Tests
- SAS - Correlation Analysis
- SAS - Linear Regression
- SAS - Bland-Altman Analysis
- SAS - Chi-Square
- SAS - Fishers Exact Tests
- SAS - Repeated Measure Analysis
- SAS - One-Way Anova
- SAS - Hypothesis Testing

- SAS Useful Resources
- SAS - Quick Guide
- SAS - Useful Resources
- SAS - Questions and Answers
- SAS - Discussion

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# SAS - Subsetting Data Sets

Subsetting a SAS data set means extracting a part of the data set by selecting a fewer number of variables or fewer number of observations or both. While subsetting of variables is done by using **KEEP** and **DROP** statement, the sub setting of observations is done using **DELETE** statement.

Also the resulting data from the subsetting operation is held in a new data set which can be used for further analysis. Sub setting is mainly used for the purpose of analyzing a part of the data set without using those variables or observations which may not be relevant to the analysis.

## Subsetting Variables

In this method we extract only few variables from the entire data set.

### Syntax

The basic syntax for sub setting variables in SAS is −

KEEP var1 var2 ... ; DROP var1 var2 ... ;

Following is the description of the parameters used −

**var1 and var2**are the variable names from the data set which needs to be kept or dropped.

### Example

Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the Name and Department values from the data set, then we can use the below code.

DATA Employee; INPUT empid ename $ salary DEPT $ ; DATALINES; 1 Rick 623.3 IT 2 Dan 515.2 OPS 3 Mike 611.5 IT 4 Ryan 729.1 HR 5 Gary 843.25 FIN 6 Tusar 578.6 IT 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN ; RUN; DATA OnlyDept; SET Employee; KEEP ename DEPT; RUN; PROC PRINT DATA = OnlyDept; RUN;

When the above code is executed, we get the following output.

The same result can be obtained by dropping the variables that are not required. The below code illustrates this.

DATA Employee; INPUT empid ename $ salary DEPT $ ; DATALINES; 1 Rick 623.3 IT 2 Dan 515.2 OPS 3 Mike 611.5 IT 4 Ryan 729.1 HR 5 Gary 843.25 FIN 6 Tusar 578.6 IT 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN ; RUN; DATA OnlyDept; SET Employee; DROP empid salary; RUN; PROC PRINT DATA = OnlyDept; RUN;

## Subsetting Observations

In this method we extract only few observations from the entire data set.

### Syntax

We use PROC FREQ which keeps track of the observations selected for the new data set.

The syntax for sub setting observations is −

IF Var Condition THEN DELETE ;

Following is the description of the parameters used −

**Var**is the name of the variable based on whose value the observations will be deleted using the specified condition.

### Example

Consider the below SAS data set containing the employee details of an organization. If we are interested only in getting the data for employees with salary greater than 700,then we use the below code.

DATA Employee; INPUT empid name $ salary DEPT $ ; DATALINES; 1 Rick 623.3 IT 2 Dan 515.2 OPS 3 Mike 611.5 IT 4 Ryan 729.1 HR 5 Gary 843.25 FIN 6 Tusar 578.6 IT 7 Pranab 632.8 OPS 8 Rasmi 722.5 FIN ; RUN; DATA OnlyDept; SET Employee; IF salary < 700 THEN DELETE; RUN; PROC PRINT DATA = OnlyDept; RUN;

When the above code is executed, we get the following output.