Free Content Numeric length in SAS®: a case study in decision making

Author: Gorrell, Paul

Source: Pharmaceutical Programming, Volume 1, Number 1, August 2008 , pp. 56-64(9)

Publisher: Maney Publishing

Buy & download fulltext article:

Free content The full text is free.

View now:
PDF 111kb 

Abstract:

This paper discusses various factors to be considered when making decisions regarding the properties of stored data. These factors extend beyond properties of the data files to include the context within which the data are used. Decisions about the stored length of numeric variables in SAS® data sets are used as an example of the decision-making process. Although the LENGTH statement in SAS is simple to use, what's going on behind the scenes is more complex, especially with respect to numeric variables. Understanding what happens when you specify the length of a numeric variable is essential for making informed decisions. SAS stores the value of all numeric variables in floating-point representation. This paper begins with a brief, practical, overview of floating-point representation and how it relates to programming questions regarding length, precision, and efficient use of disk space. We will discuss situations where numeric length should not be reduced, even if the range of integer values on the data set would appear to permit it. We'll argue, in particular, that decisions about numeric length require that you consider the larger context of data use. This is important because (i) length is a variable attribute that can be passed on to other data sets via merges or concatenation, and (ii) basing attribute decisions on the properties of single data sets ignores the context of data set usage with respect to subsequent updates. Specific examples will be used to illustrate this. For saving disk space, we'll show the advantages of the COMPRESS=BINARY option in SAS. We will also show that saving disk space is the only reason to reduce numeric length. This is because SAS uses the full 8-byte representation of numbers in all DATA steps and PROCs, regardless of the variable's specified length.

Keywords: DATA STORAGE; COMPRESSION; SAS; NUMERIC LENGTH

Document Type: Research Article

DOI: http://dx.doi.org/10.1179/175709208X334678

Publication date: 2008-08-01

Related content

Tools

Key

Free Content
Free content
New Content
New content
Open Access Content
Open access content
Subscribed Content
Subscribed content
Free Trial Content
Free trial content

Text size:

A | A | A | A
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages. print icon Print this page