Understanding SPSS variable types and formats allows you to get things done fast and reliably. Getting a grip on types and formats is not hard if you ignore the very confusing information under variable view. This tutorial takes away the confusion and puts you back in control.
We encourage you to follow along with this tutorial by downloading and opening computer_parts.
SPSS Variable Types
SPSS has two variable types: string and numeric. Numeric variables may contain only numbers. String variables may contain letters, numbers and other characters. The distinction between numeric and string variables is important because the variable type dictates what you can or cannot do with a variable.
- You can do calculations with numeric variables but not with string variables.
- You can use string functions such as taking substrings or concatenating with string variables but not with numeric variables.
There are no other variable types in SPSS than string and numeric. However, numeric variables have several different formats that are often confused with variable types. We’ll see in a minute how SPSS variable view puts many users on the wrong track here.
The only way to change a string variable to numeric or reversely is ALTER TYPE. However, there’s several ways to make a numeric copy of a string variable or reversely. We’ll get to those in a minute.
So What’s Better: String or Numeric?
The simplest rule of thumb is thatonly nominal variables with many categories should be string variables in SPSS.Examples are names of people, email addresses, passport numbers and so on. Although such variables can be useful, we don’t usually analyze them.
We do sometimes analyze nominal variables with few categories -such as nationality, blood group or profession. If these are string variables, they may or may not cause trouble. For example, the independent variable for ANOVA may or may not be a string variable depending on the exact command you use for it.*
You may get away by leaving such variables as strings. However, copying them into numeric variables makes sure you’ll avoid all trouble. A decent way to do so is AUTORECODE. For converting metric string variables -holding just numbers- into numeric variables, see SPSS Convert String to Numeric Variable.
Determining SPSS Variable Types
Before doing anything whatsoever with a variable, we always want to know whether it’s a string or numeric variable. Don’t rely on a visual inspection of your data view for determining variable types; it may be hard, sometimes impossible to see the difference between the two variable types. Instead, inspect your variable view and use the following rule:
- if “Type” is “String”, you’re dealing with a string variable;
- if “Type” is anything else than “String”, you’re dealing with a numeric variable.
SPSS suggests that “Date” and “Dollar” are variable types as well. However, these are formats, not types. The way they are shown here among the actual variable types (string and numeric) is one of SPSS’ most confusing features.
SPSS Variable Formats – Introduction
Let’s now have a look at the data under data view as shown the screenshot below. We’ll briefly describe the kinds of variables we see.
The first variable holds words;
The second variable holds numbers with two decimal places;
The third variable holds dates;
The fourth variable holds times;
The fifth variable holds dates and times;
The sixth variable holds percentages;
The seventh variable holds numbers of dollars with two decimal places.
Regarding these data, we concluded earlier that is a string variable and variables through are numeric. Remember that numeric variables can contain only numbers. However, SPSS can display these numbers in very different ways. At this point we see that numeric values have two components:
- first there’s the actual values as SPSS stores them internally. These consist of nothing but numbers.
- Second, the actual values can be displayed and treated in a myriad of different ways. Like so, numeric variables may seem to contain letters of months or dollar signs.
These different ways of displaying and treating the actual values are referred to as variable formats.
Determining SPSS Variable Formats
As we saw earlier, “Type” under variable view shows a confusing mixture of variable types and formats. Unfortunately, it doesn’t allow us to determine the actual formats. However, the following line of syntax does the trick here:
SPSS distinguishes print and write formats but we don’t bother about this distinction. SPSS variable formats consist of two parts. One or more letters indicate the format family. Most of them speak to themselves, except for the first two variables:
- A (“Alphanumeric”) is the usual format for string variables;
- F, (“Fortran”) indicates a standard numeric variable.
Formats end with numbers, indicating the number of characters to be shown.* If a period is present, the number after the period indicates the number of decimal places to be displayed.
SPSS Common Variable Formats
The table below disambiguates variable types, format families and formats for the data we’ve been studying so far.
|VARIABLE TYPE||FORMAT FAMILY||FORMAT (EXAMPLE)||SHOWN AS|
The figure below now summarizes some common variable types and formats we’ll encounter in SPSS.
Setting Variable Formats in SPSS
You can set variable formats for numeric variables with the FORMATS command. For example,
formats weight (f4.3).
shows weight with 3 decimal places. Doing so affects the output you create: most tables will add an extra decimal place for weight as well. If you’d like to see this for yourself, run the syntax below and compare the 2 resulting tables.
formats weight(f3.2).descriptives weight.*Show 3 decimal places for weight and run descriptives.
formats weight(f4.3).descriptives weight.
*Note that second output table shows more decimal places.
Keep in mind thatthat changing variable formats does not change your datain any way. The actual values are still the exact same numbers. They are merely displayed differently.
Variable Types and Formats – Why Bother?
Basically, “what you see is not what you get” in data view. For example, we see $20.37 but the actual value is just 20.37. So we can identify products costing $20,- or more by running the syntax below:
compute expensive = (price >= 20).
We don’t include the dollar sign in our syntax. Although SPSS shows a dollar sign in data view, the actual values are just numbers and these are what the syntax acts upon.
Or let’s say we’d like to add 30 days to our date variable. We could do so by running
compute new date = datesum(date,30,’days’).
The resulting values are 13644236937.72. These are the correct numbers but they’ll display as readable dates only after running something like
formats new date (date11).
Another reason for bothering about variable formats is setting decimals places for output tables. For SPSS version 22 onwards, OUTPUT MODIFY does the trick as shown below.
descriptives weight.*Set 2 decimal places (format = f3.2) for mean and SD (columns 4 and 5).
/tablecells select = [position(4) position(5)] selectdimension = columns format = ‘f3.2’.
In a similar vein, CTABLES allows choosing different formats for different statistics in your output.
/table commission [count ‘N’ f3 Minimum pct3 Maximum pct3 mean ‘Mean’ pct4.1 stddev ‘SD’ pct4.1].
This tutorial was somewhat theoretical but it has a lot of practical consequences. I hope you found it helpful.