Association between Metric and Categorical Variable


This tutorial shows how to run nice tables and graphs for investigating the association between a metric and a categorical variable. If statistical assumptions are met, these may be followed up by an ANOVA.
As an example, we’ll use freelancers.sav and see whether (and how) sector_2010 is related to income_2010.

SPSS Metric versus Categorical Variable in Data View

Data Inspection and FILTER

We’ll first inspect FREQUENCIES for sector_2010 by running the syntax below (step 1). In first instance, the table is rather messy due to system missing values (screenshot beneath syntax).

In second instance, we’ll FILTER out cases with system missings as this results in a cleaner table.*Doing so also keeps N nice and constant over analyses.


*1. Inspect frequency distribution for sector_2010.

frequencies sector_2010.*2. Create filter variable for excluding cases with system missings on sector_2010.

compute filt_1 = not(sysmis(sector_2010)).*3. Apply label to filter variable.

variable labels filt_1 “Filter that excludes cases having sysmis on sector_2010”.

*4. Switch filter variable on.

filter by filt_1.

*5. Rerun frequencies table for cleaner result.

frequencies sector_2010.

SPSS FREQUENCIES Table with System MissingsFirst FREQUENCIES Table from Running Syntax Above

Histogram and Custom Currency Format

We’ll inspect the histogram for income_2010 to see whether it holds any unusual values. This isn’t the case but the chart gets cluttered up somewhat due to the large numbers representing income.

One way to deal with this is dividing all income values by 1,000 as shown in the syntax below (step 2). In order to make clear we now have income in thousands of dollars, we’ll suffix all values with “K” (short for “Kilo” or 1,000) by defining a custom currency format in step 3.

We’ll specify this as the format for income_2010 with FORMATS (step 4) after which we obtain more readable charts.

SPSS Histogram and Custom Currency Format Syntax

*1. Run basic histogram for income_2010. No unusual values in chart.

frequencies income_2010
/format notable
/histogram.*2. Divide all incomes by 1,000.

compute income_2010 = income_2010 / 1000.*3. Set custom currency A (= cca) format with “K” suffix.

set cca ‘-,$,K,K’.

*4. Use newly defined cca format for income_2010.

formats income_2010 (cca12).

*5. More space on x-axis of chart.

frequencies income_2010
/format notable

SPSS Custom Currency Format in Histogram


Now that we made sure there’s nothing awkward regarding our variables of interest, let’s see whether they are associated. We’ll first do so by running a basic MEANS table as shown in the syntax below (step 3).

Optionally: because we don’t like the default title (“Report”), we’ll make it invisible with an SPSS table template (.stt file). Instead, we’ll display a variable label it as if it was the title. We’ll therefore change the variable label for income_2010 (step 2). Preceding it by TEMPORARY circumvents the need to reverse this action. The result is shown in the next screenshot.

SPSS MEANS Table Syntax

*1. Indicate that next command must be reversed later on.

temporary.*2. Set variable label to desired title for means table.

variable label income_2010 “Mean income by sector over 2010.”.*3. Run means table. Also indicates end of temporary and reverses previous command.

means income_2010 by sector_2010/cells count mean stddev.

SPSS Means Table Styled

Conclusion: income_2010 and sector_2010 seem strongly associated. Roughly, respondents in IT and healt care had incomes around $55,000. All other sectors showed mean incomes around $35,000.

SPSS Bar Chart for Independent Means

Next, we’ll visualize the previous table as a bar chart. The screenshots below walk you through.

SPSS Create Bar Chart BasicSPSS Create Bar Chart Independent Means

SPSS Bar Chart for Independent Means Syntax

Following the steps outlined by the screenshots results in the syntax below. Run it in order to generate the chart shown in the screenshot.

*Create bar chart for independent means.

GRAPH /BAR(SIMPLE)=MEAN(income_2010) BY sector_2010
/title “Mean income by sector over 2010 (N = 37)”.

SPSS Bar Chart Independent Means Unstyled

SPSS Bar Chart Styling

The previous bar chart very clearly visualizes the pattern we saw in the means table. However, it doesn’t look very pretty. We’ll prettify it somewhat by building and setting an SPSS chart template(.sgt file). Our final result is shown below.

SPSS Bar Chart Independent Means Styled

SPSS Create Split Histogram

Optionally, we can look a bit further into the differences between the mean incomes for different sectors by running a split histogram: we’ll create a chart with histograms for income_2010 for different sectors separately. The screenshot below walks you through.

SPSS Create Split Histogram Colvar

SPSS Split Histogram Styling

The syntax generated by following the previous screenshot is shown below As we did previously, we’ll use a variable label as if it was the chart title. We’ll apply some styling with a chart template. Our final result is shown in the following screenshot.

SPSS Split Histogram Syntax

*1. Indicate that following command must be reversed later on.

temporary.*2. Set variable label to chart title.

variable labels sector_2010 “Income by sector over 2010 (N = 37)”.*3. Run chart, end temporary, reverse previous command.

/PANEL COLVAR=sector_2010.

SPSS Split Histogram Colvar

Conclusion: the histogram for Health Care doesn’t look different from the others except that all incomes are some $20,000 higher than for other sectors except IT. For IT, we see a peak around $80,000 but another peak appears around $30,000. The average income was high, but it has a large standard deviation as well.