MATCH FILES is an SPSS command mostly used for merging data holding similar cases but different variables. For different cases but similar variables, use
ADD FILES. MATCH FILES is also the way to go for a table lookup similar to VLOOKUP in Excel.
Merging two datasets by id, which is a unique case identifier.
SPSS Match Files – Basic Use
- The most common scenario for
MATCH FILESare two data files or datasets holding different variables on similar cases.
- Each case has a unique id (identifier) in each data source. This id tells SPSS which case from one data source corresponds to which case from the other. Corresponding cases become a single case in the merged data.
- The syntax below demonstrates a very basic
MATCH FILEScommand. If you’re not comfortable working with multiple datasets, have a look at SPSS Datasets Tutorial 1 – Basics.
SPSS Match Files Syntax Example 1
data list free/id test_1.
3 8 4 5 6 6
dataset name test_1.*2. Create test data 2.
data list free/id test_2.
1 4 3 9 4 8
dataset name test_2.*3. Match test_1 and test_2.
match files file = test_1 / file = test_2
*4. Close all but merged dataset.
dataset close test_1.
dataset close test_2.
SPSS Match Files – Table
- A second common scenario is having a file with respondents and their zip codes. Note that there are probably duplicate zip codes in the respondents file.
- If we also have a table with the city (or region) indicated by each zip code, we can merge these into the respondent data. In this case we can use
MATCH FILESwith one
FILE(with duplicates) and one
- The syntax below demonstrates how to do this. Note that
*refers to the active dataset.
SPSS Match Files Syntax Example 2
data list free/zip_code (f3.0) city(a20).
123 ‘Amsterdam’ 456 ‘Haarlem’ 789 “‘s Hertogenbosch”
dataset name cities.*2. Mini data holding respondents and their zip codes.
data list free /id zip_code.
1 123 2 123 3 123 4 456 5 456 6 456 7 789 8 789 9 789
end data.*3. Add cities to active dataset using zip_code.
match files file * / table cities
*4. Close all but merged data.
dataset close cities.
SPSS Match Files – One Data Source
- Match files can also be used with a single data source. This is often used for reordering variablesand/or dropping variables..
- One option here is using the
KEEPsubcommand. It basically means “drop all variables except …”.
- Alternatively, the
DROPsubcommand means “keep all variables except …”.*
- The TO and ALL keywords are conventient here. However, in this case
ALLmeans “all variables that haven’t been addressed yet” rather than simply all variables.
SPSS Match Files Syntax Example 3
data list free / v1 to v3 v5 v6 v7 v8 v4.
0 0 0 0 0 0 0 0
end data.* 2. Reorder variables. Note the TO and ALL keywords here.
match file file * / keep v1 to v3 v4 all.
SPSS Match Files – Rules
- Instead of merging two data sources, you may specify up to 50 data sources in one
- More than one variable may be used to uniquely identify cases. We’ll hereafter refer to these as the
BYvariables since they’re used on the
BYsubcommand. An common example are respondents having a
member_idindicating the nth member of each household. Both variables will probably have many duplicates but their combination should uniquely identify each respondent.
- All data must be sorted on the
BYvariable(s) ascendingly. In case of doubt, run
SORT CASESbefore proceeding.
- The order of the merged variables is the order in which they’re encountered. This implies that the order in which data sources are specified matters for the end result. For a demo, run the first syntax example once with
file = test_1 / file = test_2and then again with
file = test_2 / file = test_1.
- Make sure there’s no duplicate variable names across data sources. In this case, values on duplicate variables that are first encountered overwrite those that are encountered later. Annoyingly, SPSS does not throw a warning if this happens.