Hilmar Buchta

Data Analysis | R | Tableau

The recent PIAAC study of OECD covers skills of adults in 24 countries. You can find the study results together with background information here: http://www.oecd.org/site/piaac/. The study is very interesting as it is the first study targeting adult skills for these countries. Fortunately OECD has made the case data of the study available for download under the ‘public use files’ section. Here is the direct link to the files: http://vs-web-fs-1.oecd.org/piaac/puf-data

Data is available in SPSS and SAS format. I went for the SPSS format, downloaded all files (all countries except Australia) and processed the files using R. Here is the script I created for this purpose:

# OECD PIAAC Study 2013
# Interesting links:
http://www.oecd.org/site/piaac/ 
http://www.oecd.org/site/piaac/publicdataandanalysis.htm
#  SPSS files: http://vs-web-fs-1.oecd.org/piaac/puf-data

library(foreign)
library(stringr)
library(reshape2)

setwd("C:\\temp\\OECD_PIAAC_Study_2013_SPSS")
file_list <- list.files(path=".")
for (file in file_list){ 
  if (!exists("piaac")){
    piaac <- read.spss(file, to.data.frame=T)
  } 
  if (exists("piaac")){
    temp_dataset <-read.spss(file, to.data.frame=T)
    piaac<-rbind(piaac, temp_dataset)
    rm(temp_dataset)
  }
}

# only include this set of variables:
collist_meta<-c(‚CNTRYID_E‘,’GENDER_R‘,’AGEG5LFS‘,’B_Q01a‘,’YEARLYINCPR‘)

# also include all score columns (10 per domain)
collist_lit<-names(piaac)[str_sub(names(piaac),start=1, end=5)=="PVLIT"]
collist_num<-names(piaac)[str_sub(names(piaac),start=1, end=5)=="PVNUM"]
collist_psl<-names(piaac)[str_sub(names(piaac),start=1, end=5)=="PVPSL"]
collist<-c(collist_meta,collist_lit,collist_num,collist_psl)
piaac2<-piaac[collist]

# combine score values to single values
p<-data.frame(piaac2[collist_meta], "Score_Lit"=rowMeans(piaac2[collist_lit]), "Score_Num"=rowMeans(piaac2[collist_num]), "Score_Psl"=rowMeans(piaac2[collist_psl]))

# remove incomplete cases
p<-melt(p, id=collist_meta)
p<-p[!is.na(p$value) & !is.na(p$CNTRYID_E),]

# write output to csv-file
write.table(p, col.names=T, row.names=F, file="c:\\temp\\piaac.csv", sep=",", dec=".", na="", qmethod="double")

Or course, we could include additional variables, but for the moment, let’s focus on the variables from the script above. A full list with all variables and details is available using the “International Codebook” link in the “public use files” section.

The values can then be imported to Tableau desktop for further analysis.

image

The official charts are looking more dramatic (click here for an example) as the value axis does not start with zero. This is a common method of making results look more impressive or – as Prof. Hichert says – it adds a certain “lie-factor” to the visualization where it’s up to the creator of the chart to scale this factor to whatever effect is desired.

A result that somewhat surprises me, was that the skills decrease with higher age, as shown below for the reading skill (Lit) in Germany:

image

Here is the map visualization (from red over grey to green, again for the reading skills) for the European region as Tableau automatically maps country names to geographic regions.

image

Another interesting chart shows the relationship between income (shown as quantile here) and test score (again for the reading skills, showing some selected countries):

image

And finally the comparison between highest education (according to the International Standard Classification of Education) and score (again shown for reading skills) for Germany:

image

I’m not going into interpretations here. There is a lot of material available on the OECD website. The study is also a great source for demographic data. So if you like to discover more maybe the R script above can help you getting started.