PISA is a survey of students' skills and knowledge as they approach the end of compulsory education. It is not a conventional school test. Rather than examining how well students have learned the school curriculum, it looks at how well prepared they are for life beyond school.
Over 510,000 students from 62 countries took part in the PISA 2012 assessment of reading, mathematics and science, representing about 28 million 15-year-olds globally. Of those countries, 44 took part in an assessment of creative problem solving and 18 in an assessment of financial literacy. Score data was not skewed by participation level but it was scaled using a calibration data set of 31,500 students.
Countries that scored below 450 on previous PISA tests were offered an easier version of the assessment. This was designed to better assess what students at the lower end of the spectrum understood. Two countries accepted the offer and although it was at a lower difficulty, the results were still comparable to every other PISA participant.
This notebook cleans and prepares the data for visualization in Tableau.
The first step involved manipulating the data slightly to make better sense of the variables. I changed the variable abbreviations in the CSV to more descriptive names so they were human-readable. I suspect more work will be needed here.
The US was split into FL, CT, MA and USA. All other countries were unified, so I consolidated the US into one variable (despite the fact that Massachusetts is a cut above the rest).
import pandas as pd
pisa = pd.read_csv('data/pisa2012.csv')
pisa.head()
pisa_dict = pd.read_csv('data/pisadict2012.csv')
new_pisa_names = ['']
for row in pisa_dict['x']:
new_pisa_names.append(row)
pisa.columns = new_pisa_names
pisa.head()
Some country codes are unrecognizable. Need to change to standard names.
country_dict = {
'Country code 3-character': {
'China-Shanghai': 'China',
'Chinese Taipei': 'Taiwan',
'Connecticut (USA)': 'United States',
'Florida (USA)': 'United States',
'Hong Kong-China': 'China',
'Korea': 'South Korea',
'Macao-China': 'China',
'Massachusetts (USA)': 'United States',
'Perm(Russian Federation)': 'Russia',
'Russian Federation': 'Russia',
'United States of America': 'United States'
}
}
pisa_updated = pisa.replace(to_replace = country_dict)
pisa_updated.to_csv('data/pisa_updated.csv', chunksize = 20000, index = False)
pisa_reduced = pisa[[
'Country code 3-character',
'Gender',
'Mother Current Job Status',
'Possessions - Internet',
'Time of computer use (mins)',
'How many - cellular phones',
'How many - televisions',
'How many - computers',
'How many - cars',
'How many - rooms bath or shower',
'How many books at home',
'Out-of-School Study Time - Homework',
'Min in <class period> - <test lang>',
'Min in <class period> - <Maths>',
'Min in <class period> - <Science>',
'No of <class period> p/wk - <test lang>',
'No of <class period> p/wk - <Maths>',
'No of <class period> p/wk - <Science>',
'No of ALL <class period> a week',
'Class Size - No of Students in <Test Language> Class',
'Teacher-Directed Instruction - Sets Clear Goals',
'Teacher-Directed Instruction - Encourages Thinking and Reasoning',
'Teacher-Directed Instruction - Checks Understanding',
'Teacher-Directed Instruction - Summarizes Previous Lessons',
'Teacher-Directed Instruction - Informs about Learning Goals',
'Highest parental education in years',
'Learning time (minutes per week) - <test language>',
'Learning time (minutes per week)- <Mathematics>',
'Learning time (minutes per week) - <Science>',
'Plausible value 1 in mathematics',
'Plausible value 2 in mathematics',
'Plausible value 3 in mathematics',
'Plausible value 4 in mathematics',
'Plausible value 5 in mathematics',
'Plausible value 1 in reading',
'Plausible value 2 in reading',
'Plausible value 3 in reading',
'Plausible value 4 in reading',
'Plausible value 5 in reading',
'Plausible value 1 in science',
'Plausible value 2 in science',
'Plausible value 3 in science',
'Plausible value 4 in science',
'Plausible value 5 in science'
]]
pisa_reduced.to_csv('data/pisa_reduced.csv')