UNEB Results Analysis

In this notebook, we look at UNEB results for PLE, UCE and UACE. We provide some analysis to try and getter a better understanding of the results. The results are listed by school and show the performace in terms of how many students passed in a particular division.

This dataset is available from the Data dot UG website. This website contains a number of datasets about Uganda that can be used for anlaysis.

We can then dive into the analysis

In [2]:
ls
ple-results-by-school-2010-2015.csv  uce-results-by-school-2011-2016.csv
uace-results-2011-2015.csv
In [3]:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline
import numpy as np

ple = pd.read_csv('ple-results-by-school-2010-2015.csv')
uce = pd.read_csv('uce-results-by-school-2011-2016.csv')
uace = pd.read_csv('uace-results-2011-2015.csv')
In [4]:
ple.head()
Out[4]:
YEAR DISTRICT SCHOOL TOTAL CANDIDATES TOTAL DIV 1 % DIV 1 TOTAL DIV 2 % DIV 2 TOTAL DIV 3 % DIV 3 ... MALE TOTAL DIV2 MALE % DIV2 MALE TOTAL DIV3 MALE % DIV3 MALE TOTAL DIV4 MALE % DIV4 MALE TOTAL U MALE % U MALE TOTAL X MALE % X
0 2010 KABAROLE EXCEL PRIMARY SCHOOL,RWIMI 23 23.0 100.0 0.0 0.0 0.0 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
1 2010 KALUNGU SACRED HEART P/S, KYAMUSANSALA 40 40.0 100.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2010 KAMPALA KING FAHAD ISLAMIC PRI. SCHOOL 3 3.0 100.0 0.0 0.0 0.0 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
3 2010 KAMPALA MAKINDYE JUNIOR SCHOOL 1 1.0 100.0 0.0 0.0 0.0 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
4 2010 KAMPALA WATERFORD P/S NAJJANANKUMBI 19 19.0 100.0 0.0 0.0 0.0 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0

5 rows × 42 columns

In [5]:
uce.head()
Out[5]:
YEAR DISTRICT SCHOOL TOTAL CANDIDATES TOTAL DIV 1 % DIV 1 TOTAL DIV 2 % DIV 2 TOTAL DIV 3 % DIV 3 ... MALE TOTAL DIV3 MALE % DIV3 MALE TOTAL DIV4 MALE % DIV4 MALE TOTAL DIV7 MALE % DIV7 MALE TOTAL DIV9 MALE % DIV9 MALE TOTAL X MALE % X
0 2011 WAKISO GAYAZA HIGH SCHOOL 176.0 175.0 99.4 1.0 0.6 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
1 2011 MUKONO NAMILYANGO COLLEGE 151.0 150.0 99.3 NaN 0.0 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.7
2 2011 MUKONO MT.ST.MARY'S,NAMAGUNGA 153.0 151.0 98.7 2.0 1.3 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
3 2011 WAKISO UGANDA MARTYRS SS,NAMUGONGO 222.0 216.0 97.3 6.0 2.7 NaN 0.0 ... NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0
4 2011 BUSHENYI MAIN KITABI SEMINARY 73.0 71.0 97.3 2.0 2.7 NaN 0.0 ... 2.0 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0

5 rows × 48 columns

In [6]:
uace.head()
Out[6]:
District_Name SCHOOL Gender 2011 Total %0-5 Points %6-10 Points %11-15 Points %16-20 Points %21-25 Points 2012 Total ... 2014 Total %0-5 Points.3 %6-10 Points.3 %11-15 Points.3 %16-20 Points.3 2015 Total %0-5 Points.4 %6-10 Points.4 %11-15 Points.4 %16-20 Points.4
0 AMUDAT POKOT SECONDARY SCHOOL FEMALE NaN 0.0 0.0 0.0 0.0 0.0 3.0 ... 2.0 50.0 0.0 50.0 0.0 1 0.0 0.0 100.0 0.0
1 AMUDAT NaN MALE NaN 0.0 0.0 0.0 0.0 0.0 8.0 ... 9.0 55.6 44.4 0.0 0.0 4 25.0 50.0 25.0 0.0
2 AMUDAT Total NaN NaN NaN 0.0 0.0 0.0 0.0 0.0 11.0 ... 11.0 54.5 36.4 9.1 0.0 20.0 40.0 40.0 0.0
3 PADER ARCHBP.FLYNN SECONDARY SCHOOL FEMALE NaN 0.0 0.0 0.0 0.0 0.0 NaN ... NaN 0.0 0.0 0.0 0.0 14 21.4 42.9 35.7 0.0
4 PADER NaN MALE NaN 0.0 0.0 0.0 0.0 0.0 NaN ... NaN 0.0 0.0 0.0 0.0 22 40.9 50.0 9.1 0.0

5 rows × 30 columns

To make the use of the different columns easier, I will change them to lower case and also remove any white space around them

In [7]:
for dataset in [ple, uce, uace]:
    dataset.columns = dataset.columns.map(str.lower)
    dataset.columns = dataset.columns.map(str.strip)
    
In [8]:
ple.columns
Out[8]:
Index([u'year', u'district', u'school', u'total candidates', u'total div 1',
       u'% div 1', u'total div 2', u'% div 2', u'total div 3', u'% div 3',
       u'total div 4', u'% div 4', u'total u', u'% u', u'total x', u'% x',
       u'female candidates', u'female total div1', u'female % div1',
       u'female total div2', u'female % div2', u'female total div3',
       u'female % div3', u'female total div4', u'female % div4',
       u'female total u', u'female % u', u'female total x', u'female % x',
       u'male candidates', u'male total div1', u'male % div1',
       u'male total div2', u'male % div2', u'male total div3', u'male % div3',
       u'male total div4', u'male % div4', u'male total u', u'male % u',
       u'male total x', u'male % x'],
      dtype='object')

I decided to create a score that would aggregate the percentages of the different divisions, attaching different weights to each of them. The score ranges from 0 to 100

In [9]:
def score(x):
    return x['% div 1'] + x['% div 2']*0.5 + x['% div 3']/float(3) +x['% div 4']*0.25 + x['% u']*0.1 + x['% x']*0
ple['score'] = ple.apply(score, axis=1)

Using this score we can then obtain the top performing schools per year.

In [10]:
ple_by_year = ple.groupby('year')
In [11]:
ple_year_ranks={}
for year, group in ple_by_year:
    ple_year_ranks[year]= group[['school', 'score']].sort_values(by='score',ascending=False).drop_duplicates()
    
In [12]:
ple_year_ranks[2014].iloc[:20]
Out[12]:
school score
56579 PEARL JUNIOR SCHOOL 100.0
56639 IBANDA TOWN PRIMARY SCHOOL 100.0
56663 43 NANKANDULO PRIMARY SCHOOL 100.0
56672 MBALE TOWER PRIMARY SCHOOL 100.0
56671 VICTORY LEARNING PRI. SCHOOL 100.0
56670 BRIGHT GRAMMAR PRIMARY SCHOOL 100.0
56669 GATEWAY PRIMARY SCHOOL 100.0
56668 STANDARD INTERGRATED PIS 100.0
56667 KITOORO HILL VIEW PRI. SCHOOL 100.0
56666 HORMISDALLEN MIXED DAY AND BDG 100.0
56665 9 NANKANDULO PRIMARY SCHOOL 100.0
56664 55 NANKANDULO PRIMARY SCHOOL 100.0
56662 42 NANKANDULO PRIMARY SCHOOL 100.0
56661 36 NANKANDULO PRIMARY SCHOOL 100.0
56660 32 RUKUUBA PRIMARY SCHOOL 100.0
56659 28 RUKUUBA PRIMARY SCHOOL 100.0
56658 18 RUKUUBA PRIMARY SCHOOL 100.0
56657 17 RUKUUBA PRIMARY SCHOOL 100.0
56656 12 RUKUUBA PRIMARY SCHOOL 100.0
56654 IBANDA TOWN PRIMARY SCHOOL 100.0

We notice some schools are repeated however with different integer values prepended to their names, seemingly making them unique. This is why they were not removed by the drop_duplicates function in the previous statement. We make a function called trim_name that takes off the integer prefix, so that we can then easily drop the duplicates.

In [13]:
def trim_name(x):
    words = x.split()
    try:
        a=int(words[0])
        n= ' '.join(words[1:])
        return n.strip()
    except:
        return x.strip()
    
for year in ple_year_ranks:
    ple_year_ranks[year]['school']= ple_year_ranks[year]['school'].apply(trim_name)
    ple_year_ranks[year].drop_duplicates(inplace=True)
In [14]:
ple_year_ranks[2014].iloc[:20]
Out[14]:
school score
56579 PEARL JUNIOR SCHOOL 100.00
56639 IBANDA TOWN PRIMARY SCHOOL 100.00
56663 NANKANDULO PRIMARY SCHOOL 100.00
56672 MBALE TOWER PRIMARY SCHOOL 100.00
56671 VICTORY LEARNING PRI. SCHOOL 100.00
56670 BRIGHT GRAMMAR PRIMARY SCHOOL 100.00
56669 GATEWAY PRIMARY SCHOOL 100.00
56668 STANDARD INTERGRATED PIS 100.00
56667 KITOORO HILL VIEW PRI. SCHOOL 100.00
56666 HORMISDALLEN MIXED DAY AND BDG 100.00
56660 RUKUUBA PRIMARY SCHOOL 100.00
56582 MARGHERITA PRIMARY SCHOOL 100.00
56581 KABALE PREPARATORY SCHOOL 100.00
56580 ST.CLELIA PRIMARY SCHOOL 100.00
56673 LEO'S JUNIOR PRIMARY SCHOOL 99.70
56674 NAMAGUNGA PRIMARY BOARDING SCH 99.65
56675 ST.KIZITO P.7 SCHOOL 99.55
56676 VILLA ROAD PRIMARY SCHOOL 99.40
56677 SIR APOLLO KAGGWA PIS NAKASERO 99.35
56679 WINSTON PRIMARY SCHOOL 99.25

We now have the duplicates removed and this gives us a clearer picture of the school's performances. We can now see the best performig schools for each year.

In [15]:
for year in ple_year_ranks:
    print 'The best schools in ' + str(year)
    print ple_year_ranks[year][:10].reset_index().drop('index', axis=1)
    print '\n'
The best schools in 2010
                           school  score
0      EXCEL PRIMARY SCHOOL,RWIMI  100.0
1  KING FAHAD ISLAMIC PRI. SCHOOL  100.0
2          MAKINDYE JUNIOR SCHOOL  100.0
3     WATERFORD P/S NAJJANANKUMBI  100.0
4       MARGHERITA PRIMARY SCHOOL  100.0
5            PARENTAL CARE SCHOOL  100.0
6       VILLA ROAD PRIMARY SCHOOL  100.0
7    KYENGERA PARENTS PRI. SCHOOL  100.0
8              ST.FRANCIS NANSANA  100.0
9  SACRED HEART P/S, KYAMUSANSALA  100.0


The best schools in 2011
                           school  score
0             PEARL JUNIOR SCHOOL  100.0
1            PARENTAL CARE SCHOOL  100.0
2     ST.JOAN OF ARC BOARDING P/S  100.0
3       HORMISDALLEN KIRINNYA P/S  100.0
4       CORNERSTONE JUNIOR SCHOOL  100.0
5  NSUJJUMPOLWE ISLAMIC & ORPHANS  100.0
6       VILLA ROAD PRIMARY SCHOOL  100.0
7              HAPPY YEARS SCHOOL  100.0
8   BRIGHT GRAMMAR PRIMARY SCHOOL  100.0
9    ROYAL PRIMARY SCHOOL, BBUNGA  100.0


The best schools in 2012
                           school  score
0             PEARL JUNIOR SCHOOL  100.0
1     T & M BRIGHT PRIMARY SCHOOL  100.0
2         HILLSIDE PRIMARY SCHOOL  100.0
3                      GOOD DADDY  100.0
4    FAIRWAYS P/S,KIREKA - KAMULI  100.0
5  DEZ JUNIOR ACADEMY PRI. SCHOOL  100.0
6         RUKUNGIRI UNIVERSAL P/S  100.0
7         TOP CARE PRIMARY SCHOOL  100.0
8              JIT PRIMARY SCHOOL  100.0
9     LYPA INTEGRATED P/S,RUBINDI  100.0


The best schools in 2013
                           school  score
0   BISHOP ASILI MEM. NURSERY P/S  100.0
1       VILLA ROAD PRIMARY SCHOOL  100.0
2  STANDARD JUNIOR PRIMARY SCHOOL  100.0
3   MASAJJA MODERN PRIMARY SCHOOL  100.0
4    FAIRWAYS P/S,KIREKA - KAMULI  100.0
5  ST.CECILIA BRD. SCHOOL BUYAMBA  100.0
6     RWENTOBO PREPARATORY SCHOOL  100.0
7     ST.JUDE KYEGOBE PRI. SCHOOL  100.0
8    VICTORY LEARNING PRI. SCHOOL  100.0
9        K.Y DAY AND BOARDING P/S  100.0


The best schools in 2014
                           school  score
0             PEARL JUNIOR SCHOOL  100.0
1      IBANDA TOWN PRIMARY SCHOOL  100.0
2       NANKANDULO PRIMARY SCHOOL  100.0
3      MBALE TOWER PRIMARY SCHOOL  100.0
4    VICTORY LEARNING PRI. SCHOOL  100.0
5   BRIGHT GRAMMAR PRIMARY SCHOOL  100.0
6          GATEWAY PRIMARY SCHOOL  100.0
7        STANDARD INTERGRATED PIS  100.0
8   KITOORO HILL VIEW PRI. SCHOOL  100.0
9  HORMISDALLEN MIXED DAY AND BDG  100.0


The best schools in 2015
                        school  score
0  LITTLE ANGELS P/S,NAKAZADDE  100.0
1    NAKASONGOLA JUNIOR SCHOOL  100.0
2    VILLA ROAD PRIMARY SCHOOL  100.0
3    NKOKONJERU PRIMARY SCHOOL  100.0
4           JIT PRIMARY SCHOOL  100.0
5    CORNERSTONE JUNIOR SCHOOL  100.0
6         GLOBAL JUNIOR SCHOOL  100.0
7    JESJONNY DAY AND BDG. P/S  100.0
8    NABBUNGA FOUNTAINS OF EDU  100.0
9     K.Y DAY AND BOARDING P/S  100.0


We can also look at the overall best performing schools over the course of the 6 years of the survey. First we shall use the trim_name method on the original list of schools

In [16]:
ple['school']=ple['school'].apply(trim_name)

We can then drop the schools that are duplicated in a particular year

In [17]:
ple.drop_duplicates(['school', 'year'], inplace=True)

We can then proceeed with our analysis. We get the average performance of each school over the entire study and then figure out which schools have performed best over this period.

In [18]:
ple_schools = ple.groupby('school')
overall_schools= ple_schools['score'].mean()
In [19]:
overall_schools.sort_values(ascending=False)[:20]
Out[19]:
school
ST.CLELIA PRIMARY SCHOOL          100.000000
STANDARD INTERGRATED PIS          100.000000
VILLA ROAD PRIMARY SCHOOL          99.766667
VICTORY LEARNING PRI. SCHOOL       99.641667
ST.MARYS IMMACULATE VILLA P/S      99.580000
SIR APOLLO KAGGWA PIS NAKASERO     99.350000
K.Y DAY AND BOARDING P/S           99.340000
SIR APOLLO KAGGWA PIS MENGO        99.200000
MARGHERITA PRIMARY SCHOOL          99.150000
WINSTON PRIMARY SCHOOL             99.100000
ENTEBBE EDUCATION CENTRE PIS       98.800000
AUNTIE AGNES INFANT P/S,MITETE     98.750000
HAPPY YEARS SCHOOL                 98.610000
K.Y DAY AND BOARDING PIS           98.600000
HORMISDALLEN KIRINNYA P/S          98.570000
BISHOP ASILI MEM. NURSERY PIS      98.550000
EXCEL PRIMARY SCHOOL,RWIMI         98.500000
PEARL JUNIOR SCHOOL                98.475000
RISE & SHINE PRIMARY SCHOOL        98.300000
SIR APOLLO KAGGWA P/S MENGO        98.260000
Name: score, dtype: float64

We can see that the schools that feature here are mainly the schools we saw when we did a year by year performance analysis.

We can also look at the representation of the different districts among the top performing schools.

We can also look at this in terms of the school that are perenially among the top schools every year. For this we shall check to see which schools are among the top 100 schools each year from 2010 to 2015 and then order them by their score.

We already have the ple_year_ranks dictionary that ranks the schools in order of their score per year, so we shall use this and a set operation to obtain these schools.

In [20]:
best_schools=[]
for year in ple_year_ranks:
    best_schools.append(set(ple_year_ranks[year][:100].school.tolist()))
perenial_best = list(set.intersection(*(best_schools)))
perenial_best
Out[20]:
['MARGHERITA PRIMARY SCHOOL',
 'KABALE PREPARATORY SCHOOL',
 'UGANDA MARTYRS KATWE P SCHOOL',
 'NAMIRYANGO JUNIOR BOYS P/S',
 'VICTORY LEARNING PRI. SCHOOL',
 'VILLA ROAD PRIMARY SCHOOL',
 'PEARL JUNIOR SCHOOL',
 'EXCEL PRIMARY SCHOOL,RWIMI',
 'WINSTON PRIMARY SCHOOL',
 "LEO'S JUNIOR PRIMARY SCHOOL"]

Interestingly there are only 10 schools that are in the top 100 schools every year. It is important to note that the score we came up with relies on the percentages of students in each division attaching different weights to each. As a result, a school that has many students that really excel as well as most getting very good grades but all has some students scoring in the lower divisions may not be consistently be at the top even though it is considered to do well perenially.

This should explain why many of the schools that we know to be among the best do not show up in the list above. Also this kind of scoring using the percentages in each division means that a school can easily fall out of the top schools in a particular. In addition to that, PLE is relatively competitive as evidenced from the data. A very huge number of schools have many students in the top divisions.

After that explanation we can plot them on a map to see where these schools are located and the distribution of the perenially well performing schools.

In [21]:
indexer =[]
ple2015 = ple[ple['year']==2015]
for school in perenial_best:
    i = ple2015[ple2015['school']== school][['school', 'district']].index.values.tolist()
    indexer = indexer + i

perennial_df = ple2015.ix[indexer][['school', 'district']]
perennial_df
Out[21]:
school district
44966 MARGHERITA PRIMARY SCHOOL KASESE M/C
45040 KABALE PREPARATORY SCHOOL KABALE M/C
44972 UGANDA MARTYRS KATWE P SCHOOL MASAKA M/C
45038 NAMIRYANGO JUNIOR BOYS P/S MUKONO M/C
44988 VICTORY LEARNING PRI. SCHOOL MASAKA M/C
44973 VILLA ROAD PRIMARY SCHOOL MASAKA M/C
44957 PEARL JUNIOR SCHOOL BUSHENYI M/C
45051 EXCEL PRIMARY SCHOOL,RWIMI KABAROLE
44994 WINSTON PRIMARY SCHOOL KAMPALA
44990 LEO'S JUNIOR PRIMARY SCHOOL MASAKA M/C
In [22]:
import geocoder
def add_coordinates(district):
    coords = geocoder.google(district.split()[0] + ', UGANDA').latlng
    return coords

perennial_df['coordinates'] = perennial_df['district'].apply(add_coordinates)
In [23]:
perennial_df
Out[23]:
school district coordinates
44966 MARGHERITA PRIMARY SCHOOL KASESE M/C [0.1698986, 30.078078]
45040 KABALE PREPARATORY SCHOOL KABALE M/C [-1.241956, 29.9856157]
44972 UGANDA MARTYRS KATWE P SCHOOL MASAKA M/C [-0.3267383, 31.7537404]
45038 NAMIRYANGO JUNIOR BOYS P/S MUKONO M/C [0.3548655, 32.7520139]
44988 VICTORY LEARNING PRI. SCHOOL MASAKA M/C [-0.3267383, 31.7537404]
44973 VILLA ROAD PRIMARY SCHOOL MASAKA M/C [-0.3267383, 31.7537404]
44957 PEARL JUNIOR SCHOOL BUSHENYI M/C [-0.4870918, 30.2051096]
45051 EXCEL PRIMARY SCHOOL,RWIMI KABAROLE [0.5896682, 30.2548787]
44994 WINSTON PRIMARY SCHOOL KAMPALA [0.3475964, 32.5825197]
44990 LEO'S JUNIOR PRIMARY SCHOOL MASAKA M/C [-0.3267383, 31.7537404]
In [24]:
import folium

map_ug = folium.Map(location=[1.373333, 32.290275], zoom_start=7)
ind= perennial_df.index
for i in range(len(ind)):
    folium.Marker(perennial_df['coordinates'].iloc[i],
                    popup= perennial_df['school'].iloc[i]).add_to(map_ug)

map_ug
Out[24]:
In [25]:
sorted_ple = ple[['school', 'district', 'score']].sort_values(by= 'score', ascending=False)
In [26]:
sorted_ple[sorted_ple['score']==100].head()
Out[26]:
school district score
0 EXCEL PRIMARY SCHOOL,RWIMI KABAROLE 100.0
44967 FUNDAMENTAL PRIMARY SCHOOL KIBUKU 100.0
44965 YUDESI PRIMARY SCHOOL KAMPALA 100.0
44964 ST.JOSEPH PILOT SCHOOL KAMPALA 100.0
44963 HORMISDALLEN MIXED DAY AND BDG KAMPALA 100.0
In [27]:
ple90_districts = list(set(sorted_ple[sorted_ple['score']>=90]['district'].tolist()))
In [28]:
districts_map ={}
for district in ple90_districts:
    districts_map[district] = add_coordinates(district)
In [29]:
ple90 = sorted_ple[sorted_ple['score']>=90]
jitter = np.random.random(len(ple90))*0.2

def add_jitter(x):
    jittered = [x[0]+ np.random.choice(jitter), x[1] + np.random.choice(jitter)]
    return jittered


ple90.loc[:, 'coords'] = ple90.loc[:, 'district'].map(districts_map)
ple90.loc[:, 'jittered coords'] = ple90.loc[:, 'coords'].apply(add_jitter)
In [30]:
ple90.head()
Out[30]:
school district score coords jittered coords
0 EXCEL PRIMARY SCHOOL,RWIMI KABAROLE 100.0 [0.5896682, 30.2548787] [0.596779015597, 30.3132728904]
44967 FUNDAMENTAL PRIMARY SCHOOL KIBUKU 100.0 [1.0452874, 33.7992536] [1.08589915677, 33.8506464157]
44965 YUDESI PRIMARY SCHOOL KAMPALA 100.0 [0.3475964, 32.5825197] [0.347871312542, 32.644170902]
44964 ST.JOSEPH PILOT SCHOOL KAMPALA 100.0 [0.3475964, 32.5825197] [0.442315140789, 32.702133854]
44963 HORMISDALLEN MIXED DAY AND BDG KAMPALA 100.0 [0.3475964, 32.5825197] [0.528051295224, 32.71426361]
In [31]:
map_ug = folium.Map(location=[1.373333, 32.290275], zoom_start=7)
ind= perennial_df.index
for i in range(len(ple90)):
    folium.Marker(ple90['jittered coords'].iloc[i],
                    popup= ple90['school'].iloc[i]).add_to(map_ug)

map_ug
Out[31]:

As initally suspected we can see that many of the schools that perform best are in the Central and Western regions primarily with the Eastern regions posting a good number as well.

It is easy to see and is also almost expected that the Northern region posts very low numbers in comparison to the rest and large expanses of districts to not have a top performing school for the entire period of 6 years. Given the history of poor quality schools due to underfunding, instability among other reasons, this is not entirely surprising. However it does raise serious issues of governance and resource distribution.

Lastly will shall examine the performance of females contrasted against that of their maele counterparts. We create a score column that is similar to the one we createda above though this one is for a particular sex. We create one for male and another for females. The weights attached to the different divisions remain unchanged.

For this we shall use the same score function as before with a few adjustments made to fit the purpose.

In [32]:
ple.head()
Out[32]:
year district school total candidates total div 1 % div 1 total div 2 % div 2 total div 3 % div 3 ... male % div2 male total div3 male % div3 male total div4 male % div4 male total u male % u male total x male % x score
0 2010 KABAROLE EXCEL PRIMARY SCHOOL,RWIMI 23 23.0 100.0 0.0 0.0 0.0 0.0 ... 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 100.0
1 2010 KALUNGU SACRED HEART P/S, KYAMUSANSALA 40 40.0 100.0 0.0 0.0 0.0 0.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 100.0
2 2010 KAMPALA KING FAHAD ISLAMIC PRI. SCHOOL 3 3.0 100.0 0.0 0.0 0.0 0.0 ... 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 100.0
3 2010 KAMPALA MAKINDYE JUNIOR SCHOOL 1 1.0 100.0 0.0 0.0 0.0 0.0 ... 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 100.0
4 2010 KAMPALA WATERFORD P/S NAJJANANKUMBI 19 19.0 100.0 0.0 0.0 0.0 0.0 ... 0.0 NaN 0.0 NaN 0.0 NaN 0.0 NaN 0.0 100.0

5 rows × 43 columns

We look at the above

In [33]:
def male_score(x):
    return x['male % div1'] + x['male % div2']*0.5 + x['male % div3']/float(3) +x['male % div4']*0.25 + x['male % u']*0.1 + x['male % x']*0

def female_score(x):
    return x['female % div1'] + x['female % div2']*0.5 + x['female % div3']/float(3) +x['female % div4']*0.25 + x['female % u']*0.1 + x['female % x']*0
In [34]:
ple['mscore'] = ple.apply(male_score, axis=1)
ple['fscore'] = ple.apply(female_score, axis =1)
In [35]:
year_gender_avg = ple.groupby('year')[['mscore','fscore']].mean()
year_gender_avg
Out[35]:
mscore fscore
year
2010 41.328751 36.271758
2011 41.249099 35.894961
2012 43.706705 38.357331
2013 42.931727 37.150712
2014 43.732498 38.322953
2015 40.166844 35.620287
In [36]:
ind = range(len(year_gender_avg.index))
In [37]:
f, ax = plt.subplots(figsize=(8,6))
plt.plot(ind, year_gender_avg['mscore'].values, marker='o')
plt.plot(ind, year_gender_avg['fscore'].values, marker='o')
plt.title('Male vs Female performance over the years')
plt.xlabel('Year')
plt.ylabel('Score')
plt.xticks(ind, year_gender_avg.index);

From the plot above we can see that(as well as the table before) we can see that boys on average perform better than girls, country-wide year-after-year. The reasons for this could be various and this mainly because of the pressures on girls in rural areas. We can make the same plot for girls in Kampala and some of the other urban areas to investigate this.

It is also important to note that the shapes of the plots for boys and girls have very similar shapes and the distance between them is almost the same throughout. This suggests that within each gender the score varies by the same amount each year. Probably reacting to the difficualty of the exams or strictness of the examiners. Whichever factor it is, it affects the sexes equally.

In [38]:
kla_gender_avg = ple[ple['district']=='KAMPALA'].groupby('year')[['mscore','fscore']].mean()
kla_gender_avg
Out[38]:
mscore fscore
year
2010 63.856977 58.488233
2011 64.121104 59.493312
2012 66.290300 60.896014
2013 63.281305 57.686106
2014 63.905150 59.800240
2015 61.430815 57.435561
In [39]:
f, ax = plt.subplots(figsize=(8,6))
plt.plot(ind, kla_gender_avg['mscore'].values, marker='o')
plt.plot(ind, kla_gender_avg['fscore'].values, marker ='o')
plt.title('Male vs Female performance over the years in Kampala')
plt.xlabel('Year')
plt.ylabel('Score')
plt.xticks(ind, kla_gender_avg.index);

The graphs don't differ greatly from the ones we had before. They also share a similar shape with each other. We can not that as would be expected the scores in Kampala are better than the average country scores. The difference beween boys and girls is a bit smaller in Kampala than the results fromthe analysis of the national results. This may suggest that the issues that cause girls to perform worse than boys around the country are not uniwue to those areas, at least some of them.

Of course this is still open to further analysis possibly combined with other datasets. We have also only used the PLE dataset of the three that we had. We can dive deeper and search for trends and insights using the UCE and UACE results as well as check to see if our findings here show up in those results as well.