The purpose of this report is to identify groups of students that did not perform well and are in need of reinforcement on the standards covered during the cube root exam. The exam was administered during Algebra 2 class to 126 students through the online platform Eduphoria. Three Texas Essential Knowledge and Skills (TEKS) were covered during this exam. A description of the TEKS can be found at the end of this document.
There are three factors considered when looking at student groups.
These are: gender, emergent bilingual (EB) status and the section the
student is enrolled in (as Class). Gender will look at the difference
between male and female scores. EB has three groups a student can fall
into: currently emergent-bilingual - students monitored and enrolled in
the EB program, fourth year of monitoring - previous EB students no
longer needing support but still monitored, non-ESL - students who
learned English as their first language or have been released from the
EB program.
Class is divided into two groups, honors and regular, with honors going
more in depth on the subject material.
The first factor considered is gender. Below is the set up and import of data, along with average raw score of the 10 question exam by gender.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
M2T1Data<-read.csv('M2T1Data.csv')
M2T1Data$Gender<-as.factor(M2T1Data$Gender)
M2T1Data$Class<-as.factor(M2T1Data$Class)
M2T1Data$EB<-as.factor(M2T1Data$EB)
m_f<-M2T1Data %>% group_by(Gender) %>%
summarise(avg_score = mean(`Raw.Score`),
med_score = median(`Raw.Score`),
count_gender = length(`Raw.Score`))
m_f
## # A tibble: 2 × 4
## Gender avg_score med_score count_gender
## <fct> <dbl> <dbl> <int>
## 1 female 7.58 8 64
## 2 male 6.54 7 63
From the output above, female students scored a full question higher than their male counterparts. This is supported by both the mean and median. Mean scores are lower than median by 0.5 for both genders, which means the distributions may be skewed left. Before deciding on a significance test, the distributions will be check for normality via a histogram.
ggplot(M2T1Data, aes(x = Raw.Score, fill = Gender))+
geom_histogram(binwidth = 1, alpha = .5, position = "identity")
Because the genders do not follow a normal distribution, the Wilcoxon
Rank Sum Test will be performed to check for statistical significance.
Alpha = 0.05
Null hypothesis: There is no difference in
distribution between the scores of male and female test takers
Alternative hypothesis: There is a difference in distribution between
the scores of male and female test takers
wilcox.test(Raw.Score~Gender, data=M2T1Data)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Raw.Score by Gender
## W = 2487.5, p-value = 0.02176
## alternative hypothesis: true location shift is not equal to 0
The null hypothesis is rejected- there is a difference in distribution of scores between male and female test takers. Based on the histograms and the mean/median test scores, being a male student may have an association with receiving a lower test score.
The next factor considered is EB status. To explore this data, the mean and median are first calculated.
EB_status <- M2T1Data %>% group_by(EB) %>%
summarise(avg_EB = mean(Raw.Score),
med_EB = median(Raw.Score),
count_EB = length(Raw.Score))
EB_status
## # A tibble: 3 × 4
## EB avg_EB med_EB count_EB
## <fct> <dbl> <dbl> <int>
## 1 Currently Emergent Bilingual 7.04 8 26
## 2 Fourth Year of Monitoring 6.25 6.5 4
## 3 Non-ESL 7.10 7 97
The mean and median are very close in value for all categories except “Currently Emergent Bilingual” (CEB), where it looks like some low scores are skewing the mean left. Also note, the low number of “Fourth Year of Monitoring” (FYM) students. Next, the distributions will be visualized.
ggplot(M2T1Data, aes(x = Raw.Score, color = EB)) + geom_density() + labs(title = "Density of Scores by EB", x = "Score") + theme_minimal()
The density plot separated by EB status is shown above. CEB and Non ESL have a high density of scores above 7.5, where FYM is closer to a uniform distribution. Also, the FYM category only has four students, so the density model may not give an accurate visualization of that data.
m_f<-M2T1Data %>% group_by(Gender, EB) %>%
summarise(avg_score = mean(`Raw.Score`),
med_score = median(`Raw.Score`),
count_gender = length(`Raw.Score`))
## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.
m_f
## # A tibble: 6 × 5
## # Groups: Gender [2]
## Gender EB avg_score med_score count_gender
## <fct> <fct> <dbl> <dbl> <int>
## 1 female Currently Emergent Bilingual 7.45 8 11
## 2 female Fourth Year of Monitoring 8 8 1
## 3 female Non-ESL 7.60 8 52
## 4 male Currently Emergent Bilingual 6.73 8 15
## 5 male Fourth Year of Monitoring 5.67 5 3
## 6 male Non-ESL 6.53 7 45
ggplot(m_f, aes(x = Gender, y = avg_score, fill = EB)) + geom_bar(stat = 'identity', position = "dodge")+labs(title = "Average Score by Gender and EB", y = "Average Score") + theme_classic()
The bar plot above shows each gender broken down by EB status. It’s evident that female students are performing better than their male counterparts, regardless of EB status.
The last factor considered is the section of Algebra 2 the students are enrolled in. The two options are Honors and Regular. Honors goes more in depth and has greater rigor, but both sections cover the same TEKS and take the same exams. The score breakdown is as follows:
section<-M2T1Data %>% group_by(Class) %>%
summarise(avg_class = mean(Raw.Score),
med_class = median(Raw.Score),
count_class = length(Raw.Score))
section
## # A tibble: 2 × 4
## Class avg_class med_class count_class
## <fct> <dbl> <dbl> <int>
## 1 Honors 7.55 8 91
## 2 Regular 5.83 5.5 36
There is a large difference in both mean and median between the two sections. The Honors section may be skewed left due to the difference in mean and median. Next, the distribution will be visualized.
ggplot(M2T1Data, aes(x = Raw.Score, fill = Class)) +
geom_density(alpha = .3)
After visualizing the data, it is evident that the two distributions
differ, with Honors having a much larger concentration of scores above
75. A Wilcoxon Rank Sum Test will be performed to show the statistical
significance.
Alpha: 0.05
Null hypothesis: There is no
difference in distribution between Regular and Honors classes
Alternative hypothesis: This is a difference in distribution between
Regular and Honors classes
wilcox.test(Raw.Score~Class, data=M2T1Data)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Raw.Score by Class
## W = 2316, p-value = 0.0002507
## alternative hypothesis: true location shift is not equal to 0
As expected the Wilcoxon test returned that there is a difference between the two distributions.
For the final visualization, all three factors will be looked at together.
m_f<-M2T1Data %>% group_by(Gender, EB, Class) %>%
summarise(avg_score = mean(`Raw.Score`),
med_score = median(`Raw.Score`),
count_gender = length(`Raw.Score`))
## `summarise()` has grouped output by 'Gender', 'EB'. You can override using the
## `.groups` argument.
ggplot(m_f, aes(x = Gender, y = avg_score, fill = EB)) + geom_bar(stat = 'identity', position = "dodge")+facet_wrap(~Class)+labs(title = "Average Score by Gender and EB", y = "Average Score") + theme_classic()
After the data analysis and visualization, the hypothesis test and graphs support reteach and reinforced lessons for the following groups: Honors FYM male, Regular CEB female, and Regular Non-ESL male. One interesting finding is that the only female category to score lower than the male counterpart is Regular CEB.
2A.2(A) - graph the function \(f(x)=x^3\) and, when applicable, analyze the key attributes such as domain, range, intercepts, symmetries, and maximum and minimum given an interval.
2A.7(B) - add, subtract, and multiply polynomials
2A.7(I) - write the domain and range of a function in interval notation, inequalities, and set notation