Difference between revisions of "Gender: Male/Female"

Latest revision as of 23:13, 27 November 2023

Kai et al. (2017) pdf

Models predicting student retention in an online college program
J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
JRip decision rules achieved much lower Kappa and AUC for male students than female students

Christie et al. (2019) pdf

Models predicting student's high school dropout
The decision trees showed very minor differences in AUC between female and male students

Hu and Rangwala (2020) pdf

Models predicting if a college student will fail in a course
Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) pdf

Models predicting six-year college graduation
False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Gardner, Brooks and Baker (2019) pdf

Model predicting MOOC dropout, specifically through slicing analysis
Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence

Riazy et al. (2020) pdf

Model predicting course outcome
Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
Inconsistent in direction between algorithms.

Lee and Kizilcec (2020) pdf

Models predicting college success (or median grade or above)
Random forest algorithms performed significantly worse for male students than female students
The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) pdf

Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
The fairness of models improved when a combination of institutional and click data was used in the model

Yu et al. (2021) pdf

Models predicting college dropout for students in residential and fully online program
Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
The model showed better recall for male students, especially for those studying in person
The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model

Riazy et al. (2020) pdf

Models predicting course outcome of students in a virtual learning environment (VLE)
More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value

Bridgeman et al. (2009) pdf

Automated scoring models for evaluating English essays, or e-rater
E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays

Bridgeman et al. (2012) pdf

A later version of automated scoring models for evaluating English essays, or e-rater
E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Verdugo et al. (2022) pdf

An algorithm predicting dropout from university after the first year
Several algorithms achieved better AUC for male than female students; results were mixed for F1.

Zhang et al. (2022)

Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
No gender group consistently had best-performing detectors

Rzepka et al. (2022) pdf

Models predicting whether student will quit spelling learning activity without completing
Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.

Li, Xing, & Leite (2022) pdf

Models predicting whether two students will communicate on an online discussion forum
Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students

Sha et al. (2021) pdf

Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
Some algorithms achieved ABROCA under 0.01 for female students versus male students,

but other algorithms (Naive Bayes) had ABROCA as high as 0.06

Balancing the size of each group in the training set reduced ABROCA

Litman et al. (2021) html

Automated essay scoring models inferring text evidence usage
All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [1]

Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
A range of over-sampling methods tested
Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Deho et al. (2023) [2]

Predicting whether course grade will be above or below 0.5
Better prediction for female students in some courses, better prediction for male students in other courses

Permodo et al. (2023) pdf

Paper discusses system that predicts probabilities of on-time graduation
DEWS prediction is comparable for males and females

Zhang et al. (2023) pdf

Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
Models have approximately equal performance for males and females.

Almoubayyed et al. (2023)pdf

Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
Model trained on smaller dataset achieves greater fairness in prediction for male and female students
For model trained on larger dataset, prediction is more accurate for female students than male students.

Chiu (2020) pdf

Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
Model detects interaction with the ASSISTments system
Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).

Cock et al.(2023) [pdf]

Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females)
Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males)

@@ Line 1: / Line 1: @@
 Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
 * Models predicting student retention in an online college program
@@ Line 35: / Line 36: @@
 * Models predicting college success (or median grade or above)
 * Random forest algorithms performed significantly worse for male students than female students
-* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values
+* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values
@@ Line 44: / Line 45: @@
-Yu and colleagues (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
+Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
 * Models predicting college dropout for students in residential and fully online program
-* Whether the protected attributed were included or not, the models had worse true negative rates but better recall for male students
+* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
-* The model was worse for male students studying in online program in terms of true negative rates, recall and accuracy.
+* The model showed better recall for male students, especially for those studying in person
+* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model
@@ Line 60: / Line 62: @@
 * Automated scoring models for evaluating English essays, or e-rater
-* E-rater performed accurately for male and female students when assessing 11th grade English essays and independent writing task in Test of English as a Foreign Language
+* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays
-* While feature-level score differences were identified across gender and ethnic groups (e.g. e-rater gave better scores for word length and vocabulary level but less on grammar and mechanics when grading 11th grade essays written by Asian American female students), the authors called for larger samples  to confirm the findings
-Bridgeman, Trapani, and Attali (2012) [https://www.researchgate.net/publication/233291671_Comparison_of_Human_and_Machine_Scoring_of_Essays_Differences_by_Gender_Ethnicity_and_Country pdf]
+Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
 * A later version of automated scoring models for evaluating English essays, or e-rater
-* The score difference between human rater and e-rater was marginal when  written responses to GRE issue prompt by male and female test-takers were compared
+* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students
-* The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt
+Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
+* An algorithm predicting dropout from university after the first year
+* Several algorithms achieved better AUC for male than female students; results were mixed for F1.
+Zhang et al. (2022)
+* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
+* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
+* No gender group consistently had best-performing detectors
+Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
+* Models predicting whether student will quit spelling learning activity without completing
+* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.
+Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
+* Models predicting whether two students will communicate on an online discussion forum
+* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students
+Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
+* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
+* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
+but other algorithms (Naive Bayes) had ABROCA as high as 0.06
+* Balancing the size of each group in the training set reduced ABROCA
+Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
+* Automated essay scoring models inferring text evidence usage
+* All algorithms studied have less than 1% of error explained by whether student is female and male
+Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
+* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
+* A range of over-sampling methods tested
+* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.
+Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
+* Predicting whether course grade will be above or below 0.5
+* Better prediction for female students in some courses, better prediction for male students in other courses
+Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
+* Paper discusses system that predicts probabilities of on-time graduation
+* DEWS prediction is comparable for males and females
+Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
+* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
+* Models have approximately equal performance for males and females.
+Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]
+* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
+*Model trained on smaller dataset achieves greater fairness in prediction for male and female students
+* For model trained on larger dataset, prediction is more accurate for female students than male students.
+Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]
+*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
+*Model detects interaction with the ASSISTments system
+*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).
+Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]
+* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
+* Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females)
+* Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males)

Difference between revisions of "Gender: Male/Female"

Latest revision as of 23:13, 27 November 2023

Navigation menu

Search