Penn Center for Learning Analytics Wiki - User contributions [en]

Socioeconomic Status

2023-11-28T04:14:54Z

Shruti:

Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&rep=rep1&type=pdf pdf]

* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)

* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion
* Models trained on schools with low, medium proportion of SES students performed similarly well for schools with high proportions of low-SES students

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]

* Models predicting undergraduate course grades and average GPA

* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)
* Fairness of model improved if it included only clickstream and survey data

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
*Models predicting college dropout for students in residential and fully online program
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs
*The model showed better recall for students with greater financial needs, especially for those studying in person

Kung & Yu (2020)
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]
* Predicting course grades and later GPA at public U.S. university
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]

* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]

* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is more accurate for low-income students than non-low-income students

Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]

* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR=0.73 for higher SES and FNR=0.57 for medium SES).

Gender: Male/Female

2023-11-28T04:13:55Z

Shruti:

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed very minor differences in AUC between female and male students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Model predicting course outcome
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
* Inconsistent in direction between algorithms.

Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for male students than female students
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
* The fairness of models improved when a combination of institutional and click data was used in the model

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
* The model showed better recall for male students, especially for those studying in person
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Models predicting course outcome of students in a virtual learning environment (VLE)
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value

Bridgeman et al. (2009)
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]

* Automated scoring models for evaluating English essays, or e-rater
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays

Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
* A later version of automated scoring models for evaluating English essays, or e-rater
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.

Zhang et al. (2022)
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
* No gender group consistently had best-performing detectors

Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
* Models predicting whether student will quit spelling learning activity without completing
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
* Predicting whether course grade will be above or below 0.5
* Better prediction for female students in some courses, better prediction for male students in other courses

Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* DEWS prediction is comparable for males and females

Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males and females.

Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students
* For model trained on larger dataset, prediction is more accurate for female students than male students.

Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
*Model detects interaction with the ASSISTments system
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).

Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females)
* Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males)

Socioeconomic Status

2023-11-28T04:12:30Z

Shruti:

International Students

2023-11-28T04:08:30Z

Shruti:

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]

* Model predicting undergraduate course grades and average GPA

* International students were inaccurately predicted to get lower course grade and average GPA than their peers when personal background was included
* Fairness of the model improved if it included both clickstream and survey data

Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
* Predicting whether course grade will be above or below 0.5
* Generally worse prediction international students

Cock et al.(2023) [https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx]

* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for students with diploma from foreign country in flipped classroom ( FNR=0.58 than FNR =0.42 for international students)

Gender: Male/Female

2023-11-28T04:00:07Z

Shruti:

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed very minor differences in AUC between female and male students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Model predicting course outcome
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
* Inconsistent in direction between algorithms.

Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for male students than female students
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
* The fairness of models improved when a combination of institutional and click data was used in the model

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
* The model showed better recall for male students, especially for those studying in person
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Models predicting course outcome of students in a virtual learning environment (VLE)
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value

Bridgeman et al. (2009)
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]

* Automated scoring models for evaluating English essays, or e-rater
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays

Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
* A later version of automated scoring models for evaluating English essays, or e-rater
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.

Zhang et al. (2022)
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
* No gender group consistently had best-performing detectors

Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
* Models predicting whether student will quit spelling learning activity without completing
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
* Predicting whether course grade will be above or below 0.5
* Better prediction for female students in some courses, better prediction for male students in other courses

Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* DEWS prediction is comparable for males and females

Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males and females.

Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students
* For model trained on larger dataset, prediction is more accurate for female students than male students.

Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
*Model detects interaction with the ASSISTments system
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).

Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR of 0.53 for females)
* Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR of 0.43 for males)

At-risk/Dropout/Stopout/Graduation Prediction

2023-11-28T03:54:18Z

Shruti:

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students
* JRip decision trees achieved much lower Kappa and AUC for male students than female students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.
* The decision trees showed very minor differences in AUC between female and male students

Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence

Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]
* Model predicting student graduation and SAT scores for military-connected students
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.

Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]
* Models predicting student retention in an online college program
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting dropout in XuetangX platform using neural network
* A range of over-sampling methods tested
* Regardless of over-sampling method used, dropout performance was slightly better for males.

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]
* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
*Prediction is less accurate for White students than other students
*Prediction is more accurate for students with Disabilities than students without Disabilities
*Prediction is more accurate for low-income students than for non-low-income students
*Prediction is comparable for Males and Females

Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR 0.73 for higher SES and 0.57 for medium SES).
* Model performs worse for males in open-ended environment (higher FNR for males than females)
* Model performs worse for students with diploma from foreign country in flipped classroom
* Model performs worse for females in flipped classrooms

At-risk/Dropout/Stopout/Graduation Prediction

2023-11-28T03:53:47Z

Shruti:

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students
* JRip decision trees achieved much lower Kappa and AUC for male students than female students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.
* The decision trees showed very minor differences in AUC between female and male students

Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence

Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]
* Model predicting student graduation and SAT scores for military-connected students
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.

Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]
* Models predicting student retention in an online college program
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting dropout in XuetangX platform using neural network
* A range of over-sampling methods tested
* Regardless of over-sampling method used, dropout performance was slightly better for males.

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]
* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
*Prediction is less accurate for White students than other students
*Prediction is more accurate for students with Disabilities than students without Disabilities
*Prediction is more accurate for low-income students than for non-low-income students
*Prediction is comparable for Males and Females

Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR 0.73 for higher SES and 0.57 for medium SES).
* Model performs worse for males in open-ended environment (higher FNR for males than females)
* Model performs worse for students with diploma from foreign country in flipped classroom
* Model performs worse for females in flipped classrooms

Gender: Male/Female

2023-10-01T04:45:35Z

Shruti:

Gender: Male/Female

2023-10-01T04:45:06Z

Shruti: Addition

Engagement and Affect Detection

2023-10-01T04:38:07Z

Shruti: Addition

Ocumpaugh et al. (2014) [https://bera-journals.onlinelibrary.wiley.com/doi/pdf/10.1111/bjet.12156 pdf]

* Models detecting student affective states (boredom, confusion, engaged concentration, frustration) from the interaction with ASSISTment system
* Study involved urban, rural, and suburban learners
* Detectors generally performed the best for the same subpopulation that they were trained on (average kappa = 0.26, A′ = 0.67), and worse for other subpopulations (average kappa = 0.03 and A′ = 0.52)
* Detectors trained on combined population generally performed better for urban and suburban population (kappa = 0.18, 0.16; A′ = 0.62, 0.66) and not as well for rural population (kappa = 0.06; A′ = 0.54)
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]

* Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.
* Model detects interaction with the ASSISTments system
* Model performs better for male students (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR)

Gender: Male/Female

2023-08-17T17:07:34Z

Shruti:

Student Knowledge Modeling

2023-08-17T17:06:16Z

Shruti:

Yudelson et al. (2014) [https://www.yudelson.info/pdf/EDM2014_YudelsonFRBNJ.pdf pdf]
*Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)
*Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion
*Models trained on schools with low, medium proportion of SES students performed similarly well for schools with high proportions of low-SES students
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]

* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
* Model trained on smaller dataset achieves greater fairness in prediction for male/female as well as white/non-white students
* For model trained on larger dataset, prediction is more accurate for white and female students than non-white and male students.

White Learners in North America

2023-08-17T17:05:42Z

Shruti: addition new paper

Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]
* Automated scoring models for evaluating English essays, or e-rater
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups

Jiang & Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]
* Predicting university course grades using LSTM
* Roughly equal accuracy across racial groups
* Slightly better accuracy (~1%) across racial groups when including race in model

Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups.
* No racial/ethnic group consistently had best-performing detectors

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students

Sulaiman & Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]
* Models predicting whether a law student will pass the bar exam (to practice law)
* Compared White and non-White students
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA
* Models applying fairness constraints performed equivalently for White and non-White students

Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]
* Predicting 9th grade math score from academic performance, surveys, and demographic information
* Despite comparable accuracy, model tends to overpredict White students' performance
* Several fairness correction methods equalize false positive and false negative rates across groups.

Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is less accurate for White students than other students

Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
*Models have approximately equal performance for White, African American and Hispanic/Latinx students.

Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]

* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
* Model trained on smaller dataset achieves greater fairness in prediction for white and non-white students
* For model trained on larger dataset, prediction is more accurate for white students than for non-white students.

Gender: Male/Female

2023-08-17T16:57:36Z

Shruti:

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed very minor differences in AUC between female and male students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Model predicting course outcome
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
* Inconsistent in direction between algorithms.

Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for male students than female students
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
* The fairness of models improved when a combination of institutional and click data was used in the model

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
* The model showed better recall for male students, especially for those studying in person
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Models predicting course outcome of students in a virtual learning environment (VLE)
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value

Bridgeman et al. (2009)
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]

* Automated scoring models for evaluating English essays, or e-rater
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays

Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
* A later version of automated scoring models for evaluating English essays, or e-rater
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.

Zhang et al. (2022)
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
* No gender group consistently had best-performing detectors

Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
* Models predicting whether student will quit spelling learning activity without completing
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
* Predicting whether course grade will be above or below 0.5
* Better prediction for female students in some courses, better prediction for male students in other courses

Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* DEWS prediction is comparable for males and females

Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males and females.

Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
*Model trained on smaller data achieves greater fairness in prediction for male and female students
* For model trained on a larger dataset, prediction is more accurate for female students than male students.

Student Knowledge Modeling

2023-08-17T16:36:45Z

Shruti: addition

Yudelson et al. (2014) [https://www.yudelson.info/pdf/EDM2014_YudelsonFRBNJ.pdf pdf]
*Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)
*Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion
*Models trained on schools with low, medium proportion of SES students performed similarly well for schools with high proportions of low-SES students
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]

* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction
* Model trained on smaller data achieves greater fairness in prediction for male/female as well as white/non-white students
* For model trained on a larger dataset, prediction is more accurate for white and female students than non-white and male students.

Student Knowledge Modeling

2023-08-17T16:34:54Z

Shruti: new

At-risk/Dropout/Stopout/Graduation Prediction

2023-06-29T01:17:31Z

Shruti:

At-risk/Dropout/Stopout/Graduation Prediction

2023-06-29T01:16:36Z

Shruti:

Socioeconomic Status

2023-06-29T01:15:10Z

Shruti: addition new paper

Learners with Disabilities

2023-06-29T01:13:48Z

Shruti:

Loukina & Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]
* a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments
* SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ2 = .57) than test takers who were given accommodations for documented disabilities (ρ2 = .73)

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]

* Models predicting course outcome of students in a virtual learning environment (VLE)
* Disparate impact was found for students with self-declared disabilities, with systematic inaccuracies in predictions for learners in this group.

Permodo et al (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is more accurate for students with Disabilities than students without Disabilities

Other NLP Applications of Algorithms in Education

2023-06-29T01:10:26Z

Shruti:

Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]

* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.

Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]

* Models predicting classroom discourse properties (e.g. authenticity and uptake)
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* MOOCs taught in English
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers
* Balancing the size of each group in the training set reduced ABROCA values

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting forum post relevance to course in Moodle data (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.

Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on the process, 2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males and females and for African American, Hispanic/Latinx, and White students.

Gender: Male/Female

2023-06-29T01:07:46Z

Shruti: Addition

White Learners in North America

2023-06-29T01:04:57Z

Shruti: Addition

Black/African-American Learners in North America

2023-06-29T01:01:09Z

Shruti:

Black/African-American Learners in North America

2023-06-29T01:00:36Z

Shruti:

Black/African-American Learners in North America

2023-06-29T01:00:21Z

Shruti: Addition

Latino/Latina/Latinx/Hispanic Learners in North America

2023-06-29T00:49:33Z

Shruti:

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed little difference in AUC among Hispanic, White, Black, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.

Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Hispanic, American Indian, Black, Hawaiian or Pacific Islander, and Multicultural) than non-URM students (White and Asian)
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person
* The model showed better recall for URM students, whether they were in residential or online program

Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]
* Automated scoring models for evaluating English essays, or e-rater
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students

Jiang & Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]
* Predicting university course grades using LSTM
* Roughly equal accuracy across racial groups
* Slightly better accuracy (~1%) across racial groups when including race in model

Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups.
* No racial/ethnic group consistently had best-performing detectors

Kung & Yu (2020)
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]
* Predicting course grades and later GPA at public U.S. university
* Poorer independence, separation, sufficiency for Latinx students than white students for five different classic machine learning algorithms

Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]
* Predicting 9th grade math score from academic performance, surveys, and demographic information
* Despite comparable accuracy, model tends to underpredict Hispanic students' performance
* Several fairness correction methods equalize false positive and false negative rates across groups.

Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for Hispanic/Latinx, African American, and White students.

Latino/Latina/Latinx/Hispanic Learners in North America

2023-06-29T00:49:11Z

Shruti: Addition

Socioeconomic Status

2023-06-29T00:41:25Z

Shruti: addition new paper

Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&rep=rep1&type=pdf pdf]

* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)

* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion
* Models trained on schools with low, medium proportion of SES students performed similarly well for schools with high proportions of low-SES students

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]

* Models predicting undergraduate course grades and average GPA

* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)
* Fairness of model improved if it included only clickstream and survey data

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
*Models predicting college dropout for students in residential and fully online program
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs
*The model showed better recall for students with greater financial needs, especially for those studying in person

Kung & Yu (2020)
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]
* Predicting course grades and later GPA at public U.S. university
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]

* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

Permodo et al. (2023)

* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is more accurate for low-income students than non-low-income students

Gender: Male/Female

2023-06-29T00:39:04Z

Shruti: Addition

Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved much lower Kappa and AUC for male students than female students

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed very minor differences in AUC between female and male students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Model predicting course outcome
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups
* Inconsistent in direction between algorithms.

Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]
* Models predicting college success (or median grade or above)
* Random forest algorithms performed significantly worse for male students than female students
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values

Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.
* The fairness of models improved when a combination of institutional and click data was used in the model

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students
* The model showed better recall for male students, especially for those studying in person
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]
* Models predicting course outcome of students in a virtual learning environment (VLE)
* More male students were predicted to pass the course than female students, but this overestimation was fairly small and not consistent across different algorithms
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value

Bridgeman et al. (2009)
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]

* Automated scoring models for evaluating English essays, or e-rater
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays

Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]
* A later version of automated scoring models for evaluating English essays, or e-rater
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.

Zhang et al. (2022)
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups.
* No gender group consistently had best-performing detectors

Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]
* Models predicting whether student will quit spelling learning activity without completing
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,
but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* Balancing the size of each group in the training set reduced ABROCA

Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
* Automated essay scoring models inferring text evidence usage
* All algorithms studied have less than 1% of error explained by whether student is female and male

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.

Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&action=download&direct&version=1]
* Predicting whether course grade will be above or below 0.5
* Better prediction for female students in some courses, better prediction for male students in other courses

Permodo et al. (2023) [http://Prediction%20is%20more%20accurate%20for%20students%20with%20Disabilities%20than%20students%20without%20Disabilities pdf]
* Paper discusses system that predicts probabilities of on-time graduation
* DEWS prediction is comparable for Males and Females

Learners with Disabilities

2023-06-29T00:36:17Z

Shruti:

Permodo et al (2023) [https://arxiv.org/abs/2304.06205 pdf]

* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is more accurate for students with Disabilities than students without Disabilities

Loukina & Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]

* a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments
* SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ2 = .57) than test takers who were given accommodations for documented disabilities (ρ2 = .73)

Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]

* Models predicting course outcome of students in a virtual learning environment (VLE)
* Disparate impact was found for students with self-declared disabilities, with systematic inaccuracies in predictions for learners in this group.

Learners with Disabilities

2023-06-29T00:35:29Z

Shruti:

White Learners in North America

2023-06-29T00:29:11Z

Shruti: addition new paper

Permodo et al. (2023) [https://arxiv.org/abs/2304.06205 pdf]

* Paper discusses system that predicts probabilities of on-time graduation
* Prediction is less accurate for White students than other students

Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]
* Automated scoring models for evaluating English essays, or e-rater
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups

Jiang & Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]
* Predicting university course grades using LSTM
* Roughly equal accuracy across racial groups
* Slightly better accuracy (~1%) across racial groups when including race in model

Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups.
* No racial/ethnic group consistently had best-performing detectors

Li, Xing, & Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]
* Models predicting whether two students will communicate on an online discussion forum
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students

Sulaiman & Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]
* Models predicting whether a law student will pass the bar exam (to practice law)
* Compared White and non-White students
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA
* Models applying fairness constraints performed equivalently for White and non-White students

Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]
* Predicting 9th grade math score from academic performance, surveys, and demographic information
* Despite comparable accuracy, model tends to overpredict White students' performance
* Several fairness correction methods equalize false positive and false negative rates across groups.

At-risk/Dropout/Stopout/Graduation Prediction

2023-06-28T22:58:23Z

Shruti:

Permodo et al.(2023) [http://pdf%20https://arxiv.org/abs/2304.06205 pdf]
* Paper discusses system that predicts probabilities of on-time graduation
*Prediction is less accurate for White students than other students
*Prediction is more accurate for students with Disabilities than students without Disabilities
*Prediction is more accurate for low-income students than for non-low-income students.
*DEWS prediction is comparable for Males and Females
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students
* JRip decision trees achieved much lower Kappa and AUC for male students than female students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.
* The decision trees showed very minor differences in AUC between female and male students

Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence

Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]
* Model predicting student graduation and SAT scores for military-connected students
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.

Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]
* Models predicting student retention in an online college program
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting dropout in XuetangX platform using neural network
* A range of over-sampling methods tested
* Regardless of over-sampling method used, dropout performance was slightly better for males.

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]

* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

At-risk/Dropout/Stopout/Graduation Prediction

2023-06-28T22:57:33Z

Shruti: Addition

Permodo et al.(2023) [http://pdf https://arxiv.org/abs/2304.06205]
* Paper discusses system that predicts probabilities of on-time graduation
*Prediction is less accurate for White students than other students
*Prediction is more accurate for students with Disabilities than students without Disabilities
*Prediction is more accurate for low-income students than for non-low-income students.
*DEWS prediction is comparable for Males and Females
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]
* Models predicting student retention in an online college program
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students
* JRip decision trees achieved much lower Kappa and AUC for male students than female students

Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]
* Models predicting if a college student will fail in a course
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.

Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]
* Models predicting six-year college graduation
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used

Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]
* Models predicting student's high school dropout
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and Native Hawaiian and Pacific Islander.
* The decision trees showed very minor differences in AUC between female and male students

Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]
* Model predicting MOOC dropout, specifically through slicing analysis
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence

Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]
* Model predicting student graduation and SAT scores for military-connected students
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.

Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]
* Models predicting student retention in an online college program
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did
* J-Rip decision rules achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did

Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]
* Models predicting college dropout for students in residential and fully online program
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs

Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]
* An algorithm predicting dropout from university after the first year
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting dropout in XuetangX platform using neural network
* A range of over-sampling methods tested
* Regardless of over-sampling method used, dropout performance was slightly better for males.

Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]

* Models predicting secondary school students at risk of failure or dropping out
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)

Other NLP Applications of Algorithms in Education

2023-06-28T22:00:14Z

Shruti: edit

Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]

* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males versus females and for African American, Hispanic/Latinx, and White students.

Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]

* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.

Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]

* Models predicting classroom discourse properties (e.g. authenticity and uptake)
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* MOOCs taught in English
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers
* Balancing the size of each group in the training set reduced ABROCA values

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting forum post relevance to course in Moodle data (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.

Other NLP Applications of Algorithms in Education

2023-06-28T21:59:53Z

Shruti: addition new paper

Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]

* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) Commenting on process,2) commenting on the answer, and 3) relating to self.
* Models have approximately equal performance for males versus females and for African American, Hispanic/Latinx, and White students.

Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]

* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.

Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]

* Models predicting classroom discourse properties (e.g. authenticity and uptake)
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)

Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant
* MOOCs taught in English
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers
* Balancing the size of each group in the training set reduced ABROCA values

Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]
* Predicting forum post relevance to course in Moodle data (neural network)
* A range of over-sampling methods tested
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.

White Learners in North America

2023-06-28T21:54:31Z

Shruti: year

Black/African-American Learners in North America

2023-06-27T21:06:24Z

Shruti: