<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.pcla.wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Shruti</id>
	<title>Penn Center for Learning Analytics Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.pcla.wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Shruti"/>
	<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php/Special:Contributions/Shruti"/>
	<updated>2026-05-05T10:24:32Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.37.1</generator>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=488</id>
		<title>Socioeconomic Status</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=488"/>
		<updated>2023-11-28T04:14:54Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&amp;amp;rep=rep1&amp;amp;type=pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
&lt;br /&gt;
* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
* Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)&lt;br /&gt;
* Fairness of model improved if it included only clickstream and survey data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
*Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs&lt;br /&gt;
*The model showed better recall for students with greater financial needs, especially for those studying in person&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics&lt;br /&gt;
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for low-income students than non-low-income students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR=0.73 for higher SES and FNR=0.57 for medium SES).&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=487</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=487"/>
		<updated>2023-11-28T04:13:55Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for female students than male students.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]&lt;br /&gt;
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.&lt;br /&gt;
*Model detects interaction with the ASSISTments system&lt;br /&gt;
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR=0.53 for females) &lt;br /&gt;
* Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR=0.43 for males)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=486</id>
		<title>Socioeconomic Status</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=486"/>
		<updated>2023-11-28T04:12:30Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&amp;amp;rep=rep1&amp;amp;type=pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
&lt;br /&gt;
* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
* Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)&lt;br /&gt;
* Fairness of model improved if it included only clickstream and survey data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
*Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs&lt;br /&gt;
*The model showed better recall for students with greater financial needs, especially for those studying in person&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics&lt;br /&gt;
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for low-income students than non-low-income students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR 0.73 for higher SES and FNR 0.57 for medium SES).&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=International_Students&amp;diff=485</id>
		<title>International Students</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=International_Students&amp;diff=485"/>
		<updated>2023-11-28T04:08:30Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Model predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* International students were inaccurately predicted to get lower course grade and average GPA than their peers when personal background was included&lt;br /&gt;
* Fairness of the model improved if it included both clickstream and survey data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Generally worse prediction international students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx]&lt;br /&gt;
&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for students with diploma from foreign country in flipped classroom ( FNR=0.58 than FNR =0.42 for international students)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=484</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=484"/>
		<updated>2023-11-28T04:00:07Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for female students than male students.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]&lt;br /&gt;
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.&lt;br /&gt;
*Model detects interaction with the ASSISTments system&lt;br /&gt;
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for males in open-ended environment (FNR=0.70 for males and FNR of 0.53 for females) &lt;br /&gt;
* Model performs worse for females in flipped classrooms(FNR=0.56 for females and FNR of 0.43 for males)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=483</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=483"/>
		<updated>2023-11-28T03:54:18Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students&lt;br /&gt;
*Prediction is comparable for Males and Females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR 0.73 for higher SES and 0.57 for medium SES).  &lt;br /&gt;
* Model performs worse for males in open-ended environment (higher FNR for males than females)&lt;br /&gt;
* Model performs worse for students with diploma from foreign country in flipped classroom  &lt;br /&gt;
* Model performs worse for females in flipped classrooms&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=482</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=482"/>
		<updated>2023-11-28T03:53:47Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students&lt;br /&gt;
*Prediction is comparable for Males and Females&lt;br /&gt;
&lt;br /&gt;
Cock et al.(2023) [[https://dl.acm.org/doi/abs/10.1145/3576050.3576149?casa_token=6Fjh-EUzN-gAAAAA%3AtpRMYzSAVoQFYNzwY5gwSsrnzHIlI0tUjMq6okwgdcCUmuBMVZEtn8eLO52dCtIYUbrHBV_Il9Sx pdf]]&lt;br /&gt;
* Paper investigates biases in models designed to early identify middle school students at risk of failing in flipped-classroom course and open-ended exploration environment (TugLet)&lt;br /&gt;
* Model performs worse for students from school with higher socio-economic status in open-ended environment (FNR 0.73 for higher SES and 0.57 for medium SES).  &lt;br /&gt;
* Model performs worse for males in open-ended environment (higher FNR for males than females)&lt;br /&gt;
* Model performs worse for students with diploma from foreign country in flipped classroom  &lt;br /&gt;
* Model performs worse for females in flipped classrooms&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=481</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=481"/>
		<updated>2023-10-01T04:45:35Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for female students than male students.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]&lt;br /&gt;
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.&lt;br /&gt;
*Model detects interaction with the ASSISTments system&lt;br /&gt;
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=480</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=480"/>
		<updated>2023-10-01T04:45:06Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for female students than male students.&lt;br /&gt;
&lt;br /&gt;
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]&lt;br /&gt;
*Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.&lt;br /&gt;
*Model detects interaction with the ASSISTments system&lt;br /&gt;
*Model performs better for males (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR).&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Engagement_and_Affect_Detection&amp;diff=479</id>
		<title>Engagement and Affect Detection</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Engagement_and_Affect_Detection&amp;diff=479"/>
		<updated>2023-10-01T04:38:07Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Ocumpaugh et al. (2014) [https://bera-journals.onlinelibrary.wiley.com/doi/pdf/10.1111/bjet.12156 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models detecting student affective states (boredom, confusion, engaged concentration, frustration) from the interaction with ASSISTment system&lt;br /&gt;
* Study involved urban, rural, and suburban learners&lt;br /&gt;
* Detectors generally performed the best for the same subpopulation that they were trained on (average kappa = 0.26, A′ = 0.67), and worse for other subpopulations (average kappa = 0.03 and A′ = 0.52)&lt;br /&gt;
* Detectors trained on combined population generally performed better for urban and suburban population (kappa = 0.18, 0.16; A′ = 0.62, 0.66) and not as well for rural population (kappa = 0.06; A′ = 0.54)&lt;br /&gt;
Chiu (2020) [https://files.eric.ed.gov/fulltext/EJ1267654.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Model identifies affective states (boredom, concentration, confusion, frustration, off task and gaming) of middle school students’ online mathematics learning in predicting their choice to study STEM in higher education.&lt;br /&gt;
* Model detects interaction with the ASSISTments system&lt;br /&gt;
* Model performs better for male students (AUC =0.641 for RFPS; AUC =0.571 for LR) than female students (AUC = 0.492 for RFPS; AUC=0.535 for LR)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=478</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=478"/>
		<updated>2023-08-17T17:07:34Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller dataset achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for female students than male students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=477</id>
		<title>Student Knowledge Modeling</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=477"/>
		<updated>2023-08-17T17:06:16Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://www.yudelson.info/pdf/EDM2014_YudelsonFRBNJ.pdf pdf]&lt;br /&gt;
*Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
*Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
*Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
* Model trained on smaller dataset achieves greater fairness in prediction for male/female as well as white/non-white students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for white and female students than non-white and male students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=476</id>
		<title>White Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=476"/>
		<updated>2023-08-17T17:05:42Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition new paper&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sulaiman &amp;amp; Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]&lt;br /&gt;
* Models predicting whether a law student will pass the bar exam (to practice law)&lt;br /&gt;
* Compared White and non-White students&lt;br /&gt;
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA&lt;br /&gt;
* Models applying fairness constraints performed equivalently for White and non-White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to overpredict White students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is less accurate for White students than other students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self. &lt;br /&gt;
*Models have approximately equal performance for White, African American and Hispanic/Latinx students.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
* Model trained on smaller dataset achieves greater fairness in prediction for white and non-white students&lt;br /&gt;
* For model trained on larger dataset, prediction is more accurate for white students than for non-white students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=475</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=475"/>
		<updated>2023-08-17T16:57:36Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Almoubayyed et al. (2023)[https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
*Model trained on smaller data achieves greater fairness in prediction for male and female students&lt;br /&gt;
* For model trained on a larger dataset, prediction is more accurate for female students than male students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=474</id>
		<title>Student Knowledge Modeling</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=474"/>
		<updated>2023-08-17T16:36:45Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://www.yudelson.info/pdf/EDM2014_YudelsonFRBNJ.pdf pdf]&lt;br /&gt;
*Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
*Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
*Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalization of the performance for reading comprehension ability in the context of middle school students’ usage of Carnegie Learning’s ITS for mathematics instruction&lt;br /&gt;
* Model trained on smaller data achieves greater fairness in prediction for male/female as well as white/non-white students &lt;br /&gt;
* For model trained on a larger dataset, prediction is more accurate for white and female students than non-white and male students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=473</id>
		<title>Student Knowledge Modeling</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Student_Knowledge_Modeling&amp;diff=473"/>
		<updated>2023-08-17T16:34:54Z</updated>

		<summary type="html">&lt;p&gt;Shruti: new&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://www.yudelson.info/pdf/EDM2014_YudelsonFRBNJ.pdf pdf]&lt;br /&gt;
*Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
*Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
*Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
Almoubayyed et al. (2023) [https://educationaldatamining.org/EDM2023/proceedings/2023.EDM-long-papers.18/2023.EDM-long-papers.18.pdf pdf]&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=471</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=471"/>
		<updated>2023-06-29T01:17:31Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students.&lt;br /&gt;
*DEWS prediction is comparable for Males and Females&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=470</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=470"/>
		<updated>2023-06-29T01:16:36Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al.(2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students.&lt;br /&gt;
*DEWS prediction is comparable for Males and Females&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=469</id>
		<title>Socioeconomic Status</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=469"/>
		<updated>2023-06-29T01:15:10Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition new paper&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&amp;amp;rep=rep1&amp;amp;type=pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
&lt;br /&gt;
* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
* Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)&lt;br /&gt;
* Fairness of model improved if it included only clickstream and survey data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
*Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs&lt;br /&gt;
*The model showed better recall for students with greater financial needs, especially for those studying in person&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics&lt;br /&gt;
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for low-income students than non-low-income students&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=468</id>
		<title>Learners with Disabilities</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=468"/>
		<updated>2023-06-29T01:13:48Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt; Loukina &amp;amp; Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]&lt;br /&gt;
* a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments&lt;br /&gt;
* SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .57) than test takers who were given accommodations for documented disabilities (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .73)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* Disparate impact was found for students with self-declared disabilities, with systematic inaccuracies in predictions for learners in this group.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for students with Disabilities than students without Disabilities&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=467</id>
		<title>Other NLP Applications of Algorithms in Education</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=467"/>
		<updated>2023-06-29T01:10:26Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora&lt;br /&gt;
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.&lt;br /&gt;
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .&lt;br /&gt;
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting classroom discourse properties (e.g. authenticity and uptake)&lt;br /&gt;
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)&lt;br /&gt;
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* MOOCs taught in English&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting forum post relevance to course in Moodle data (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on the process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females and for African American, Hispanic/Latinx, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=466</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=466"/>
		<updated>2023-06-29T01:07:46Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)  [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for males and females&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males and females.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=465</id>
		<title>White Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=465"/>
		<updated>2023-06-29T01:04:57Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sulaiman &amp;amp; Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]&lt;br /&gt;
* Models predicting whether a law student will pass the bar exam (to practice law)&lt;br /&gt;
* Compared White and non-White students&lt;br /&gt;
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA&lt;br /&gt;
* Models applying fairness constraints performed equivalently for White and non-White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to overpredict White students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [https://www.researchgate.net/publication/370001437_Difficult_Lessons_on_Social_Prediction_from_Wisconsin_Public_Schools pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is less accurate for White students than other students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self. &lt;br /&gt;
*Models have approximately equal performance for White, African American and Hispanic/Latinx students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=464</id>
		<title>Black/African-American Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=464"/>
		<updated>2023-06-29T01:01:09Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Black, White, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Black, American Indian, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Black students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ramineni &amp;amp; Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]&lt;br /&gt;
* Revised automated scoring engine for assessing GRE essay&lt;br /&gt;
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts&lt;br /&gt;
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by African American and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (over 2/3&lt;br /&gt;
Black/African American)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is Black&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Black students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process, 2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for African American, Hispanic/Latinx, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=463</id>
		<title>Black/African-American Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=463"/>
		<updated>2023-06-29T01:00:36Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Black, White, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Black, American Indian, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Black students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ramineni &amp;amp; Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]&lt;br /&gt;
* Revised automated scoring engine for assessing GRE essay&lt;br /&gt;
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts&lt;br /&gt;
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by African American and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (over 2/3&lt;br /&gt;
Black/African American)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is Black&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Black students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for African American, Hispanic/Latinx, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=462</id>
		<title>Black/African-American Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=462"/>
		<updated>2023-06-29T01:00:21Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Black, White, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Black, American Indian, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Black students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ramineni &amp;amp; Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]&lt;br /&gt;
* Revised automated scoring engine for assessing GRE essay&lt;br /&gt;
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts&lt;br /&gt;
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by African American and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (over 2/3&lt;br /&gt;
Black/African American)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is Black&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Black students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for African American, Hispanic/Latinx, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&amp;diff=461</id>
		<title>Latino/Latina/Latinx/Hispanic Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&amp;diff=461"/>
		<updated>2023-06-29T00:49:33Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Hispanic, White, Black, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Hispanic, American Indian, Black, Hawaiian or Pacific Islander, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Poorer independence, separation, sufficiency for Latinx students than white students for five different classic machine learning algorithms&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Hispanic students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for Hispanic/Latinx, African American, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&amp;diff=460</id>
		<title>Latino/Latina/Latinx/Hispanic Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Latino/Latina/Latinx/Hispanic_Learners_in_North_America&amp;diff=460"/>
		<updated>2023-06-29T00:49:11Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Hispanic, White, Black, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Hispanic, American Indian, Black, Hawaiian or Pacific Islander, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Hispanic students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Poorer independence, separation, sufficiency for Latinx students than white students for five different classic machine learning algorithms&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Hispanic students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;br /&gt;
&lt;br /&gt;
Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for Hispanic/Latinx, African American, and White students.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=459</id>
		<title>Socioeconomic Status</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Socioeconomic_Status&amp;diff=459"/>
		<updated>2023-06-29T00:41:25Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition new paper&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Yudelson et al. (2014) [https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.659.872&amp;amp;rep=rep1&amp;amp;type=pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models discovering generalizable sub-populations of students across different schools to predict students' learning with Carnegie Learning’s Cognitive Tutor (CLCT)&lt;br /&gt;
&lt;br /&gt;
* Models trained on schools with a high proportion of low-SES student performed worse than those trained with medium or low proportion&lt;br /&gt;
* Models trained on schools with low, medium  proportion of SES students performed similarly well for schools with high proportions of low-SES students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting undergraduate course grades and average GPA&lt;br /&gt;
&lt;br /&gt;
* Students from low-income households were inaccurately predicted to perform worse for both short-term (final course grade) and long-term (GPA)&lt;br /&gt;
* Fairness of model improved if it included only clickstream and survey data&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
*Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
*Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students with greater financial needs&lt;br /&gt;
*The model showed better recall for students with greater financial needs, especially for those studying in person&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kung &amp;amp; Yu (2020)&lt;br /&gt;
[https://dl.acm.org/doi/pdf/10.1145/3386527.3406755 pdf]&lt;br /&gt;
* Predicting course grades and later GPA at public U.S. university&lt;br /&gt;
* Equal performance for low-income and upper-income students in course grade prediction for several algorithms and metrics&lt;br /&gt;
* Worse performance on independence for low-income students than high-income students in later GPA prediction for four of five algorithms; one algorithm had worse separation and two algorithms had worse sufficiency&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student receives free/reduced price lunch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023)&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for low-income students than non-low-income students&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=458</id>
		<title>Gender: Male/Female</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Gender:_Male/Female&amp;diff=458"/>
		<updated>2023-06-29T00:39:04Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms studied performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Model predicting course outcome&lt;br /&gt;
* Marginal differences were found for prediction quality and in overall proportion of predicted pass between groups&lt;br /&gt;
* Inconsistent in direction between algorithms.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for male students than female students&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Female students were inaccurately predicted to achieve greater short-term and long-term success than male students.&lt;br /&gt;
* The fairness of models improved when a combination of institutional and click data was used in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates and worse accuracy for male students&lt;br /&gt;
* The model showed better recall for male students, especially for those studying in person&lt;br /&gt;
* The difference in recall and true negative rates were lower, and thus fairer, for male students studying online if their socio-demographic information was not included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* More male students were predicted to pass the course than female students, but  this overestimation was fairly small and not consistent across different algorithms&lt;br /&gt;
* Among the algorithms, Naive Bayes had the lowest normalized mutual information value and the highest ABROCA value&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) &lt;br /&gt;
[https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC for male than female students; results were mixed for F1.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022)&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across gender groups. &lt;br /&gt;
* No gender group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Rzepka et al. (2022) [https://www.insticc.org/node/TechnicalProgram/CSEDU/2022/presentationDetails/109621 pdf]&lt;br /&gt;
* Models predicting whether student will quit spelling learning activity without completing&lt;br /&gt;
* Multiple algorithms have slightly better false positive rates and AUC ROC for male students than female students, but equivalent performance on multiple other metrics.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for female versus male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students,&lt;br /&gt;
but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is female and male&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Three data sets and algorithms: predicting course pass/fail (random forest), dropout (neural network), and forum post relevance (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, course pass/fail performance was moderately better for males, dropout performance was slightly better for males, and forum post relevance performance was moderately better for females.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Deho et al. (2023) [https://files.osf.io/v1/resources/5am9z/providers/osfstorage/63eaf170a3fade041fe7c9db?format=pdf&amp;amp;action=download&amp;amp;direct&amp;amp;version=1]&lt;br /&gt;
* Predicting whether course grade will be above or below 0.5&lt;br /&gt;
* Better prediction for female students in some courses, better prediction for male students in other courses&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Permodo et al. (2023) [http://Prediction%20is%20more%20accurate%20for%20students%20with%20Disabilities%20than%20students%20without%20Disabilities pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* DEWS prediction is comparable for Males and Females&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=457</id>
		<title>Learners with Disabilities</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=457"/>
		<updated>2023-06-29T00:36:17Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Permodo et al (2023) [https://arxiv.org/abs/2304.06205 pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
Loukina &amp;amp; Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]&lt;br /&gt;
&lt;br /&gt;
* a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments&lt;br /&gt;
* SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .57) than test takers who were given accommodations for documented disabilities (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .73)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* Disparate impact was found for students with self-declared disabilities, with systematic inaccuracies in predictions for learners in this group.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=456</id>
		<title>Learners with Disabilities</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Learners_with_Disabilities&amp;diff=456"/>
		<updated>2023-06-29T00:35:29Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Permodo et al (2023) [https://arxiv.org/abs/2304.06205 pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
&lt;br /&gt;
  Loukina &amp;amp; Buzick (2017) [https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/ets2.12170 pdf]&lt;br /&gt;
&lt;br /&gt;
* a model (the SpeechRater) automatically scoring open-ended spoken responses for speakers with documented or suspected speech impairments&lt;br /&gt;
* SpeechRater was less accurate for test takers who were deferred for signs of speech impairment (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .57) than test takers who were given accommodations for documented disabilities (ρ&amp;lt;sup&amp;gt;2&amp;lt;/sup&amp;gt; = .73)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Riazy et al. (2020) [https://www.scitepress.org/Papers/2020/93241/93241.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting course outcome of students in a virtual learning environment (VLE)&lt;br /&gt;
* Disparate impact was found for students with self-declared disabilities, with systematic inaccuracies in predictions for learners in this group.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=455</id>
		<title>White Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=455"/>
		<updated>2023-06-29T00:29:11Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition new paper&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Permodo et al. (2023) [https://arxiv.org/abs/2304.06205 pdf]&lt;br /&gt;
&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
* Prediction is less accurate for White students than other students&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sulaiman &amp;amp; Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]&lt;br /&gt;
* Models predicting whether a law student will pass the bar exam (to practice law)&lt;br /&gt;
* Compared White and non-White students&lt;br /&gt;
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA&lt;br /&gt;
* Models applying fairness constraints performed equivalently for White and non-White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to overpredict White students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=454</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=454"/>
		<updated>2023-06-28T22:58:23Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Permodo et al.(2023) [http://pdf%20https://arxiv.org/abs/2304.06205 pdf]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students.&lt;br /&gt;
*DEWS prediction is comparable for Males and Females&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=453</id>
		<title>At-risk/Dropout/Stopout/Graduation Prediction</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=At-risk/Dropout/Stopout/Graduation_Prediction&amp;diff=453"/>
		<updated>2023-06-28T22:57:33Z</updated>

		<summary type="html">&lt;p&gt;Shruti: Addition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Permodo et al.(2023) [http://pdf https://arxiv.org/abs/2304.06205]&lt;br /&gt;
* Paper discusses system that predicts probabilities of on-time graduation&lt;br /&gt;
*Prediction is less accurate for White students than other students&lt;br /&gt;
*Prediction is more accurate for students with Disabilities than students without Disabilities&lt;br /&gt;
*Prediction is more accurate for low-income students than for non-low-income students.&lt;br /&gt;
*DEWS prediction is comparable for Males and Females&lt;br /&gt;
Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* J48 decision trees achieved significantly lower Kappa but higher AUC for male students than female students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
* JRip decision trees achieved much lower Kappa and AUC for male students than female students&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against male students, performing particularly better for Psychology course.&lt;br /&gt;
* Other models (Logistic Regression and Rawlsian Fairness) performed far worse for male students, performing particularly worse in Computer Science and Electrical Engineering.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Anderson et al. (2019) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM2019_paper56.pdf pdf]&lt;br /&gt;
* Models predicting six-year college graduation&lt;br /&gt;
* False negatives rates were greater for Latino students when Decision Tree and Random Forest yielded was used&lt;br /&gt;
* White students had higher false positive rates across all models, Decision Tree, SVM, Logistic Regression, Random Forest, and SGD&lt;br /&gt;
* False negatives rates were greater for male students than female students when SVM, Logistic Regression, and SGD were used&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among White, Black, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
* The decision trees showed very minor differences in AUC between female and male students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Gardner, Brooks and Baker (2019) [[https://www.upenn.edu/learninganalytics/ryanbaker/LAK_PAPER97_CAMERA.pdf pdf]]&lt;br /&gt;
* Model predicting MOOC dropout, specifically through slicing analysis&lt;br /&gt;
* Some algorithms performed worse for female students than male students, particularly in courses with 45% or less male presence&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Baker et al. (2020) [[https://www.upenn.edu/learninganalytics/ryanbaker/BakerBerningGowda.pdf pdf]]&lt;br /&gt;
* Model predicting student graduation and SAT scores for military-connected students&lt;br /&gt;
* For prediction of graduation, algorithms applying across population resulted an AUC of 0.60, degrading from their original performance of 70% or 71% to chance.&lt;br /&gt;
* For prediction of SAT scores, algorithms applying across population resulted in a Spearman's ρ of 0.42 and 0.44, degrading a third from their original performance to chance.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Kai et al. (2017) [https://files.eric.ed.gov/fulltext/ED596601.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J-48 decision trees achieved much higher Kappa and AUC for students whose parents did not attend college than those whose parents did&lt;br /&gt;
* J-Rip decision rules  achieved much higher Kappa and AUC for students whose parents did not attended college than those whose parents did&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* The model showed better recall for students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse accuracy and true negative rates for residential students who are under-represented minority (URM; not White or Asian), male, first-generation, or with greater financial needs&lt;br /&gt;
* Both accuracy and true negative rates were better for students who are first-generation, or with greater financial needs&lt;br /&gt;
&lt;br /&gt;
Verdugo et al. (2022) [https://dl.acm.org/doi/abs/10.1145/3506860.3506902 pdf]&lt;br /&gt;
* An algorithm predicting dropout from university after the first year&lt;br /&gt;
* Several algorithms achieved better AUC and F1 for students who attended public high schools than for students who attended private high schools.&lt;br /&gt;
* Several algorithms predicted better AUC for male students than female students; F1 scores were more balanced.&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting dropout in XuetangX platform using neural network&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, dropout performance was slightly better for males.&lt;br /&gt;
&lt;br /&gt;
Queiroga et al. (2022) [https://www.mdpi.com/2078-2489/13/9/401 pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting secondary school students at risk of failure or dropping out&lt;br /&gt;
* Model was unable to make prediction of student success (F1 score = 0.0) for students not in a social welfare program (higher socioeconomic status)&lt;br /&gt;
* Model had slightly lower AUC ROC (0.52 instead of 0.56) for students not in a social welfare program (higher socioeconomic status)&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=452</id>
		<title>Other NLP Applications of Algorithms in Education</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=452"/>
		<updated>2023-06-28T22:00:14Z</updated>

		<summary type="html">&lt;p&gt;Shruti: edit&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males versus females and for African American, Hispanic/Latinx, and White students.&lt;br /&gt;
&lt;br /&gt;
Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora&lt;br /&gt;
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.&lt;br /&gt;
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .&lt;br /&gt;
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting classroom discourse properties (e.g. authenticity and uptake)&lt;br /&gt;
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)&lt;br /&gt;
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* MOOCs taught in English&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting forum post relevance to course in Moodle data (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=451</id>
		<title>Other NLP Applications of Algorithms in Education</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Other_NLP_Applications_of_Algorithms_in_Education&amp;diff=451"/>
		<updated>2023-06-28T21:59:53Z</updated>

		<summary type="html">&lt;p&gt;Shruti: addition new paper&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Zhang et al.(2023) [https://learninganalytics.upenn.edu/ryanbaker/ISLS23_annotation%20detector_short_submit.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models developed to detect attributes of student feedback for other students’ mathematics solutions, reflecting the presence of three constructs:1) Commenting on process,2) commenting on the answer, and 3) relating to self.&lt;br /&gt;
* Models have approximately equal performance for males versus females and for African American, Hispanic/Latinx, and White students.&lt;br /&gt;
&lt;br /&gt;
Naismith et al. (2018) [http://d-scholarship.pitt.edu/40665/1/EDM2018_paper_37.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* a model that measures L2 learners’ lexical sophistication with the frequency list based on the native speaker corpora&lt;br /&gt;
* Arabic-speaking learners are rated systematically lower across all levels of English proficiency than speakers of Chinese, Japanese, Korean, and Spanish.&lt;br /&gt;
* Level 5 Arabic-speaking learners are unfairly evaluated to have similar level of lexical sophistication as Level 4 learners from China, Japan, Korean and Spain .&lt;br /&gt;
* When used on ETS corpus, “high”-labeled essays by Japanese-speaking learners are rated significantly lower in lexical sophistication than Arabic, Japanese, Korean and Spanish peers.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Samei et al. (2015) [https://files.eric.ed.gov/fulltext/ED560879.pdf pdf]&lt;br /&gt;
&lt;br /&gt;
* Models predicting classroom discourse properties (e.g. authenticity and uptake)&lt;br /&gt;
* Model trained on urban students (authenticity: 0.62, uptake: 0.60) performed with similar accuracy when tested on non-urban students (authenticity: 0.62, uptake: 0.62)&lt;br /&gt;
* Model trained on non-urban (authenticity: 0.61, uptake: 0.59) performed with similar accuracy when tested on urban students (authenticity: 0.60, uptake: 0.63)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2021) [https://angusglchen.github.io/files/AIED2021_Lele_Assessing.pdf pdf]&lt;br /&gt;
* Models predicting a MOOC discussion forum post is content-relevant or content-irrelevant&lt;br /&gt;
* MOOCs taught in English&lt;br /&gt;
* Some algorithms achieved ABROCA under 0.01 for female students versus male students, but other algorithms (Naive Bayes) had ABROCA as high as 0.06&lt;br /&gt;
* ABROCA varied from 0.03 to 0.08 for non-native speakers of English versus native speakers&lt;br /&gt;
* Balancing the size of each group in the training set reduced ABROCA values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sha et al. (2022) [https://ieeexplore.ieee.org/abstract/document/9849852]&lt;br /&gt;
* Predicting forum post relevance to course in Moodle data (neural network)&lt;br /&gt;
* A range of over-sampling methods tested&lt;br /&gt;
* Regardless of over-sampling method used, forum post relevance performance was moderately better for females.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=450</id>
		<title>White Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=White_Learners_in_North_America&amp;diff=450"/>
		<updated>2023-06-28T21:54:31Z</updated>

		<summary type="html">&lt;p&gt;Shruti: year&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by White and African American students than other groups&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (overrepresented group approximately 90% White)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Sulaiman &amp;amp; Roy (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Sulaiman_Transformers.pdf]&lt;br /&gt;
* Models predicting whether a law student will pass the bar exam (to practice law)&lt;br /&gt;
* Compared White and non-White students&lt;br /&gt;
* Models not applying fairness constraints performed significantly worse for White students in terms of ABROCA&lt;br /&gt;
* Models applying fairness constraints performed equivalently for White and non-White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to overpredict White students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
	<entry>
		<id>https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=449</id>
		<title>Black/African-American Learners in North America</title>
		<link rel="alternate" type="text/html" href="https://www.pcla.wiki/index.php?title=Black/African-American_Learners_in_North_America&amp;diff=449"/>
		<updated>2023-06-27T21:06:24Z</updated>

		<summary type="html">&lt;p&gt;Shruti: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Kai et al. (2017) [https://www.upenn.edu/learninganalytics/ryanbaker/DLRN-eVersity.pdf pdf]&lt;br /&gt;
* Models predicting student retention in an online college program&lt;br /&gt;
* J48 decision trees achieved much lower Kappa and AUC for Black students than White students&lt;br /&gt;
* JRip decision rules achieved almost identical Kappa and AUC for Black students and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hu and Rangwala (2020) [https://files.eric.ed.gov/fulltext/ED608050.pdf pdf]&lt;br /&gt;
* Models predicting if a college student will fail in a course&lt;br /&gt;
* Multiple cooperative classifier model (MCCM) model was the best at reducing bias, or discrimination against African-American students, while other models (particularly Logistic Regression and Rawlsian Fairness) performed far worse&lt;br /&gt;
* The level of bias was inconsistent across courses, with MCCM prediction showing the least bias for Psychology and the greatest bias for Computer Science&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Christie et al. (2019) [https://files.eric.ed.gov/fulltext/ED599217.pdf pdf]&lt;br /&gt;
* Models predicting student's high school dropout&lt;br /&gt;
* The decision trees showed little difference in AUC among Black, White, Hispanic, Asian, American Indian and Alaska Native, and  Native Hawaiian and Pacific Islander.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Lee and Kizilcec (2020) [https://arxiv.org/pdf/2007.00088.pdf pdf]&lt;br /&gt;
* Models predicting college success (or median grade or above)&lt;br /&gt;
* Random forest algorithms performed significantly worse for underrepresented minority students (URM; Black, American Indian, Hawaiian or Pacific Islander, Hispanic, and Multicultural) than non-URM students (White and Asian)&lt;br /&gt;
* The fairness of the model, namely demographic parity and equality of opportunity, as well as its accuracy, improved after correcting the threshold values from 0.5 to group-specific values&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2020) [https://files.eric.ed.gov/fulltext/ED608066.pdf pdf]&lt;br /&gt;
* Model predicting undergraduate short-term (course grades) and long-term (average GPA) success&lt;br /&gt;
* Black students were inaccurately predicted to perform worse for both short-term and long-term&lt;br /&gt;
* The fairness of models improved when either click or a combination of click and survey data, and not institutional data, was included in the model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Yu et al. (2021) [https://dl.acm.org/doi/pdf/10.1145/3430895.3460139 pdf]&lt;br /&gt;
* Models predicting college dropout for students in residential and fully online program&lt;br /&gt;
* Whether the socio-demographic information was included or not, the model showed worse true negative rates for students who are underrepresented minority (URM; or not White or Asian), and worse accuracy if URM students are studying in person &lt;br /&gt;
* The model showed better recall for URM students, whether they were in residential or online program&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Ramineni &amp;amp; Williamson (2018) [https://files.eric.ed.gov/fulltext/EJ1202928.pdf pdf]&lt;br /&gt;
* Revised automated scoring engine for assessing GRE essay&lt;br /&gt;
* E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts&lt;br /&gt;
* The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring pdf]&lt;br /&gt;
* Automated scoring models for evaluating English essays, or e-rater &lt;br /&gt;
* The score difference between human rater and e-rater was significantly smaller for 11th grade essays written by African American and White students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Bridgeman et al. (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502 pdf]&lt;br /&gt;
* A later version of automated scoring models for evaluating English essays, or e-rater&lt;br /&gt;
* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jiang &amp;amp; Pardos (2021) [https://dl.acm.org/doi/pdf/10.1145/3461702.3462623 pdf]&lt;br /&gt;
* Predicting university course grades using LSTM&lt;br /&gt;
* Roughly equal accuracy across racial groups&lt;br /&gt;
* Slightly better accuracy (~1%) across racial groups when including race in model&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Zhang et al. (2022) [https://www.upenn.edu/learninganalytics/ryanbaker/EDM22_paper_35.pdf pdf]&lt;br /&gt;
* Detecting student use of self-regulated learning (SRL) in mathematical problem-solving process&lt;br /&gt;
* For each SRL-related detector, relatively small differences in AUC were observed across racial/ethnic groups. &lt;br /&gt;
* No racial/ethnic group consistently had best-performing detectors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Li, Xing, &amp;amp; Leite (2022) [https://dl.acm.org/doi/pdf/10.1145/3506860.3506869?casa_token=OZmlaKB9XacAAAAA:2Bm5XYi8wh4riSmEigbHW_1bWJg0zeYqcGHkvfXyrrx_h1YUdnsLE2qOoj4aQRRBrE4VZjPrGw pdf]&lt;br /&gt;
* Models predicting whether two students will communicate on an online discussion forum&lt;br /&gt;
* Compared members of overrepresented racial groups to members of underrepresented racial groups (over 2/3&lt;br /&gt;
Black/African American)&lt;br /&gt;
* Multiple fairness approaches lead to ABROCA of under 0.01 for overrepresented versus underrepresented students&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]&lt;br /&gt;
* Automated essay scoring models inferring text evidence usage&lt;br /&gt;
* All algorithms studied have less than 1% of error explained by whether student is Black&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Jeong et al. (2022) [https://fated2022.github.io/assets/pdf/FATED-2022_paper_Jeong_Racial_Bias_ML_Algs.pdf]&lt;br /&gt;
* Predicting 9th grade math score from academic performance, surveys, and demographic information&lt;br /&gt;
* Despite comparable accuracy, model tends to underpredict Black students' performance&lt;br /&gt;
* Several fairness correction methods equalize false positive and false negative rates across groups.&lt;/div&gt;</summary>
		<author><name>Shruti</name></author>
	</entry>
</feed>