Difference between revisions of "Automated Essay Scoring"

Latest revision as of 12:33, 4 July 2022

Bridgeman et al. (2009) page

Automated scoring models for evaluating English essays, or e-rater
E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students
E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean
E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers
E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings

Bridgeman et al. (2012) pdf

A later version of automated scoring models for evaluating English essays, or e-rater
E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE
E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay
E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL
E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students

Ramineni & Williamson (2018) pdf

Revised automated scoring engine for assessing GSE essay

E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts
The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization

Wang et al. (2018) pdf

Automated scoring model for evaluating English spoken responses
SpeechRater gave a significantly lower score than human raters for German students
SpeechRater scored students from China higher than human raters, with H1-rater scores higher than mean

Litman et al. (2021) html

Automated essay scoring models inferring text evidence usage
All algorithms studied have less than 1% of error explained by whether student is female and male, whether student is Black, or whether student receives free/reduced price lunch

@@ Line 1: / Line 1: @@
-Bridgeman et al. (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf]
+Bridgeman et al. (2009) [https://www.researchgate.net/publication/242203403_Considering_Fairness_and_Validity_in_Evaluating_Automated_Scoring page]
 * Automated scoring models for evaluating English essays, or e-rater
-* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students
+* E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students
-* E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean
+* E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean
-* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers
+* E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers
 * E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings
@@ Line 12: / Line 12: @@
 * A later version of automated scoring models for evaluating English essays, or e-rater
-*E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE
+* E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE
+* E-rater gave  better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay
+* E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL
+* E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students
@@ Line 28: / Line 32: @@
 Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf]
 *Automated scoring model for evaluating English spoken responses
-*SpeechRater gave a significantly lower score than human raters for German
+*SpeechRater gave a significantly lower score than human raters for German students
-*SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean
+*SpeechRater scored students from China higher than human raters, with H1-rater scores higher than mean
+Litman et al. (2021) [https://link.springer.com/chapter/10.1007/978-3-030-78292-4_21 html]
+* Automated essay scoring models inferring text evidence usage
+* All algorithms studied have less than 1% of error explained by whether student is female and male, whether student is Black, or whether student receives free/reduced price lunch

Difference between revisions of "Automated Essay Scoring"

Latest revision as of 12:33, 4 July 2022

Navigation menu

Search