Difference between revisions of "Automated Essay Scoring"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
Bridgeman, Trapani, and Attali (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf] | Bridgeman, Trapani, and Attali (2009) [https://d1wqtxts1xzle7.cloudfront.net/52116920/AERA_NCME_2009_Bridgeman-with-cover-page-v2.pdf?Expires=1652902797&Signature=PiItDCa9BN8Mey1wuXaa2Lo0uCVp4I245Xx14GPKAthX7YREEF2wT8HEjmwwiSL~rn9tB21kL6zYIFrL3b44oyHfw5ywE1GQmGSeLCcK7f0WyfQUzYDXbQqWzJCInX9t3QvPKN05XK37iAn7SrEI5iN2HQcYmeF3B0fhtLdszf2-5TtPfT1dNwtdo8A30Z4xxtt~gIBQwXYtNhtPbv3idaZPUZe3lZf6kGGweKj3q9-yuyPc8VkXg7Tc72AOUlQqpjb2TDPH7vze1xLbg3Q1~YxYJnHWhvIINkbAadTLitQZvKhJOmV-it2pNEqqzrnwwl5~gqcgX180xVd89z81iQ__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA pdf] | ||
* E- | * Automated scoring models for evaluating English essays, or e-rater | ||
* E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students | |||
* E- | * E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean | ||
* E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers | |||
* E-rater | * E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings | ||
* E- | |||
Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf] | Bridgeman, Trapani, and Attali (2012) [https://www.tandfonline.com/doi/pdf/10.1080/08957347.2012.635502?needAccess=true pdf] |
Revision as of 15:19, 18 May 2022
Bridgeman, Trapani, and Attali (2009) pdf
- Automated scoring models for evaluating English essays, or e-rater
- E-Rater gave significantly better scores for 11th grade essays written by Hispanic students and Asian-American students than White students
- E-Rater gave significantly better scores for TOEFL essays (independent task) written by speakers of Chinese and Korean
- E-Rater correlated poorly with human rater and give better scores for GRE essays (both issue and argument prompts) written by Chinese speakers
- E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings
Bridgeman, Trapani, and Attali (2012) pdf
- A later version of automated scoring models for evaluating English essays, or e-rater
- E-rater gave particularly lower score for African-American, and American-Indian males, when assessing written responses to issue prompt in GRE
- The score was significantly lower when e-rater was assessing GRE written responses to argument prompt by African-American test-takers, both males and females.
- E-rater gave slightly higher scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing written responses to independent prompt in Test of English as a Foreign Language (TOEFL)
- E-rater gave slightly lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL
- E-rater gave significantly higher scores for test-takers from Mainland China than from Taiwan, Korea and Japan when assessing their GRE writings which tended to be below average on grammar, usage, and mechanics but longest response
- The score difference between human rater and e-rater was marginal when written responses to GRE issue prompt by male and female test-takers were compared
- The difference in score was significantly greater when assessing written responses to GRE argument prompt, as e-rater gave lower score for male test-takers, particularly for African American, American Indian, and Hispanic males, when assessing written responses to GRE argument prompt
Ramineni & Williamson (2018) pdf
- Revised automated scoring engine for assessing GSE essay
- E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts
- The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization
Wang et al. (2018) pdf
- Automated scoring model for evaluating English spoken responses
- SpeechRater gave a significantly lower score than human raters for German
- SpeechRater scored in favor of Chinese group, with H1-rater scores higher than mean