Difference between revisions of "Automated Essay Scoring"
Jump to navigation
Jump to search
(clarification) |
(correction) |
||
Line 32: | Line 32: | ||
Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf] | Wang et al. (2018) [https://www.researchgate.net/publication/336009443_Monitoring_the_performance_of_human_and_automated_scores_for_spoken_responses pdf] | ||
*Automated scoring model for evaluating English spoken responses | *Automated scoring model for evaluating English spoken responses | ||
*SpeechRater gave a significantly lower score than human raters for German | *SpeechRater gave a significantly lower score than human raters for German students | ||
*SpeechRater scored | *SpeechRater scored students from China higher than human raters, with H1-rater scores higher than mean |
Revision as of 05:08, 10 June 2022
Bridgeman et al. (2009) page
- Automated scoring models for evaluating English essays, or e-rater
- E-Rater gave significantly better scores than human rater for 11th grade essays written by Hispanic students and Asian-American students
- E-Rater gave significantly better scores than human rater for TOEFL essays (independent task) written by speakers of Chinese and Korean
- E-Rater correlated poorly with human rater and gave better scores than human rater for GRE essays (both issue and argument prompts) written by Chinese speakers
- E-Rater system performed comparably accurately for male and female students when assessing their 11th grade essays, TOEFL, and GRE writings
Bridgeman et al. (2012) pdf
- A later version of automated scoring models for evaluating English essays, or e-rater
- E-rater gave significantly lower score than human rater when assessing African-American students’ written responses to issue prompt in GRE
- E-rater gave better scores for test-takers from Chinese speakers (Mainland China, Taiwan, Hong Kong) and Korean speakers when assessing TOEFL (independent prompt) essay
- E-rater gave lower scores for Arabic, Hindi, and Spanish speakers when assessing their written responses to independent prompt in TOEFL
- E-Rater system correlated comparably well with human rater when assessing TOEFL and GRE essays written by male and female students
Ramineni & Williamson (2018) pdf
- Revised automated scoring engine for assessing GSE essay
- E-rater gave African American test-takers significantly lower scores than human raters when assessing their written responses to argument prompts
- The shorter essays written by African American test-takers were more likely to receive lower scores as showing weakness in content and organization
Wang et al. (2018) pdf
- Automated scoring model for evaluating English spoken responses
- SpeechRater gave a significantly lower score than human raters for German students
- SpeechRater scored students from China higher than human raters, with H1-rater scores higher than mean