ESL Library is now Ellii!

Unsupported Browser

Ellii may not function properly in Internet Explorer. We recommend using Google Chrome or Firefox instead.

Unsupported Browser

Ellii may not function properly in older browsers. We recommend updating yours to the latest version for the best experience.

Research Behind Ellii's Auto-Scoring (AI) Feature

August 29, 2022

On the Ellii platform, you'll find thousands of relevant lessons and several types of digital tasks where English learners can practice different skills, including reading, writing, speaking, and listening.

If you've ever assigned your students a digital task through Ellii, then you may have already noticed that many digital task types get auto⁠⁠-⁠⁠corrected by the platform. That's because there's only one correct answer (e.g., fill in the blanks, multiple-choice, true or false).

However, there are a few task types that require manual scoring (e.g., writing and speaking), which can be time-consuming for teachers to grade.

requires manual scoring

This means that when you assign a full lesson with multiple tasks, including open-ended speaking and writing tasks, chances are you're spending more time grading assignments than you'd like to.

Realizing this has led us to ask ourselves the following question:

How can we make this easier on teachers and reduce grading time?

Triggered by this question, the developers kick-started their mission to figure out a way to make some manually scored tasks auto-scorable, paving the way for Ellii's new Auto-Scoring (AI) feature.

Here's a behind-the-scenes look at how we developed Ellii's Auto-Scoring (AI) feature:

Phase 1: Research

We started by talking to the Publishing team about how these manually scored questions are created in digital lessons and how they work internally.

We identified two types of written responses:

  • Opinion questions (or open-ended questions)
    • Example: Do you like pizza?
  • Correct/incorrect questions (or closed questions)
    • Example: What is the man in the video doing?

Every month, students answer nearly half a million questions in writing. More than 80% of those are closed questions.

Since written responses are the most assigned task and take the most time to grade, it made sense for our team to start thinking about getting these auto-scored first. As for grammar, vocabulary, and spelling tasks, we will consider auto-scoring for them in the future.

After doing extensive research on written responses, our team realized we could use artificial intelligence (AI) to match the student's answer to the correct answer (i.e., the suggested answer that shows up on the platform).

There’s a common term in the AI world known as Natural Language Processing (NLP).

Wikipedia defines NLP as follows:

". . . interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them."

In short, we decided to use NLP to semantically match the student's answer to the correct answer. In other words, the NLP is able to gauge the meaning behind the student's response against the provided answer without having to match it word for word.

Here’s an example of what a semantic match looks like: 

semantic match chart

Phase 2: Analysis

Up until this point, our research was all theory. We now needed to put it into practice.

The first thing we did was take a small sample of different types of written responses: complex ones, simpler ones, and ones where the answer is a list of items. Then we took random student answers for those sample questions and began analyzing them.

We ran multiple NLP algorithms to test them. Our developers compared student answers to the correct answer and took notes on the score the AI was assigning.

We manually analyzed all the answers one by one to see which algorithms made sense. Since we're developers and not teachers, we also took into account the score the teacher gave the student for a particular question.

In doing so, we quickly noticed that one of the algorithms was looking very promising.

Let's see it in action! The following examples show how writing tasks are scored by AI vs. a teacher:

Question 1

What happened in the boss’s office?

Ellii's suggested answer

He tripped over a phone cord and fell in the boss’s office.

suggested answers vs AI

You'll notice that even if the student's answer doesn’t fully match the Publishing team's suggested answer, the AI does consider the meaning behind the sentence and sets a score accordingly.

Question 2

Why does the reviewer mention food labels? Use the word "disclose" in your answer.

Ellii's suggested answer 

The reviewer compares how app makers have to disclose what data is shared in their apps, just like how food sellers have to disclose information about what's in the food. Later he says that in their privacy label requirements, Apple doesn't ask app developers to say who they share data with. The reviewer says this is like having a nutrition label without listing the ingredients. He sums up by saying that we deserve an honest account of what's in both our food and our apps.

Complex AI Scoring

This is a very complex question and answer. You'll notice the NLP service was able to compare the semantics of the student's response against the suggested answer.

Question 3

How does the reading end?

Ellii's suggested answer

The reading ends with a question worth pondering. Since it is the migrant who has to take the action and send the money home, what happens if he/she decides to cut the family off? Do his/her dependents have any way to coerce the migrant to continue funding the family back home? Imagine what happens when a migrant remarries in a foreign country. Will his/her new family agree to share the family income?

AI Score chart

Phase 3: Closed beta

The results were very promising at this point, but we were not 100% confident about how useful this feature would be and how good the AI scoring was.

So we decided to do a closed beta test. This meant selecting a handful of Ellii teachers and inviting them to test the feature.

We ran the closed beta in May 2022, and teachers were very happy with it according to the feedback we received from our beta test survey.

Here are some notable statistics:

  • 4,340 questions were answered and auto-scored during the month of May 2022
  • Teachers changed the AI score in only 3.2% of questions
  • 66% of surveyed teachers said they would love to have this feature be permanent
  • 33% said they don’t have a strong opinion on whether AI was useful or not 

After thoroughly analyzing the results, we noticed a few areas where we could improve the AI to provide teachers and their students with more accurate scoring.

We worked on these improvements, and now we're ready for the next phase.

Phase 4: Open beta

As of now, we're officially doing an open beta test of the Auto-Scoring (AI) feature. This means that Ellii teachers will be able to opt in to use this feature in their classes.

You'll find the auto-scoring opt-in checkbox when you create your class and edit it. Note that you need to opt in for each class and assign to students within a class in order for the AI to work.  

Class Edit Settings

Have you tried Ellii's Auto-Scoring (AI) feature yet?

Share your feedback with us in the comments! Let us know what you loved and what you think needs improvement so that we can continue to make grading easier for you.

Not an Ellii member?

Get unlimited access to 1,000+ lessons and 3,000+ flashcards.

Sign Up

Comments (8)

Khaled A.(Teacher)

What a brilliant feature! I've been avoiding using open-ended questions, because I didn't have the time to do manual grading, but now I will include open-ended questions knowing I don't have to worry about manual grading. Big thank you.

Reply to Comment

Tara Benwell(Author)

Hi, Khaled! We're happy to hear that you're going to opt in and give Elliibot a try! Keep in mind that it only grades questions that have Suggested Answers provided by our Publishing team. It does not grade opinion-style questions or speaking questions (yet).

Rosana L.(Teacher)

Well done. I think we should be given the option to score a student or not according to our concerns. Like you shold develop a field where the teachers can opt to choose for a score or give their own score manually if they want it. Also I think the percentage of the grades should vary between 10 to 10% and not 25%. This can give a teacher a larger range to suggest the student's improvement.

Reply to Comment

Tara Benwell(Author)

Hi Rosana,
So, I think you mean before you assign the task, you want to be able to preview it and turn Ellii Bot on or off at the task level, right? We'll discuss that with our devs.

Rosana L.(Teacher)

Another point I would like to suggest is that when you send a digital assignment the student does not have the chance to choose the accent to listen ,such as american, british or canadian. Can you please improve this resource so that the student can opt for the accent they prefer or feel more attracted.


Reply to Comment

Tara Benwell(Author)

Hi Rosana,
Currently our digital content is only available in American English. Most of our printable content does have Canadian and UK versions for print and audio.
Thanks for your request, though!

Uhc C.(Teacher)

I just love you guys for making our teaching life easier and fun.

Margaret Holec
Windsor, On

Reply to Comment

Tara Benwell(Author)

Hi, Margaret!
Thanks for stopping by the blog. We're so happy to hear that Ellii is checking these important boxes for you.
Tara (fellow Windsorite!)

Leave a Comment

Log In to Comment Reply

Comment Reply as a Guest
  • **bold**_italics_> quote

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    Thinking of joining Ellii?

    Complete this form to create an account and stay up to date on all the happenings here at Ellii.