Skip to content

Global NIPS Paper Implementation Challenge - An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations

License

Notifications You must be signed in to change notification settings

albertusk95/nips-challenge-essay-scoring-arabic

Repository files navigation

An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations

Global NIPS Paper Implementation Challenge

I implemented the paper based on the research methodology

Original Paper

https://arxiv.org/pdf/1611.02815.pdf

Main Goal

Develop an automated system is proposed for essay scoring in Arabic language for online exams based on stemming techniques and Levenshtein edit operations

Programming Tool

  • Python 2.7

Files

Some important files / directories:

  • heavy_stemming.py

    The whole source code for heavy stemming approach

  • light_stemming.py

    The whole source code for light stemming approach

  • docs

    Several text files, such as questions, correct_ans, and student_ans

  • prefixes

    Stores the list of prefixes

  • suffixes

    Stores the list of suffixes

  • stopwords

    Stores the list of stopwords

To Run

To run the program, execute the following command:

  • Heavy stemming approach: python heavy_stemming.py
  • Light stemming approach: python light_stemming.py

Methodology

Both approaches (heavy and light stemming) uses the following steps. The difference is only in the removal of prefixes and suffixes.

  • Begin Heavy Stemming on both student and correct answers

    This initial step consists of two sub-steps, such as removal of numbers from both answers and removal of diacritics from both answers. For the latter task, each answer is converted to unicode then the diacritics can be removed from both answers.

  • Split each one of the two anwers into an array of words, processing one word at a time

    It includes several steps, such as removal of stopwords, removal of prefix if word length is greater than 3, and removal of suffix if word length is greater than 3.

  • Find the similarities by giving a weight to each word in both answers

    The weight formula for each word: Word(i) weight = 1 / (total words in correct answer)

  • For each word in student answer, calculate the similarity with words in correct answer

    Several steps were included, such as calculating the Levenshtein distance between every word in student answer and words in correct answer AND calculating the similarity score between every word in student answer and words in correct answer.

  • For each word in student answer, calculate the similarity with words in correct answer

    These are the rules for calculating the final mark:

    • If the similarity between StudentWord(i) and CorrectWord(i) = 1 then add weight to the final mark
    • Elseif the similarity between StudentWord(i) and CorrectWord(i) < 1 and >= 0.96, add weight to the final mark
    • Elseif the similarity between StudentWord(i) and CorrectWord(i) >= 0.8 and < 0.96, add half the weight to the final mark
    • Elseif the similarity between StudentWord(i) and CorrectWord(i) < 0.8 then no weight is added to the final mark


Albertus Kelvin
Bandung Institute of Technology

Code was developed on January 21st, 2018
Code was made publicly available on January 31st, 2018

About

Global NIPS Paper Implementation Challenge - An Automated System for Essay Scoring of Online Exams in Arabic based on Stemming Techniques and Levenshtein Edit Operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages