Issue#58: Adding Grammar Analyzer feature to GatorMiner #90

Mai1902 · 2021-04-21T05:31:07Z

What is the current behavior?

We believe that Grammar is one of the important criteria to judge the quality of a reflection; hence, our team want to add a grammar error analyzer as a new feature to Gator Miner. This is the implementation of issue #58.

Purpose of this feature:

The Grammar Analyzer is a tool that will scan an assignment through efficient coding and output two things. It will show where and what words have grammar errors while also revealing a score/grade in relation to the amount of grammatical errors. This tool will be a great addition for GatorMiner because it will add another dimension to revealing an assignment’s meticulousness, quality, and integrity

What is the new behavior if this PR is merged?

As of right now, our source code of grammar analyzer has been fully implemented and tested, which is able to return the correct number of errors and the percentage of error grammar in a text per number of words. However, such feature only work with a short text instead of a long text like reflection.
We also adding a new page into streamline_web.py under the title of Grammar Checker, but it hasn't return anything since we are still struggling with implementing appropriate data frame.

Type of change

Please describe the pull request as one of the following:

Other information: Full documentation on our pending work is as follows:

Current outcome of the latest push:

The current outcome of the implementation is that the code is functional but inefficient. What we were hoping to accomplish was a properly working analyzer that scans through all the input values and posts on a table the student ID, the number of errors and the percentage of errors. The code works, however we are having some issues with the efficiency of the library that we are using because it takes too long to run. Though we got what what we wanted in terms of a functioning code, it doesn’t run as proficient as we were hoping.

Implementation of source code explanation:

The source code adopted the grammar checking tool from the library language-tool-python. The only method in this file is taking input text as the parameter and then checking for the number of grammar errors in each line of the text. This method returns the number of grammar errors in a text and error percentage per number of words in the text.

Current issue:

The code in this PR has been tested and proven to be correct in terms of grammar error. However, this source code only functions normally with a small input text (1-2 paragraph as maximum). When my team tried to parse a large input text inside (a reflection), this code took an extremely long run time and failed to produce any output at the end, even though there’s no bugs in the code.
After some trial, we realized that the language-tool-python library is a fork of language_check library, which is already outdated. Hence, the tool is not efficient, resulting in infinite run time of the program.

Possible solution for future implementation:

Alternative solution 1: Put this grammar analyzer feature inside the interactive feature, where the input will/can be smaller → the tool will be able to check its grammar error. A warning message should be placed by this feature to inform the user that it will only work with a small text input (200 words)
Alternative solution 2: Forking the code from library language-tool-python and creating a new tool based on this library but more efficient, by updating the trained data inside it, etc…
Alternative solution 3: Split the big input text into smaller paragraphs (200 words per paragraph) and use suitable logic to parse each paragraph into the tool.

This PR has:

Commit messages that are correctly formatted
Tests for newly introduced code
Docstrings for newly introduced code

Developers

@Mai1902 @Kevin487 @TheShiny1 @Batmunkh0419

…ions

…estion

… into issue#58

enpuyou

Hi @Mai1902, thanks for working on this feature! It seems like there are still a couple of errors in the code that need to be fixed so that the program can be executed. See the details below.

Additionally, I noticed that there is a language model being installed during the execution. It is quite large. Perhaps, it would be best to clarify somewhere that the grammar feature requires the installation of such models. I also just want to report that the plots seem to be taking forever to load, is there any bug here or is that just part of the feature?

streamlit_web.py

src/grammar_analyzer.py

enpuyou · 2021-04-21T05:52:38Z

src/grammar_analyzer.py

+        err_num = err_num + len(matches)
+
+    # Store all alphanumeric characters in the reflection in a list
+    words = re.sub('[^0-9a-zA-Z]+', ' ', str(text)).lower().split()


I'm not great at regex, although, is this line tokenizing the text?

This line is expected to replace all non alphanumeric character in the text with white space and then tokenize that text, store under the list call words.

Okay, thanks! That's what I thought! If that's the case, the tokens are already processed and stored in the data frame when markdown documents are being imported. See if you can just reuse them instead of retokenizing from the text.

src/grammar_analyzer.py

Pipfile

enpuyou · 2021-04-21T06:04:05Z

Also, don't forget to resolve all the merge conflicts. The Pipfile and streamlit_web.py should be rather easy. As for the Pipfile.lock, just remove it and regenerate one based on the updated Pipfile.

…splayed on web

codecov · 2021-04-21T15:38:07Z

Codecov Report

Merging #90 (46aef18) into master (1677535) will increase coverage by 0.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
+ Coverage   91.66%   92.09%   +0.42%     
==========================================
  Files           6        7       +1     
  Lines         240      253      +13     
==========================================
+ Hits          220      233      +13     
  Misses         20       20

Impacted Files	Coverage Δ
src/grammar_analyzer.py	`100.00% <100.00%> (ø)`

corlettim · 2021-04-28T19:14:11Z

Please update your branch/PR with master

corlettim · 2021-04-29T13:13:14Z

Please make sure to resolve your conflicts in streamline_web.py

m and others added 30 commits April 7, 2021 05:14

Write draft code

569a828

Turn grammar analyzer to dictionary

ff5bf89

Fix syntax

267fda2

Fix syntax

4551f66

Fix syntax

2852767

disable missingdocstring pylint

0b20760

Disable missing docstring

04f2d68

Update todo list

a72afbc

Update Piplock file and reformat source code

5dc6595

Fixing source code of grammar_analyzer

7582c90

Fixing flake8 error and add comment to src code

d69ac16

Fixing source code

5d0443e

beginning of the analyzer display code

a164b88

Change grammar error source code according to Product Owner's suggest…

7b8945b

…ions

Change grammar error source code according to Product Owner's suggest…

56d6397

…ions

Change grammar analyzer source code according to Product Owner's sugg…

456953d

…estion

building of the grammar analyzer display

b5f21fe

additional grammar method

7a5dfec

more edits

79c88a6

grammar_analyzer edit

3bbfbab

Update data structure and fix some displaying code

155f99e

minor changes with the streamlit code and implementation in visualizatio

ffbb05d

Merge branch 'issue#58' of github.com:Allegheny-Ethical-CS/GatorMiner…

3f2bc5b

… into issue#58

change to code to match summary display

faa8510

Fix source code and change display code

7802022

Update grammar analyzer source code

cf5b454

adding student selection for grammar analyzer display

cca35ea

Debug the grammar analyzer

a88987e

Merge branch 'issue#58' of github.com:Allegheny-Ethical-CS/GatorMiner…

4490425

… into issue#58

finished the test coding for grammar analyzer

17ddf3e

m added 2 commits April 21, 2021 01:17

Fix linting, upload test case, and test another data frame

832c0b1

Fix linting, upload test case, and test another data frame

e78aefb

Mai1902 added enhancement New feature or request question Further information is requested Team 4 labels Apr 21, 2021

Mai1902 requested review from jjumadinova, corlettim, enpuyou and noorbuchi April 21, 2021 05:31

Mai1902 assigned Mai1902, Batmunkh0419, Kevin487 and TheShiny1 Apr 21, 2021

m added 2 commits April 21, 2021 01:34

Reupload test case

3517a14

Fix pylint error

5db3897

enpuyou suggested changes Apr 21, 2021

View reviewed changes

m and others added 3 commits April 21, 2021 02:31

Resolving changes requested from TL, program remain tested but not di…

00d5569

…splayed on web

Update pipfile to resolve merge conflict

c58d396

Merge branch 'master' into issue#58

46aef18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue#58: Adding Grammar Analyzer feature to GatorMiner #90

Issue#58: Adding Grammar Analyzer feature to GatorMiner #90

Mai1902 commented Apr 21, 2021 •

edited

Loading

enpuyou left a comment

enpuyou Apr 21, 2021

Mai1902 Apr 21, 2021

enpuyou Apr 21, 2021

enpuyou commented Apr 21, 2021

codecov bot commented Apr 21, 2021 •

edited

Loading

corlettim commented Apr 28, 2021

corlettim commented Apr 29, 2021

Issue#58: Adding Grammar Analyzer feature to GatorMiner #90

Are you sure you want to change the base?

Issue#58: Adding Grammar Analyzer feature to GatorMiner #90

Conversation

Mai1902 commented Apr 21, 2021 • edited Loading

What is the current behavior?

Purpose of this feature:

What is the new behavior if this PR is merged?

Type of change

Other information: Full documentation on our pending work is as follows:

Current outcome of the latest push:

Implementation of source code explanation:

Current issue:

Possible solution for future implementation:

This PR has:

Developers

enpuyou left a comment

Choose a reason for hiding this comment

enpuyou Apr 21, 2021

Choose a reason for hiding this comment

Mai1902 Apr 21, 2021

Choose a reason for hiding this comment

enpuyou Apr 21, 2021

Choose a reason for hiding this comment

enpuyou commented Apr 21, 2021

codecov bot commented Apr 21, 2021 • edited Loading

Codecov Report

corlettim commented Apr 28, 2021

corlettim commented Apr 29, 2021

Mai1902 commented Apr 21, 2021 •

edited

Loading

codecov bot commented Apr 21, 2021 •

edited

Loading