Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Integrate Document Categorization to Frequency Analysis #68

Open
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

hadenwIV
Copy link
Collaborator

@hadenwIV hadenwIV commented Mar 31, 2021

What is the current behavior?

Only information about the frequency of individual words is given in the Frequency Analysis section, with no information about the frequency of different categories of words such as words referencing different technical skills and programming languages. There is also no storage of data on the frequency of different words or other data between runs.

What is the new behavior if this PR is merged?

Information on the frequency of the most common words in assignments is stored and remains after a frequency analysis of that assignment is conducted. The user will have an ability to tag an assignment with categories after it's frequency analysis, and the most frequent words in different categories of assignments are to later be compared to find words significant to each category. This is a step on the path to providing frequency analysis information about both the most frequent words and the most frequent categories to the user.

Close #51

Type of change

Please describe the pull request as one of the following:

  • Bug fix
  • Breaking change
  • New feature
  • Documentation update
  • Other

Other information

This PR has:

  • Commit messages that are correctly formatted
  • Tests for newly introduced code
  • Docstrings for newly introduced code

Developers

@hadenwIV, @hewittk, @donizk, @favourojo, @solisa986

@enpuyou
Copy link
Contributor

enpuyou commented Mar 31, 2021

Hi @hadenwIV, please provide a description of the PR based on the template. Thank you!

@favourojo
Copy link
Collaborator

favourojo commented Mar 31, 2021

I worked as the scrum lead.

A program where we will implement the finding of words for the categories by making an addition to the GatorMinor program to store the 50ish most frequent words from each category's set of practicals that we run and then put that set of words into a text file, then we build a program to find the most unique one.

Kiley was assigned to work on categories of words. Wil was assigned work on text mining. Adriana was assigned documentation while also working on the interface. Kyrie will do the majority of the work on the interface.

@enpuyou
Copy link
Contributor

enpuyou commented Mar 31, 2021

@favourojo Thanks. You can actually just edit the description of the PR opened by @hadenwIV.

@enpuyou
Copy link
Contributor

enpuyou commented Apr 2, 2021

@favourojo Thanks for working on this feature. Please let us know when you think the PR is ready to be reviewed. In the meantime, could you also change the title of the PR from Issue51 to something more descriptive? Thanks again!

@solisa986 solisa986 changed the title Issue#51 Frequency of Word (Issue#51) Apr 4, 2021
@hewittk hewittk changed the title Frequency of Word (Issue#51) Frequency of Word Categories (Issue#51) Apr 4, 2021
@codecov
Copy link

codecov bot commented Apr 27, 2021

Codecov Report

Merging #68 (f74def8) into master (244b0ba) will decrease coverage by 0.03%.
The diff coverage is 92.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #68      +/-   ##
==========================================
- Coverage   92.09%   92.05%   -0.04%     
==========================================
  Files           6        6              
  Lines         253      277      +24     
==========================================
+ Hits          233      255      +22     
- Misses         20       22       +2     
Impacted Files Coverage Δ
src/analyzer.py 93.75% <92.00%> (-0.57%) ⬇️

@hewittk
Copy link
Collaborator

hewittk commented Apr 27, 2021

We think that we finished our enhancement and all of its testing and documentation now, we're ready for official review.

Copy link
Collaborator

@noorbuchi noorbuchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that before we can offer a thorough review of the code, it's crucial to make sure that all test cases are passing and that the overall build is passing too. We can help you get that going, but this should be the priority for now

@hewittk
Copy link
Collaborator

hewittk commented May 3, 2021

I think that before we can offer a thorough review of the code, it's crucial to make sure that all test cases are passing and that the overall build is passing too. We can help you get that going, but this should be the priority for now

It looks as though all test cases have been and are still passing, the overall build started failing the day of because of someone adding an empty standup folder. It should be resolved now, can you review it now?

@hewittk hewittk requested a review from noorbuchi May 3, 2021 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed high priority WIP Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Frequencies of categories of words
8 participants