Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#944] Implement authorship analysis #2140

Merged
merged 161 commits into from
Apr 28, 2024

Conversation

SkyBlaise99
Copy link
Contributor

@SkyBlaise99 SkyBlaise99 commented Mar 3, 2024

Fixes #944

Proposed commit message

A line is credited to the author who last modified it.

Another author might have written the line initially and the current
author only modified it slightly. In such a case, the current author
gets credited for work that is not entirely done by him/her.

Let's analyze how similar a line is as compared to its ancestor lines
(previous versions of the line) and give full or partial credit to the
last author based on the analysis.

Other information

Changes in Frontend

  1. Added a new entry: isFullCredit to be read from authorship.json files. (types.ts, authorship-type.ts, c-authorship.vue)

  2. A darker colour represents full credit while a lighter colour represents partial credit. (c-segment.vue, _colors.scss)

  3. Added a new entry: isAuthorshipAnalyzed to be read from summary.json file. (window.ts, summary-type.ts, api.ts)

  4. Added a legend for full/partial color scheme, only shown when isAuthorshipAnalyzed is true (i.e. full/partial credit analysis was turned on). (c-authorship.vue)

Changes in Backend

  1. Added 2 new flags (CliArguments.java, ArgsParser.java)
  • --analyze-authorship or -A to enable the feature
    • Initially added as the program takes too long to run (hence default false)
    • It can now be removed or set to true by default if required, since runtime decreased by a lot (90 min -> 13 min).
  • --originality-threshold or -ot to manually set the threshold
    • Acceptable range: 0.0 ~ 1.0
    • Default 0.51
    • Rational: the last author should modify slightly more than half the line to be given full credit.
  1. Added AuthorshipAnalyzer.java
  • Main logic:
    1. Given a line l by author a.
    2. Find previous version of l, the deleted lines dl with the lowest originality score (same idea as highest similarity score). Let da be the author of dl.
    3. Give full credit if
      • There are no deleted lines found
      • Deleted line's originality score is more than the originality threshold
      • Author of the deleted line is unknown
      • Config settings (i.e. before since date, is in ignored list, or is an ignored file)
    4. Give partial credit if a != da (So a updated lines a little bit)
    5. Else restart from step 1 with dl and da
  • Improvements
    • Final runtime decreased from 90 min to 13 min
    • Memory usage rise from 1.5GB to 2GB due to caching
    • Things done:
      1. Added caching for git log and git diff
      2. Added early termination for DP getLevenshteinDistance()
      3. Reduce space compleity of DP getLevenshteinDistance()
  1. Updated AnnotatorAnalyzer.java and LineInfo.java to handle credit information of annotated author (@@author tags)
  • Case 1: last author == annotated author
    -> Do nothing since it's the same author, credit information is retained.
  • Case 2: last author != annotated author
    -> Overwrite author as usual, partial credit is given to annotated author as we are not sure the exact contribution since it is claimed.

Copy link
Member

@ckcherry23 ckcherry23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SkyBlaise99, thanks for your awesome work on the frontend! Everything looks good to me. Really appreciate the detailed comments in your Cypress tests and your continued work in improving the code and UI.

Have left a few comments for the docs.

docs/ug/cli.md Outdated Show resolved Hide resolved
docs/ug/cli.md Outdated Show resolved Hide resolved
docs/ug/cli.md Outdated Show resolved Hide resolved
@SkyBlaise99
Copy link
Contributor Author

@ckcherry23 you might need to look again, I added in the codes about the legend as well as requested.

@ckcherry23
Copy link
Member

@ckcherry23 you might need to look again, I added in the codes about the legend as well as requested.

Can you please add a screenshot of how the legend looks now with the rest of the panel?

@SkyBlaise99
Copy link
Contributor Author

SkyBlaise99 commented Apr 2, 2024

Sure, below are some pics. I tried to make the center divider a bit nicer but failed haha. Ideally it should expand with the text as well.

In the normal report, nothing is shown

image

In report with full/partial credit:

image

image

When groups are merged:

image

image

@ckcherry23
Copy link
Member

ckcherry23 commented Apr 2, 2024

Sure, below are some pics. I tried to make the center divider a bit nicer but failed haha. Ideally it should expand with the text as well.

Thanks for saving me time with the pics! I think you can remove the centre divider and optionally add some padding instead. Otherwise, the legend LGTM!

Edit: Seems like other CSS values are also multiples of 5, so we can continue to use such values.

docs/ug/cli.md Outdated Show resolved Hide resolved
@SkyBlaise99 SkyBlaise99 requested a review from gok99 April 2, 2024 14:03
@ckcherry23 ckcherry23 requested a review from a team April 24, 2024 09:15
Copy link
Contributor

@gok99 gok99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took another pass at the backend changes and things seem to be in order. Thanks for the great work on this feature!

Let's create an associated follow-up issue (created here: #2196) for using GitBlameLineInfo in aggregateBlameAuthorModifiedAndDateInfo of FileInfoAnalyzer.java

@ckcherry23
Copy link
Member

ckcherry23 commented Apr 27, 2024

@SkyBlaise99 Whenever you're available, can you fix the linting errors causing the CI to fail?

Copy link
Member

@MarcusTXK MarcusTXK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested it with the new flag combinations, and it works as per expected and does not affect any other existing functionality. I think this is a really cool feature, great job on it!

@ckcherry23 ckcherry23 merged commit 29b9f5f into reposense:master Apr 28, 2024
8 checks passed
Copy link
Contributor

The following links are for previewing this pull request:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve code authorship attribution
9 participants