Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged branches are ignored #208

Open
LaurentMarchelli opened this issue Nov 17, 2023 · 9 comments
Open

Merged branches are ignored #208

LaurentMarchelli opened this issue Nov 17, 2023 · 9 comments

Comments

@LaurentMarchelli
Copy link
Contributor

Version : dev (0.33.0)
Commit : 68bc840

Its seems strange to exclude merged branches in processBranches(...).

LaurentMarchelli added a commit to LaurentMarchelli/liatrio-otel-collector that referenced this issue Nov 17, 2023
@adrielp
Copy link
Collaborator

adrielp commented Nov 20, 2023

Hey @LaurentMarchelli 👋🏼

That was intentional.

We ignore the default branch because the purpose of the metrics is to see how quickly (and if at all) teams are practicing trunk based development. The trunk in projects could contain a massively long history of commits and changes. But the important part is how long branches are staying opened.

For merged branches, the merged branches should be deleted. In fact, we'd recommend having automation set to auto delete branches on merge.

Merged branches also could heavily diverge and create a lot of extra work to process, when in reality they shouldn't exist because they cause unnecessary overhead both technologically and psychologically.

Maybe the course for this is to document that choice and guidance instead of including merged commits. Would appreciate your thoughts?

@LaurentMarchelli
Copy link
Contributor Author

LaurentMarchelli commented Nov 20, 2023

Hi @adrielp,

I thought about the intentional probability, but I had to be sure. ;-)
I got your point and I'm globally ok with it but this approch presents a misleading.

The name of the output metric is not consistent because sometimes the prefix means all branches including the default branch and sometimes it means a specific set of branches:

  • git_repository_branch_count{} = 323 (including default branch)
  • count(git_repository_branch_time{}) = 16

I think the git_repository_branch_ prefix should, by default, represent all branches, and metric parameters would be a good way to refine the selection.

In my opinion, it is essential to identify repositories where the golden rules are not applied.
As you can see from this repository, 306 branches had to be removed to comply with best practices.

Hope this will help,
Regards,
Laurent

@adrielp
Copy link
Collaborator

adrielp commented Nov 22, 2023

Appreciate your feedback on that @LaurentMarchelli

We tossed around the idea of not including trunk from the branch count metric. And to your point, while it's called a branch, it's also technically not a branch. (when we contemplate it in the context of a tree)

I think it'd probably be best to subtract 1 from the branch count metric in the code (GitHub returns branch count which includes trunk, but calculations run respective to trunk)

Can you elaborate on your 306 removed branches? Even if folks are practicing Git Flow, there's still a default trunk which calculations have to run against.

For example, the git branch data query has to get a branch and compare it against trunk (default). So you're not going to be comparing trunk in reverse order against other branches.

Thoughts on documenting that branch count is respective to branches from trunk, and not trunk itself?

The one caveat is that this means the metric might be 0 and not 1 (which there should always be at least 1 branch.

@LaurentMarchelli
Copy link
Contributor Author

LaurentMarchelli commented Nov 23, 2023

Hi @adrielp,

I do not want to argue about the terminology "default branch" versus “trunk”, this is not the subject here. However, if you omit the "default branch" from the total, I think you should document this unusual behavior to avoid any misunderstanding.

Here is the explanation about my 306 branches issue :

  • The repository has 323 branches.
  • 1 is the default branch.
  • 16 branches have commits not present in the default branch (Unmerged).
  • 306 branches only contain commits present in the default branch. Each of them is a set or a subset of the default branch (Merged).

If your collector only supports CI celerity, you don’t need to worry about these merged branches (306), because even they exist, they had to be removed.

If the goal of your collector is also to provide observability on best practices, I think you should give metrics even on these merged branches.

It is simply a matter of objectives, but my customer will need these metrics.

Regards,
Laurent

@adrielp
Copy link
Collaborator

adrielp commented Nov 23, 2023

@LaurentMarchelli - I apologize if I came off as arguing. I was just typing out loud my thoughts. 😅

Maybe this is functionality that could be feature flagged such that it could be included if wanted.

I appreciate the extra context. Just to confirm I'm understanding correctly, the metric you're needing is the age of each branch that has been merged? That's the missing metric we're talking about here?

@LaurentMarchelli
Copy link
Contributor Author

LaurentMarchelli commented Nov 24, 2023

Hi @adrielp,

Sorry, I didn’t mean you did, I just didn't want to argue because I’m a bit sensitive when the «trunk» terminology is used with git. Maybe I was talking to myself too. 😉

I suppose this feature could be enabled in the configuration file, if that’s what you meant with "feature flagged".

Expected metrics are the three followings :

  • git_repository_branch_time is required to identify abandoned branches that have not been deleted.
    Without this information it is impossible to distinguish newly created from abandoned branches.
  • Because information on the default branch are not provided, git_repository_branch_commit_behindby_count is necessary to identify the real difference between default branch and others when the repository’s content is not updated regularly.
  • git_repository_branch_commit_aheadby_count (= 0) should be provided for consistency.

Does that make sense to you?

Regards,
Laurent

@rhoofard
Copy link
Contributor

Hello @LaurentMarchelli,

So in regards to being able to distinguish abandoned branches from newly created ones using the git_repository_branch_time metric, the same issue is present even if we were to not ignore the merged branches just because of how we are calculating the branch age to begin with.

Merged branches and newly created branches will both have an age of 0 because we use the age of the earliest commit on the branch to do that. That value subsequently relies on the one that makes up the git_repository_branch_commit_behindby_count because we have to go back exactly that amount of commits in the commit history to find that info.

There doesn't seem to be any way to grab branch age directly using the REST or GraphQl api's so for now there is no way to differentiate between the two, docs on this specific part of both are below for reference.

GraphQl we use the ref for the branch with a target of type commit which has a history value we iterate through the commits with
GitHub branches api
Github commits api

@LaurentMarchelli
Copy link
Contributor Author

LaurentMarchelli commented Dec 7, 2023

Hi Ryan,

You're right, there is no way to make the difference between abandoned branches and newly created branch if both identify the same commit.

However, having the branch date (last committed date) instead of the age seems pertinent, because information about the default branch are excluded from exported data. The branch's date is the “committer date” of the commit pointed by the branch aka the head of the branch or the last commit on the branch.

Then, could you please explain your branch age definition, I’m a bit confused:

  • Based on your documentation, git.repository.branch.time is expected to be the time of the branch.
  • Based on your code, git.repository.branch.time is your branch age, the number of hours elapsed since the direct child of the default’s branch commit had been committed.
  • In fact, the date of the branch must be e[0].Node.GetCommittedDate() not e[len(e)-1].Node.GetCommittedDate() because e[0].Node represents the most recent commit.

Best regards,
Laurent Marchelli

@LaurentMarchelli
Copy link
Contributor Author

@ryan,

After a second thought, the age stored in git.repository.branch.time should be intended to represent the age of the first commit addition to the branch after its creation from the default branch and in a certain way the age of the branch.

But it is not, it just represents the age of the last rebase (git pull -r origin main) executed on the branch.

Regards,
Laurent Marchelli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants