Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace/imrpove regex parsing for extracting co-authors of pr_collaborated events #492

Open
rithviknishad opened this issue Aug 13, 2024 · 10 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@rithviknishad
Copy link
Member

rithviknishad commented Aug 13, 2024

At times the pr_collaborated event yields incorrect co-author results

const coAuthors = commit.commit.message.match(
/Co-authored-by: (.+) <(.+)>/,
);

@rithviknishad rithviknishad added the bug Something isn't working label Aug 13, 2024
@vikram-8290
Copy link

hii @rithviknishad i want to work on this issue i have understand that what is wrong with this but not to get that where should i make this chances , is there any documentation so the reference, is there any way you can help with this,
Thanks.

@rithviknishad
Copy link
Member Author

https://docs.ohc.network/docs/leaderboard/gsoc-2024/refactor-scraper#3-parseeventsts

const coAuthors = commit.commit.message.match(
/Co-authored-by: (.+) <(.+)>/,
);

@vikram-8290
Copy link

One thing I've identified is that the current implementation only captures a single match for the regex pattern /Co-authored-by: (.+) <(.+)>/, which can be problematic if there are multiple Co-authored-by entries in the commit message. Additionally, any slight variations in the format (like additional spaces or different capitalizations) could result in missed matches.

Proposed Fix:

Adjust the regular expression or use a more robust method to handle multiple Co-authored-by entries.
Implement a loop to capture all matches in the commit message rather than just the first occurrence.
Would you like me to implement this fix, or do you foresee any other potential issues to consider before moving forward?

@rithviknishad
Copy link
Member Author

The core issue is, it's parsing authors incorrectly as mentioned in #492 (comment)

Finding all co-authors, nice find, it can simply be done by using matchAll instead of match right? Why create a loop and all?

@vikram-8290
Copy link

Yes, you’re correct that using matchAll is a great way to capture all co-authors in the commit message. However, we still need a loop to process each match individually.

Here’s why the loop is necessary:

  1. Iterating Through Matches: matchAll returns an iterator of all matches, and we need to loop through these matches to extract and process each co-author’s information.

  2. Processing Logic: For each co-author, we need to check for conditions such as whether they are blacklisted or whether their names/emails are in our caches. The loop allows us to implement this logic for every individual match.

So, while matchAll helps us find all the matches, the loop is essential for handling the processing of each co-author found in the commit message.

If you're open to it, I can go ahead and implement this fix!

@vikram-8290
Copy link

I have created a pull request #519 . Please Check and let me Know if i have made some mistake.
Thanks

@Srayash
Copy link
Contributor

Srayash commented Nov 10, 2024

hey @rithviknishad! waiting for review on #541.

rithviknishad added a commit to Srayash/leaderboard that referenced this issue Dec 3, 2024
rithviknishad added a commit to Srayash/leaderboard that referenced this issue Dec 12, 2024
@Utkarsh-Anandani
Copy link

@rithviknishad If this issue is still valid, I would like to work on it

@rithviknishad
Copy link
Member Author

Comment how you are planning to solve this though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
4 participants