Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[github] Include commit comments in Github backend #518

Closed

Conversation

harshalmittal4
Copy link
Contributor

Addresses #515.
The changes made:

  • commits_data field in pull_request response originally contained the PR commit hash, now it contains commit_comments as well.

  • commit_comments has two information : comment body and user_data (for the comment author).

Currently, the PR is not ready to be merged.

It is running into error perceval.errors.ArchiveError: data storage error; cause: duplicated entry,
(the error can be reproduced using command perceval github Cloud-CV Origami --json-line --category pull_request --sleep-for-rate -t XXXX>> testing.json , for any other other repo also, same error is popping up).
It would really be helpful if someone could have a look where am I missing!

Also, I wished to understand if any information needs to be removed or added in commits_data.

I will add the tests to the PR once this is done, thanks a lot :)
@valeriocos @aswanipranjal @jgbarah

@harshalmittal4
Copy link
Contributor Author

Also, is it okay to include the commit_comments field inside commits_data, or shall it be added seperately to the pull_request response, please see

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented Apr 24, 2019

@valeriocos @aswanipranjal, ping..
I just needed a little help to figure out what the problem is, it would be really great if you can just take a look, whenever you have time ;)

@Polaris000
Copy link
Contributor

I get the same errors with mine, @harshalmittal4. Not completely sure why.

@valeriocos
Copy link
Member

I'll have a look at it ASAP, thanks for the reminder :)

@valeriocos
Copy link
Member

When perceval fetches data from remote sources (e.g., GitHub API), it stores the requests and the corresponding responses in a database/archive (see details here) in order to reuse this data for other executions. In case the combination of url, payload, headers, response already exists in the archive, the following error is thrown: perceval.errors.ArchiveError: data storage error; cause: duplicated entry (ideally the same request shouldn't be issued more than once).

In the case of commits in PRs, it may happen that the same commit is present in more pull requests, for instance:

A possible solution to avoid the archive error is to use an internal cache (as similarly done for github users).

@Polaris000 @harshalmittal4 at the moment this is the only solution that comes to my mind. However for repos with many commits in PRs, this solution may not be good.

What do you think ?

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented Apr 25, 2019

Hey @valeriocos, the error is resolved by using an internal cache, thanks a lot! Shall we go with it for now, since for users also the same thing is done.

Other than that, may you just take a look at the PR structure, I mean this :

The changes made:

  • commits_data field in pull_request response originally contained the PR commit hash, now it contains commit_comments as well.
  • commit_comments has two information : comment body and user_data (for the comment author).

I wished to understand if any information needs to be removed or added in commits_data.

Thanks ;)

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshalmittal4 the PR looks OK (I left just a minor comment). Please include the caching support and tests.

Thank you!

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from 3f2346a to e4a72be Compare April 25, 2019 13:36
@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented Apr 25, 2019

@valeriocos, the PR isn't ready to be merged, it needs a little work to pass the travis-build. will update it soon.
I have currently added the caching support and the tests.

@valeriocos
Copy link
Member

great @harshalmittal4 let me know when it is ready to review, thanks!

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from e4a72be to 976968e Compare April 28, 2019 07:21
@harshalmittal4
Copy link
Contributor Author

Hey @valeriocos,
I have made the updates, it can be reviewed and let me know for possible improvements :)

@coveralls
Copy link

coveralls commented Apr 28, 2019

Coverage Status

Coverage increased (+0.008%) to 97.063% when pulling ab28f26 on harshalmittal4:Include_commit_comments into 366834f on chaoss:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.06%) to 96.991% when pulling 976968e on harshalmittal4:Include_commit_comments into 41c9086 on chaoss:master.

@valeriocos
Copy link
Member

Thank you @harshalmittal4 , I'll check it later today

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from 976968e to e89f85d Compare April 28, 2019 09:19
@harshalmittal4
Copy link
Contributor Author

Also, the PR is failing due to covergae decrease, so I needed some guidance what can I do about this or if this could be avoided, thanks.

@valeriocos
Copy link
Member

To check the coverage, you should use the coverage package and execute it as follows:

  • coverage run run_tests.py (this will create the file .coverage, used to generate the report).
  • coverage report -m (this will generate a report and show the lines not covered and percentage per file).

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshalmittal4, overall the code looks OK, but the tests need some improvements.
It seems that writing tests that use httpretty is not easy at the beginning. If so, could you summarize the difficulties you found?

As suggested to @Polaris000 we could improve the documentation to explain how to write this kind of tests ( maybe a small section in the grimoirelab book @jgbarah ). What do you think ?

I would suggest to take a look at this review: #519 and this commit valeriocos@01314c9 (which includes the test data and corresponding modifications in test_github.py for that feature) to better understand how tests work.

I try to summarize how tests that rely on httpretty work using two examples: test_get_user and test_fetch_pulls.

  • test_get_user:
    This test checks that the method client.user retrieves the info about a user via the GitHub endpoint /users/<login>. Httpretty allows to mock http requests by defining the request url and the output of the response (GITHUB_USER_URL and body=login in httpretty.register_uri), which is available here . Thus, two important considerations have to be done: 1) the data returned when executing the test test_get_user will be always the same, 2) the data returned can be whatever and may not exist in the real API, if you take a look at the content of the github_login file, you will see that zhquan_example doesn't exist on GitHub.

Things get a bit more complicated when a test requires to call more API endpoints, this is the case of test_fetch_pulls, which mimics the behavior of the backend when fetching pull requests. To make httpretty works, all endpoint calls made by the backend should be mocked, otherwise a call will be issued against the real API, and since the repo zhquan_example/repo and the corresponding data don't exist, some HTTP errors will pop up and make the test(s) stuck (this is specially true for user info, if you have a look at the test data you will see that most user info is the one of zhquan_example (the mocked user)).

  • test_fetch_pulls:
    This test mocks the execution of Perceval when fetching a pull request. As you can see, several URIs are mocked (lines 268 - 340). Thus, in case the backend is fetching data from a new endpoint (reviews, commit comments, etc), a new URI must be added to mock the corresponding call and mocked data must be provided. It is worth noting that some calls may rely on values included in previous responses, for instance the method __get_pull_review_comments reads the pull request number from the pull request obtained. In this case, the attribute number in the mocked data must be modified to make the tests work.

If what written above is clear and to answer your question here, you should add mocked data for commit comments and make sure that the values embedded in it are consistent with the information needed by the related API calls.

@harshalmittal4
Copy link
Contributor Author

Thanks a lot @valeriocos. I will update the tests for the PR in 1-2 days and then get it reviewed.
Also the documentation for the tests seems really nice idea to me as it would make things quick and easier, and I would be happy working on it!

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented Apr 28, 2019

Hey @Polaris000 we can discuss regarding this shortly once we both have got an overview, thanks :)

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch 2 times, most recently from cf79e0a to 961626e Compare May 1, 2019 11:53
@harshalmittal4
Copy link
Contributor Author

Hey @valeriocos, I have updated the PR.
Still the caching isn't very clear.

Caching is included currently but I am not sure about it. I saw #519 for that and found that caching isn't required in case of pull_request reviews (since no 2 PR's would have the same endpoints for reviews), but we need caching in this case, since same commit can be present in 2 different PR's (also if I remove the current caching part from the code, it fails for some repos).

The test data for github_request_pull_request_1_commits and github_request_pull_request_2_commits used the same commit, which gave error (due to the caching that is used currently), so I needed to modify the commit_sha for pull_request_2.

Needed some thoughts regarding the caching implementation!
(Also sorry for the delay, this wouldn't be the case after my college exams are over by 4th May ).

@valeriocos
Copy link
Member

no worries @harshalmittal4 , let me know when I can review the PR, thanks! :)

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented May 1, 2019

The PR can be reviewed @valeriocos, but I needed help regarding the caching!
The code is okay but I think its not very correct, so please have a look when you have time, thanks.

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from 961626e to 8f88732 Compare May 1, 2019 14:55
@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented May 2, 2019

Hey @valeriocos, the PR is good now, was able to resolve the caching error when I had a look again today!
Please have a look when you have time, sorry for bothering earlier :)
(I would squash the commits once the review is done).

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @harshalmittal4 for updating the PR. The code looks great, but there are some minor things to fix (see comments).

Please remember to update the github version number at: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L88.

Thanks! :)

@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from 10b7594 to 8d7aa33 Compare May 2, 2019 14:11
@harshalmittal4
Copy link
Contributor Author

@valeriocos, I have updated the PR and had some minor things to ask, please look into them, thanks :)

Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick reaction @harshalmittal4 (a couple of comments left)

I was looking for a repo with commit comments to make a live test, but I haven't found any so far. I tried to add a comment to the PR chaoss/grimoirelab-graal#26 in graal, but I didn't find how to do it (although github documentation says that is possible: https://developer.github.com/v3/guides/working-with-comments/). At the end I added two comments to graal (see below), which were correctly retrieved with your feature.

I'm a bit puzzled wrt commit comments as part of pull request reviews, since I wasn't able to add a comment to a PR commit, but the offical doc says that it's possible. So I would ask you to check it on your side to be sure that commit comments can be part of a pull request review. If this wasn't the case (due to some changes in the GitHub API), we can think about adding a new category to the github backend to retrieve commits and their comments (to discuss with @sduenas). What do you think ?

}
path = urijoin("commits", commit_sha, "comments")

items = [items for items in self.fetch_items(path, payload)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked this code in the previous review, sorry. why items[0] is needed?

I'm making some tests (still looking for commit comments), but the code shouldn't be something as follows?

raw_comments = [items for items in self.fetch_items(path, payload)]
self._commit_comments[commit_sha] = raw_comments
return raw_comments

Copy link
Contributor Author

@harshalmittal4 harshalmittal4 May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @valeriocos
I tried it earlier and found that self.fetch_items(path, payload) returns a generator object. Then the code for items in self.fetch_items(path, payload) gets a string which is a list of commit comments. So items[0] is just a string which comprises of the list of commit_comments.

Copy link
Contributor Author

@harshalmittal4 harshalmittal4 May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also was not able to find much for commit comments, so used this : octocat/Spoon-Knife#1176 for the example.
The required pull_request (1176) data is quickly retrieved using the commandperceval github octocat Spoon-Knife --from-date "2019-03-06T23:25:42Z" --category pull_request --sleep-for-rate -t XXXX >> testing.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it earlier and found that self.fetch_items(path, payload) returns a generator object. Then the code for items in self.fetch_items(path, payload) gets a string which is a list of commit comments. So items[0] is just a string which comprises of the list of commit_comments.

If I'm not wrong the method fetch_items returns paginated items so items[0] should return just the first page (I'm going to check to confirm it)

I also was not able to find much for commit comments, so used this : octocat/Spoon-Knife#1176 for the example...

Were you able to create one or more commit comments in a review ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried again to create a commit comment within a review but I didnt' find how. If we are not able to create one, it's possible that this feature is not supported anymore by GitHub (I'm really sorry about that @harshalmittal4). In this case, we should probably freeze the PR and contact GitHub, or see how we can reuse this code. What do you think @harshalmittal4 (any idea is welcomed)?

Copy link
Contributor Author

@harshalmittal4 harshalmittal4 May 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @valeriocos, sorry for the delay, some views:

  1. I am not able to figure out currently why the commit comments box at the end is showing up in some PR's but not in others, Example :
    image

Maybe github has discontinued this recently or it might be some other case that we are missing (since this is an example of a random PR that I saw from Jan 2019, and there also the option for commit comments is not present at the end as it should have been).

  1. I also wanted to ask regarding commenting on a specific line in the Files changed. Two options present : Add single line comment, and Start a review.
    I suppose that when we select Add single line comment, the comment is counted as a commit comment. Whereas any comment that is made after selecting the option Start a review (and before submitting the review) is counted as a review.
    If this is the case, this PR can fetch the comments made from Add single line comments option.

This code adds commit comments for pull_request commits
to Github backend, and returns them in the pull_request
response. Also adds the corresponding tests.
Github backend version updated to "0.20.2".

Signed-off-by: Harshal Mittal <[email protected]>
@harshalmittal4 harshalmittal4 force-pushed the Include_commit_comments branch from 8d7aa33 to ab28f26 Compare May 4, 2019 11:10
@valeriocos
Copy link
Member

@harshalmittal4 sorry for the late reply!
WRT 1., would you like to contact GitHub and ask for clarifications ? I can do it if you want
WRT 2., it seems that in both cases these comments appear as review comments

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented May 5, 2019

Hey @valeriocos,
w.r.t 1, I am OK with whatever you feel, no hard inclination to either side. Not having any problem if the PR is freezed or closed if no solution pops up and going with what everyone else feels.
w.r.t 2, I will just update you after checking again! Thanks!

@harshalmittal4
Copy link
Contributor Author

harshalmittal4 commented May 5, 2019

Hey @valeriocos,
For the repo octocat/Spoon-Knife#1176, the retrieved commit comments are present here, and they dont contain the line comment that I made. Also I am not able to figure out how the comments which are present for the three commits were made if I go to the PR..so probably github might have removed support for such type of comments anymore.
(We can do whatever you feel would be good w.r.t to this PR since commit comments are no longer seen, just wondering if it is likely that commit comments will be enabled again by github!)

@harshalmittal4
Copy link
Contributor Author

I think I might just ask them regarding this on the Github API forum.

@Polaris000
Copy link
Contributor

@harshalmittal4, I think it would be a great idea to ask. Nothing to lose (I think)!

@valeriocos
Copy link
Member

Thank you for the feedback @harshalmittal4 . Yes, it would be good to ask (at least we know what's going on).

A possible idea to use your code could be to add another category for the github backend to retrieve commits. What do you think @sduenas @jgbarah ?

@harshalmittal4
Copy link
Contributor Author

I just opened a topic regarding this feature in the Github API forum, lets see what comes up.

@valeriocos
Copy link
Member

Great! thank you @harshalmittal4

@harshalmittal4
Copy link
Contributor Author

@valeriocos, I got a reply to the post here, lets wait and see what comes or I'll ping once again. Thanks :)

@valeriocos
Copy link
Member

Perfect! Thank you @harshalmittal4 for the heads-up

@harshalmittal4
Copy link
Contributor Author

Hey @valeriocos
As per the last reply obtained here, commit comments are there but only as part of individual commits. They are not part of the commits present in a pull request. Let me know what you think about this!

@valeriocos
Copy link
Member

Thank you for the info @harshalmittal4 ! I added a comment in the thread you linked.

We could add the code you wrote to support a new category (commits). However, Perceval already fetch git commits in a more efficient way using the git backend.

What do you think @sduenas , should we freeze the PR or try to reuse this code for another category ?

@jgbarah do you think it could be useful to have another category for the github backend to address some CHAOSS metrics ?

@sduenas
Copy link
Member

sduenas commented May 9, 2019

See #515 with the reasons why I'm closing this PR.

@sduenas sduenas closed this May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants