-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[github] Include commit comments in Github backend #518
[github] Include commit comments in Github backend #518
Conversation
Also, is it okay to include the |
@valeriocos @aswanipranjal, ping.. |
I get the same errors with mine, @harshalmittal4. Not completely sure why. |
I'll have a look at it ASAP, thanks for the reminder :) |
When perceval fetches data from remote sources (e.g., GitHub API), it stores the requests and the corresponding responses in a database/archive (see details here) in order to reuse this data for other executions. In case the combination of In the case of commits in PRs, it may happen that the same commit is present in more pull requests, for instance:
A possible solution to avoid the archive error is to use an internal cache (as similarly done for github users). @Polaris000 @harshalmittal4 at the moment this is the only solution that comes to my mind. However for repos with many commits in PRs, this solution may not be good. What do you think ? |
Hey @valeriocos, the error is resolved by using an internal cache, thanks a lot! Shall we go with it for now, since for users also the same thing is done. Other than that, may you just take a look at the PR structure, I mean this : The changes made:
I wished to understand if any information needs to be removed or added in commits_data. Thanks ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshalmittal4 the PR looks OK (I left just a minor comment). Please include the caching support and tests.
Thank you!
3f2346a
to
e4a72be
Compare
@valeriocos, the PR isn't ready to be merged, it needs a little work to pass the travis-build. will update it soon. |
great @harshalmittal4 let me know when it is ready to review, thanks! |
e4a72be
to
976968e
Compare
Hey @valeriocos, |
Thank you @harshalmittal4 , I'll check it later today |
976968e
to
e89f85d
Compare
Also, the PR is failing due to covergae decrease, so I needed some guidance what can I do about this or if this could be avoided, thanks. |
To check the coverage, you should use the coverage package and execute it as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshalmittal4, overall the code looks OK, but the tests need some improvements.
It seems that writing tests that use httpretty is not easy at the beginning. If so, could you summarize the difficulties you found?
As suggested to @Polaris000 we could improve the documentation to explain how to write this kind of tests ( maybe a small section in the grimoirelab book @jgbarah ). What do you think ?
I would suggest to take a look at this review: #519 and this commit valeriocos@01314c9 (which includes the test data and corresponding modifications in test_github.py for that feature) to better understand how tests work.
I try to summarize how tests that rely on httpretty work using two examples: test_get_user
and test_fetch_pulls
.
- test_get_user:
This test checks that the methodclient.user
retrieves the info about a user via the GitHub endpoint/users/<login>
. Httpretty allows to mock http requests by defining the request url and the output of the response (GITHUB_USER_URL
andbody=login
inhttpretty.register_uri
), which is available here . Thus, two important considerations have to be done: 1) the data returned when executing the testtest_get_user
will be always the same, 2) the data returned can be whatever and may not exist in the real API, if you take a look at the content of the github_login file, you will see thatzhquan_example
doesn't exist on GitHub.
Things get a bit more complicated when a test requires to call more API endpoints, this is the case of test_fetch_pulls
, which mimics the behavior of the backend when fetching pull requests. To make httpretty works, all endpoint calls made by the backend should be mocked, otherwise a call will be issued against the real API, and since the repo zhquan_example/repo
and the corresponding data don't exist, some HTTP errors will pop up and make the test(s) stuck (this is specially true for user info, if you have a look at the test data you will see that most user info is the one of zhquan_example
(the mocked user)).
- test_fetch_pulls:
This test mocks the execution of Perceval when fetching a pull request. As you can see, several URIs are mocked (lines 268 - 340). Thus, in case the backend is fetching data from a new endpoint (reviews, commit comments, etc), a new URI must be added to mock the corresponding call and mocked data must be provided. It is worth noting that some calls may rely on values included in previous responses, for instance the method__get_pull_review_comments
reads the pull request number from the pull request obtained. In this case, the attributenumber
in the mocked data must be modified to make the tests work.
If what written above is clear and to answer your question here, you should add mocked data for commit comments and make sure that the values embedded in it are consistent with the information needed by the related API calls.
Thanks a lot @valeriocos. I will update the tests for the PR in 1-2 days and then get it reviewed. |
Hey @Polaris000 we can discuss regarding this shortly once we both have got an overview, thanks :) |
cf79e0a
to
961626e
Compare
Hey @valeriocos, I have updated the PR. Caching is included currently but I am not sure about it. I saw #519 for that and found that caching isn't required in case of pull_request reviews (since no 2 PR's would have the same endpoints for reviews), but we need caching in this case, since same commit can be present in 2 different PR's (also if I remove the current caching part from the code, it fails for some repos). The test data for Needed some thoughts regarding the caching implementation! |
no worries @harshalmittal4 , let me know when I can review the PR, thanks! :) |
The PR can be reviewed @valeriocos, but I needed help regarding the caching! |
961626e
to
8f88732
Compare
Hey @valeriocos, the PR is good now, was able to resolve the caching error when I had a look again today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @harshalmittal4 for updating the PR. The code looks great, but there are some minor things to fix (see comments).
Please remember to update the github version number at: https://github.com/chaoss/grimoirelab-perceval/blob/master/perceval/backends/core/github.py#L88.
Thanks! :)
10b7594
to
8d7aa33
Compare
@valeriocos, I have updated the PR and had some minor things to ask, please look into them, thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick reaction @harshalmittal4 (a couple of comments left)
I was looking for a repo with commit comments to make a live test, but I haven't found any so far. I tried to add a comment to the PR chaoss/grimoirelab-graal#26 in graal, but I didn't find how to do it (although github documentation says that is possible: https://developer.github.com/v3/guides/working-with-comments/). At the end I added two comments to graal (see below), which were correctly retrieved with your feature.
I'm a bit puzzled wrt commit comments as part of pull request reviews, since I wasn't able to add a comment to a PR commit, but the offical doc says that it's possible. So I would ask you to check it on your side to be sure that commit comments can be part of a pull request review. If this wasn't the case (due to some changes in the GitHub API), we can think about adding a new category to the github backend to retrieve commits and their comments (to discuss with @sduenas). What do you think ?
} | ||
path = urijoin("commits", commit_sha, "comments") | ||
|
||
items = [items for items in self.fetch_items(path, payload)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked this code in the previous review, sorry. why items[0]
is needed?
I'm making some tests (still looking for commit comments), but the code shouldn't be something as follows?
raw_comments = [items for items in self.fetch_items(path, payload)]
self._commit_comments[commit_sha] = raw_comments
return raw_comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @valeriocos
I tried it earlier and found that self.fetch_items(path, payload)
returns a generator object. Then the code for items in self.fetch_items(path, payload)
gets a string which is a list of commit comments. So items[0] is just a string which comprises of the list of commit_comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also was not able to find much for commit comments, so used this : octocat/Spoon-Knife#1176 for the example.
The required pull_request (1176) data is quickly retrieved using the commandperceval github octocat Spoon-Knife --from-date "2019-03-06T23:25:42Z" --category pull_request --sleep-for-rate -t XXXX >> testing.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it earlier and found that self.fetch_items(path, payload) returns a generator object. Then the code for items in self.fetch_items(path, payload) gets a string which is a list of commit comments. So items[0] is just a string which comprises of the list of commit_comments.
If I'm not wrong the method fetch_items
returns paginated items so items[0]
should return just the first page (I'm going to check to confirm it)
I also was not able to find much for commit comments, so used this : octocat/Spoon-Knife#1176 for the example...
Were you able to create one or more commit comments in a review ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried again to create a commit comment within a review but I didnt' find how. If we are not able to create one, it's possible that this feature is not supported anymore by GitHub (I'm really sorry about that @harshalmittal4). In this case, we should probably freeze the PR and contact GitHub, or see how we can reuse this code. What do you think @harshalmittal4 (any idea is welcomed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @valeriocos, sorry for the delay, some views:
- I am not able to figure out currently why the commit comments box at the end is showing up in some PR's but not in others, Example :
Maybe github has discontinued this recently or it might be some other case that we are missing (since this is an example of a random PR that I saw from Jan 2019, and there also the option for commit comments is not present at the end as it should have been).
- I also wanted to ask regarding commenting on a specific line in the Files changed. Two options present : Add single line comment, and Start a review.
I suppose that when we select Add single line comment, the comment is counted as a commit comment. Whereas any comment that is made after selecting the option Start a review (and before submitting the review) is counted as a review.
If this is the case, this PR can fetch the comments made from Add single line comments option.
This code adds commit comments for pull_request commits to Github backend, and returns them in the pull_request response. Also adds the corresponding tests. Github backend version updated to "0.20.2". Signed-off-by: Harshal Mittal <[email protected]>
8d7aa33
to
ab28f26
Compare
@harshalmittal4 sorry for the late reply! |
Hey @valeriocos, |
Hey @valeriocos, |
I think I might just ask them regarding this on the Github API forum. |
@harshalmittal4, I think it would be a great idea to ask. Nothing to lose (I think)! |
Thank you for the feedback @harshalmittal4 . Yes, it would be good to ask (at least we know what's going on). A possible idea to use your code could be to add another category for the github backend to retrieve commits. What do you think @sduenas @jgbarah ? |
I just opened a topic regarding this feature in the Github API forum, lets see what comes up. |
Great! thank you @harshalmittal4 |
@valeriocos, I got a reply to the post here, lets wait and see what comes or I'll ping once again. Thanks :) |
Perfect! Thank you @harshalmittal4 for the heads-up |
Hey @valeriocos |
Thank you for the info @harshalmittal4 ! I added a comment in the thread you linked. We could add the code you wrote to support a new category (commits). However, Perceval already fetch git commits in a more efficient way using the git backend. What do you think @sduenas , should we freeze the PR or try to reuse this code for another category ? @jgbarah do you think it could be useful to have another category for the github backend to address some CHAOSS metrics ? |
See #515 with the reasons why I'm closing this PR. |
Addresses #515.
The changes made:
commits_data
field in pull_request response originally contained the PR commithash
, now it containscommit_comments
as well.commit_comments
has two information : comment body and user_data (for the comment author).Currently, the PR is not ready to be merged.
It is running into error
perceval.errors.ArchiveError: data storage error; cause: duplicated entry
,(the error can be reproduced using command
perceval github Cloud-CV Origami --json-line --category pull_request --sleep-for-rate -t XXXX>> testing.json
, for any other other repo also, same error is popping up).It would really be helpful if someone could have a look where am I missing!
Also, I wished to understand if any information needs to be removed or added in
commits_data
.I will add the tests to the PR once this is done, thanks a lot :)
@valeriocos @aswanipranjal @jgbarah