Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scancli option to CoLic Backend #27

Closed
inishchith opened this issue May 8, 2019 · 15 comments
Closed

Add scancli option to CoLic Backend #27

inishchith opened this issue May 8, 2019 · 15 comments

Comments

@inishchith
Copy link
Contributor

Adding support of a faster version of scancode ( scancli ) to CoLic Backend.

@valeriocos Please let me know if i can work on this.
Thanks

@valeriocos
Copy link
Member

Sure @inishchith , thanks !
You can find useful info at the following urls:

A possible implementation could add a boolean param cli here: https://github.com/chaoss/grimoirelab-graal/blob/master/graal/backends/core/analyzers/scancode.py#L41 and the method analyze could be modified to call two private methods: analyze_scancode and analyze_scancode_cli (depending on the value of cli), the former would contain the code of the current analyze method and the other some code similar to this one.

The code of the colic backend shouldn't probably changed too much (just adding new categories and related code).

What do you think ?

@inishchith
Copy link
Contributor Author

@valeriocos Thanks for the supporting links and insights on how to go about the task.
I'll start working on the task and open a PR once done, then we can have further discussion over there.

Thanks :)

@inishchith
Copy link
Contributor Author

@valeriocos can you share the version of scancode release or the setup that you used in order to run scancli successfully?
I read the discussion on aboutcode-org/scancode-toolkit#1400 but couldn't reproduce the results as I ran into multiple errors, so thought of asking before moving forward.

Thanks

@valeriocos
Copy link
Member

valeriocos commented May 12, 2019

Sorry for the late reply @inishchith

In the virtual env used by graal, I installed simplejson and execnet as reported here: aboutcode-org/scancode-toolkit@8afa686#diff-f826f8c8f6f35f368b2a692610f05d62R18

Then I used the following branch: https://github.com/valeriocos/grimoirelab-graal/tree/test-scancli/graal, and launched the backend in the following way:

colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/xyzw
--exec-path
/home/scancode-toolkit/scancode (v3.0.0 downloaded from here: https://github.com/nexB/scancode-toolkit/releases
--category
code_license_scancode
--json

Note that you have to modify the method metadata to include the param filtered_classified

Tomorrow I can push a better version of the code of my branch.

Hope it helps :)

@inishchith
Copy link
Contributor Author

inishchith commented May 12, 2019

@valeriocos Thanks for sharing the information.

  • These changes were introduced on 5th March 2019 and Scancode-toolkit v3.0.0 was released on 15th Feb 2019 ( i.e before the changes were made ), Hence there doesn't existscancli.py.

Please do correct me here if I'm wrong or have missed something out. Thanks

@valeriocos
Copy link
Member

valeriocos commented May 12, 2019

Sorry @inishchith I made a mistake. It wasn't version 3.0.0, but the checkout at aboutcode-org/scancode-toolkit@8afa686 (as reported here: aboutcode-org/scancode-toolkit#1400 (comment)). The code was then merged in the develop branch (as reported here: aboutcode-org/scancode-toolkit#1400 (comment)).

If you clone the repo and use the current develop branch, the backend should work (https://github.com/nexB/scancode-toolkit/tree/develop).

Let me know if you have any problem, thanks :)

@inishchith
Copy link
Contributor Author

@valeriocos Sorry for the delayed response.

I tried reproducing the results using your setup information and the test-scancli branch of your fork. But I couldn't do it, I feel there has been some change to the implementation since. I've shared the error log. Please do let me know if you've encountered it before or i must have missed something out. Thanks :)

  • Error log
[2019-05-13 16:50:03,438] - Starting the quest for the Graal.
[2019-05-13 16:50:10,816] - Git worktree /tmp/worktrees/tmp2 created!
[2019-05-13 16:50:10,817] - Fetching commits: 'https://github.com/chaoss/grimoirelab-toolkit' git repository from 1970-01-01 00:00:00+00:00 to 2100-01-01 00:00:00+00:00; all branches
[2019-05-13 16:50:12,460] - Git repository tmp2 checked out!
Traceback (most recent call last):
  File "/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py", line 72, in <module>
    for s in scan(args):
  File "/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py", line 63, in scan
    results = channel.receive()
  File "/usr/local/lib/python3.6/site-packages/execnet/gateway_base.py", line 728, in receive
    raise self._getremoteerror() or EOFError()
execnet.gateway_base.RemoteError: Traceback (most recent call last):
  File "<string>", line 1063, in executetask
  File "<string>", line 1, in do_exec
  File "<remote exec>", line 53, in <module>
  File "<remote exec>", line 44, in run_scan
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 864, in run_scan
    quiet=quiet, verbose=verbose, kwargs=kwargs, echo_func=echo_func,
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 1054, in run_scanners
    with_timing=timing, progress_manager=progress_manager)
  File "/Users/Nishchith/scancode-toolkit/src/scancode/cli.py", line 1145, in scan_codebase
    location, rid, scan_errors, scan_time, scan_result, scan_timings = scans.next()
AttributeError: 'list' object has no attribute 'next'

[2019-05-13 16:52:39,704] - Analysis failed at 9dc821962567715e5358b1192e1b15d8868d2b6c
Traceback (most recent call last):
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 62, in analyze
    msg = subprocess.check_output(cmd_scancli).decode("utf-8")
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 336, in check_output
    **kwargs).stdout
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['python3', '/Users/Nishchith/scancode-toolkit/etc/scripts/scancli.py', '/tmp/worktrees/tmp2/.gitignore', '/tmp/worktrees/tmp2/AUTHORS', '/tmp/worktrees/tmp2/LICENSE', '/tmp/worktrees/tmp2/grimoirelab/__init__.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py', '/tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py', '/tmp/worktrees/tmp2/setup.cfg']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 472, in run
    for item in items:
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 589, in fetch
    raise e
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 583, in fetch
    for item in items:
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 162, in fetch
    for item in self.fetch_items(category, **kwargs):
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 183, in fetch_items
    raise e
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/graal.py", line 176, in fetch_items
    commit['analysis'] = self._analyze(commit)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 161, in _analyze
    analysis = self.analyzer.analyze(local_paths)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/colic.py", line 204, in analyze
    analysis = self.analyzer.analyze(**kwargs)
  File "/Users/Nishchith/GitHub/grimoirelab-graal/graal/backends/core/analyzers/scancode.py", line 65, in analyze
    e.output.decode("utf-8")))
graal.graal.GraalError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg, 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/graal", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 125, in <module>
    main()
  File "/Users/Nishchith/GitHub/grimoirelab-graal/bin/graal", line 71, in main
    cmd.run()
  File "/Users/Nishchith/GitHub/grimoirelab-perceval/perceval/backend.py", line 482, in run
    raise RuntimeError(str(e))
RuntimeError: Scancode failed at /tmp/worktrees/tmp2/.gitignore /tmp/worktrees/tmp2/AUTHORS /tmp/worktrees/tmp2/LICENSE /tmp/worktrees/tmp2/grimoirelab/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/__init__.py /tmp/worktrees/tmp2/grimoirelab/toolkit/_version.py /tmp/worktrees/tmp2/setup.cfg, 

@valeriocos
Copy link
Member

valeriocos commented May 13, 2019

No worries @inishchith :)

I have uploaded a branch with some improvements in the code, however I confirm what you reported: the errors you posted appear when using the develop or master branches of the original repo. However if you perform the following steps and run the same code, no errors pop up:

git clone https://github.com/nexB/scancode-toolkit
git checkout -b xxx 8afa686fb71b9540029234e5a40c0572c4457c28
colic
https://github.com/chaoss/grimoirelab-toolkit
--git-path
/tmp/cdefgh
--exec-path
/home/graal-libs/scancode-toolkit/etc/scripts/scancli.py <-- the repo just downloaded
--category
code_license_scancode_cli
--json

I'll keep investigating and let you know about the advances

@inishchith
Copy link
Contributor Author

@valeriocos Thanks for checking the issue out. After the checkout commit, I could reproduce the results 👍

Also I checked out your implementation of scancode_cli here.
I noticed that you're passing all the files at once as arguments instead of passing files individually as per the in-place convention, does it provide enhanced performance in the former case?
I didn't get time to test the ways thoroughly hence thought of asking :)

@valeriocos
Copy link
Member

Great @inishchith !

Also I checked out your implementation of ....

Yes, this is one of the feature of scancli (check the comment here: aboutcode-org/scancode-toolkit#1400 (comment), and the following one).

If you test scancode and scancli against https://github.com/chaoss/grimoirelab-toolkit you should see the difference.

@inishchith
Copy link
Contributor Author

@valeriocos thanks for answering.
As my unversity exams are under way, i'll work on this when time permits.
I'll probably test scancode and scancli to check the difference tomorrow and continue the work which is currently staged.

Sorry for the delayed response.

@valeriocos
Copy link
Member

No worries @inishchith , I have just open a PR (#28) with some code to use scancli.

Feel free to work on that PR or create a new one.

@inishchith
Copy link
Contributor Author

@valeriocos Sure.
I checked out #28 , The work that i've done until now seems similar.
Still, I'll open a PR in some time so that we can work on adding tests for it too.

Thanks

@inishchith
Copy link
Contributor Author

@valeriocos I think we can close this. what do you think?

@valeriocos
Copy link
Member

Sure @inishchith , feel free to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants