Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a tutorial for porting DL program generation to other languages #30

Open
50417 opened this issue Sep 12, 2018 · 13 comments
Open

Add a tutorial for porting DL program generation to other languages #30

50417 opened this issue Sep 12, 2018 · 13 comments
Assignees

Comments

@50417
Copy link

50417 commented Sep 12, 2018

In the paper, you mentioned that your RNN can easily be ported to any other programming language. We are trying to validate that claim . Can you provide tutorial if it is possible to just use the RNN model in the code base?

@ChrisCummins
Copy link
Owner

Hi Sohil, sure! I was actually working on a doc file to do exactly that. Unfortunately though I am taking the next three months off my PhD and won't be working on it until I'm back. In the mean time you're welcome to poke around in the code to do it yourself. The process is really quite straightforward:

  1. Create a corpus for a program language using the code in //datasets/github/scrape_repos.
  2. Create a CLgen model to train on the new corpus (see this file for an example corpus).
  3. Train and sample the model, which is as simple as: $ blaze run //deeplearning/clgen -- --config=/path/to/the/config/file.

Let me know how you get on!

Cheers,
Chris

@ChrisCummins ChrisCummins self-assigned this Sep 12, 2018
@ChrisCummins ChrisCummins changed the title Can we just use the generator part ? Add a tutorial for porting DL program generation to other languages Sep 12, 2018
@50417
Copy link
Author

50417 commented Sep 12, 2018

Thank You for the quick reply. I will try to experiment with the code tomorrow. Let you know if I have any questions or concern.

@50417 50417 closed this as completed Sep 12, 2018
@ChrisCummins
Copy link
Owner

No worries :) I'll actually keep this issue open as a reminder to myself, and in case anyone else wants something similar.

Cheers,
Chris

@ChrisCummins ChrisCummins reopened this Sep 12, 2018
@50417
Copy link
Author

50417 commented Sep 21, 2018

Hello Chris,
I have been trying to run CLgen on macOS. After debugging for some time, i still am unable to debug it to train the test corpus. I get following error.

clgen.py 176 ERROR invalid literal for int() with base 10: '' (ValueError)= stacktrace: #1 /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/sandbox/darwin-sandbox/20/execroot/phd/bazel-out/darwin-py3-opt/bin/deeplearning/clgen/clgen_test.runfiles/phd/deeplearning/clgen/corpuses/corpuses.py:112 __init__() #2 /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/sandbox/darwin-sandbox/20/execroot/phd/bazel-out/darwin-py3-opt/bin/deeplearning/clgen/clgen_test.runfiles/phd/deeplearning/clgen/models/models.py:66 __init__() #3 /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/sandbox/darwin-sandbox/20/execroot/phd/bazel-out/darwin-py3-opt/bin/deeplearning/clgen/clgen_test.runfiles/phd/deeplearning/clgen/clgen.py:100 __init__() #4 /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/sandbox/darwin-sandbox/20/execroot/phd/bazel-out/darwin-py3-opt/bin/deeplearning/clgen/clgen_test.runfiles/phd/deeplearning/clgen/clgen.py:244 DoFlagsAction() #5 /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/sandbox/darwin-sandbox/20/execroot/phd/bazel-out/darwin-py3-opt/bin/deeplearning/clgen/clgen_test.runfiles/phd/deeplearning/clgen/clgen.py:205 RunContext()

I ran the recommended test for clgen. 10 out of 21 passed. To run it on macOS, I have created a virtualenv and ran the code there. The python version used was 3.6.5
I see issue with bazel coming across the https://github.com/tensorflow/tensorflow/issues/10436. The issue encountered was

ERROR: /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/external/base/image/BUILD:6:1: Couldn't build file external/base/image/002.tar.gz.nogz.sha256: SHA256 external/base/image/002.tar.gz.nogz.sha256 failed (Exit 1): sha256 failed: error executing command (cd /private/var/tmp/_bazel_sohilshrestha/f8ee67cead6a3e5516303f5e0dd3d4e7/execroot/phd && \ exec env - \ bazel-out/host/bin/external/bazel_tools/tools/build_defs/hash/sha256 bazel-out/darwin-opt/bin/external/base/image/002.tar.gz.nogz bazel-out/darwin-opt/bin/external/base/image/002.tar.gz.nogz.sha256) Use --sandbox_debug to see verbose messages from the sandbox Traceback (most recent call last): File "bazel-out/host/bin/external/bazel_tools/tools/build_defs/hash/sha256", line 203, in <module> Main() File "bazel-out/host/bin/external/bazel_tools/tools/build_defs/hash/sha256", line 176, in Main raise AssertionError('Could not find python binary: ' + PYTHON_BINARY) AssertionError: Could not find python binary: python3.6

There are few other error as well
`path = PosixPath('/var/folders/68/_gxs799d0930bmq_170px4fw0000gn/T/clgen_abc_corpus_ghp3hxfo')

def GetDirectoryMTime(path: pathlib.Path) -> int:
  """Get the timestamp of the most recently modified file/dir in directory.

  Recursively checks subdirectory contents. This requires that the directory
  exists and is not empty.

  Params:
    abspath: The absolute path to the directory.

  Returns:
    The seconds since epoch of the last modification.
  """
  # Pure python implementation.
  # return int(max(
  #     max(os.path.getmtime(os.path.join(root, file)) for file in files) for
  #     root, _, files in os.walk(path)))
  # Faster implementation using UNIX tools. Requires GNU xargs, which supports
  # the '-d' argument, which is needed to support file names with spaces. On
  # macOS, this means having the homebrew findutils package installed, and
  # the following directory in your PATH:
  #    /usr/local/opt/findutils/libexec/gnubin
  output = subprocess.check_output(
      f"find '{path}' -type f | xargs -d'\n' stat -c '%Y:%n' | sort -t: -n | "
      "tail -1 | cut -d: -f1", universal_newlines=True, shell=True)
return int(output)E     ValueError: invalid literal for int() with base 10: ''`

@ChrisCummins
Copy link
Owner

Hi there, sorry I’m writing this on my ipad so can’t test the fix - but I think I see what the problem is. If you find the file which contains the function ‘def GetDirectoryMTime(’, you’ll see in the comment ‘Pure python implementation’, and then return int(max(.... If you uncomment that return statement, it should fix the error.

@ChrisCummins
Copy link
Owner

The problem is that I’ve hardcoded a reference to GNU xargs command, and macOS ships with a BSD implementation. I’ll fix up the docs / code to work around this. Thanks for reporting the issue!

@JiajieZhang-Georgia
Copy link

Hi Chris and @50417 ,
I tried to run the code for creating a corpus for a language,
when I run the first code,
bazel run //datasets/github/scrape_repos/scraper --clone_list $PWD/clone_list.pbtxt
it gave me an error about 'ERROR: Unrecognized option: --clone_list'
Do you know how to solve it?

@ChrisCummins
Copy link
Owner

ChrisCummins commented Nov 23, 2018

Hi @JiajieZhang-Georgia , woops I'm sorry, I missed a -- in the README. The command is:

bazel run //datasets/github/scrape_repos:scraper -- --clone_list $PWD/clone_list.pbtxt

@50417
Copy link
Author

50417 commented Feb 6, 2019

HI @ChrisCummins ,

Are there any updates on this ?

Is it possible to port CLgen to any other OS environments like Windows or other dialects of Linux. ?

@ChrisCummins
Copy link
Owner

Hey @50417, thanks for your patience! :-) I can see you've made good progress on adapting it to Simulink. If you're looking for specific help with your project I may be able to help out - I would also be interested in getting your work upstream. If you're interested in collaborating, shoot me an email at [email protected]

Cheers,
Chris

@50417
Copy link
Author

50417 commented Mar 12, 2019

Hello everyone,
I have created a bare minimum CLgen using basic python script(without need for bazel) here. Let me know if there are any issues and can this issue be closed .

@ChrisCummins
Copy link
Owner

Interesting! What, in your experience, is the biggest issue for using this project that your fork overcomes?

@50417
Copy link
Author

50417 commented Mar 14, 2019

The biggest issue was I had to rebuilt all of your projects in the phd project. Although learning bazel had a bit of a learning curve, it does not officially support Python and the fact that it is still in beta was an issue when there were bugs .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants