-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganization of Code Generation #6
Conversation
On my machine: Julia startup time:
With your patch
Without your patch
While the 25% gain is good, I think your patch will make it impossible for someone to refer to the source for types/member names/APIs. What are your thoughts? |
Also, I don't understand where the speedup is coming from... since in both cases all types and methods are being defined at startup. Is it just the .jl file parsing time? |
A simpler way to reduce load time may be to split EC2 into separate packages by functionality. S3 also goes into its own package. and all of them have a dependency on a core AWS package which only has basic types and the actual request execution code. This way the user can only include the functionality required. |
On my 64-bit machine, both times (with/without patch) is identical at 10.2 seconds (of which 1.7 seconds is Julia startup time). Also, sooner or later, we will have support for precompiled packages. |
The main thought here is not startup time (though it's a nice benefit), but added flexibility and simplicity in code generation. While I see your argument about referring to the code for API, I think having 11000 lines of almost identical code is a terrible idea. Another thing I want to point out is that people really shouldn't be using the raw EC2 API. We need to figure out the right abstraction and provide that API and if people really really want to, they can use this API, but in that case they should refer to the EC2 API Documentation. |
Since all 11000 lines are generated, I don't see it as a big issue. Macros just hide the process. Trust me, people will be using the raw EC2 API and tailoring it to their needs. Which is why I suggested people use test/ec2_utils.jl as a template and customize it for their needs. It is a little difficult to come up with a good abstraction at this stage. Maybe when we have a few more users. For example, lots of public datasets are available as ebs snapshots on ec2. We were discussing that folks would probably want to (programmatically), create a volume from the snapshot, make n copies of the volume, start n instances, mount volume on each instance, fire a parallel julia job, terminate the instances and release the copied volumes. This kind of stuff will be common enough that we can probably provide a few templates initially instead of a higher level api. While the EC2 API documentation will be the only real guide for this package, it is very useful to see the types and API while writing code. We can probably a) generate the types and apis only in a separate .jl file that is not included in the main package - it is just a reference file, never really loaded and available in a Anyways, send me a rebased pull request when you are ready and I'll test it out with the limited test script that I have. |
Have added you as a collaborator. You can create branches here itself. |
I believe there are two classes of users - those end users who want to offload compute to ec2 and those who want to do robust deployments on ec2. For the first class, we need a very simple interface that hides the complexity of ec2. The second class can use the current detailed interface. Since the underlying interface is auto generated, perhaps we should be focussing on the user facing simple interface. |
Btw, I disagree about providing templates. I would prefer to have an evolving api over templates. |
I agree with Viral about not having templates. |
OK. We can try building a higher level interface with test/ec2_utils.jl as a starting point. I would suggest however in keeping it very simple so that this interface can become an interface to multiple cloud platforms in the future - specifically OpenStack and Azure. |
Is this issue relevant anymore? |
This reorganizes code generation and moves part of it to metaprogramming on load time, which apart from improving load times (at least on my machines) also has the added benefit of allowing us to easily switch to a different organizational structure if dynamic generation of types proves unsuitable for this task (since this package alone has more types than all of Base). Anyway, I tried to make the result as close as possible to the original and will send functional changes in a different pull request.
This also dependes on my previous two pull requests. If those are merged before this, I will rebase this branch as appropriate.
WARNING: I'm currently hunting a memory bug in Julia triggered by this. After that this should be good to merge, but I figured I'd put it up before.