-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reposcanner Design Changes #5
Comments
Looks good to me. My central design idea is to make an "execute" function that takes a repo-name and a yaml "view" and renders it into a repo-specific-directory location. There's a "hidden" input in there, which is what information the view has available to render with. That's the data model... My example had to do an extra "controller" call to create a contributor list, and implicitly took the github repo object as its "model". I think the render templates should be paired 1:1 with a controller code that calls the render template. On the implementation of "stateful analysis data objects" I think simpler is better. A repo name and a controller function would be a minimal state as far as I can tell. The controller code can be explicitly annotated with the "data model" and "view" it uses for provenance, but I hope those are fairly statically associated with their controller. |
Cool! Yeah, I just want to make sure I'm building up this codebase with that use case in mind.
Normally, what'd I'd do would be a "clean architecture" style, where I have a request object that gets passed, and a response object that gets returned. Normally this would be used to create a strict wall of separation between the user interface and the analysis code. However, I'm thinking that I want to retain data from one analysis to the next, which would mean having stateful containers for all the results. If you pass in multiple repos and have Reposcanner perform a set of routines on each, I imagine that you'd want to have all that data bundled together so your visualization routine can just pull out the pieces of data it needs across all the executions. But that all depends on how you want to work with the data downstream. I'm fine with whatever works for you. |
I don't know what the user interface looks like, but I do want to keep views as a per-repo render template and a 'data directory' as a location to store all the rendered info. Then a separate code entirely could analyze the data directory. |
@frobnitzem For the record, I am putting serious effort today towards making all these changes, and I'll let you know how things go. |
@frobnitzem, I made a couple updates yesterday that I'll mention here. All these changes are guarded by unit tests now, which will help with stability.
Now that I've laid the groundwork I needed, I can complete step 2 ("Making routines reusable"). I'll swap out the GitHub specific code in |
CC'ing @frobnitzem since we discussed some of this.
Unless there are any serious objections, I have a couple changes I'll be making to Reposcanner to facilitate the kinds of analyses we plan to do.
Adding support for loading repo lists via YAML files. Reposcanner will support passing a single repository or a set of repositories. Inputs will now flow through stateful analysis data objects that can hold onto credentials, lists of repositories, the set of routines to be performed, etc. This will allow us to handle any number of repositories in a uniform way.
Making routines reusable. Right now we pass repository information to a routine via a constructor, and we'd have to create a new routine object for every repository we wanted to analyze. I'll be reworking these so that they're reusable interactors that are passed the details they need when execute() is called.
Removing the render step from the routine workflow. Especially if we plan on rendering graphs of many different repositories and combining different data sources, there's not much benefit to generating graphs for each and every step of the process. Rendering can be moved out to the end of Reposcanner's execution.
Creating a one-step solution for provenance and data curation. If we generate data for multiple repositories, we need to generate a "receipt" that covers the time of execution, the version of Reposcanner used, repositories analyzed, routines involved, and files generated. This is easy to do with the data objects that I intend on adding.
Tests! Everything needs to be tested.
The text was updated successfully, but these errors were encountered: