-
-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --input-dir and --output-dir as options #119
Conversation
cd4b04f
to
3608037
Compare
process.go
Outdated
|
||
// == Rendering ==================================================== | ||
|
||
func renderTemplate(g *Gomplate, inString string, outPath string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather this function live in main.go
. This file seems like a pretty good place to keep all of the file I/O functions, but this one doesn't really fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, will move it back :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only concern I have is a circular deps on the files then: main -> process -> main (or maybe I think here a bit to Javaesque). From a purity pov it should go in a dedicated render.go but thats then probably overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking again about it, imo renderTemplate in process.go is maybe not so a bad fit as it is the final step in 'process' (so process could io + render).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a circular deps on the files
not really an issue... to the Go compiler the files are almost irrelevant - everything in the directory is all part of the same flat namespace. Definitely a Java-esque thing - which Go isn't 😉
process.go
Outdated
} | ||
|
||
for n, input := range inputs { | ||
if err := renderTemplate(g, input, outputs[n]); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be ideal if this were done in parallel, using goroutines - I think it should be pretty straightforward...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense cpu-wise, but I wonder whether io-wise it reasonable to write in parallel to disk. My gut feeling is that a serial approach might be not worse here
But without a benchmark its hard to judge.
Should I still change this ? Also, this branch is only entered when using multiple --file options (not with the new --input-dir). So its not that many probably anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, rendering templates aren't always pure disk IO - there's often network IO that blocks for far longer. So running these in parallel could be a big win.
I didn't realize this wasn't called with --input-dir
- I haven't had a chance to look too closely yet 😉. I'd definitely rather all separate files be rendered in parallel, whether or not they come from --file
or --input-dir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo renderTemplate in process.go is maybe not so a bad fit as it is the final step in 'process' (so process could io + render)
Figuring out which files to read and reading them is IMO a separate concern from the render step. Admittedly, main.go
has a lot of mixed concerns, but if a new file is created I'd rather it be fairly single-responsibility
process.go
Outdated
func ensureDir(dir string, mode os.FileMode) error { | ||
_, err := os.Stat(dir) | ||
if err != nil { | ||
if os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is unnecessary - os.MkdirAll
will be a no-op if the path already exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the heads up
test/integration/envvars.bats
Outdated
@@ -11,7 +11,7 @@ load helper | |||
|
|||
@test "errors with non-existant env var using .Env" { | |||
gomplate -i '{{.Env.FOO}}' | |||
[ "$status" -eq 2 ] | |||
[ "$status" -eq 1 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this is because RunTemplate
's signature has changed... I'd rather it stay the same to not break compatibility.
main.go
Outdated
@@ -26,16 +27,13 @@ type Gomplate struct { | |||
} | |||
|
|||
// RunTemplate - | |||
func (g *Gomplate) RunTemplate(text string, out io.Writer) { | |||
func (g *Gomplate) RunTemplate(text string, out io.Writer) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is public API, so the signature can't change. Please remove the error
. panic
ing isn't the best way to error out, but it's not the end of the world, and I'd much rather keep API compatibility. I know of at least one user who bundles gomplate into a different tool and calls RunTemplate
directly, so this would break them.
README.md
Outdated
@@ -194,6 +194,18 @@ By default, `gomplate` will read from `Stdin` and write to `Stdout`. This behavi | |||
|
|||
You can specify multiple `--file` and `--out` arguments. The same number of each much be given. This allows `gomplate` to process multiple templates _slightly_ faster than invoking `gomplate` multiple times in a row. | |||
|
|||
##### `--input-dir` and `--output-dir` | |||
|
|||
For processing multiple templates in a directory you can use `--input-dir` and `--output-dir` together. In this case all files in input directory will be processed as templates and the resulting files stored in `--output-dir` (which must exist). The directory structure will be preserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(which must exist)
I don't think that's true, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here where it is asserted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, commented about that in-line in process.go
@@ -0,0 +1,2 @@ | |||
one: eins | |||
two: zwei |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing EOL here, no big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
main.go
Outdated
|
||
func validateInOutOptions(c *cli.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is totally necessary, I really don't like that this has to exist... I've been thinking of migrating to github.com/jawher/mow.cli
for a while and maybe it's time to do that. It can handle option exclusion pretty well, so this validation would be unnecessary. Maybe I'll work on migrating to mow.cli
in the next little while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let me know when and from where (branch) to rebase when you added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might need to take this back. I started working on it in #122 but it seems to have trouble with taking -
as an argument for --file
or --out
. I think it maybe makes sense to keep this code in for now and not block this PR on the other one.
Thanks for this, @rhuss - and sorry for the rough review so far - I want to make sure this feature is implemented well. I hope you're not taking any of my nitpicks and criticisms personally - I really appreciate the time you've put into this so far! 😄 Thinking about this further, I think I'd like I think I'm also going to migrate the commandline flag processing to |
process.go
Outdated
func processInputDir(c *cli.Context, g *Gomplate) error { | ||
inputDir := c.String("input-dir") | ||
outDir := c.String("output-dir") | ||
if _, err := assertDir(outDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hairyhenderson here is the assertion that the outDir
must exist. I'm not sure if should we require this, but I think its common practice to require the top-level directory to exists. Otherwise typos in the usage can easily slip through leading to unexpected results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... 🤔 I think I'd rather create outDir
if it doesn't exist. gomplate
is usually used in scripts and typos would be caught pretty quickly, by virtue of the files not existing in the right location. Besides, --out
will create the output if it doesn't exist, so it makes sense (to me anyway) that --out-dir
would create the directory too.
Also it makes the code simpler, which is a good thing 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
ok, I can implement it that way. |
Just added parallel file processing (both for the --file and the --input-dir case). I also did a small stupid benchmark with 1000 files a 40k each and 1000 template instructions (on an 16 GB Mac with 8 cores) For the serial implementation I get: time bin/gomplate --input-dir test/t/in --output-dir test/t/out --datasource config=test/t/config.yml
322.71s user 66.78s system 152% cpu 4:14.92 total For the parallel implementation:
So roughly a factor two for the total time (although more overhead for io and context switching I guess, there are 10k go routines created in this sample). Not so bad. |
If not given, the default is the current working directory Relates to PR hairyhenderson#119
Addressed all review comments so far. wrt. to the parallel processing: I had to introduce some locking around the cache access when processing the Datasource. Please review this as I have quite some general experience with threading but not with shared memory when using goroutines. |
… success Thanks to the awesome IT suite for triggering this Relates to hairyhenderson#119
all green ;-) like the integration test suite, need to add some for --input-dir, too ... |
Interesting! I think it would be pretty unusual to be processing directories so large, but it's 👍 anyway. |
data.go
Outdated
} | ||
cached, ok := d.cache[cacheKey] | ||
if ok { | ||
func (d *Data) ReadSource(fs vfs.Filesystem, source *Source, args ...string) (map[string]interface{}, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is public API - can you find a way to accomplish this without changing the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it since it was necessary to add the creation of the JSON and YAML documents in the mutex protected section when doing parallel processing. Also, I think it makes sense to include the (potentially costly) deserialized objects into the cache so that this conversion needs to be called only once (and not for every call to the datasource). I don't consider this optimisation as premature since the call to the datasource can happen quite often also when doing only a single call.
I suggest adding a second method ReadSourceAsObjects
(could be private, too) which is used by DataSource and leave an (unoptimised) ReadSource
with the original signature. Then we could decide whether to increase the cache to hold both kinds of objects (with the cost of increasing memory usage) or we make ReadSource
to not use a cache.
wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think it makes sense to include the (potentially costly) deserialized objects into the cache so that this conversion needs to be called only once (and not for every call to the datasource). I don't consider this optimisation as premature since the call to the datasource can happen quite often also when doing only a single call.
If you're using the same datasource multiple times in a template, it makes sense to assign its output to a variable:
{{ $config := datasource "config" }}
hostname: {{ $config.hostname }}
port: {{ $config.port }}
...
So datasources should almost never be called multiple times in a template. It's non-obvious if someone is new to gomplate and go templates in general, but maybe that could be mitigated with documentation for now... Ultimately it would be nice to have some sort of subcommand that could lint the template, but that's a long way off 😉
Ok, just had a deeper look at the code, and I was a little confused about the cache lock at first, but now I understand why it's necessary. I think I'm going to change my mind (sorry!) - let's not parallelize rendering. At this point it feels like premature optimization and I'm not sure I'm comfortable with the extra complexity in the code that it introduces... |
@hairyhenderson what about the following suggestion: We keep the current login without synchronising but add an option I think it should be possible to keep the current API and clearly separate those functions without much duplication. Please let me know whether I should give it a try. Otherwise, I revert the code again (but let me know whether you want the cache optimisation mentioned above) |
Let's just revert the code for now, including the cache changes. I'd rather keep this PR smaller and focused solely on |
If not given, the default is the current working directory Relates to PR hairyhenderson#119
… success Thanks to the awesome IT suite for triggering this Relates to hairyhenderson#119
Reverting according to review hairyhenderson#119 (comment)
Reworked again and moved the optional parallel processing to #123 which is based on this PR. Think it still makes sense, and since it was quite a bit of work and testing I didn't want to simply remove it. |
let me know whether i should do a git squash (so weed out the threading meander on this PR) |
Yes please! |
} | ||
return "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just be accomplished by defaulting to .
in the StringFlag
, though that'd change the logic for validateInOutOptions
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that, but this doesn't work in the validation logic, which checks for whether the outputDir is set. if it has a default value, then it is always set. You then also can't distinguish between people setting "." explicitely (and so switching on the ferature) or whether it comes from the default.
Dont think its so bad here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You then also can't distinguish between people setting "." explicitely (and so switching on the ferature) or whether it comes from the default.
Not really sure that this matters - the feature should be dependent on --input-dir
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you then error on
- when
--output-dir="."
and--out="parsed-file.txt"
is provided - when
--output-dir="."
is provided and no--input-dir
is given
Of course I can remove these checks (or make them apply only when output-dir is not "." but that inconsistent imo).
Again, is it really that bad here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - good points. There would be ways to apply this, but I see now that the logic would probably be uglier - let's just leave this as-is then 🙂
} | ||
|
||
// read directory | ||
entries, err := ioutil.ReadDir(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want this put above the MkdirAll
call. If something goes wrong reading it, I don't want the output dir to be left behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if we want to make this really atomar, then we should also check the whole traversal to have not partially filled (and remember which directories we created and which were already there).
But I can move that up (which however makes it a bit harder to refactor of course as this method is already close to the cyclomatic complexity limit of 10)
Please be sure whether you really want this up (don't want to make many turns yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it.
process.go
Outdated
if err != nil { | ||
return nil, err | ||
} | ||
if !si.IsDir() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary - ioutil.ReadDir
will error if given a non-directory. If you remove this block, this function can disappear and you can just call os.Stat
directly on ln39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
process.go
Outdated
return os.OpenFile(filename, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0644) | ||
} | ||
|
||
func checkClose(c io.Closer, err *error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too clear what the point of this function is... Why not simply defer file.Close()
? Nothing will be done with err
so it doesn't seem too useful to assign it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not me, its the gometalinter which croaks if the error is not checked ;-).
Well, if *error
is nil then it's set with the error from the Close()
so it makes a little sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add // nolint: errcheck
on the line to shut gometalinter
up.
Well, if *error is nil then it's set with the error from the Close() so it makes a little sense.
Right, but after it's set, nothing actually reads that error... 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you are right, but the calling method could initialize err
with nil and then always return err
(which in case of a Close() error would be filled before returning, otherwise it stays nil). That was missing.
But I agree that its an academic question, will remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks 🙂
squashed it |
Ok! Thanks for your patience with all this - I have just one more request, then it'll be ready to go 😉 - can you write some integration tests for this? Just add some tests to |
All filese from --input-dir will be processed as templates and stored with the same directory hierachy in --ouput-dir - Use both options when a whole directory hierarchy needs to be processed. - Extracted file processing logic in an extra process.go - --output-dir is optional and default to "." - --output-dir is created automatically if not existing Fixes hairyhenderson#117 Signed-off-by: Roland Huss <[email protected]>
test added |
@rhuss thank you! This is ready to merge now - thanks again for your patience with this 🙂 |
Partial solution to #117
Bear with me, Golang is not my mother tongue (which was in fact Perl at those days, but is Java & JavaScript since some times). Looking forward to a review ;-)