Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read_file macro method #6967

Merged
merged 11 commits into from
Nov 8, 2018
Merged

Conversation

Sija
Copy link
Contributor

@Sija Sija commented Oct 20, 2018

#2791, again! ;)

/cc @asterite

@Sija Sija mentioned this pull request Oct 20, 2018
@bcardiff
Copy link
Member

I don't think the discussion will differ. Better to find a way to embed slices.

That been said. I care more about API. So, if the macro would return a slice, even if the string is embedded as a first implementation, that is something I could live with. Until a better implementation without breaking the API will come.

@sdogruyol
Copy link
Member

I've wanted to have this for a long time. Let's KISS and start with something easy to iterate on like this 👍

2 similar comments
@sdogruyol
Copy link
Member

I've wanted to have this for a long time. Let's KISS and start with something easy to iterate on like this 👍

@sdogruyol
Copy link
Member

I've wanted to have this for a long time. Let's KISS and start with something easy to iterate on like this 👍

src/compiler/crystal/macros/methods.cr Outdated Show resolved Hide resolved
@asterite
Copy link
Member

Right now a string can contain any kind of bytes, so I think this is fine. If someone wants to embed a slice of bytes they can just do:

DATA = {{ read_file("whatever") }}.to_slice

@ysbaddaden
Copy link
Contributor

I'd like a kwarg, to have a String (default) or Bytes (on demand), but maybe it would be confusing, and always having a String is better?

{{ read_file("LICENSE.md") }}             # => String
{{ read_file("logo.png", binary: true) }} # => Bytes

@jhass
Copy link
Member

jhass commented Oct 22, 2018

I don't see a strong reason to have it for string, worst case it should be a String.new({{read_file...) away, but if we want it I'd vote for having read_file to return Bytes and read_file_to_string or perhaps read_text_file to return String.

@asterite
Copy link
Member

The problem is that read_file is a macro method, and, like all macro methods, it returns an AST node. There's a StringLiteral node to represent a string. There's no node to represent a slice of bytes. So if you want that you'll have to:

  • introduce such node
  • define a syntax to write it (maybe Bytes[...] would be enough)
  • provide operations on it

In my mind it's just simpler to make read_file return a string. I don't see any problem with it, and you can manipulate the data at compile-time just fine.

@jhass
Copy link
Member

jhass commented Oct 22, 2018

The proper implementation should do something like #2791 (comment), so generate an AST subtree equivalent to the inline assembly call.

As Brian points out, introducing it now to return a string literal would just force a breaking API change in the future.

On a compromise, maybe we can do a text only version for now and call it read_text_file or the like?

@asterite
Copy link
Member

If it returns an asm node there's no way to manipulate the data at compile-time. I still don't understand what's the problem with having it be a StringLiteral, could someone explain this to me? :-)

@jhass
Copy link
Member

jhass commented Oct 22, 2018

@asterite

Until a better implementation without breaking the API will come.

As Brian points out, introducing it now to return a string literal would just force a breaking API change in the future.

I'm not sure how I can present the argument any more clear than that, sorry.

@asterite
Copy link
Member

But read_file returning StringLiteral is perfect and there won't be a need to change it's API. I mean, we are assuming it needs to be changed because it has something wrong with it. I want to know what that is.

@jhass
Copy link
Member

jhass commented Oct 22, 2018

Maybe it's the old discussion about what String should be, generic data blob or representation of sequence of codepoints in some encoding and me strongly favoring the latter as I see doing the former one of the biggest mistakes of Ruby. Also if we would go for the former we should remove Slice.

@jhass
Copy link
Member

jhass commented Oct 22, 2018

Also if we keep the concerns separate, that is read some text vs just give me the bytes, we can make the former pleasant to use at some point by applying any necessary encoding transformations.

@asterite
Copy link
Member

Then I guess a conclusion won't be reached here. I propose to close this then until someone figures out how to lay out everything related to strings and bytes.

@straight-shoota
Copy link
Member

straight-shoota commented Oct 22, 2018

@jhass String.new({{ read_file }}) would copy the slice to the new String instance.

Having a binary argument as @ysbaddaden suggested seems like a good idea. If false, it returns a StringLiteral. If binary: true, it could just append a call to #to_slice to the StringLiteral. And later, when there is a proper bytes literal, it could just be upgraded to that without any changes in the API.

@bcardiff
Copy link
Member

  • I'm ok with read_text_file.
  • I'm ok with read_file to expand to a Call(StringLiteral, :to_slice) and whatever is needed to simulate a slice literal.
  • I'm ok with not having read_file now, but adding read_text_file now
  • I'm not ok with changing the returned type based on an argument. Even in macros. So no for binary: true.

So, if the macro name is changed and the review of @asterite regarding path is solved, this is GTG on my side.

@RX14
Copy link
Contributor

RX14 commented Oct 23, 2018

There's clearly two different things wanted here:

  • a way to read files inside a macro and then use the contents of the file inside the macro to make decisions on the macro level. This may expose the contents of a file at runtime but doesn't have to. The most reasonable way to do this is returning a StringLiteral. Think VERSION file.
  • a way to efficiently embed large, possibly binary, files inside the executable to make exevutables more self-contained. Think compiled image and JS assets for a webapp.

These are orthogonal and want different interfaces. The former can be read_file and would be a macro method. The latter can be achieved using module-level ASM to embed a file in the ELF directly using assembler directives. I'd call it embed_file and it'd be a macro which expanded to a global_asm block and a Slice.new call. (and macros are not macro methods)

@jhass
Copy link
Member

jhass commented Oct 23, 2018

Good point, thank you. I still think we should be cautious with introducing the first case while the second doesn't exist, because during that period people will abuse it for the second case. So I stick to preferring read_text_file as a name, I don't think it harms the first case in any way and communicates its contract much more clearly.

@straight-shoota
Copy link
Member

@RX14 The embedded data could be binary but also text. For text, it should simple to get a String
at runtime without copying the slice.

@RX14
Copy link
Contributor

RX14 commented Oct 23, 2018

@straight-shoota you could just use read_text_file for that.

@straight-shoota
Copy link
Member

Yeah, but it feels a bit weird to have read_text_file but embed_file for the same thing with a binary file.

@jhass
Copy link
Member

jhass commented Oct 24, 2018

Yeah, but it feels a bit weird to have read_text_file but embed_file for the same thing with a binary file.

That's a discussion we can have once we do embed_file, let's not derail this further than necessary.

@RX14
Copy link
Contributor

RX14 commented Oct 25, 2018

Yeah, but it feels a bit weird to have read_text_file but embed_file for the same thing with a binary file.

But it's not the same thing, conceptually or in implementation. That was the whole point of my comment above.

@straight-shoota
Copy link
Member

Both fits in the second bullet point. If I want to embed a file into the binary, I can use embed_file for binary data. But if it's text data and it's supposed to be accessible as String at runtime, I need to use read_text_file instead.

@asterite
Copy link
Member

Any other language that lets you embed data using asm? It seems like a huge portability pain to do that.

@RX14
Copy link
Contributor

RX14 commented Oct 25, 2018

@asterite any language with asm which uses a real assembler lets you do it.

Doing file embedding with global_asm just means that we pass all the work over to LLVM, and don't have to add any code in the crystal compiler to do this. The asm isn't platform-specific because the ASM block consists only of assembler directives, no instructions.

The implementation doesn't matter anyway.

@bcardiff
Copy link
Member

bcardiff commented Nov 7, 2018

NB: We should wait for 1 more core team re review before merging. And it should be merged with a squash.

@straight-shoota
Copy link
Member

@Sija could squash it and add a nice commit message =)

@bcardiff bcardiff merged commit 73f514e into crystal-lang:master Nov 8, 2018
@bcardiff bcardiff added this to the 0.27.1 milestone Nov 8, 2018
@bcardiff
Copy link
Member

bcardiff commented Nov 8, 2018

Merged :-)
The nice commit message was the title :-P

Thanks everybody for the discussion!

@Sija
Copy link
Contributor Author

Sija commented Nov 14, 2018

It occured to me that perhaps it would be sensible to have nilable read_file? (current read_file) and raising read_file variants, whatcha think?

@bcardiff
Copy link
Member

@Sija you mean that read_file will cause a compile time failure if the file does not exist, right?
I think that make sense I would not even add a read_file?.

@Sija
Copy link
Contributor Author

Sija commented Nov 14, 2018

@bcardiff yep, exactly. Although I would (add read_file? variant), since I can easily imagine use case where you'd like to read file with certain fallback which wouldn't be possible when using raising variant.

@Sija
Copy link
Contributor Author

Sija commented Nov 14, 2018

@asterite true, yet it simplifies things in case like this:

{% version = read_file("../../VERSION") %}
{{ version ? version.chomp : nil }}

vs

{{ read_file("../../VERSION").chomp }}

@straight-shoota
Copy link
Member

{{ read_file("../../VERSION") || raise "Missing VERSION file" }}

@Sija
Copy link
Contributor Author

Sija commented Nov 14, 2018

@straight-shoota nope, it ain't work since I'd need to use .chomp. Having #try would solve it but sadly, it doesn't exist in macros.

@straight-shoota
Copy link
Member

{{ (read_file("../../VERSION") || raise "Missing VERSION file").chomp }}

@woodruffw
Copy link
Contributor

woodruffw commented Nov 15, 2018

👍 For a version that errors at compile time if the file doesn't exist (or I/O fails in any way). The second || raise workaround suffices, but having the error as a property of the macro itself is DRYer and feels like a natural extension of an I/O operation at compile time.

@Sija
Copy link
Contributor Author

Sija commented Nov 15, 2018

I agree with @woodruffw, @straight-shoota's workaround is good enough (though it still "hides" the root cause of not returning the file contents - which might be for instance permissions problem, not only missing file), yet having it "baked" into the method itself feels more natural (as long as there's nilable variant too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants