-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add yielding line number to {String,IO}#each_line #7275
Add yielding line number to {String,IO}#each_line #7275
Conversation
c2c1a53
to
76e77ef
Compare
76e77ef
to
eb6aa57
Compare
# haiku = "the first cold shower | ||
# even the monkey seems to want | ||
# a little coat of straw" | ||
# haiku.each_line do |stanza, lino| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# haiku.each_line do |stanza, lino| | |
# haiku.each_line do |stanza, line_number| |
There's also still some lino
left in the specs and in other examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, conventially this shortcut is written as lineno
which I'd suggest using instead of lino
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index
is also an option. Let's please use fully-written names.
What happens if you have a file with more than Int32::MAX lines? Or connect a TCP socket to a server that will eventually stream more than this number of lines? I guess overflow exception, whatever if we actually use the line_number variable. |
Having unified |
I played a bit with implementing a unified Integer type, it's super slow, or at least like 1000 times slower than primitives. Maybe it could be further optimized, I don't know, but I think it's very unlikely we'll have something like that unless someone starts to really play with the idea and finds a way to make it fast (the gist is a type that holds either an int or a big int under the hood) |
you can always use |
@RX14 It makes it less likely, but can still happen, if I create a connection and write messages for some weeks. I can specify a u128, but then I'm not saying it's a bad idea to pass the iteration line number, but... aren't we repeating the |
We could also make this a separate method Another option is to simply count on the offending counter being optimized out by LLVM when the line number block arg is not used 😏 |
I don't think overflowing is a problem. I'd probably use the line number when iterating a file's lines, and file lines are super unlikely to overflow. That said, iterating and using an extra local variable for this not-that-common use case is fine too. Otherwise, I'd name it |
Yes, when using files you're probably safe, but @ysbaddaden's example of a long running server connection doesn't sound too far-fetched. And it's definitely a use case to count the number of received lines. |
@ysbaddaden if you managed to overflow a 64-bit signed integer of lines, you've received more than 2^63 bytes, or 9.22 exabytes. |
Adding |
eb6aa57
to
caa0466
Compare
Separated into |
Then I don't see the point. Is this really used commonly enough to justify a new method? At least |
I know I have used this quite a few times, that's why I'd welcome it as a helpful addition. If that's enough to justify adding another method (or actually three), I don't know. |
This method accompanies String#each_line, IO#each_line and File.each_line and yields a second argument to the block containing the line number. It is implemented as a separate method instead of an optional block argument to each_line to avoid potential (although unlikely) overflow when reading large amounts of lines from an IO.
caa0466
to
c415fc6
Compare
I have extracted the uncontested parts of this PR to #7419. |
No need to add another method. I wanted to emphasize that maybe it's not a good idea to introduce a counter in a loop function, which we did once and eventually reverted. If we expect |
Is it that complicate to create our own counter? This extends the API to do a simple thing, I don't see the need. Having the control of the counter, one can decide to be Int32 or even UInt64 to reduce the chance of overflow, depending of the use case. |
This works already: File.open("some_file").each_line.with_index(1) do |line, idx|
puts "#{idx}: #{line}"
end |
@sudo-nice Using an iterator is less efficient than @ysbaddaden I agree that adding a separate method is not a great idea. Either we integrate line numbers into How about using a wrapping counter? This way it won't raise but simply overflow. I figure this would be fine because when using line numbers makes any sense, they're gonna be far smaller than |
A wrapping on overflow counter would kill the whole point of having integer overflow checks in the first place. IMO the iterator is the best solution: no need for yet-another-method, no need for always having a counter. |
We have many variants of |
I have to agree here that the usecases are too few to really warrant the extra API. Using the iterator or a closured var is simple enough. Also there hasn't been any interest in this anymore for over a year, so the pain can't be big. I vote to close this :) |
When iterating through lines you often need a line number. This PR adds them as a second (optional) argument to the block provided to
String#each_line
,IO#each_line
andFile.each_line
.In the updated API docs I removed unrelated features
upcase
andreverse
from the examples.Alternatively, it could be a separate method called
#each_line_with_number
but I think the proposed option is better.A second commit in this PR refactors related specs in
io_spec.cr
which I noticed could be simplified while adding the specs for yielding a line number.