-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PATCH] Replace File::Slurp with Path::Tiny in perlfaq5 #13836
Comments
From [email protected]Apologies if this has already been discussed. I don't subscribe to p5p This patch replaces the example/recommendation of using File::Slurp The reasoning behind this is an outstanding critical bug with File::Slurp: https://rt.cpan.org/Public/Bug/Display.html?id=83126 This is a bug report for perl from mcgrath.martin@gmail.com, Inline Patchdiff --git a/cpan/perlfaq/lib/perlfaq5.pod b/cpan/perlfaq/lib/perlfaq5.pod
index a2baf16..25d494f 100644
--- a/cpan/perlfaq/lib/perlfaq5.pod
+++ b/cpan/perlfaq/lib/perlfaq5.pod
@@ -200,7 +200,7 @@ you can fit the whole thing in memory!):
print $out $content;
-Modules such as L<File::Slurp> and L<Tie::File> can help with that
+Modules such as L<Path::Tiny> and L<Tie::File> can help with that
too. If you can, however, avoid reading the entire file at once. Perl
won't give that memory back to the operating system until the process
finishes.
@@ -1133,13 +1133,13 @@ C<$DB_RECNO> bindings, which allow you to tie
-If you want to load the entire file, you can use the L<File::Slurp> - use File::Slurp; - my $all_of_it = read_file($filename); # entire file in scalar Or you can read the entire file contents into a scalar like this: --------------1.8.3.2-- Flags: Site configuration information for perl 5.19.12: Configured by marto at Sat May 10 12:15:37 BST 2014. Summary of my perl5 (revision 5 version 19 subversion 12) configuration: Locally applied patches: @INC for perl 5.19.12: Environment for perl 5.19.12: |
From [email protected]0001-Replace-File-Slurp-with-Path-Tiny-in-perlfaq5.patchFrom: Martin McGrath <[email protected]>
Date: Sat, 10 May 2014 12:47:51 +0100
Subject: [PATCH] Replace File::Slurp with Path::Tiny in perlfaq5
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------1.8.3.2"
This is a multi-part message in MIME format.
--------------1.8.3.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit
---
cpan/perlfaq/lib/perlfaq5.pod | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--------------1.8.3.2
Content-Type: text/x-patch; name="0001-Replace-File-Slurp-with-Path-Tiny-in-perlfaq5.patch"
Content-Transfer-Encoding: 8bit
Content-Disposition: attachment; filename="0001-Replace-File-Slurp-with-Path-Tiny-in-perlfaq5.patch"
diff --git a/cpan/perlfaq/lib/perlfaq5.pod b/cpan/perlfaq/lib/perlfaq5.pod
index a2baf16..25d494f 100644
--- a/cpan/perlfaq/lib/perlfaq5.pod
+++ b/cpan/perlfaq/lib/perlfaq5.pod
@@ -200,7 +200,7 @@ you can fit the whole thing in memory!):
print $out $content;
-Modules such as L<File::Slurp> and L<Tie::File> can help with that
+Modules such as L<Path::Tiny> and L<Tie::File> can help with that
too. If you can, however, avoid reading the entire file at once. Perl
won't give that memory back to the operating system until the process
finishes.
@@ -1133,13 +1133,13 @@ C<$DB_RECNO> bindings, which allow you to tie an array to a file so that
accessing an element of the array actually accesses the corresponding
line in the file.
-If you want to load the entire file, you can use the L<File::Slurp>
+If you want to load the entire file, you can use the L<Path::Tiny>
module to do it in one simple and efficient step:
- use File::Slurp;
+ use Path::Tiny;
- my $all_of_it = read_file($filename); # entire file in scalar
- my @all_lines = read_file($filename); # one line per element
+ my $all_of_it = path($filename)->slurp; # entire file in scalar
+ my @all_lines = path($filename)->lines; # one line per element
Or you can read the entire file contents into a scalar like this:
--------------1.8.3.2--
|
From @jhiOn Wednesday-201405-14, 5:11, Martin McGrath (via RT) wrote:
In addition to File::Slurp / perlfaq problem, shouldn't this:
be classified as a serious issue in sysread? Though, I think, one could well argue that trying to read anything else
|
The RT System itself - Status changed from 'new' to 'open' |
From @LeontOn Wed, May 14, 2014 at 4:14 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote:
Yes. This is essentially a core bug that's leaking out because File::Slurp Leon |
From @epaInteresting. Have you suggested this as a Perl::Critic policy too? -- |
From @jhiOn Wednesday-201405-14, 10:23, Leon Timmermans wrote:
Probably not for 5.20, harhar... but certainly we could try purposefully |
From @jhi
I'll file a bug on this.
|
From @perhunterOn 05/14/2014 10:23 AM, Leon Timmermans wrote:
and how would you go about fixing it in file::slurp? i am not a thanx, uri -- |
From @LeontOn Wed, May 14, 2014 at 5:14 PM, Uri Guttman <uri@stemsystems.com> wrote:
File::Slurp is nothing more than a workaround for a performance bug in Leon |
From @jhiOn Wednesday-201405-14, 11:30, Leon Timmermans wrote:
FWIW re the UTF-8 validity/strictness: I think we're well within our RFC 3629 states "Implementations of the decoding algorithm MUST protect |
From @xdgOn Wed, May 14, 2014 at 12:34 PM, Jarkko Hietaniemi <jhi@iki.fi> wrote:
It's worth considering two separate issues: * invalid UTF-8 encoding As I understand it, :utf8 is lax in both respects. I'd be very -- |
From @druud62On 2014-05-14 17:30, Leon Timmermans wrote:
Not sure if this is still true for the latest Perls, but you better my or it will use twice the memory. -- |
From @perhunterOn 05/14/2014 11:30 AM, Leon Timmermans wrote:
if you think that is all there is to it, then do the workaround. most over 500 modules and distros use file::slurp. tell them all to use the in the mean time, instead of saying it uses sysread and can't handle utf uri -- |
From [email protected]On 14 May 2014 15:45, Ed Avis via RT <perlbug-followup@perl.org> wrote:
Not as yet, I've forked the repo and will work on this tonight. Thanks |
From @jhiOn Wednesday-201405-14, 22:57, Uri Guttman wrote:
Based on some torture testing I just did, sysread() isn't the only thing |
From @tonycozOn Wed, May 14, 2014 at 10:57:41PM -0400, Uri Guttman wrote:
Don't use sysread() if there are any layers - since it ignores I'd tend to fallback to read(), whether that meets your performance Tony [1] in a scary way, eg. if your file is UCS-2BE and you open with |
From @LeontOn Thu, May 15, 2014 at 4:57 AM, Uri Guttman <uri@stemsystems.com> wrote:
All of that is true, but you don't need hundreds of lines of conceptually
edit_file and edit_file_lines are horribly broken by ignoring encodings
Don't use sysread, at least not unless you're absolutely sure it gives the Leon |
From @perhunterOn 05/15/2014 08:43 AM, Tony Cook wrote:
so what about sysread in raw mode and then converting the buffer to the thanx, uri -- |
From @perhunterOn 05/15/2014 08:54 AM, Leon Timmermans wrote:
and there aren't many bug reports on those hundreds of lines. in fact
they don't ignore them. they pass on the encoding options to read_file
and you didn't answer the question about calling sysread in raw mode and uri -- |
From @craigberryOn Thu, May 15, 2014 at 9:26 AM, Uri Guttman <uri@stemsystems.com> wrote:
The main problem I'm aware of is that there is no guarantee the buffer |
From @perhunterOn 05/15/2014 10:46 AM, Craig A. Berry wrote:
the whole idea of slurp is to read in the entire file into a string. any so again, i will ask the simple question to which i have not gotten a so i will be blunt here, put up or shut up. i am asking for help on this thanx, uri -- |
From @demerphqOn 15 May 2014 17:09, Uri Guttman <uri@stemsystems.com> wrote:
Reading the full file and then calling Encode::decode() on it with the cheers, -- |
From @perhunterOn 05/15/2014 11:31 AM, demerphq wrote:
yves, thanks for the direct answer. now the question is should i just pass in thanx, uri -- |
From @autarchOn Thu, 15 May 2014, Uri Guttman wrote:
This will only sort of work. With binmode you can set multiple layers and Encode can accept anything that would be passed to the ":encoding(...)" I suspect a better fix would be to simply _not_ use sysread if binmode is -dave /*============================================================ |
From @perhunterOn 05/15/2014 12:09 PM, Dave Rolsky wrote:
what about adding another option that is passed directly to decode? thanx, uri -- |
From @ikegamiOn Thu, May 15, 2014 at 11:09 AM, Uri Guttman <uri@stemsystems.com> wrote:
Yes, but it won't handle layers other than :encoding, and it won't handle |
From @perhunterOn 05/15/2014 01:16 PM, Eric Brine wrote:
that can be documented. most callers of read_file won't use it with thanx, uri -- |
From @LeontOn Thu, May 15, 2014 at 4:31 PM, Uri Guttman <uri@stemsystems.com> wrote:
It's just *completely* broken with encodings, which is a fairly critical they don't ignore them. they pass on the encoding options to read_file and
Sorry, it seems I had misread the docs there.
Yes, you can, but why would you go through these complications? What makes sub read_file($filename, $encoding = 'utf8') { It's not that this can't be made to work, it's that a 100+ lines of code Leon |
From @perhunterOn 05/15/2014 02:10 PM, Leon Timmermans wrote:
and why would you need to explicitly pass in cdlf when it can do line
i am not that dumb as to not pass in those options. the idea of those is
so do that your self. and what about all the users out there who don't
tricky things for optimizations? you still haven't really read the code. uri -- |
From @ikegamiOn Thu, May 15, 2014 at 2:04 PM, Uri Guttman <uri@stemsystems.com> wrote:
:crlf is also very common. |
From @perhunterOn 05/15/2014 04:00 PM, Eric Brine wrote:
from what i see in perldoc perlio it just converts crlf to \n on thanx, uri -- |
From @LeontOn Thu, May 15, 2014 at 10:00 PM, Eric Brine <ikegami@adaelis.com> wrote:
«binmode => ':crlf'» completely broken on File::Slurp. On Unix it's a ${$buf_ref} =~ s/\015\012/\n/g if $is_win32 && !$opts->{'binmode'} ; Emulating PerlIO layers really is a deep rabbit hole. Leon |
From @perhunterOn 05/15/2014 04:25 PM, Leon Timmermans wrote:
so you don't pass in :crlf to slurp. is that so hard to understand?? it uri -- |
From @davidnicolJust to be difficult, has anyone benchmarked sub read_file($) { `/bin/cat \Q$_[0]\E` } # or "type" on windows as an alternative, recently? |
From @ikegamiOn Thu, May 15, 2014 at 4:43 PM, Uri Guttman <uri@stemsystems.com> wrote:
Yes, but you were talking of breaking it by providing an implementation i get it. you don't like the module. good for you. doesn't mean you have to
Yeah, we hate it so much we just had to try to help you fix it??? |
From @ikegamiOn Thu, May 15, 2014 at 4:50 PM, David Nicol <davidnicol@gmail.com> wrote:
Backticks read the same way C<read> does (8K at a time), so it's slower use Benchmark qw( cmpthese ); sub _read { sub _sysread { cmpthese (-5, { Rate qx sysread read 1.5MB file |
From @perhunterOn 05/16/2014 12:33 AM, Eric Brine wrote:
how does it need to be supported more than it is now? it can be ignored
i was referring to leon, not you. thanx, uri -- |
From @perhunterOn 05/16/2014 12:46 AM, Eric Brine wrote:
is that a mistake with both read and sysread calling _read?
those last two are basically the same result as they call the same code. thanx, uri -- |
From @SmylersLeon Timmermans writes:
That's great to hear, Leon. As well as speeding up people using that Uri Guttman writes:
Indeed, File::Slurp provides a convenient interface for many users, who I interpreted Leon's suggestion above not as something users should do Once using <$fh> has been sped up sufficiently, one possibility would be Best wishes Smylers |
From @SmylersI just wrote:
Erm, I meant speeding up their code, not actually speeding up the Smylers |
From @LeontOn Thu, May 15, 2014 at 10:43 PM, Uri Guttman <uri@stemsystems.com> wrote:
You need it to (portably) open text files created on Windows. That is not
How is pointing out bugs "unhelpful comments"? How is offering solutions Leon |
From @ikegamiOn Fri, May 16, 2014 at 1:38 AM, Uri Guttman <uri@stemsystems.com> wrote:
Of course. Was super duper tired. Fixed: Rate qx read sysread |
From @perhunterOn 05/16/2014 07:38 AM, Leon Timmermans wrote:
so it is a trivial fix to check for :crlf, do the same line conversion uri -- |
From @perhunterOn 05/16/2014 08:12 AM, Eric Brine wrote:
more like what i have seen. that is a lot of speed to make up. i don't also the most common homebrew code i have seen is more like this: $text = join( '', <$fh> ) ; the use of $/ is not well known so advocating it is not a good solution thanx, uri -- |
From @LeontOn Fri, May 16, 2014 at 4:29 PM, Uri Guttman <uri@stemsystems.com> wrote:
It's not so much layer overhead per se, but buffering overhead in a sub read_file ($filename, $layer = ':unix') { is just as fast as the code in File::Slurp (the read itself is the As soon as you do anything that uses a buffer this performance benefits Leon |
From @jkeenanOn Wed May 14 02:11:21 2014, mcgrath.martin@gmail.com wrote:
Discussion in this RT appears to have shifted to how best to fix File::Slurp. File::Slurp is not a core module, so discussion of how to improve it would normally have been conducted in its own bug queue (presumably, https://rt.cpan.org//Dist/Display.html?Queue=File-Slurp). I realize that shutting off discussion in p5p at this point would be Canutian, but can I at least pose these questions: Do we want to modify perlfaq5 in the manner suggested by the OP two days ago? Do we want to modify perlfaq5 at all? If we do, does this go into 5.20? My two cents: File::Slurp is a well-known library. I can recall lightning talks Uri gave about it at YAPCs a decade ago. Whatever problems it has can be hashed out, preferably in its own bug queue and don't warrant its deletion from perlfaq5. "Mend it, don't end it." Perhaps some discussion of Path::Tiny's slurping capabilities ought to be added to perlfaq5. And, no, this does not go into 5.20. Thank you very much. |
From @rjbs* James E Keenan via RT <perlbug-followup@perl.org> [2014-05-16T17:56:32]
No. -- |
From @xdgOn Fri, May 16, 2014 at 10:29 AM, Uri Guttman <uri@stemsystems.com> wrote:
FWIW, if you open the file with :unix, then you don't get :perlio and In my benchmarking, that was as fast or faster in some cases. I've Benchmark: https://gist.github.com/dagolden/c4dad291cd0b7156f268 Source: https://gist.github.com/dagolden/973bcfa11816e2d96c65 (Small was 50k, medium was 500k, large was 5000k) I'm not claiming it's an entirely fair benchmark, but it does show a Path::Tiny has a bunch of method call overhead and abstraction (plus David |
From @perhunterOn 05/16/2014 10:53 PM, David Golden wrote:
and why would that be better than calling sysread directly? given the
you can easily add an entry into the benchmark that come with thanx, uri -- |
From @xdgOn Sat, May 17, 2014 at 12:14 AM, Uri Guttman <uri@stemsystems.com> wrote:
In the real world, probably not much. In a tight slurp loop benchmark
I'll put that on my round tuit list. :-) David -- |
From @druud62On 2014-05-15 02:47, Dr.Ruud wrote:
Apparently fixed in 5.19. -- |
From @tonycozOn Wed May 14 02:11:21 2014, mcgrath.martin@gmail.com wrote:
perlfaq is upstream CPAN, issues in it should be reported at: https://github.com/perl-doc-cats/perlfaq/issues though I suggest you point them at this discussion. Discussion of the bug in File::Slurp belongs in the File::Slurp ticket: https://rt.cpan.org/Ticket/Display.html?id=83126 Closing this ticket. Tony |
@tonycoz - Status changed from 'open' to 'resolved' |
Migrated from rt.perl.org#121870 (status was 'resolved')
Searchable as RT121870$
The text was updated successfully, but these errors were encountered: