Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio.readLines is pathologically slow for very long lines #803

Closed
sswam opened this issue Mar 14, 2021 · 3 comments · Fixed by #867
Closed

bufio.readLines is pathologically slow for very long lines #803

sswam opened this issue Mar 14, 2021 · 3 comments · Fixed by #867
Labels
bug Something isn't working

Comments

@sswam
Copy link

sswam commented Mar 14, 2021

I found that bufio.readLines becomes pathologically slow when given large lines, such as a single 10MB line. Deno took 58 seconds to read and write this line, compared to under 0.1 seconds for Go, Python and Perl. Those other languages are able to read and write a 100MB line in under 0.3 seconds on my laptop, whereas I was not patient enough to wait for Deno bufio to finish that task.

I'm guessing that bufio is reallocing its buffer in small steps, which results in quadratic complexity, rather than the normal method which is to double the capacity of the buffer each time it needs to be expanded, which gives linear complexity. I can have a go at fixing this, but would like to get your go ahead first.

Create a file containing a single 10MB line. I used Perl:

perl -e 'for (1..10000000) { print "X"; } print "\n";' > bigline-10M.txt

readLine.ts: TypeScript Deno code to read and write the line

import { readLines } from "https://deno.land/std/io/bufio.ts";

const lines = readLines(Deno.stdin);

for await (const l of lines) {
        console.log(l);
        break;
}

readLine.py: Python code to read and write the line

import sys
print(sys.stdin.readline(), end='')

readLine.pl: Perl code to read and write the line

print scalar <>;

readLine.go: Go code to read and write the line

package main

import "bufio"
import "os"
import "fmt"
import "log"

func main() {
        var buf []byte = make([]byte, 64*1024)
        scanner := bufio.NewScanner(os.Stdin)
        scanner.Buffer(buf, 1e12)
        if scanner.Scan() {
                fmt.Println(scanner.Text())
        }
        if err := scanner.Err(); err != nil {
                log.Fatal(err)
        }
}

test.sh: script to time execution for each language

time() { command time -f '%e %U %S %C' "$@"; }
time deno run readLine.ts < bigline-10M.txt > /dev/null
time python readLine.py < bigline-10M.txt > /dev/null
time perl readLine.pl < bigline-10M.txt > /dev/null
time go build readLine.go
time ./readLine < bigline-10M.txt > /dev/null

results

58.67 56.13 12.43 deno run readLine.ts
0.05 0.03 0.02 python readLine.py
0.01 0.00 0.00 perl readLine.pl
0.07 0.07 0.04 go build readLine.go
0.02 0.00 0.02 ./readLine
@sswam
Copy link
Author

sswam commented May 10, 2021

Thank-you for working on this.

Deno is still around 20 times slower than the other languages I tested, but much better than before. It doesn't seem to be taking polynomial time now.

I don't expect Deno to match the performance of go, rust, or C, but ideally it should be able to keep up with Python and Perl. But it may be that Python and Perl are doing buffered IO using C code rather than code written in their own language, in which case the difference is understandable.

Here are my new results for a 10MB line, previously this took nearly 1 minute:

0.34 0.35 0.05 deno run readLine.ts
0.02 0.01 0.00 python readLine.py
0.00 0.00 0.00 perl readLine.pl
0.01 0.00 0.00 ./readLine-go
0.00 0.00 0.00 ./readLine-rs
0.00 0.00 0.00 ./readLine-b

and a 100MB line, previously DNF / I didn't wait:

2.34 2.38 0.50 deno run readLine.ts
0.11 0.06 0.04 python readLine.py
0.04 0.02 0.02 perl readLine.pl
0.07 0.05 0.05 ./readLine-go
0.02 0.01 0.01 ./readLine-rust
0.04 0.02 0.01 ./readLine.b

Deno cannot read a 1GB line, whereas the other languages I tested can do it in under 1 second (on Linux):

error: Uncaught (in promise) RangeError: string too long
    yield decoder.decode(chunk);
                  ^
    at TextDecoder.decode (deno:op_crates/web/08_text_encoding.js:4143:21)
    at readStringDelim (file:///home/sam/jz-experiments/deno_std/io/bufio.ts:698:19)
    at async readLines (file:///home/sam/jz-experiments/deno_std/io/bufio.ts:706:18)
    at async file:///home/sam/jz-experiments/readLine.ts:7:18
Command exited with non-zero status 1
23.89 27.58 4.98 deno run readLine.ts
0.85 0.44 0.41 python readLine.py
0.27 0.06 0.21 perl readLine.pl
0.62 0.41 0.44 ./readLine-go
0.24 0.07 0.17 ./readLine-rust
0.29 0.13 0.16 ./readLine.b

Anyway, the performance is "good enough" now in my opinion. I might need to cope with 1MB or 10MB lines, but 1G lines are pretty rare.

@kitsonk
Copy link
Contributor

kitsonk commented May 10, 2021

I believe what you are encountering is: denoland/deno#10157

Which really isn't a problem with the std library and affects any JavaScript/TypeScript IO.

@t829702
Copy link

t829702 commented Jun 27, 2021

@sswam how does Deno compare with Node 's builtin readline on your same machine ?

https://nodejs.org/api/readline.html#readline_example_read_file_stream_line_by_line

  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });
  // Note: we use the crlfDelay option to recognize all instances of CR LF
  // ('\r\n') in input.txt as a single line break.

  for await (const line of rl) {
    // Each line in input.txt will be successively available here as `line`.
    console.log(`Line from file: ${line}`);
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants