-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bufio.readLines is pathologically slow for very long lines #803
Comments
Thank-you for working on this. Deno is still around 20 times slower than the other languages I tested, but much better than before. It doesn't seem to be taking polynomial time now. I don't expect Deno to match the performance of go, rust, or C, but ideally it should be able to keep up with Python and Perl. But it may be that Python and Perl are doing buffered IO using C code rather than code written in their own language, in which case the difference is understandable. Here are my new results for a 10MB line, previously this took nearly 1 minute:
and a 100MB line, previously DNF / I didn't wait:
Deno cannot read a 1GB line, whereas the other languages I tested can do it in under 1 second (on Linux):
Anyway, the performance is "good enough" now in my opinion. I might need to cope with 1MB or 10MB lines, but 1G lines are pretty rare. |
I believe what you are encountering is: denoland/deno#10157 Which really isn't a problem with the std library and affects any JavaScript/TypeScript IO. |
@sswam how does Deno compare with Node 's builtin https://nodejs.org/api/readline.html#readline_example_read_file_stream_line_by_line const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
} |
I found that bufio.readLines becomes pathologically slow when given large lines, such as a single 10MB line. Deno took 58 seconds to read and write this line, compared to under 0.1 seconds for Go, Python and Perl. Those other languages are able to read and write a 100MB line in under 0.3 seconds on my laptop, whereas I was not patient enough to wait for Deno bufio to finish that task.
I'm guessing that bufio is reallocing its buffer in small steps, which results in quadratic complexity, rather than the normal method which is to double the capacity of the buffer each time it needs to be expanded, which gives linear complexity. I can have a go at fixing this, but would like to get your go ahead first.
Create a file containing a single 10MB line. I used Perl:
readLine.ts: TypeScript Deno code to read and write the line
readLine.py: Python code to read and write the line
readLine.pl: Perl code to read and write the line
readLine.go: Go code to read and write the line
test.sh: script to time execution for each language
results
The text was updated successfully, but these errors were encountered: