Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Add native TextBuffer implementation #5

Merged
merged 316 commits into from
Jun 8, 2017
Merged

Add native TextBuffer implementation #5

merged 316 commits into from
Jun 8, 2017

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Jan 13, 2017

Core functionality

  • Implement basic text access methods in terms of an immutable, contiguous 'base text' and a mutable Patch. This representation will significantly improve our memory efficiency and data locality when dealing with large files, and allow many operations like serialize and isModified to perform well regardless of the file size.
    • randomized tests
  • Implement TextBuffer::create_snapshot(). This will allow the text of the buffer to be used in a read-only fashion in a background thread, even while mutations are going on in the main thread. We will use this to add an asynchronous save method that does no copying.
    • randomized tests
  • Implement Text::write. This will serialize a text slice to a C++ stream in a given encoding. We will use this when saving the buffer to write each chunk of the buffer to disk.
  • Implement a text_diff method that returns a Patch that we can use to update markers correctly when the buffer's on-disk contents change. We'll use Google's diff-match-patch library for the diffing. The library has been ported to work with STL-compatible strings.
    • randomized tests
  • Implement a search method that can be used to search for a given regex pattern in either the buffer or a snapshot of the buffer. We'll use the PRCE regex library since it supports a superset of the ECMAScript regex syntax, and it can search text that is stored in a non-contiguous structure, via the partial match API.
    • randomized tests
  • Implement an efficient is_modified method that performs well regardless of the file size.
    • remove noop changes from patch after splicing, to avoid keeping unnecessary nodes
    • make is_modified work correctly even when there are outstanding snapshots
  • Handle IO failure in .save and .load
  • Get IO methods working on windows. On POSIX systems, we can use libiconv, which is installed on users' systems by default. On windows, we may be able to use a port of this library.

Optimizations

  • Avoid any unnecessary copying in search.
  • Optimize UTF8 transcoding performance using stdlib rather than iconv
  • Use stdio rather than fstream to maximize IO performance
  • Make sure that performance is ok when there is a large contiguous unsaved change.
  • Make sure that performance is ok when there are huge numbers of edits
  • In Patch::splice mutate nodes' existing text rather than creating new text.
  • Avoid the overhead of storing old and new text pointers on each Patch node if we're not storing text in the Patch.

Integration

  • Create node binding for TextBuffer
  • Create emscripten binding for TextBuffer
  • Implement an async TextBuffer.save
  • Implement an async TextBuffer.load
  • Add an API for retrieving the unsaved changes of a buffer efficiently, as a serialized Patch plus some digest of the buffer's base text
  • Add an async TextBuffer.search API that we can use as an efficient for read operations that are currently slow for large files, such as checking if hard tabs are used, and checking what line endings are used.
  • Use native buffer into text-buffer repo. (Use native buffer implementation from superstring text-buffer#225)
  • Use new text-buffer branch in Atom (Use new native text-buffer implementation atom#14435)
    • Get Atom usable with new native TextBuffer implementation
    • Get all Atom tests passing
    • Get all bundled package tests passing

@nathansobo nathansobo force-pushed the text-buffer branch 2 times, most recently from 0bbafab to ddb73a3 Compare January 15, 2017 18:41
@arcanis arcanis mentioned this pull request Jan 16, 2017
@nathansobo nathansobo force-pushed the text-buffer branch 5 times, most recently from a3f92a0 to aa1cd98 Compare January 20, 2017 22:59
@maxbrunsfeld maxbrunsfeld force-pushed the text-buffer branch 3 times, most recently from 93832b2 to 688ee28 Compare February 1, 2017 20:02
@nathansobo nathansobo force-pushed the text-buffer branch 2 times, most recently from e5919b6 to b7c3bb2 Compare February 3, 2017 13:58
@maxbrunsfeld maxbrunsfeld force-pushed the text-buffer branch 2 times, most recently from b84275d to fa8cd99 Compare March 11, 2017 00:40
@maxbrunsfeld maxbrunsfeld force-pushed the text-buffer branch 3 times, most recently from 1fd28a8 to 340e586 Compare June 8, 2017 19:26
At least on macOS with libc++, it is like 3x faster.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants