Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature to cache the most recent string index->cursor result #793

Merged
merged 6 commits into from
Oct 19, 2022

Conversation

dpk
Copy link
Contributor

@dpk dpk commented Dec 10, 2021

Not ready to merge yet. Needs more testing and benchmarks.

This is lighter-weight than building a full index->cursor table for the string, adding a constant two words to the memory required to store a string, as opposed to one word for every n characters. The cached cursor is used for any string-ref operation requesting an index after the most-recently-requested index, making potentially quadratic repeated string-ref procedures run in linear time. In theory, it could also use a heuristic to speed up moving backwards through the string when it thinks that moving the old cursor backwards would be faster than starting again at the start of the string. In practice, my logging of when the cached cursor is actually reused during the Chibi compilation and startup process shows that the most common case of moving backwards is going back to the start of the string anyway. (EDIT: Oh, but there are cases like string-index-right where an efficient implementation would want to move backwards character-by-character from the end of the string.)

I’m unsure how well this interacts with threading in Chibi. As I understand it, Chibi’s green threads will never interrupt a C function, so there should be no races regards getting and setting the cache. If multiple threads are accessing the same string at the same time, they’ll trample on each others’ use of the cursor — but it should still be quicker, in growth terms, for both threads than would be just starting at cursor/index 0 each time.

Benchmarks to follow. It remains to be seen if the cost of checking and setting the cursor cache actually improves performance overall, especially considering the impact on small strings. If results are favourable, I think it might even be reasonable to enable this option by default.

Input, especially code review, appreciated.

This is lighter-weight than building a full index->cursor table for
the string, adding a constant two words to the memory required to
store a string, as opposed to one word for every n characters. The
cached cursor is used for any string-ref operation requesting an index
after the most-recently-requested index, making potentially quadratic
repeated string-ref procedures run in linear time. In theory, it could
also use a heuristic to speed up moving backwards through the string
when it thinks that moving the old cursor backwards would be faster
than starting again at the start of the string. In practice, my
logging of when the cached cursor is actually reused during the Chibi
compilation and startup process shows that the most common case of
moving backwards is going back to the start of the string anyway.

Benchmarks to follow.
@dpk
Copy link
Contributor Author

dpk commented Dec 10, 2021

Preliminary (albeit likely unfairly flattering) benchmarks show that this patch does actually marginally speed up access to even very small strings. With the default SEXP_STRING_INDEX_TABLE_CHUNK_SIZE of 64, the results are most noticeable on strings slightly longer than 76 characters, the minimum to trigger Chibi into generating the offset table at all. With these strings SEXP_USE_STRING_INDEX_TABLE has a notable negative performance impact, whereas the new SEXP_USE_STRING_REF_CACHE is slightly faster (by an insignificant amount, possibly a sampling error). I’ve tested with strings in English (ASCII-only), German (mostly ASCII), Ancient Greek (mostly 2 bytes per character in UTF-8), and Sanskrit (mostly three bytes per character in UTF-8). I’ll post my benchmark code when I’ve tested longer strings.

Results on tiny strings (between 5 and 12 characters long):

vanilla          36.84 real        36.33 user         0.18 sys
string-index     37.72 real        36.52 user         0.15 sys
string-ref-cache 35.29 real        34.93 user         0.12 sys

Results on small strings (all just over 76 characters):

vanilla          105.69 real       104.41 user         0.52 sys
string-index     111.58 real       110.23 user         0.49 sys
string-ref-cache 104.66 real       103.59 user         0.40 sys

Mostly in the weeds so far, though it’s encouraging to see that it doesn’t seem to have any negative time impact. (Practical memory impact is yet to be tested.) As remarked in features.h, the big improvements, if any, will come with very large strings.

@ashinn
Copy link
Owner

ashinn commented Dec 10, 2021

I'm OK with even a small hit in string-ref performance if it helps some common quadratic patterns. This doesn't slow down the cursor API or any of the high-level APIs built on cursors.

The only real concern is the memory overhead. Chibi assumes 4-word aligned heap objects, so your change increases the current 4-word string wrapper to 8 words (not counting the underlying byte vector).

The other minor concern is this might encourage people to think string-ref is fast for all cases in Chibi and they don't need cursors, when in fact random access would still be slow.

@dpk
Copy link
Contributor Author

dpk commented Dec 11, 2021

I'm OK with even a small hit in string-ref performance if it helps some common quadratic patterns. This doesn't slow down the cursor API or any of the high-level APIs built on cursors.

Okay, good to know. So far, there doesn’t appear to be a speed performance hit, as you can see above. I’ll have to see what effect changing it to be able to optimize string accesses backwards as well as forwards has.

It just occurred to me that the same cache can be used to speed up sexp_string_cursor_to_index as well as sexp_string_index_to_cursor. I’ll also see about extending my patch to do that. That should also not slow down doing things the pure string-cursor way: the APIs which only use cursors seemingly leave sexp_string_cursor_to_index almost entirely untouched.

The only real concern is the memory overhead. Chibi assumes 4-word aligned heap objects, so your change increases the current 4-word string wrapper to 8 words (not counting the underlying byte vector).

Okay, good to know. How should I measure memory usage? SEXP_USE_DEBUG_GC or external tools or both? If the former, what figures should I look at particularly? The speed tests above are from Mac OS’s /usr/bin/time, which can’t measure memory usage, but I’ve installed GNU time which can.

The other minor concern is this might encourage people to think string-ref is fast for all cases in Chibi and they don't need cursors, when in fact random access would still be slow.

How common is truly random access to strings? I’m doubtful that it’s common enough that this need be a concern. The only pseudo-random access pattern I can think of is the one I mentioned above, of two threads simultaneously trying to process the same string. That could be a problem. Certainly people should still be encouraged to use the string cursor APIs when possible, though.

@ashinn
Copy link
Owner

ashinn commented Dec 11, 2021

You can check the chibi/red/snow image sizes for a start.

@ashinn
Copy link
Owner

ashinn commented Dec 17, 2021

Regarding non-linear usage patterns, backtracking regexp and parsing libraries fall into this case, as well as Boyer-Moore (though that tends to be more localized). Even non-backtracking parsers will tend to record submatch indexes to lazily generate substrings, and the caller may extract the substrings in an arbitrary order.

More pathological cases are a space optimized string set or dictionary which concatenates many strings into a single string, storing only the start offsets externally (e.g. in a compact uvector). The entries may be accessed entirely randomly.

@dpk
Copy link
Contributor Author

dpk commented Dec 27, 2021

You can check the chibi/red/snow image sizes for a start.

For me, they’re all exactly the same size as they are when built without this patch.

@dpk
Copy link
Contributor Author

dpk commented Dec 28, 2021

I’ve possibly discovered a bug in SEXP_USE_STRING_INDEX_TABLE while writing benchmarks. It seems that sometimes the index table can possibly sometimes contain invalid cursors (pointing to partial UTF-8 bytes) — at least that’s the only explanation I can think of for a program which only uses string-ref (no string-set!, no direct use of cursors) coming out with ERROR in next-char on line 25 of file benchmarks/strings/02-random-char-access-file.scm: string-ref: invalid utf8 byte: {String-Cursor #15 5184}. Will investigate.

@dpk dpk marked this pull request as draft January 23, 2022 19:49
@dpk
Copy link
Contributor Author

dpk commented Feb 10, 2022

Still haven’t had time to finish working out some practical benchmarks for this, but coincidentally I did just learn that Emacs uses basically the same trick.

(It doesn’t appear to be officially used in the Emacs source, but the name ‘bookmark’ for this is actually pretty good. Much less awkward than ‘string-ref cache’)

@ashinn
Copy link
Owner

ashinn commented Feb 11, 2022

Thanks for the link Daphne, that's an interesting article. Obviously I prefer the Julia/Go approach :)

[skip ci]
@dpk
Copy link
Contributor Author

dpk commented Mar 3, 2022

Chicken Scheme is also now planning to switch to doing this.

I’ve merged in a branch which lets the bookmark move backwards as well as forwards through the string. I had intended to benchmark that separately, since moving backwards is slower than moving forwards for a few reasons, but in practice the factors involved don’t seem to be large enough for this to be a concern — there‘s still generally a speedup compared to starting from the beginning of the string.

I’m marking this ready for merge, but I’d appreciate a code review. In particular, I’m not sure if my change to the sexp_struct for strings requires a change in _sexp_type_specs — that code is just completely opaque to me. (Could this be why the image sizes weren’t affected?)

@dpk dpk marked this pull request as ready for review March 3, 2022 18:00
@dpk
Copy link
Contributor Author

dpk commented Sep 26, 2022

Apologies for the bump, but I had someone recently asking when this would be merged. Could you take a look at the remaining issue with _sexp_type_specs?

@ashinn
Copy link
Owner

ashinn commented Oct 1, 2022

Apologies for the delay, I lost track of this. You don't need to touch _sexp_type_specs in this case. I'll take a final look shortly before merging.

@ashinn ashinn merged commit d67fa42 into ashinn:master Oct 19, 2022
@ashinn
Copy link
Owner

ashinn commented Oct 19, 2022

LGTM, sorry for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants