Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string constructor performance #1742

Closed
stevesmoot opened this issue May 15, 2024 · 9 comments · Fixed by #1744
Closed

string constructor performance #1742

stevesmoot opened this issue May 15, 2024 · 9 comments · Fixed by #1744
Assignees

Comments

@stevesmoot
Copy link
Collaborator

stevesmoot commented May 15, 2024

I'll attach a sample, but I discovered that loading a pre-computed hash table got really slow on spicy.
After some false trails, I think its actually constructing strings that is the problem. Compare the two attached .spicy files in the zip, in the next comment.
one has 10k empty strings (fast) the other 10k one byte strings (slow) for me
on 7.0.0-dev.254. On 6.0.3 and 6.2.0 however they are both 28s.

@stevesmoot
Copy link
Collaborator Author

stevesmoot commented May 15, 2024

I tried to solidify the further and got some confusing results, so will add more over time.
verified on the right build and path.
Retried with 10,000 element vectors that gives 28 sec vs 14 min, so not as dramatic but the same effect I guess.
time spicyz -o ex ex10000.spicy;
real 14m57.615s
user 14m17.929s
sys 0m42.904s
ex2.zip

time spicyz -o ex ex-null10000.spicy

real 0m27.564s
user 0m32.288s
sys 0m1.792s

@stevesmoot
Copy link
Collaborator Author

My orig discovered issue took place in an "import" - just checked an import version and the effect is the same. (not worse). Just noting to rule that out.

@stevesmoot
Copy link
Collaborator Author

note I was very loosely saying string above, when really it was bytestrings.

As I thought it would be interesting, I morphed it into a string test. On the plus side it wasnt in minutes, but clearly there is some interesing optimization happening:
ex-string-10000 38 s
ex-string-null-10000 7 s

so the one char long case dropped down to almost the null case for bytestrings, but the actual null case plummeted 4x as well, so it was actually 6x longer with one char in strings. These are consistent with 6.2 (same delta). So potential opt point, but not a problem....its something about bytestrings.

@bbannier
Copy link
Member

bbannier commented May 15, 2024

Since this issue is strictly about compilation performance it has not much to do with the performance of an actual constructor (these would be invoked at runtime, not at compile time). Instead it appears that the issue is that for big container literals we generate C++ code which is (very) expensive to compile (this depends on the exact C++ compiler used, but seems consistent). In particular for Spicy code like this

global xs = vector(1, 2, 3, 4, 5);

we generate C++ roughly equivalent to this

std::vector xs{1, 2, 3, 4, 5};

The elements are passed as an std::initializer_list to the std::vector constructor which for huge containers triggers the C++ compiler performance edge case seen (we are talking about vectors with 10,000 elements here). A similar issue exists for set or map literals.

I opened #1744 to generate C++ code which seems to be easier to digest for the compiler.

For the time being you could unroll the initialization of "huge" containers yourself, e.g., for a vector (set and map could follow a similar pattern),

global xs: vector<uint8>; # Construct empty `vector`.

# Calling `reserve` for a `vector` before inserting might improve runtime performance.
xs.reserve(5);

# Add elements.
xs.push_back(1);
xs.push_back(2);
xs.push_back(3);
xs.push_back(4);
xs.push_back(5);

Moving this to a dedicated function might improve readability and maintenance of your code (but still not allow initializing a const, see #1745).

@bbannier bbannier self-assigned this May 15, 2024
@vpax
Copy link

vpax commented May 15, 2024

I'd be surprised if this boils down to large vector initializers. I switched to using those for -O gen-C++ in order to speed up compilation speed, and a micro-benchmark I just tried on my Mac has clang++ -O3 compiling a vector initialized to a million elements in under a second.

@vpax
Copy link

vpax commented May 15, 2024

I should add: in particular, the original code was like the unrolling you describe, and that took many minutes to compile, whereas using vectors is 100x faster.

@stevesmoot
Copy link
Collaborator Author

also I think the string result (vs) bytestring argues against it being vector. Had another idea for benchmarking, will try later today.

@bbannier
Copy link
Member

bbannier commented May 15, 2024

I'd be surprised if this boils down to large vector initializers. I switched to using those for -O gen-C++ in order to speed up compilation speed, and a micro-benchmark I just tried on my Mac has clang++ -O3 compiling a vector initialized to a million elements in under a second.

I also benchmarked to be sure, so maybe I am not looking at the right thing. I compared code using an initializer list of a non-trival object (in Smoot's case a bytes, but for benchmarking a std::string)

std::vector<std::string> f() {
    auto xs = std::vector<std::string>{
        " ", // Repeated `n` times.
             // ...
    };

    return xs;
}

int main() { f(); }

and another version using push_back

    auto xs = std::vector<std::string>();
    xs.push_back(" "); // Repeated `n` times.
    // ../

I then compiled this with clang++ --std=c++17 -O3 ... and measured the times. I see quadratic behavior when calling the ctr taking a std::initializer_list, and roughly flat behavior for push_back (likely: time to compile operations just disappears in background).

Screenshot 2024-05-15 at 5 47 29 PM

If the std::vector contained a trivial type like int using an initializer or not makes no noticeable difference.

All this is roughly equivalent to what is happening for the reproducer; #1744 switches from initializers to insertion and I see an improved performance.

@vpax
Copy link

vpax commented May 15, 2024

If the std::vector contained a trivial type like int using an initializer or not makes no noticeable difference.

That's presumably the difference. Your example above used ints so that's what I timed (and what the -O gen-C++ code uses, where the ints serve for table-driven initialization of more complex objects using much less code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants