-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory pressure by stopping cloning all the time #4032
Comments
It's good to keep this in mind, but since this is not causing performance issues yet, I am not sure whether this should be Other than that we have to keep in mind that we have both regular |
But ranges are immutable (or should be) so you could use one range across the whole app and no one should be able to change it. So, there's no "operate on same range".
It's in 1.0.0 because if we change this too late we'll certainly cause a lot of headaches. This is transparent for the shape of the API, but can have severe impact on how the code behaves. And those issues are hard to track down. As for the impact on performance – we just don't know yet. I'm not sure if it's seriously affecting the overall performance, but the point is that without comparison test we won't know. This is a thing which will be pretty much invisible in the CPU profile and may not be obvious in memory profile due to GC. |
PS. by 1.0.0 I don't mean 1.0.0 alpha. We can certainly work on this later. |
I think that we should avoid optimization before finding that this is a real problem. On the one hand, I agree that we do a lot of cloning, create new objects which later need to be removed by the garbage collection. On the other, position contains: 1 reference to root and one array of a few numbers. A range is an object with 2 positions. That's not much. I remember I did research at the very beginning when we started working on the document model, back it times when we were thinking about the linear model, and the result was that creating new objects is not very costly, because browser handles it very well, including some smart deduplication. It might be a performance problem, but it might be as important optimisation as thinking if we should do: const editor = this.editor;
const doc = editor.doc; or: const doc = this.editor.doc; It's also an additional memory usage. What I mean is we should not change it as long we do not know it's a real performance problem. |
It's incomparable. Here you create a variable which is a ref to the same object. If you clone positions and ranges all the time you'd need an extremely smart algorithm to actually dedupe all this (so these are the same refs too).
It's a conceptual problem and it's just a mess now. As I wrote in the very first sentence – I didn't identify a performance issue yet and I don't think you'd be ever able to identify that a certain slowness comes from this. Basically because as a conceptual problem it's scattered all around the code base and it's not just about how long code executes. There was an article written by, I believe, one of the popular framework's authors. I can't find it now, but one of the things it recommended was to avoid creating a lot of short-lived objects (I think that it discussed passing new objects to every next function in your algorithm). Why is it a problem? I'd need to look for yet another article, this time about GC, which explained that there's a special space for those objects, but it's limited. Abusing it will slow down GC significantly and hence your app. Now, imagine debugging this. That typing locks every now and then. That when some other things work in the background typing or collab editing jitters. This will be really hard to identify in the timeline. To sum up, IMO, we should clean up this situation because:
|
OK I've fiddle a bit with Right now most of the tests fails because of direct modifications to position's parameters (like modifying To make
Most of the code that uses Currently I see two options to make We can either create a The only problem is with code that calculates new function getTransformed( positionToTransform ) {
const position = Position.createFromPosition( positionToTransform );
position.path.push( 7 );
position.offset -= 7;
position.path.pop();
position.offset++;
return position;
}
function getTransformed( positionToTransform ) {
const posBuilder = PositionBuilder.fromPosition( positionToTransform );
posBuilder.pusthToPath( 7 );
posBuilder.setOffset( posBuilder.getOffset() -7 );
posBuilder.popPath();
posBuilder.addOneToOffset();
return posBuilder.build();
} And I don't like those Pros:
function getTransformed( positionToTransform ) {
const mutablePosition = MutablePosition.createFromPosition( positionToTransform );
mutablePosition.path.push( 7 );
mutablePosition.offset -= 7;
mutablePosition.path.pop();
mutablePosition.offset++;
return mutablePostion.toPosition();
} With
Actual question: Which one of the above is more acceptable? |
I think that |
Example change in tests: // was:
expected.oldRange.start.offset = 1;
// with builder:
expected.oldRange.start = PositionBuilder.fromPosition( expected.oldRange.start )
.setOffset( 1 ).build(); Probably the part with |
I would choose option 3: add set of methods to position = position.getShiftedBy( -7 );
position = position.getShiftedBy( 1 ); Also, I believe that position changes should make semantic sense. I mean position = Position.createBefore( postion.getParent() ); I believe it's much more readable then poping element from the path (note that it might be unclear if the new position after you pop number from the path is before or after parent). This is also why I'm not sure if there will be many methods to be added to the current API. I can not imagine any at the moment. Also, note that if one really need to hack positions, for instance in the OT, he is always able to do: let path = position.path.slice( 0 ); //clone
path.push( 42 ); // magic
position = new Position( position.root, path ); |
Other: Make `Position` and `Range` immutable in model and view. Closes #897.
I did a quick check and this change reduced the performance instead of improving it. Since, besides being an API cleanup, this meant to improve performance, we failed. I looked into the profiler a bit and I see long This should be investigated because what we see here is a real performance issue. For example, to avoid Also, I'd be curious why it's called ArraySliceFallback. Especially, that |
I played with this a bit and added parent path memoization. This helped a bit, but only a bit. What I noticed then is that So, I made another change, and passed optional offset to This is before: And after: The loss on It's definitely tricky. What is rather clear is that we simply still create millions of positions and optimising their creation code may not be further possible, so architectural changes may be needed. |
PS. 5th undo step created itself 150k positions. 150k... |
But it's still better than before :D 184k. |
I'm not sure if getters affect performance, but instead of getters to enforce immutability if you might use |
Thanks for the tips! I think that we've considered However, ensuring immutability was only one part of the problem. The other one was that we're doing so many transformations of positions and ranges in algorithms for OT that calculating 5th undo step may generate 150k positions. This seems to consume a significant amount of time. Our idea for reducing cloning while keeping positions immutable was to use special kind of "fast positions" inside the OT only. Like this:
I hope that it would give us a significant boost, but only real tests will prove that it works. Another option is to research some ideas in the functional programming patterns. I imagine there are ways to deal with huge numbers of short-lived objects. One idea (although, perhaps not from this field) was to use an object pool but I'd actually try mutable positions first as they should be simpler. Object pool would be more error-prone. |
Not sure if there are any nuggets we could pull from ProseMirror's positioning which is just a single numeric index. That is a fundamental difference in the API however, but maybe it could spark ideas. Since their data is immutable they also reuse objects within a document. E.g. the "empty text node" is a single object in memory that will show up in all blocks without any text. Not sure if there are reusable positions or ranges in this case. |
I've looked at ProseMirror's indexing and it looks like an interesting idea, however, I am not sure if we would be able to make it work with our OT. At the moment we use paths and we need paths because they give you direct and unambiguous information on where the change happened. This way we can compare two positions even if we don't have the out-dated/other client's document state. So we are able to resolve operations' conflicts without sending the document state. If you have just a single number, you also need document state to "decode" it, and compare two positions. I don't remember exactly how's ProseMirror's conflict resolution works, but AFAIR it is based (at least partially) on document state? |
Yes, I think you are right. |
Related ticket: https://github.com/ckeditor/ckeditor5-engine/issues/1518. |
Related ticket: #6541. Another thing that might affect performance is that we do create a lot objects. Compare this performance report. After some code optimization in other place the biggest time hog is now Minor GC (Garbage Collector): |
There has been no activity on this issue for the past year. We've marked it as stale and will close it in 30 days. We understand it may be relevant, so if you're interested in the solution, leave a comment or reaction under this issue. |
We've closed your issue due to inactivity over the last year. We understand that the issue may still be relevant. If so, feel free to open a new one (and link this issue to it). |
Disclaimer: I didn't identify any performance issues yet. However, this is a low-level aspect of our engine, a code which is used everywhere millions of times, so we can be sure that it affects the performance. Also, since it touches archtectural decisions, it's a thing which will be hard to change later on. So let's talk about it now.
This topic started in https://github.com/ckeditor/ckeditor5-engine/pull/896/files#r108710032.
The code I commented on looked like this:
My comments:
and:
PS. Mind what happens with positions. They are cloned all the time too ;|
Proposal
We introduced cloning on "input" like
setRanges()
and output likegetRanges()
because we were worried that someone might change one position somewhere and affect a couple of ranges and selection classes somewhere else. Those issues are usually hard to track.While I'm still sure that we should worry about such issues I think that once we made ranges and positions nearly immutable (which happened later AFAIR and is still not enforced in the entire code) we secured the stability pretty well. The point is that we can secure the code by either breaking references everywhere or making things completely immutable. We former means loooots of cloning, so let's thing about the latter.
There are just 4 things which one can break:
range.start
,range.end
,position.root
,position.path
.If we made these 4 things totally read-only, we would be safe even without the cloning. Right?
So, how to make them really read-only? With the first 3 of them it's simple – just make them getters and keep the values under private properties. The path is a bigger issue because it can be modified after retrieving it. Therefore, I think that it can be a getter too but it'd need to clone the array to break the reference. Keep in mind that it's better to clone just this array (if one specifically accessed
position.path
– the other getters and methods can access the private_path
property) rather than clone all the ranges and positions all the time (which clone the path too).It seems that with this plan we should be safe. The only thing I'm not entirely sure about is whether making those 4 public properties getters which return private properties will not have negative performance impact too. Or rather – how strong it will be (if it exceeds cloning overhead, it'd be a pointless change :D). Unfortunately, without really making this move we won't learn that.
The text was updated successfully, but these errors were encountered: