Reduce memory pressure by stopping cloning all the time #4032

Reinmar · 2017-03-29T16:12:50Z

Disclaimer: I didn't identify any performance issues yet. However, this is a low-level aspect of our engine, a code which is used everywhere millions of times, so we can be sure that it affects the performance. Also, since it touches archtectural decisions, it's a thing which will be hard to change later on. So let's talk about it now.

This topic started in https://github.com/ckeditor/ckeditor5-engine/pull/896/files#r108710032.

The code I commented on looked like this:

const newSelection = Selection.createFromSelection( selection );
const ranges = newSelection.getRanges();

let trimmedRanges = [];

for ( let range of ranges ) {
	trimmedRanges.push( range.getTrimmed() );
}

newSelection.setRanges( trimmedRanges );

My comments:

createFromSelection() gives us setTo() call which e.g. copies fake selection's stuff so that's good. What I like less is the fact that we are copying the ranges here (getRanges() copies them) to copy them again when doing setRanges() to copy them again 3 times on another getRanges(), getTrimmed(), setRanges(). This means 5 times for every selection and we need to trim two selections...

and:

Let me add that this is all due to our lack of trust that no one will touch the ranges and positions.

In a perfect immutable world, we'd only need to copy new instances on getTrimmed(). No other method would need to do that because no other method change ranges. And I think that we must go into this direction. Otherwise, we create a huge memory pressure.

PS. Mind what happens with positions. They are cloned all the time too ;|

Proposal

We introduced cloning on "input" like setRanges() and output like getRanges() because we were worried that someone might change one position somewhere and affect a couple of ranges and selection classes somewhere else. Those issues are usually hard to track.

While I'm still sure that we should worry about such issues I think that once we made ranges and positions nearly immutable (which happened later AFAIR and is still not enforced in the entire code) we secured the stability pretty well. The point is that we can secure the code by either breaking references everywhere or making things completely immutable. We former means loooots of cloning, so let's thing about the latter.

There are just 4 things which one can break:

range.start, range.end,
position.root, position.path.

If we made these 4 things totally read-only, we would be safe even without the cloning. Right?

So, how to make them really read-only? With the first 3 of them it's simple – just make them getters and keep the values under private properties. The path is a bigger issue because it can be modified after retrieving it. Therefore, I think that it can be a getter too but it'd need to clone the array to break the reference. Keep in mind that it's better to clone just this array (if one specifically accessed position.path – the other getters and methods can access the private _path property) rather than clone all the ranges and positions all the time (which clone the path too).

It seems that with this plan we should be safe. The only thing I'm not entirely sure about is whether making those 4 public properties getters which return private properties will not have negative performance impact too. Or rather – how strong it will be (if it exceeds cloning overhead, it'd be a pointless change :D). Unfortunately, without really making this move we won't learn that.

The text was updated successfully, but these errors were encountered:

scofalik · 2017-03-30T06:55:28Z

It's good to keep this in mind, but since this is not causing performance issues yet, I am not sure whether this should be candidate:1.0.0. It doesn't look like a lot of refactor in the code base, but there are some places where we modify position.path manually (for better performance, of course :P) also we probably access those properties a lot in tests.

Other than that we have to keep in mind that we have both regular Ranges and Positions but also LiveRanges and LivePositions. Cloning is useful to "cast" one type to the other type, depending on which one we need. Also cloning is useful when a user would add one Range to two "things" and both of them would operate on same range. It's not only that we don't trust that user won't make range.end = new Position(), it's also preventing cases like this one.

Reinmar · 2017-03-30T07:30:09Z

Also cloning is useful when a user would add one Range to two "things" and both of them would operate on same range

But ranges are immutable (or should be) so you could use one range across the whole app and no one should be able to change it. So, there's no "operate on same range".

It's good to keep this in mind, but since this is not causing performance issues yet, I am not sure whether this should be candidate:1.0.0.

It's in 1.0.0 because if we change this too late we'll certainly cause a lot of headaches. This is transparent for the shape of the API, but can have severe impact on how the code behaves. And those issues are hard to track down.

As for the impact on performance – we just don't know yet. I'm not sure if it's seriously affecting the overall performance, but the point is that without comparison test we won't know. This is a thing which will be pretty much invisible in the CPU profile and may not be obvious in memory profile due to GC.

Reinmar · 2017-03-30T07:31:40Z

PS. by 1.0.0 I don't mean 1.0.0 alpha. We can certainly work on this later.

pjasiun · 2017-03-30T08:00:29Z

I think that we should avoid optimization before finding that this is a real problem. On the one hand, I agree that we do a lot of cloning, create new objects which later need to be removed by the garbage collection. On the other, position contains: 1 reference to root and one array of a few numbers. A range is an object with 2 positions. That's not much.

I remember I did research at the very beginning when we started working on the document model, back it times when we were thinking about the linear model, and the result was that creating new objects is not very costly, because browser handles it very well, including some smart deduplication.

It might be a performance problem, but it might be as important optimisation as thinking if we should do:

const editor = this.editor;
const doc = editor.doc;

or:

const doc = this.editor.doc;

It's also an additional memory usage.

What I mean is we should not change it as long we do not know it's a real performance problem.

Reinmar · 2017-04-24T16:19:52Z

It might be a performance problem, but it might be as important optimisation as thinking if we should do:

(...)

It's also an additional memory usage.

It's incomparable. Here you create a variable which is a ref to the same object. If you clone positions and ranges all the time you'd need an extremely smart algorithm to actually dedupe all this (so these are the same refs too).

I think that we should avoid optimization before finding that this is a real problem.

It's a conceptual problem and it's just a mess now. As I wrote in the very first sentence – I didn't identify a performance issue yet and I don't think you'd be ever able to identify that a certain slowness comes from this. Basically because as a conceptual problem it's scattered all around the code base and it's not just about how long code executes.

There was an article written by, I believe, one of the popular framework's authors. I can't find it now, but one of the things it recommended was to avoid creating a lot of short-lived objects (I think that it discussed passing new objects to every next function in your algorithm). Why is it a problem? I'd need to look for yet another article, this time about GC, which explained that there's a special space for those objects, but it's limited. Abusing it will slow down GC significantly and hence your app.

Now, imagine debugging this. That typing locks every now and then. That when some other things work in the background typing or collab editing jitters. This will be really hard to identify in the timeline.

To sum up, IMO, we should clean up this situation because:

it's about sanity – being able to reason about the performance and the code,
it's potentially backwards incompatible change so the sooner we do it the better (plus, if we ever change it it will be super hard to pinpoint when tracking its consequent bugs),
it's against good practices to flood memory with so many objects,
if any of you would notice the same issue there's a high chance that you'd propose fixing it too, but everyone feels obliged to reply "don't optimise prematurely".

jodator · 2017-10-20T06:47:18Z

OK I've fiddle a bit with Position and LivePosition so far to make them immutable. The code seems to work in manual tests but rewriting tests might be cumbersome so I'd like not to make it more then once ;)

Right now most of the tests fails because of direct modifications to position's parameters (like modifying path array).

To make Position immutable I'd have to:

make path and root getter only (LivePosition uses exported @Protected setter functions)
freeze path array by using Object.freeze( path ) in constructor
remove offset setter

Most of the code that uses return Position.createFromPosition( position ) can be safely change to return position as any modification attempt throw error.

Currently I see two options to make Position calculations - most of them are temporary values that are used to create new Position returned by some transformations, etc.

We can either create a PositionBuilder (this was my first implementation) or create MutablePosition - both of them will be used to calculate (modify) new position.

The only problem is with code that calculates new Positions from old one. Example code that modifies Position:

function getTransformed( positionToTransform ) {
    const position = Position.createFromPosition( positionToTransform );

    position.path.push( 7 );
    position.offset -= 7;
    position.path.pop();
    position.offset++;

    return position;
}

PositionBuilder example:

function getTransformed( positionToTransform ) {
    const posBuilder = PositionBuilder.fromPosition( positionToTransform );

    posBuilder.pusthToPath( 7 );
    posBuilder.setOffset( posBuilder.getOffset() -7 );
    posBuilder.popPath();
    posBuilder.addOneToOffset();

    return posBuilder.build();
}

And I don't like those popPath() methods since there are too many of them (at first I thought there will be just 3-4 of them but with refactoring Position there were many cases so I've kept adding. Much simpler would be to make it's API the same as below (MutablePosition) but I think that in code it might be not clear which one is used (Position or PositionBuilder) so creating those methods somehow makes that separation clear.

Pros:

easy to differantiate from Position
Cons:
more work while refactoring

MutablePositon example:

function getTransformed( positionToTransform ) {
    const mutablePosition = MutablePosition.createFromPosition( positionToTransform );

    mutablePosition.path.push( 7 );
    mutablePosition.offset -= 7;
    mutablePosition.path.pop();
    mutablePosition.offset++;

    return mutablePostion.toPosition();
}

With MutablePosition it will be safer to make it not a child of Position.
Pros:

less code to change
Cons:
might be confusing in code with Position

Actual question: Which one of the above is more acceptable? MutablePosition or PositionBuilder?
The same pattern will be used with other Foo.createFromFoo() patterns. This will create required encapsulation (outsider will not modify passed Position or Range and only Live* could modify their inner values.

scofalik · 2017-10-20T07:39:40Z

I think that PositionBuilder approach is better because it is much less confusing. I like the idea.

jodator · 2017-10-20T09:07:40Z

Example change in tests:

// was:
expected.oldRange.start.offset = 1;
// with builder:
expected.oldRange.start = PositionBuilder.fromPosition( expected.oldRange.start )
    .setOffset( 1 ).build();

Probably the part with expected.oldRange.start will be replaced also (I didn't refactored ranges yet)

pjasiun · 2017-10-20T09:58:42Z

I would choose option 3: add set of methods to Position which return modified positions. Then I will be able to do:

    position = position.getShiftedBy( -7 );
    position = position.getShiftedBy( 1 );

Also, I believe that position changes should make semantic sense. I mean position.path.pop(); is position.getPositionBeforeParentElement();. I think that one should do:

    position = Position.createBefore( postion.getParent() );

I believe it's much more readable then poping element from the path (note that it might be unclear if the new position after you pop number from the path is before or after parent). This is also why I'm not sure if there will be many methods to be added to the current API. I can not imagine any at the moment.

Also, note that if one really need to hack positions, for instance in the OT, he is always able to do:

   let path = position.path.slice( 0 ); //clone
   path.push( 42 ); // magic
   position = new Position( position.root, path );

Other: Make `Position` and `Range` immutable in model and view. Closes #897.

Reinmar · 2017-11-24T16:20:28Z

I did a quick check and this change reduced the performance instead of improving it. Since, besides being an API cleanup, this meant to improve performance, we failed.

I looked into the profiler a bit and I see long ArraySliceFallback calls and still a lot of new Position() calls:

This should be investigated because what we see here is a real performance issue.

For example, to avoid ArraySliceFallback, we could cache the path in Position.

Also, I'd be curious why it's called ArraySliceFallback. Especially, that SimpleSlice is also used there extensively.

Reinmar · 2017-11-24T17:00:23Z

I played with this a bit and added parent path memoization. This helped a bit, but only a bit.

What I noticed then is that slice() is called by maaany other methods too. E.g. when you do getShiftedBy() we use slice twice to clone the array initially in that method and later we clone it inside new Position().

So, I made another change, and passed optional offset to Position( a, b, optionalOffset ). That didn't end well:

This is before:

And after:

The loss on Position() is bigger than the gain on ArraySliceFallback and SimpleSlice.

It's definitely tricky. What is rather clear is that we simply still create millions of positions and optimising their creation code may not be further possible, so architectural changes may be needed.

Reinmar · 2017-11-24T17:04:47Z

PS. 5th undo step created itself 150k positions. 150k...

Reinmar · 2017-11-24T17:08:13Z

But it's still better than before :D 184k.

jacwright · 2018-04-05T18:04:04Z

I'm not sure if getters affect performance, but instead of getters to enforce immutability if you might use Object.freeze in development builds and remove it in production builds. If you are in 'strict' mode you'll see errors when you try to modify a frozen object (I believe). Then you can get the performance of pure property lookup in prod with the safety net in place in dev. (I'm poking my nose in, feel free to ignore it, or punch it)

Reinmar · 2018-04-05T18:51:19Z

Thanks for the tips! I think that we've considered Object.freeze() at some point. Although... maybe we've been thinking only about freezing Position#path.

However, ensuring immutability was only one part of the problem. The other one was that we're doing so many transformations of positions and ranges in algorithms for OT that calculating 5th undo step may generate 150k positions. This seems to consume a significant amount of time.

Our idea for reducing cloning while keeping positions immutable was to use special kind of "fast positions" inside the OT only. Like this:

Immutable position/range is passed to OT's method.
OT's method casts it to a mutable position for internal use. After all, it can't modify the input position anyway.
OT's method does its internal computations by trying to reuse as many mutable positions as possible. That can be done by simply modifying these positions instead of constantly creating new ones by methods like shiftBy().
Once everything is computed, cast the output to an immutable position and return it.

I hope that it would give us a significant boost, but only real tests will prove that it works.

Another option is to research some ideas in the functional programming patterns. I imagine there are ways to deal with huge numbers of short-lived objects. One idea (although, perhaps not from this field) was to use an object pool but I'd actually try mutable positions first as they should be simpler. Object pool would be more error-prone.

jacwright · 2018-04-05T20:08:06Z

Not sure if there are any nuggets we could pull from ProseMirror's positioning which is just a single numeric index. That is a fundamental difference in the API however, but maybe it could spark ideas.

Since their data is immutable they also reuse objects within a document. E.g. the "empty text node" is a single object in memory that will show up in all blocks without any text. Not sure if there are reusable positions or ranges in this case.

scofalik · 2018-04-06T09:37:57Z

I've looked at ProseMirror's indexing and it looks like an interesting idea, however, I am not sure if we would be able to make it work with our OT.

At the moment we use paths and we need paths because they give you direct and unambiguous information on where the change happened. This way we can compare two positions even if we don't have the out-dated/other client's document state. So we are able to resolve operations' conflicts without sending the document state.

If you have just a single number, you also need document state to "decode" it, and compare two positions. I don't remember exactly how's ProseMirror's conflict resolution works, but AFAIR it is based (at least partially) on document state?

jacwright · 2018-04-06T16:03:04Z

it is based (at least partially) on document state

Yes, I think you are right.

Reinmar · 2018-08-28T11:02:09Z

Related ticket: https://github.com/ckeditor/ckeditor5-engine/issues/1518.

jodator · 2020-04-03T08:34:33Z

Related ticket: #6541.

Another thing that might affect performance is that we do create a lot objects. Compare this performance report. After some code optimization in other place the biggest time hog is now Minor GC (Garbage Collector):

CKEditorBot · 2023-10-02T05:18:13Z

There has been no activity on this issue for the past year. We've marked it as stale and will close it in 30 days. We understand it may be relevant, so if you're interested in the solution, leave a comment or reaction under this issue.

CKEditorBot · 2023-11-03T05:13:45Z

We've closed your issue due to inactivity over the last year. We understand that the issue may still be relevant. If so, feel free to open a new one (and link this issue to it).

jodator self-assigned this Oct 19, 2017

scofalik closed this as completed in ckeditor/ckeditor5-engine#1175 Nov 24, 2017

scofalik referenced this issue in ckeditor/ckeditor5-engine Nov 24, 2017

Merge pull request #1175 from ckeditor/t/897

836dfd8

Other: Make `Position` and `Range` immutable in model and view. Closes #897.

Reinmar reopened this Nov 24, 2017

jodator removed their assignment May 22, 2019

mlewand transferred this issue from ckeditor/ckeditor5-engine Oct 9, 2019

mlewand added this to the backlog milestone Oct 9, 2019

mlewand added status:discussion type:improvement This issue reports a possible enhancement of an existing feature. package:engine labels Oct 9, 2019

jodator mentioned this issue Oct 11, 2019

T/897: Make Position/Range immutable. ckeditor/ckeditor5-engine#1175

Merged

jodator mentioned this issue Apr 3, 2020

Cache parent in Position #6541

Closed

jodator added the type:refactor This issue requires or describes a code refactoring. label Apr 3, 2020

pomek removed this from the backlog milestone Feb 21, 2022

CKEditorBot added the status:stale label Oct 2, 2023

CKEditorBot added the resolution:expired This issue was closed due to lack of feedback. label Nov 3, 2023

CKEditorBot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory pressure by stopping cloning all the time #4032

Reduce memory pressure by stopping cloning all the time #4032

Reinmar commented Mar 29, 2017

scofalik commented Mar 30, 2017

Reinmar commented Mar 30, 2017

Reinmar commented Mar 30, 2017

pjasiun commented Mar 30, 2017

Reinmar commented Apr 24, 2017

jodator commented Oct 20, 2017

scofalik commented Oct 20, 2017

jodator commented Oct 20, 2017

pjasiun commented Oct 20, 2017 •

edited

Loading

Reinmar commented Nov 24, 2017 •

edited

Loading

Reinmar commented Nov 24, 2017

Reinmar commented Nov 24, 2017

Reinmar commented Nov 24, 2017

jacwright commented Apr 5, 2018

Reinmar commented Apr 5, 2018 •

edited

Loading

jacwright commented Apr 5, 2018

scofalik commented Apr 6, 2018 •

edited

Loading

jacwright commented Apr 6, 2018

Reinmar commented Aug 28, 2018

jodator commented Apr 3, 2020

CKEditorBot commented Oct 2, 2023

CKEditorBot commented Nov 3, 2023

Reduce memory pressure by stopping cloning all the time #4032

Reduce memory pressure by stopping cloning all the time #4032

Comments

Reinmar commented Mar 29, 2017

Proposal

scofalik commented Mar 30, 2017

Reinmar commented Mar 30, 2017

Reinmar commented Mar 30, 2017

pjasiun commented Mar 30, 2017

Reinmar commented Apr 24, 2017

jodator commented Oct 20, 2017

scofalik commented Oct 20, 2017

jodator commented Oct 20, 2017

pjasiun commented Oct 20, 2017 • edited Loading

Reinmar commented Nov 24, 2017 • edited Loading

Reinmar commented Nov 24, 2017

Reinmar commented Nov 24, 2017

Reinmar commented Nov 24, 2017

jacwright commented Apr 5, 2018

Reinmar commented Apr 5, 2018 • edited Loading

jacwright commented Apr 5, 2018

scofalik commented Apr 6, 2018 • edited Loading

jacwright commented Apr 6, 2018

Reinmar commented Aug 28, 2018

jodator commented Apr 3, 2020

CKEditorBot commented Oct 2, 2023

CKEditorBot commented Nov 3, 2023

pjasiun commented Oct 20, 2017 •

edited

Loading

Reinmar commented Nov 24, 2017 •

edited

Loading

Reinmar commented Apr 5, 2018 •

edited

Loading

scofalik commented Apr 6, 2018 •

edited

Loading