Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FetchedRecordsController uses gigs of memory, my fault ? #263

Closed
skrew opened this issue Oct 12, 2017 · 14 comments
Closed

FetchedRecordsController uses gigs of memory, my fault ? #263

skrew opened this issue Oct 12, 2017 · 14 comments

Comments

@skrew
Copy link

skrew commented Oct 12, 2017

Hi,

I have a problem using FetchedRecordsController, i have read the documentation but don't know if i'm doing anything wrong or if there are a problem with large inserts.

When i insert a second batch of about 1000 items, it consume a lot of memory (i kill it after 20 gigas (on the simulator of course ;p))

Please look my sample code (may need a pod install)

grdb1.zip

Thanks

@groue
Copy link
Owner

groue commented Oct 13, 2017

Hello @skrew,

Thanks a lot for your sample project. It clearly demonstrates the very high memory consumption.

No, it's not your fault. When asked to produce detailed changes, FetchedRecordsController uses a Levenshtein diff algorithm which is known to have a high complexity of O(N*M), and to be memory hungry.

I'm thus not quite surprised that it would have difficulties producing detailed changes when there are many rows.

So here is my advice: if your application doesn't need detailed changes, don't ask for them: don't provide the willChange and onChange callbacks:

trackController.trackChanges(
    didChange: { controller in
        ...
    })

When the willChange and onChange callbacks are not provided, FetchedRecordsController uses a much simpler change detection algorithm, which can easily deal with thousands of rows.

@groue groue added the question label Oct 13, 2017
@skrew
Copy link
Author

skrew commented Oct 13, 2017

Hi @groue

The problem is i need to know which column got updated because i do fine update (eg, if it's an image, i only reload the UIImageView, i don't update all items of the cell...)

But the problem are not so simple...
Sorry people, i have to write in french, my english are too bad and it's late !

Ce que je ne comprend pas, c'est que ca fonctionne très bien lors d'un 1er batch, quelque soit le nombre de rows...
C'est seulement lors du 2ème batch que l'utilisation de la mémoire s'emballe... C'est montré dans mon sample.

Est-ce qu'il y aurait un moyen de faire une sorte de "reset" entre 2 transactions ? Qu'est ce qui fait que la 1ère transaction fonctionne, mais pas la 2ème ?

@groue
Copy link
Owner

groue commented Oct 13, 2017

Il est tard ici aussi 😉 À demain pour une réponse plus détaillée !

@skrew
Copy link
Author

skrew commented Oct 13, 2017

Pas de problème, je vais pas tarder à me coucher aussi... Je sais qu'on a le même fuseau horaire ! :) Et de toute façon je suis bloqué... Et du coup, ca va être dur de dormir en me disant que s'il y a pas de solution, je vais devoir changer pas mal de choses dans le code ! :)

@groue
Copy link
Owner

groue commented Oct 13, 2017

Bonjour !

I'll switch back to English, if you don't mind.

You wonder why your test project easily computes a first diff, and has difficulties computing the second. The answer is that the first diff is computed from an empty array to an array of thousands (easy), while the second is computed from an array of thousands to an array of even more thousands (difficult). This is the consequence of the non-linear complexity of the Levenshtein diffing algorithm, which gets worse and worse as both the number of "old" and "new" rows are big.

je suis bloqué

You're stuck. Let's clear things up.

The amount of work to make FetchedRecordsController able to produce large diffs is really big. This involves eventually finding a more efficient algorithm, if it exists. But also an implementation of this algorithm that performs reasonably well, while keeping memory consumption under control. This most likely involves a disk cache of fetched results. And we'd also need to handle "back pressure", that is to say prevent frequent little transactions from kneeling the application down because it can't compute diffs fast enough. Maybe FetchedRecordsController will eventually get improved this way. But this involves a big amount of time.

If your company considers sponsoring this research task, I would gladly accept that we enter a regular business relationship. Not only would you take profit from the FetchedRecordsController improvements, but other GRDB users as well. Contact me at [email protected] for more details. Such task could not start before 2018, though.

Meanwhile, let's look at another way out of your issue:

FetchedRecordsController notifies of database changes. This general sentence can be split into fine-grained services:

  1. FetchedRecordsController tells that a collection has changed
  2. FetchedRecordsController can produce detailed insert/detele/update/move events, so that one can animate cells of a table or collection view.
  3. On update and move changes, FetchedRecordsController tells which columns have been modified.

You wrote:

The problem is i need to know which column got updated because i do fine update (eg, if it's an image, i only reload the UIImageView, i don't update all items of the cell...)

It thus looks like your app needs 1 and 3, but I'm not sure about 2.

Let's suppose you can do without cell animations. In this case, you could simply perform reloadData in the didChange callback. This is fast enough even for large collection. What happens then? Does your table/collection view flicks because some remote images do not reload smoothly? What about using an efficient memory-based image cache, then? There are many popular ones on Github.

Do you see what I'm aiming at? It's fixing eventual glitches of reloadData.

Now, if your app really really really needs row animations, then I suggest you have a look at general diff algorithms like tonyarnold/Differ. I've played a little with that one: it has a lower complexity than the Levenshtein algorithm, it produces correct table/collection view animations, but it won't provide the same fine-grained column changes produced by FetchedRecordsController. It may help you, though.

Now it's time for you to sort out what's really important for your app.

@groue
Copy link
Owner

groue commented Oct 13, 2017

Note to myself: update the documentation of FetchedRecordsController with a clear warning about large diffs.

@groue
Copy link
Owner

groue commented Oct 13, 2017

A last word about the alternative diff algorithm Differ. I've used it as a table view animator in the demo app for the upcoming Swift 4 version of RxRGDB, the reactive extensions to GRDB. This was part of my desire to build reactive table view animations based on GRDB. You may want to have a look at this demo app, because it works pretty well. But it is not as well packaged as FetchedRecordsController, and it only deals with deletions, insertions, and moves (updates are not recognized by Differ, and exposed as a delete/insert pair instead). And I don't know how it behaves when fed with thousands of rows.

@skrew
Copy link
Author

skrew commented Oct 13, 2017

Ok thanks, i'd added a diff function for our need, who works but are very specific.
As we started the project with Realm, all "reactive" and changes are based on Realm notifications.
BTW, 2018 are too far for this project ;)

I think i can close.

@skrew skrew closed this as completed Oct 13, 2017
@groue
Copy link
Owner

groue commented Oct 14, 2017

Hello @skrew

As we started the project with Realm, all "reactive" and changes are based on Realm notifications.

If you needed Realm-like notifications, then maybe our long discussion about FetchedRecordsController was not that useful. I can't provide good advice when the reality of the situation is not exposed clearly. Have a look at transaction observers one day.

@groue
Copy link
Owner

groue commented Oct 14, 2017

@skrew Please don't hesitate sharing your experience after you have found a working solution: it may well help other users, and maybe pave the way for future GRDB improvements as well. For this library to meet its users' needs, it's important that those needs are well known, you see? Your feedback will be welcome. Meanwhile, happy GRDB!

@skrew
Copy link
Author

skrew commented Oct 14, 2017

Hi @groue
For my needs, I use a mix of transaction observers and sortedMerge, as sortedMerge have the left and right list, i can compare changes and notify the targets to self update.

I explain why i need this:

  • I parse folders and files and put each files in DB -> notify new, deleted files (i use TLIndexPathTools for Table/Collection sync).
  • I sync items with a external "cloud" service -> Need to update/notify UI items when needed.
  • I sync items with iCloud -> Need to update/notify UI items when needed.
  • Theses items have metadatas that i update from servers -> Need to update/notify UI items when needed.
  • Each files can have a "read", "progress" (...) -> Need to update/notify UI items when needed.
    And this is just a small part of the app...

That's why i need notifications everywhere. I don't just insert items in DB and voila. There are many updates after, and i do this each time user enter in a folder.
That's why i don't want to do a "reloadData" each time, there are too many occasions i will need to reload datas.

Realm handle this perfectly, but Realm are just a nightmare with pinned transactions, the DB size can grow to death (the app crash and you can't launch it anymore, need to uninstall / reinstall).

GRDB provide many way to do the same (or near the same), but i'm new to this lib, i learn each days (there are so many things to learn, the doc are huge ! :p)...
BTW, with my experience, i can tell GRDB are way most sure for big projects than Realm. (and you have the power of sql query engine).

@groue
Copy link
Owner

groue commented Oct 14, 2017

For my needs, I use a mix of transaction observers and sortedMerge, as sortedMerge have the left and right list, i can compare changes and notify the targets to self update.

I'm glad you found this sample code, because the "sorted merge" algorithm is very efficient at what it does :-)

I explain why i need this:

I parse folders and files and put each files in DB -> notify new, deleted files (i use TLIndexPathTools for Table/Collection sync).
I sync items with a external "cloud" service -> Need to update/notify UI items when needed.
I sync items with iCloud -> Need to update/notify UI items when needed.
Theses items have metadatas that i update from servers -> Need to update/notify UI items when needed.
Each files can have a "read", "progress" (...) -> Need to update/notify UI items when needed.
And this is just a small part of the app...
That's why i need notifications everywhere. I don't just insert items in DB and voila. There are many updates after, and i do this each time user enter in a folder.
That's why i don't want to do a "reloadData" each time, there are too many occasions i will need to reload datas.

That's a pretty complex app indeed! I better understand now.

I don't know how your application will evolve. But due to its very particular needs, I must tell you about SQLite pre-update hooks. Contributed by the very talented @swiftlyfalling, it basically is the most advanced database observation technique in GRDB. It extends TransactionObserver so that it not only notifies of each inserted and deleted row, but also the values of each changed column before and after a row update. As powerful as it is, this feature is also more involved: it requires a custom SQLite build.

Realm handle this perfectly, but Realm are just a nightmare with pinned transactions, the DB size can grow to death (the app crash and you can't launch it anymore, need to uninstall / reinstall).

I didn't know that :-/

GRDB provide many way to do the same (or near the same), but i'm new to this lib, i learn each days (there are so many things to learn, the doc are huge ! :p)...
BTW, with my experience, i can tell GRDB are way most sure for big projects than Realm. (and you have the power of sql query engine).

Thanks :-) I'm convinced that finding solutions with you is part of the job maintaining and improving GRDB. So please keep on opening issues with interesting challenges!

@skrew
Copy link
Author

skrew commented Oct 14, 2017

Interesting, i was abused by the name "pre-update" as a way to modify values before getting updated ! (for debugging or other specific cases)

I will test it now, thanks

(You have a 404 error link http://www.sqlite.org/sessions/c3ref/preupdate_count.html) in the doc

@groue
Copy link
Owner

groue commented Oct 14, 2017

Fixed, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants