-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Songs/Artists/Albums query performances by ~50 #348
Improve Songs/Artists/Albums query performances by ~50 #348
Conversation
6c4e32a
to
160d897
Compare
Also, I think a lot of requests can beneficiate from this improvement. If everything looks fine I might try to improve the others. |
I just realised that the request is sorted by listening count, not total duration, so we can directly limit/offset at start, which would make the request instantaneous Edit it seems to work (0,2s), but the total_count is not calculated correctly, so I reverted to the 1,2s version |
getBestSongsNbOffseted
performances by ~10getBestSongsNbOffseted
performances by ~60
e5fb3b2
to
160d897
Compare
getBestSongsNbOffseted
performances by ~60getBestSongsNbOffseted
performances by ~10
Hello! Thanks a lot for what you've done. It seems good to me. The facet is a bit weird since I'm not used to it. I've tried working on #162 at some point but did not find any better implementation back then. Again, thanks a lot. Don't hesitate if you have any questions, and keep in touch when you feel like you're done with this :) |
Hi!
They seem to have the exact same result in my case! However, it may be good to test again before merging with other data! I'll try to improve the PR in the next days to cover more improvements 👍 |
server/src/database/queries/stats.ts
Outdated
@@ -560,45 +560,58 @@ export const getBestSongsNbOffseted = ( | |||
) => | |||
InfosModel.aggregate([ | |||
{ $match: basicMatch(user._id, start, end) }, | |||
{ | |||
$project: { ...getGroupByDateProjection(user.settings.timezone), id: 1 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a usage for this line that I removed? I'm not sure where it can be useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't know either. Must be a line I forgot to remove one day. I think it's safe to remove it
Thanks a lot |
With the same technique, 5669865 makes the /top/artists request faster, from 12.1s to 1.3s For this request however:
|
And to conclude, /top/albums from 12s to 1.4s 🚀 |
Should be ready for review! 🚀 |
getBestSongsNbOffseted
performances by ~10
Hello @quentinguidee, congrats for the optimization it's pretty impressive 👏 |
@Yooooomi I have a MongoDB question. I think that almost every requests in this project could be instantaneous if we could have |
I can write a migration that adds these fields in the info model. |
Thanks! That would be helpful. Also this might open the path for more precise total duration when storing each listening duration separately |
The Spotify API does allow to get the specific listening duration for the recently played tracks. |
Yeah unfortunately, but this might be approximated with the time elapsed between two entries in the history? And by default set it to the track duration? Also, iirc the spotify data when downloaded include the duration of each listening |
Spotify history is sometimes not precise, relying on timestamps to establish listening duration might be dangerous. |
It's pushed to release/1.8.0 |
@Yooooomi It seems that the migration does not trigger, is that because the timestamp of the migration is wrong? (probably the duplication of vscode which incremented the timestamp by one). I tried to npm migrate manually but it says its migrated (but the infos don't have the new fields) |
Oh the migration was marked as done in the collection "migrations", but was not done for a reason I don't understand. I did a mongorestore and restarted, and it worked |
Yeah that works really well, I achieve a request of ~300ms instead of 12.1s with the new fields 🚀 And the requests are also simpler in the code (we can extract almost everything to a new function) InfosModel.aggregate([
{ $match: basicMatch(user._id, start, end) },
{
$group: {
_id: "$id",
duration_ms: { $sum: "$durationMs" },
count: { $sum: 1 },
},
},
{
$facet: {
infos: [
{ $sort: { count: -1, _id: 1 } },
{ $skip: offset },
{ $limit: nb },
],
computations: [
{
$group: {
_id: null,
total_duration_ms: { $sum: "$duration_ms" },
total_count: { $sum: "$count" },
},
},
],
},
},
{ $unwind: "$infos" },
{ $unwind: "$computations" },
{
$project: {
_id: "$infos._id",
result: {
$mergeObjects: ["$infos", "$computations"],
},
},
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: ["$result", { _id: "$_id" }],
},
},
},
{ $lookup: lightTrackLookupPipeline("_id") },
{ $unwind: "$track" },
{ $lookup: lightAlbumLookupPipeline() },
{ $unwind: "$album" },
{ $lookup: lightArtistLookupPipeline() },
{ $unwind: "$artist" },
]); The |
Nice job! I don't remember why we compute total_duration_ms. This feels weird when we just take a subpart of the whole thing. |
I feel like we could do this in a request on the side. So that this request does not have to compute this for every page. |
I just pushed a new file architecture to the branch so that it follows modern monorepo architecture. You might encounter issues. I would suggest you merge the 1.8.0 to your branch to see if there are any merge conflicts. |
73148fc
to
cfe4995
Compare
Signed-off-by: Quentin Guidée <[email protected]>
cfe4995
to
12fec6d
Compare
Thanks! Everything looks fine, the 3 requests are almost instantaneous now. I have also refactor the 3 to use the same base request. I think the "differents" field and some lookup can be removed depending on the request (tracks/artists/albums) but that doesn't change the time so clean code is better imo for now while we recreate the requests |
I'll keep this PR restricted to these 3 methods for now so we can iterate faster, so this one is ready for review 🧪 |
Hello, this seems fine to me. Does it change the signature fontend? It seems the objects are not looked up anymore so the frontend cannot display information am I right? |
OK OK OK this is BLAZING fast. Got from 20.48s down to 0.81s on M1 Pro, I'll update this with numbers from intel i7. Infos mongo collection is 10 times bigger though but still only 41MB (32MB of indexes). Huge work thanks a lot. Can't wait to see how it can be applied to other requests. Many users will thank those performance improvements. |
Yeah that changes entirely the feeling of the app! The tradeoff with document databases compared to relational db is that join requests are really bad and should be avoided as much as possible. Without that the performances are extremely good! I'll work on the other queries really soon, I have a lot of free time at the moment. For the 30MB of indexes yeah that was predictable, but I think it is nothing compared to the space it took in RAM during the giant request (but I don't have benchmarks for that). |
While requests are performant for many use cases, I have around 225000 listened songs (23000 different) with Your Spotify hosted on a Raspberry PI, so navigate on the client takes a lot of time, especially with the "Last year" or "All" timeframe.
This PR improves performances of
getBestSongsNbOffseted
,getBestArtistsNbOffseted
andgetBestAlbumsNbOffseted
by a factor of ~10. My approach was to group allInfos
of the same track at start. This avoids to compute many times the sameInfo
, which helps a lot for thelightTrackLookupPipeline
.Benchmark
done on mac m1, not on the Raspberry
A request for the top songs with date range "All" before this PR takes in my case
11,8s
:After this PR, this time drops to
1,4s
, which makes the page loads almost instantly for the biggest request.Update: See below for another performance boost (#348 (comment))
For reviewer
$project: { ...getGroupByDateProjection(user.settings.timezone), id: 1 }
step. I'm not sure what it does exactly. However, we can add it back if necessary.Related
#232, #162