-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"All Websites dashboard" should load fast even when tracking hundreds of websites! MultiSites #1077
Comments
Hi, is there any update on this? |
+1 vote for this issue |
1 similar comment
+1 vote for this issue |
Other bug report: I am tracking 22 sites with Piwik. Two have titles that start with A (we'll call them A1 and A2), and the other 20 have titles starting with C-Z. The main MultiSites screen is a bit funky. The initial view shows sites C-Z, in descending alphabetical order. Clicking on Next shows A1 and A2, in alpha order. These should be shown at the top of the other screen, while the last 2 should be on this second screen. Also, the text at the bottom reads "1-21 of 22", when in fact it should read "1-20 of 22" |
[1872] fixes sorting and other small UI issues |
Doable in one query, we can go from 1,000 websites= 6,000 SELECT to one SELECT which would greatly improve performance. This is one of the last performance blockers for 1.0 with #409 |
See also #1315 |
Also it would be useful to have a global.ini.php setting controlling how many websites to show in the UI by default (so Super Users can decide to show 1000 if they want to) |
|
Feature requests from forum post
|
Hello all, I'm interested in fixing this bug. I think I've worked out a way to fix it, but I'm not completely familiar with the code base, so I wanted to run my idea by you guys before I created a patch. I looked at the queries being made when the 'All Websites' page is loaded and found that the large number of queries was due to the Array Archive types looping over an array of other Archives. I thought a possible solution would be a new archive type (called Piwik_Archive_Cached) that cached the archive IDs that result when prepareArchive() is called in a BLOB archive record. This would reduce the number of queries made from two per site per API call to two queries per api call, but it wouldn't make the page any faster when archiving is triggered through the browser. Does this sound like an acceptable approach? |
capedfuzz, in your proposal, will you run a query such as WHERE idarchive IN (.........) ? This could be a problem, when there are for example, 5,000 websites, the WHERE idarchive IN ... will be rather slow. But, it is worth testing anyway such patch, as it might still improve performance? In any case, it is normal that the page will work nicely only when automatic archiving is setup (ie. browser archiving is disabled), so this limitation is definitely OK. looking forward to seeing your patch or ideas about this! |
I planned on using IN (...), but I might be able to optimize the query... I'll create a prototype and see what happens. |
A quick update, wanted to let you know I'm still working on this. I had to go a different route than what I proposed: instead of storing archive IDs I select all the IDs in one go. From my simple tests (which didn't include loading sparklines), this sped up the page from ~46 secs to ~17 secs when loading data for 800 websites. I still have a fair amount of work to do, but hopefully I'll have a patch posted soon. |
Actually, sorry, that's ~24 to 17secs. Forgot to disable archiving when testing the normal version. So not quite as monumental a speed up :). |
capedfuzz, any update on your patch? Any improvement is definitely welcome for the next release :) thx |
Yes, I've got an update. I ended up adding some more improvements and got the page to load in < 4 secs for a sufficiently small amount of sites per page. However, to do that I had to make it possible to do the sorting/pagination logic server-side. So the patch is a bit big. I should be able to submit it some time tomorrow (EST), but in the mean time, here's a list of the big changes I made:
The page will still load slowly when looking for a custom range that initiates archiving. |
Attachment: Patch for this issue. |
My patch is attached. It will speed things up greatly when there aren't that many sites shown, and speed things up a bit when a lot of sites are shown. Some notes:
|
Attachment: Patch for issue #1077. Optimizes the IndexedBySite archive type. Speeds up All Websites page by 12 seconds when Piwik tracks 800 websites. |
I've created some optimizations that speed up the All Websites page by ~12 seconds when tracking 800 websites. I had to make changes to the Piwik_ArchiveProcessing class, but I don't have detailed knowledge how this class & its descendants work. Would anyone mind taking a quick look at my changes to make sure there are no potential issues w/ it (I've uploaded a patch)? Thanks in advance. |
capedfuzz, looks like an excellent patch! I haven't tried applying it, but will definitely test performance before and after you have committed. Feedback
I don't see any potential issues otherwise, please commit when you manage to add the test. if you need any help let me know :) How long does the page take before and after? what % improvement? |
(In [5475]) Refs #1077. Improved speed of the IndexedBySite archive type by optimizing the case when launching the archiving process is disabled. Instead of selecting archive IDs one at a time, IndexedBySite will now select them all at once. Notes:
|
Attachment: A quick benchmark used to test how long it takes to retrieve the All Websites HTML. Written in CoffeeScript & uses node.js. |
I committed the change & there's a new integration test. The times I record when I test aren't constant, but the most conservative view of the speed improvement is from 14s to 6s (~57%). I've also uploaded the script I've used to test the times. |
Very nice commit and ideas. New integration test good. I see from 17s to 7s improvements on my 600 websites test setup... Very nice!!! Great work. When I enable profiler and debug logging, I see that now there is this executed for each site:
I guess this could be detected upstream and select * instead, which would probably bring another 10% improvement.
This one is also interesting. Executed 4 times, because of: Piwik_MultiSites_Controller->getSitesInfo() called at [D:\piwik\svn\trunk\plugins\MultiSites\Controller.php:37]
|
(In [5489]) Refs #1077 Pre-load all websites when user is super user only. Saves hundreds of select * from piwik_site where idsite = ? |
(In [5490]) Refs #1077 Missing code preventing multiple executions |
Replying to matt:
Didn't know there was a profiler built into Piwik, thanks for pointing it out!
I think you could get rid of these by allowing Piwik_Archive creation to be separate from the querying. I'll upload a patch to show you what I mean. (This might result in a speed up when archiving is enabled, too...) Though from what I can tell the remaining SQL isn't taking too much time to execute. I'm going to try and whittle the time down a couple more seconds. If I can't I'll post a comment. |
Attachment: Patch for issue #1077. Several smaller optimizations included. Made for rev 5492. |
I uploaded another patch with a bunch of smaller fixes. This patch speeds All Sites up for me from 5.2s to 2.2s (w/ 800 websites stored). Here's a list of the changes:
I can do one more optimization that should reduce the time to ~1.5s. When the website count goes into the thousands, the speed up will be more apparent. But, it will require an added method to the VisitsSummary & Goals APIs. I noticed that the only remaining thing that takes too much time is the creation of the IndexedBySite archive. The Archive::build method instantiates an Archive, Site, Period and Date instance and IndexedBySite will invoke it for every site. The first time its run it takes ~1s on my setup. I want to add getForArchive($archive) methods to VisitsSummary & Goals which take an already built archive. Do you think there would be a problem w/ this? Also, I can rewrite part of the MultiSites controller and add an API class w/ a getSitesInfo method. This way it can have some integration tests. |
Great!!
Here the call to setSite() is not done anymore - is this call to setSite() not used/useless?
After all other changes (and calling 1 API for today, 1 for yesterday), I'm not sure if this one would be required still. Please feel free to commit! great stuff! in fact this ticket can be closed after this, since the page will render for most users now. |
Replying to matt:
Archive::build will set the site on a Single archive, so the call in IndexedBySite is unnecessary. It did have an effect on the IndexedByDate class, but I added a call to setSite to IndexedByDate's constructor.
The issue I see w/ the super user improvement is that its in the Archive::build method. So any API request that checks for 'all' sites may result in two queries (if the original API method optimizes for non-super user cases). If I move the optimization to the All Sites plugin, this wouldn't occur, but the optimization would have to be explicitly added to other parts of Piwik that get data for 'all' sites. I'm going to leave it in when I commit, though.
No, it wouldn't. After I thought about it some more I realized you could just go through the data table and forget about the other APIs :).
I'd still like to refactor the Multi Sites plugin, though. Do you mind if I create a new ticket: 'Refactor Multi Sites plugin to follow Piwik plugin conventions'? |
I think the only other parts is the "Manage Websites" page in the Settings, for the Super User. Otherwise no other page should select all websites information I think.
Creating a new ticket sounds good, since this one focuses on performance which is now mostly fixed :) |
(In [5505]) Fixes #1077, Assorted optimizations for the Multi Sites plugin:
|
(In [5523]) Refs #1077 fix typo |
Currently, the multi sites dashboard page render takes 6 API requests (ie. 6 mysql SELECTs) for each website displayed. If a Piwik install has 1000 registered websites, the page load will simply fail, because the php page would try and execute 6,000 queries which is not a good idea.
Instead, only one request should be made to select the 6 values necessary for each website. This query would therefore select 6,000 values at once which would be as fast as we can get with that many values.
This query would join the site, access and archive_* tables. See Piwik_Access::reloadAccess() and core/Archive/Array/IndexedBySite.php for the current behavior. I think the solution might be to have a new class in core/Archive/Array indexed by user?
The text was updated successfully, but these errors were encountered: