-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce network request gaps when loading tiles #746
Comments
Preliminary exploration is encouraging... Have a branch that has isolated the "content fetch" part of tile loading work and dispatches all this work together, and as tightly as possible. Testing with the Google Tiles test "LocaleChrysler" yields about a 15% reduction in total load time. Not all coding is finished, so there might be more gains when all is done. https://github.com/CesiumGS/cesium-native/tree/network-work-refactor |
From very, very quickly skimming over the PR, it looks like it might be related to some discussion in an older issue. Maybe that impression is wrong, but that linked comment specifically referred to the Applied do my understanding of what is addressed in the PR, it looks like the "promise chain" of
that was quoted in the linked comment is broken into something like
Or in a less pseudocode-y way: There is one queue for "network tasks" and one for "processing tasks". Each of them is worked off by a worker thread pool. Whenever a "network task" is done, the resulting data is thrown into the list of "processing tasks". Both worker pools are busy when there is something to do, and idle when not. If this is roughly correct, then I'll sneak a 👍 in here. Some of the diagrams above might be a bit misleading: These orange 'Network Fetch' blocks suggest that the workers are 'busy', but they actually are not. Most of the time, they are doing nothing except for waiting for a network response. So ... it shouldn't really matter whether the number of "networkWorkers" is |
Correct! and please do. You're right, that diagram can misleading. Really the intent was to show the missed opportunity where a network request could be in flight, but was not. As far as actual CPU efficiency, yes, a network request shouldn't do much work at all, much less create its own thread. In Unreal Engine, all requests get queued up and are polled by one thread anyway. The callers are just waiting for completion events. I'm likely thinking about the problem in a similar way to your linked discussion...
Most of the work in this PR is separating Basically, the code is... Suffice to say, even if this PR seems like a great idea and gets merged, there's still more work we can do here. |
When instrumenting cesium native code, I've recently discovered some opportunities to improve network performance by reducing some apparent "gaps" when loading tiles.
Background
Below is a simplified diagram of how a tile is loaded. A worker thread fetched data needed for the tile, then processes it in some way that is useable to the native runtime that needs it (Ex. Unreal).
![Load Gap - Diagram 1](https://private-user-images.githubusercontent.com/130494071/279137023-40d32c96-4b67-4997-ba1d-9b8a6bbb15b8.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQ3MzEsIm5iZiI6MTczOTY3NDQzMSwicGF0aCI6Ii8xMzA0OTQwNzEvMjc5MTM3MDIzLTQwZDMyYzk2LTRiNjctNDk5Ny1iYTFkLTliOGE2YmJiMTViOC5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwMjUzNTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iZWZhZDQ0NzMzYmIxNTVjODcxODE2ZWQwODFkMWE0N2RjOTAwMWU1MjU4NDQ4NmUwMTFiYmY5YjgwOTMxZWJmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.ntL6vanznGTSzty2G3bfWezCRyhAyNy8NZokkO47meM)
We do this across multiple workers, in parallel, to achieve faster load times (configured with
![Load Gap - Diagram 2](https://private-user-images.githubusercontent.com/130494071/279137218-978e7be6-3fff-4885-ac36-90b71d071221.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQ3MzEsIm5iZiI6MTczOTY3NDQzMSwicGF0aCI6Ii8xMzA0OTQwNzEvMjc5MTM3MjE4LTk3OGU3YmU2LTNmZmYtNDg4NS1hYzM2LTkwYjcxZDA3MTIyMS5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwMjUzNTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hMmUxYmZkMGY0YTY4NWVhZTBhMjc4ZjNhOTBmNWIwNWRmMWM4NWVhYTRjYWQ3OGY2YzFhYmUxMTcyMTM5MGVkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.HE37nv1Xnv1wUVQwj2NRx0qbkJ2kZRfnNaB0fkl-ndI)
maximumSimultaneousTileLoads
).Here is an example of what multiple workers loading tiles could look like...
While the workers are effectively busy 100% of the time, you may notice a gap between when a worker finishes downloading a tile and when it starts downloading the next one.
![Load Gap - Diagram 3](https://private-user-images.githubusercontent.com/130494071/279137305-c4a543cc-9641-4d6e-934f-546d9a1de108.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQ3MzEsIm5iZiI6MTczOTY3NDQzMSwicGF0aCI6Ii8xMzA0OTQwNzEvMjc5MTM3MzA1LWM0YTU0M2NjLTk2NDEtNGQ2ZS05MzRmLTU0NmQ5YTFkZTEwOC5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwMjUzNTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NTAyZGYyYzMyMjc0MDM3MDk3ODQ4ZGYzZmQ4ODAxNzM1YTNkOWZjMTc1YmVlYWExOWQ5MDk4ZjE4NTU0MjZkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.h99I3uKmEEk0-jw0UC08uO5IkYMKFz0Apzj5u3nZ1n0)
Even though parallel fetches can help gaps in network utilization, it is still possible that network utilization is underutilized.
In the previous example, we configured 4 workers. You might expect that 4 network requests would always be in flight, but that's not the case. Notice the period of inactivity in the middle of the load.
![Load Gap - Diagram 4](https://private-user-images.githubusercontent.com/130494071/279137522-8b10300e-6e27-4f19-adde-0087fd9f458a.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQ3MzEsIm5iZiI6MTczOTY3NDQzMSwicGF0aCI6Ii8xMzA0OTQwNzEvMjc5MTM3NTIyLThiMTAzMDBlLTZlMjctNGYxOS1hZGRlLTAwODdmZDlmNDU4YS5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwMjUzNTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05YzA5YTNjMTBmNzRkYmZkYzZlNzQyMTcwY2E3NjRkNDg3ZmE2NzM5NWQ1MTliNDBkODUxMjNjNmRkOTg5ZjM2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.-2qiNjwL-zEKadcBkQDOmJT4fbyehtdKTrIAOayD9Ng)
Ideally, we would batch the network requests as tightly as possible, to maximize network throughput.
Here is an alternate scheme where network requests are batched together as tightly as possible, with the processing work queued to different threads.
Notice the network inactivity event is gone and all workers are fetching for a longer, more contiguous block of time. Also, processing work is more densely packed among the tile workers, which may open more chances for memory cache hits or batching optimizations.
Proposed work
Tileset::_processWorkerThreadLoadQueue
. This is where all potential tile work is known and parallel work is throttled withmaximumSimultaneousTileLoads
TilesetContentManager::loadTileContent
to separate the network fetch (CachingAssetAccessor::get
) from the data post processing work.maximumSimultaneousTileLoads
to configure our maximum parallel network fetchesBenefits
maximumSimultaneousTileLoads
. This now corresponds directly to parallel network requestsReference
This work hints at moving parts our code towards a more "Data Parallel" perspective, where parts of our tile loading can continue to be broken down into small parallelizable tasks, with an emphasis on batching and throughput.
This ticket is very similar, with more ideas related to short vs long running tasks, #473
Here is data from the original investigation showing potential gaps in a Google 3D Tiles test (Chrysler Building, 828 tiles)
![Load Gap Analysis - Chrysler Release](https://private-user-images.githubusercontent.com/130494071/279139749-66aa4ecd-94b9-4d65-b948-943ff184dbdc.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzQ3MzEsIm5iZiI6MTczOTY3NDQzMSwicGF0aCI6Ii8xMzA0OTQwNzEvMjc5MTM5NzQ5LTY2YWE0ZWNkLTk0YjktNGQ2NS1iOTQ4LTk0M2ZmMTg0ZGJkYy5QTkc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwMjUzNTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05MGY1ZGI4OWZkNTU5MzIwYzAwODQwZjJkNDg2Y2Y1YzAxNDRlZWI5OTZmNzViNDRmMWQxNGZkOGI2MGRhMmQ0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.Yhti9Dn97m-jN01hEvBnstMjYzyc_fGSUaVRCUi_TfU)
The highlighted row showed a tile that took 228 ms to complete, with a 26 ms gap where it was not fetching data from the network (
gapUsecs
).The text was updated successfully, but these errors were encountered: