Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework image & texture management to use concurrent message queues. #9486

Merged

Conversation

chinmaygarde
Copy link
Member

@chinmaygarde chinmaygarde commented Jun 25, 2019

This patch reworks image decompression and collection in the following ways
because of misbehavior in the described edge cases.

The current flow for realizing a texture on the GPU from a blob of compressed
bytes is to first pass it to the IO thread for image decompression and then
upload to the GPU. The handle to the texture on the GPU is then passed back to
the UI thread so that it can be included in subsequent layer trees for
rendering. The GPU contexts on the Render & IO threads are in the same
sharegroup so the texture ends up being visible to the Render Thread context
during rendering. This works fine and does not block the UI thread. All
references to the image are owned on UI thread by Dart objects. When the final
reference to the image is dropped, the texture cannot be collected on the UI
thread (because it has not GPU context). Instead, it must be passed to either
the GPU or IO threads. The GPU thread is usually in the middle of a frame
workload so we redirect the same to the IO thread for eventual collection. While
texture collections are usually (comparatively) fast, texture decompression and
upload are slow (order of magnitude of frame intervals).

For application that end up creating (by not necessarily using) numerous large
textures in straight-line execution, it could be the case that texture
collection tasks are pending on the IO task runner after all the image
decompressions (and upload) are done. Put simply, the collection of the first
image could be waiting for the decompression and upload of the last image in the
queue.

This is exacerbated by two other hacks added to workaround unrelated issues.

  • First, creating a codec with a single image frame immediately kicks of
    decompression and upload of that frame image (even if the frame was never
    request from the codec). This hack was added because we wanted to get rid of
    the compressed image allocation ASAP. The expectation was codecs would only be
    created with the sole purpose of getting the decompressed image bytes.
    However, for applications that only create codecs to get image sizes (but
    never actually decompress the same), we would end up replacing the compressed
    image allocation with a larger allocation (device resident no less) for no
    obvious use. This issue is particularly insidious when you consider that the
    codec is usually asked for the native image size first before the frame is
    requested at a smaller size (usually using a new codec with same data but new
    targetsize). This would cause the creation of a whole extra texture (at 1:1)
    when the caller was trying to “optimize” for memory use by requesting a
    texture of a smaller size.
  • Second, all image collections we delayed in by the unref queue by 250ms
    because of observations that the calling thread (the UI thread) was being
    descheduled unnecessarily when a task with a timeout of zero was posted from
    the same (recall that a task has to be posted to the IO thread for the
    collection of that texture). 250ms is multiple frame intervals worth of
    potentially unnecessary textures.

The net result of these issues is that we may end up creating textures when all
that the application needs is to ask it’s codec for details about the same (but
not necessarily access its bytes). Texture collection could also be delayed
behind other jobs to decompress the textures on the IO thread. Also, all texture
collections are delayed for an arbitrary amount of time.

These issues cause applications to be susceptible to OOM situations. These
situations manifest in various ways. Host memory exhaustion causes the usual OOM
issues. Device memory exhaustion seems to manifest in different ways on iOS and
Android. On Android, allocation of a new texture seems to be causing an
assertion (in the driver). On iOS, the call hangs (presumably waiting for
another thread to release textures which we won’t do because those tasks are
blocked behind the current task completing).

To address peak memory usage, the following changes have been made:

  • Image decompression and upload/collection no longer happen on the same thread.
    All image decompression will now be handled on a workqueue. The number of
    worker threads in this workqueue is equal to the number of processors on the
    device. These threads have a lower priority that either the UI or Render
    threads. These workers are shared between all Flutter applications in the
    process.
  • Both the images and their codec now report the correct allocation size to Dart
    for GC purposes. The Dart VM uses this to pick objects for collection. Earlier
    the image allocation was assumed to 32bpp with no mipmapping overhead
    reported. Now, the correct image size is reported and the mipmapping overhead
    is accounted for. Image codec sizes were not reported to the VM earlier and
    now are. Expect “External” VM allocations to be higher than previously
    reported and the numbers in Observatory to line up more closely with actual
    memory usage (device and host).
  • Decoding images to a specific size used to decode to 1:1 before performing a
    resize to the correct dimensions before texture upload. This has now been
    reworked so that images are first decompressed to a smaller size supported
    natively by the codec before final resizing to the requested target size. The
    intermediate copy is now smaller and more promptly collected. Resizing also
    happens on the workqueue worker.
  • The drain interval of the unref queue is now sub-frame-interval. I am hesitant
    to remove the delay entirely because I have not been able to instrument the
    performance overhead of the same. That is next on my list. But now, multiple
    frame intervals worth of textures no longer stick around.

The following issues have been addressed:

  • MessageLoop.CanCreateConcurrentMessageLoop runs forever flutter#34070 Since this was the first usage
    of the concurrent message loops, the number of idle wakes were determined to
    be too high and this component has been rewritten to be simpler and not use
    the existing task runner and MessageLoopImpl interface.
  • Image decoding had no tests. The new ui_uniteests harness has been added
    that sets up a GPU test harness on the host using SwiftShader. Tests have been
    added for image decompression, upload and resizing.
  • The device memory exhaustion in this benchmark has been addressed. That
    benchmark is still not viable for inclusion in any harness however because it
    creates 9 million codecs in straight-line execution. Because these codecs are
    destroyed in the microtask callbacks, these are referenced till those
    callbacks are executed. So now, instead of device memory exhaustion, this will
    lead to (slower) exhaustion of host memory. This is expected and working as
    intended.

This patch only addresses peak memory use and makes collection of unused images
and textures more prompt. It does NOT address memory use by images referenced
strongly by the application or framework.

This is work towards addressing flutter/flutter#32143

@dnfield
Copy link
Contributor

dnfield commented Jun 25, 2019

This does appear to make flutter/flutter#32143 better (if I slow down enough, things catch up and we get back to a better place).

However, it's still fairly easy to cause the app to crash due to memory usage when you have a large list/grid of images and you scroll rapidly back and forth.

@dnfield
Copy link
Contributor

dnfield commented Jun 25, 2019

I believe we're missing a flush somewhere in here that could help. Doesn't necessarily have to be part of this patch - but similar to what I was attempting in #9004

fml/concurrent_message_loop.cc Show resolved Hide resolved
}

ConcurrentMessageLoop::ConcurrentMessageLoop(size_t worker_count) {
const auto worker_threads = std::max(worker_count, 1ul);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: c++ readability would ask for this not to be auto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have those reviews and this pattern is prevalent. FWIW, I don't think this is unreadable.

workers_.emplace_back([i, this]() {
fml::Thread::SetCurrentThreadName(
std::string{"io.flutter.worker." + std::to_string(i + 1)});
WorkerMain();
});
}

worker_count_ = worker_threads;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you be using the initializer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

wait_condition_.notify_all();
void ConcurrentMessageLoop::PostTask(fml::closure task) {
if (!task) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably log this in debug cases since asking for a task to post and nothing happening is probably a logical error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task does not exist. So the call not doing anything makes sense I think.


void ConcurrentTaskRunner::PostTask(fml::closure task) {
if (!task) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should log in debug build here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above.


private:
const std::unique_ptr<SkCodec> codec_;
const int frameCount_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these ivars be underscored, not camel case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class was moved as-is from codec.cc to its own file. The dubious naming scheme is present because this component was originally in blink. None of the original code remains but the still stuck around. I'll change the this file and other to match engine style in a subsequent patch.

void GetNextFrameAndInvokeCallback(
std::unique_ptr<DartPersistentValue> callback,
fml::RefPtr<fml::TaskRunner> ui_task_runner,
fml::WeakPtr<GrContext> resourceContext,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resourceContext inconsistent variable naming, shouldn't it be underscored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about inheriting this style from blink. I just moved this file. Will change the names in a subsequent patch.

FML_LOG(ERROR) << "Frame " << nextFrameIndex_ << " depends on frame "
<< requiredFrameIndex
<< " and no required frames are cached.";
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

int repetitionCount() const override;

// |Codec|
Dart_Handle getNextFrame(Dart_Handle args) override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring to talk about what thread is involved.

}

size_t SingleFrameCodec::GetAllocationSize() {
const auto& data = descriptor_.data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove auto?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@chinmaygarde
Copy link
Member Author

However, it's still fairly easy to cause the app to crash due to memory usage when you have a large list/grid of images and you scroll rapidly back and forth.

The images in that case are strongly held in the cache. As mentioned in the concluding paragraph, this patch does not affect strongly referenced images. Only fixes issues and reduces memory use on the engine side as I was reworking those caches.

I believe we're missing a flush somewhere in here that could help.

I haven't investigated or setup benchmarks for those allocations yet. Will take a look.

@dnfield
Copy link
Contributor

dnfield commented Jun 25, 2019

Here's the diff that I think makes sense:

diff --git a/flow/skia_gpu_object.cc b/flow/skia_gpu_object.cc
index fa0de8d05c..0cf242d940 100644
--- a/flow/skia_gpu_object.cc
+++ b/flow/skia_gpu_object.cc
@@ -6,12 +6,16 @@
 
 #include "flutter/fml/message_loop.h"
 
+#include <chrono>
+
 namespace flutter {
 
 SkiaUnrefQueue::SkiaUnrefQueue(fml::RefPtr<fml::TaskRunner> task_runner,
-                               fml::TimeDelta delay)
+                               fml::TimeDelta delay,
+                               fml::WeakPtr<GrContext> context)
     : task_runner_(std::move(task_runner)),
       drain_delay_(delay),
+      context_(context),
       drain_pending_(false) {}
 
 SkiaUnrefQueue::~SkiaUnrefQueue() {
@@ -39,6 +43,9 @@ void SkiaUnrefQueue::Drain() {
   for (SkRefCnt* skia_object : skia_objects) {
     skia_object->unref();
   }
+  if (context_) {
+    context_->performDeferredCleanup(std::chrono::milliseconds(8));
+  }
 }
 
 }  // namespace flutter
diff --git a/flow/skia_gpu_object.h b/flow/skia_gpu_object.h
index 4c079af96e..77de3d7522 100644
--- a/flow/skia_gpu_object.h
+++ b/flow/skia_gpu_object.h
@@ -12,6 +12,7 @@
 #include "flutter/fml/memory/weak_ptr.h"
 #include "flutter/fml/task_runner.h"
 #include "third_party/skia/include/core/SkRefCnt.h"
+#include "third_party/skia/include/gpu/GrContext.h"
 
 namespace flutter {
 
@@ -31,12 +32,14 @@ class SkiaUnrefQueue : public fml::RefCountedThreadSafe<SkiaUnrefQueue> {
  private:
   const fml::RefPtr<fml::TaskRunner> task_runner_;
   const fml::TimeDelta drain_delay_;
+  const fml::WeakPtr<GrContext> context_;
   std::mutex mutex_;
   std::deque<SkRefCnt*> objects_;
   bool drain_pending_;
 
   SkiaUnrefQueue(fml::RefPtr<fml::TaskRunner> task_runner,
-                 fml::TimeDelta delay);
+                 fml::TimeDelta delay,
+                 fml::WeakPtr<GrContext> context);
 
   ~SkiaUnrefQueue();
 
diff --git a/shell/common/shell_io_manager.cc b/shell/common/shell_io_manager.cc
index d01c23a890..3a028695af 100644
--- a/shell/common/shell_io_manager.cc
+++ b/shell/common/shell_io_manager.cc
@@ -52,7 +52,8 @@ ShellIOManager::ShellIOManager(
                             : nullptr),
       unref_queue_(fml::MakeRefCounted<flutter::SkiaUnrefQueue>(
           std::move(unref_queue_task_runner),
-          fml::TimeDelta::FromMilliseconds(8))),
+          fml::TimeDelta::FromMilliseconds(8),
+          resource_context_weak_factory_->GetWeakPtr())),
       weak_factory_(this) {
   if (!resource_context_) {
 #ifndef OS_FUCHSIA

@dnfield
Copy link
Contributor

dnfield commented Jun 25, 2019

With that patched in, this allows cleanup of objects much more quickly, but doesn't reduce peak memory usage (but perhaps we can look at the caching strategy a bit).

TEST(MessageLoop, CanCreateAndShutdownConcurrentMessageLoopsOverAndOver) {
for (size_t i = 0; i < 10; ++i) {
auto loop = fml::ConcurrentMessageLoop::Create();
ASSERT_EQ(loop->GetWorkerCount(), std::thread::hardware_concurrency());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check for a worker count of 1 if std::thread::hardware_concurrency returns zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I should not depend on the hardware concurrency of the test platform anyway. Fixed to use a specified worker count.


// Get the updated dimensions of the image. If both dimensions are specified,
// use them. If one of them is specified, respect the one that is is and use the
// aspect ratio to calculate the other. If neither dimensions are specified, use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to "neither dimension is specified"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// Get the updated dimensions of the image. If both dimensions are specified,
// use them. If one of them is specified, respect the one that is is and use the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "is is"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return tonic::ToDart("Callback must be a function");
}

// This to be valid because this method is called from Dart.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "This to be valid"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comments.

fml/concurrent_message_loop.cc Show resolved Hide resolved
}

ConcurrentMessageLoop::ConcurrentMessageLoop(size_t worker_count) {
const auto worker_threads = std::max(worker_count, 1ul);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have those reviews and this pattern is prevalent. FWIW, I don't think this is unreadable.

workers_.emplace_back([i, this]() {
fml::Thread::SetCurrentThreadName(
std::string{"io.flutter.worker." + std::to_string(i + 1)});
WorkerMain();
});
}

worker_count_ = worker_threads;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

wait_condition_.notify_all();
void ConcurrentMessageLoop::PostTask(fml::closure task) {
if (!task) {
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task does not exist. So the call not doing anything makes sense I think.


void ConcurrentTaskRunner::PostTask(fml::closure task) {
if (!task) {
return;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above.

FML_LOG(ERROR) << "Frame " << nextFrameIndex_ << " depends on frame "
<< requiredFrameIndex
<< " and no required frames are cached.";
return NULL;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


private:
const std::unique_ptr<SkCodec> codec_;
const int frameCount_;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class was moved as-is from codec.cc to its own file. The dubious naming scheme is present because this component was originally in blink. None of the original code remains but the still stuck around. I'll change the this file and other to match engine style in a subsequent patch.

void GetNextFrameAndInvokeCallback(
std::unique_ptr<DartPersistentValue> callback,
fml::RefPtr<fml::TaskRunner> ui_task_runner,
fml::WeakPtr<GrContext> resourceContext,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about inheriting this style from blink. I just moved this file. Will change the names in a subsequent patch.

return tonic::ToDart("Callback must be a function");
}

// This to be valid because this method is called from Dart.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

size_t SingleFrameCodec::GetAllocationSize() {
const auto& data = descriptor_.data;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@@ -709,6 +714,7 @@ DartIsolate::CreateDartVMAndEmbedderObjectPair(
null_task_runners, // task_runners
fml::WeakPtr<SnapshotDelegate>{}, // snapshot_delegate
fml::WeakPtr<IOManager>{}, // io_manager
fml::WeakPtr<ImageDecoder>{}, // io_manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment seems wrong

@@ -52,7 +52,7 @@ ShellIOManager::ShellIOManager(
: nullptr),
unref_queue_(fml::MakeRefCounted<flutter::SkiaUnrefQueue>(
std::move(unref_queue_task_runner),
fml::TimeDelta::FromMilliseconds(250))),
fml::TimeDelta::FromMilliseconds(8))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about why we have this value here? I seem to recall head scratching about why we had 250 and whether we could change it. Might help future us.

Copy link

@hyiso hyiso Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinmaygarde @dnfield
I'm facing a time racing problem related to this value in release build when supporting another platform.(In debug build, it runs as excepted)

UnrefQueue#Unref post a delayed task but ends when invoked later, causing the loop in io_runner not terminated, and callingpthread_join in flutter/fml/thread.cc got stuck.

When I change the value from 8 to 0, it runs as excepted.

So I'm wondering whether the value 8 has any potential meaning?

Copy link
Contributor

@dnfield dnfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small nits. LGTM as long as everything is resolved from the other review(s)

@dnfield
Copy link
Contributor

dnfield commented Jun 26, 2019

I'd just add that we know we have at least two more problems to tackle beyond this:

  1. This patch still lets Skia retain resources if, for example, you allocate a bunch of images and then do no work besides freeing them. It looks like SkImage will cache things on us, and until we either explicitly tell Skia to purge some caches or start doing some more drawing work, they stay in memory.
  2. We still are easily able to create situations in the framework where too much memory is allocated when scrolling in a list or grid of images. But this patch improves at least some of that.

@@ -62,6 +63,7 @@ std::weak_ptr<DartIsolate> DartIsolate::CreateRootIsolate(
task_runners, // task runners
std::move(snapshot_delegate), // snapshot delegate
std::move(io_manager), // IO manager
std::move(image_decoder), // IO manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: not IO manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@SupSaiYaJin
Copy link
Contributor

../../flutter/fml/concurrent_message_loop.cc:21:21: error: no matching function for call to 'max'.
I got a compile error.

@chinmaygarde chinmaygarde force-pushed the concurrent_image_decompression branch from 0c038ea to b7fcfae Compare July 9, 2019 19:19
This patch reworks image decompression and collection in the following ways
because of misbehavior in the described edge cases.

The current flow for realizing a texture on the GPU from a blob of compressed
bytes is to first pass it to the IO thread for image decompression and then
upload to the GPU. The handle to the texture on the GPU is then passed back to
the UI thread so that it can be included in subsequent layer trees for
rendering. The GPU contexts on the Render & IO threads are in the same
sharegroup so the texture ends up being visible to the Render Thread context
during rendering. This works fine and does not block the UI thread. All
references to the image are owned on UI thread by Dart objects. When the final
reference to the image is dropped, the texture cannot be collected on the UI
thread (because it has not GPU context). Instead, it must be passed to either
the GPU or IO threads. The GPU thread is usually in the middle of a frame
workload so we redirect the same to the IO thread for eventual collection. While
texture collections are usually (comparatively) fast, texture decompression and
upload are slow (order of magnitude of frame intervals).

For application that end up creating (by not necessarily using) numerous large
textures in straight-line execution, it could be the case that texture
collection tasks are pending on the IO task runner after all the image
decompressions (and upload) are done. Put simply, the collection of the first
image could be waiting for the decompression and upload of the last image in the
queue.

This is exacerbated by two other hacks added to workaround unrelated issues.
* First, creating a codec with a single image frame immediately kicks of
  decompression and upload of that frame image (even if the frame was never
  request from the codec). This hack was added because we wanted to get rid of
  the compressed image allocation ASAP. The expectation was codecs would only be
  created with the sole purpose of getting the decompressed image bytes.
  However, for applications that only create codecs to get image sizes (but
  never actually decompress the same), we would end up replacing the compressed
  image allocation with a larger allocation (device resident no less) for no
  obvious use. This issue is particularly insidious when you consider that the
  codec is usually asked for the native image size first before the frame is
  requested at a smaller size (usually using a new codec with same data but new
  targetsize). This would cause the creation of a whole extra texture (at 1:1)
  when the caller was trying to “optimize” for memory use by requesting a
  texture of a smaller size.
* Second, all image collections we delayed in by the unref queue by 250ms
  because of observations that the calling thread (the UI thread) was being
  descheduled unnecessarily when a task with a timeout of zero was posted from
  the same (recall that a task has to be posted to the IO thread for the
  collection of that texture). 250ms is multiple frame intervals worth of
  potentially unnecessary textures.

The net result of these issues is that we may end up creating textures when all
that the application needs is to ask it’s codec for details about the same (but
not necessarily access its bytes). Texture collection could also be delayed
behind other jobs to decompress the textures on the IO thread. Also, all texture
collections are delayed for an arbitrary amount of time.

These issues cause applications to be susceptible to OOM situations. These
situations manifest in various ways. Host memory exhaustion causes the usual OOM
issues. Device memory exhaustion seems to manifest in different ways on iOS and
Android. On Android, allocation of a new texture seems to be causing an
assertion (in the driver). On iOS, the call hangs (presumably waiting for
another thread to release textures which we won’t do because those tasks are
blocked behind the current task completing).

To address peak memory usage, the following changes have been made:
* Image decompression and upload/collection no longer happen on the same thread.
  All image decompression will now be handled on a workqueue. The number of
  worker threads in this workqueue is equal to the number of processors on the
  device. These threads have a lower priority that either the UI or Render
  threads. These workers are shared between all Flutter applications in the
  process.
* Both the images and their codec now report the correct allocation size to Dart
  for GC purposes. The Dart VM uses this to pick objects for collection. Earlier
  the image allocation was assumed to 32bpp with no mipmapping overhead
  reported. Now, the correct image size is reported and the mipmapping overhead
  is accounted for. Image codec sizes were not reported to the VM earlier and
  now are. Expect “External” VM allocations to be higher than previously
  reported and the numbers in Observatory to line up more closely with actual
  memory usage (device and host).
* Decoding images to a specific size used to decode to 1:1 before performing a
  resize to the correct dimensions before texture upload. This has now been
  reworked so that images are first decompressed to a smaller size supported
  natively by the codec before final resizing to the requested target size. The
  intermediate copy is now smaller and more promptly collected. Resizing also
  happens on the workqueue worker.
* The drain interval of the unref queue is now sub-frame-interval. I am hesitant
  to remove the delay entirely because I have not been able to instrument the
  performance overhead of the same. That is next on my list. But now, multiple
  frame intervals worth of textures no longer stick around.

The following issues have been addressed:
* flutter/flutter#34070 Since this was the first usage
  of the concurrent message loops, the number of idle wakes were determined to
  be too high and this component has been rewritten to be simpler and not use
  the existing task runner and MessageLoopImpl interface.
* Image decoding had no tests. The new `ui_uniteests` harness has been added
  that sets up a GPU test harness on the host using SwiftShader. Tests have been
  added for image decompression, upload and resizing.
* The device memory exhaustion in this benchmark has been addressed. That
  benchmark is still not viable for inclusion in any harness however because it
  creates 9 million codecs in straight-line execution. Because these codecs are
  destroyed in the microtask callbacks, these are referenced till those
  callbacks are executed. So now, instead of device memory exhaustion, this will
  lead to (slower) exhaustion of host memory. This is expected and working as
  intended.

This patch only addresses peak memory use and makes collection of unused images
and textures more prompt. It does NOT address memory use by images referenced
strongly by the application or framework.
@chinmaygarde chinmaygarde force-pushed the concurrent_image_decompression branch from b7fcfae to 1619752 Compare July 9, 2019 19:47
@chinmaygarde chinmaygarde merged commit ad582b5 into flutter:master Jul 9, 2019
@chinmaygarde chinmaygarde deleted the concurrent_image_decompression branch July 9, 2019 21:59
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 9, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 10, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 11, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 11, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 11, 2019
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 11, 2019
engine-flutter-autoroll added a commit to flutter/flutter that referenced this pull request Jul 11, 2019
flutter/engine@75387db...49445ce

git log 75387db..49445ce --no-merges --oneline
49445ce FLEViewController/Engine API changes (flutter/engine#9750)
2a79462 Add Fuchsia build CI presubmit steps (flutter/engine#9736)
67cebdb Roll fuchsia/sdk/core/linux-amd64 from KGmm_RIJoXS19zTm2crjM3RYpmp5Y03-fLUeVdylbTYC to ehWVT9QJbC-vFMM6SkkQM9HJ9oITFCws7FC9JnrFq2gC (flutter/engine#9765)
089c740 Roll fuchsia/sdk/core/mac-amd64 from lCQWEeR_Ert7t_qAbMRycwrRyZC-dIprYPyPJzwPmg4C to EYnRdXFT9l-d8Qkz4zeTRXnqfV3KQzpQhoPs1r0-740C (flutter/engine#9759)
b22410e Include SkParagraph headers only when the enable-skshaper flag is on (flutter/engine#9758)
2cd650d Minimal integration with the Skia text shaper module (flutter/engine#9556)
f775f5e Re-enable the Wuffs GIF decoder (flutter/engine#9466)
aca0482 Make all shell unit tests use the OpenGL rasterizer. (flutter/engine#9746)
bc57291 Make FLEViewController&#39;s view an internal detail (flutter/engine#9741)
9776043 Synchronize main thread and gpu thread for first render frame (flutter/engine#9506)
f600ae8 Use libc&#43;&#43; variant of string view and remove the FML variant. (flutter/engine#9737)
564f53f Revert &#34;Improve caching limits for Skia (#9503)&#34; (flutter/engine#9740)
b453d3c libtxt: fix reference counting of SkFontStyleSets held by font asset providers (flutter/engine#9561)
fa7627d Fix backspace crash on Chinese devices (flutter/engine#9734)
56885f7 Let pushColorFilter accept all types of ColorFilters (flutter/engine#9641)
6dccb21 Roll src/third_party/skia 96fdfe0fe88e..af4e7b6cf616 (1 commits) (flutter/engine#9735)
8511d9b Roll fuchsia/sdk/core/mac-amd64 from byM-kyxL4bemlTYNqhKUfJfZoIUrCSzS6XzsFr4n9-MC to lCQWEeR_Ert7t_qAbMRycwrRyZC-dIprYPyPJzwPmg4C (flutter/engine#9742)
b3bf0a1 Roll fuchsia/sdk/core/linux-amd64 from I2Qe1zxgckzIzMBTztvzeWYsDgcb9Fw-idSI16oIlx8C to KGmm_RIJoXS19zTm2crjM3RYpmp5Y03-fLUeVdylbTYC (flutter/engine#9743)
7e56823 Fix windows test by not attempting to open a directory as a file. (flutter/engine#9745)
6cf0d13 Roll src/third_party/skia a3ffaabcc4f2..96fdfe0fe88e (5 commits) (flutter/engine#9731)
49a00ae Fix Fuchsia build. (flutter/engine#9730)
b3bb39b Roll src/third_party/dart 06c3d7ad3a...09fc76bc51 (flutter/engine#9728)
2284210 Make the license script compatible with recently changed Dart I/O stream APIs (flutter/engine#9725)
ad582b5 Rework image &amp; texture management to use concurrent message queues. (flutter/engine#9486)
1dcd5f5 Roll src/third_party/skia 6b82cf638682..a3ffaabcc4f2 (24 commits) (flutter/engine#9726)
129979c Revert &#34;Roll src/third_party/dart 06c3d7ad3a..7acecda2cc (12 commits)&#34; (flutter/engine#9724)
8020d7e Roll src/third_party/skia 56065d9b875f..6b82cf638682 (3 commits) (flutter/engine#9718)
e24bd78 Roll src/third_party/dart 06c3d7ad3a..7acecda2cc (12 commits)
3d2668c Reland isolate group changes
802bd15 iOS platform view opacity (flutter/engine#9667)
3b6265b Roll src/third_party/dart b5aeaa6796..06c3d7ad3a (44 commits)
887e052 Refactor ColorFilter to have a native wrapper (flutter/engine#9668)

The AutoRoll server is located here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff ([email protected]), and stop
the roller if necessary.
johnsonmh pushed a commit to johnsonmh/flutter that referenced this pull request Jul 30, 2019
flutter/engine@75387db...49445ce

git log 75387db..49445ce --no-merges --oneline
49445ce FLEViewController/Engine API changes (flutter/engine#9750)
2a79462 Add Fuchsia build CI presubmit steps (flutter/engine#9736)
67cebdb Roll fuchsia/sdk/core/linux-amd64 from KGmm_RIJoXS19zTm2crjM3RYpmp5Y03-fLUeVdylbTYC to ehWVT9QJbC-vFMM6SkkQM9HJ9oITFCws7FC9JnrFq2gC (flutter/engine#9765)
089c740 Roll fuchsia/sdk/core/mac-amd64 from lCQWEeR_Ert7t_qAbMRycwrRyZC-dIprYPyPJzwPmg4C to EYnRdXFT9l-d8Qkz4zeTRXnqfV3KQzpQhoPs1r0-740C (flutter/engine#9759)
b22410e Include SkParagraph headers only when the enable-skshaper flag is on (flutter/engine#9758)
2cd650d Minimal integration with the Skia text shaper module (flutter/engine#9556)
f775f5e Re-enable the Wuffs GIF decoder (flutter/engine#9466)
aca0482 Make all shell unit tests use the OpenGL rasterizer. (flutter/engine#9746)
bc57291 Make FLEViewController&flutter#39;s view an internal detail (flutter/engine#9741)
9776043 Synchronize main thread and gpu thread for first render frame (flutter/engine#9506)
f600ae8 Use libc&flutter#43;&flutter#43; variant of string view and remove the FML variant. (flutter/engine#9737)
564f53f Revert &flutter#34;Improve caching limits for Skia (flutter#9503)&flutter#34; (flutter/engine#9740)
b453d3c libtxt: fix reference counting of SkFontStyleSets held by font asset providers (flutter/engine#9561)
fa7627d Fix backspace crash on Chinese devices (flutter/engine#9734)
56885f7 Let pushColorFilter accept all types of ColorFilters (flutter/engine#9641)
6dccb21 Roll src/third_party/skia 96fdfe0fe88e..af4e7b6cf616 (1 commits) (flutter/engine#9735)
8511d9b Roll fuchsia/sdk/core/mac-amd64 from byM-kyxL4bemlTYNqhKUfJfZoIUrCSzS6XzsFr4n9-MC to lCQWEeR_Ert7t_qAbMRycwrRyZC-dIprYPyPJzwPmg4C (flutter/engine#9742)
b3bf0a1 Roll fuchsia/sdk/core/linux-amd64 from I2Qe1zxgckzIzMBTztvzeWYsDgcb9Fw-idSI16oIlx8C to KGmm_RIJoXS19zTm2crjM3RYpmp5Y03-fLUeVdylbTYC (flutter/engine#9743)
7e56823 Fix windows test by not attempting to open a directory as a file. (flutter/engine#9745)
6cf0d13 Roll src/third_party/skia a3ffaabcc4f2..96fdfe0fe88e (5 commits) (flutter/engine#9731)
49a00ae Fix Fuchsia build. (flutter/engine#9730)
b3bb39b Roll src/third_party/dart 06c3d7ad3a...09fc76bc51 (flutter/engine#9728)
2284210 Make the license script compatible with recently changed Dart I/O stream APIs (flutter/engine#9725)
ad582b5 Rework image &amp; texture management to use concurrent message queues. (flutter/engine#9486)
1dcd5f5 Roll src/third_party/skia 6b82cf638682..a3ffaabcc4f2 (24 commits) (flutter/engine#9726)
129979c Revert &flutter#34;Roll src/third_party/dart 06c3d7ad3a..7acecda2cc (12 commits)&flutter#34; (flutter/engine#9724)
8020d7e Roll src/third_party/skia 56065d9b875f..6b82cf638682 (3 commits) (flutter/engine#9718)
e24bd78 Roll src/third_party/dart 06c3d7ad3a..7acecda2cc (12 commits)
3d2668c Reland isolate group changes
802bd15 iOS platform view opacity (flutter/engine#9667)
3b6265b Roll src/third_party/dart b5aeaa6796..06c3d7ad3a (44 commits)
887e052 Refactor ColorFilter to have a native wrapper (flutter/engine#9668)

The AutoRoll server is located here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff ([email protected]), and stop
the roller if necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants