Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #959: Cache TIFF width and height into fields on File media if they exist #983

Closed
wants to merge 15 commits into from

Conversation

alxp
Copy link

@alxp alxp commented Oct 6, 2023

#959

What does this Pull Request do?

When generating IIIF manifests in the Views Style plugin, if it has to resort to querying Cantaloupe, it will cache the result in fields on the file media if they exist.

What's new?

Querying Cantaloupe for an image's width and height turns out to be costly performance-wise, and if it's done for multiple images such as when we generate a IIIF manifest out of Original Files, it can take several seconds to a minute or even time out if there are more than about 100 pages.

This change will store the height and width it finds own fields named 'field_height' and 'field_width' on the File media entity if those fields exist.

Accompanying this PR will be a starter site PR to add those fields by default.

  • Does this change add any new dependencies? No
  • Does this change require any other modifications to be made to the repository
    (i.e. Regeneration activity, etc.)? No
  • Could this change impact execution of existing code? generating manifests will be affected, but end result will be the same.

How should this be tested?

An exported starter site config will be attached to this PR but is not necessary if you are comfortable making the following change:

  1. Go to admin/structure/media and click 'Manage Fields' for the File media entity type.
  2. Choose Add Existing Field and add the existing field_width and field_height fields that belong to the Image media entity type.
  3. Save the changes
  • if those fields are not present, this code just skips the caching process.

Then to test:.

  1. Go to /node/add and create an object of type Repository Item with model Paged Content.
  2. Add at least one Repository Item object as a member of that object, with model Page.
  3. Add media to the child objects and upload a TIFF file for each,
  4. If you like, monitor the CPU usage on the Cantaloupe server.
  5. Go to /node/xxx/book-manifest-original (if using the standard starter site where this is available) where did is the node of the Paged Content object you created in step 1.
  6. Hit 'Refresh' after you see the manifest, it should generate much more quickly. For more robust testing, clear the site cache and then do the same.
  7. Navigate to a media from this collection of objects and look for the Width and Height fields, they should now be populated.

Documentation Status

  • Does this change existing behaviour that's currently documented? No
  • Does this change require new pages or sections of documentation? No
  • Who does this need to be documented for?
  • Associated documentation pull request(s): ___ or documentation issue ___

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers

@alxp
Copy link
Author

alxp commented Oct 11, 2023

Converting this to draft since it exposes a problem where updating so many media at once can crash your site. deliberating on a solution now.

@rosiel
Copy link
Member

rosiel commented Oct 11, 2023

This PR is active, not a draft, at the moment. Would you like it to be reviewed?

@alxp alxp marked this pull request as draft October 11, 2023 19:12
@alxp
Copy link
Author

alxp commented Oct 11, 2023

@rosiel thanks no this should be in draft, changed it for real this Tim.

@alxp alxp force-pushed the 959-iiif-width-height-caching branch from 74eb9ea to 565a1b4 Compare October 18, 2023 22:06
Comment on lines +349 to +359
// If the media has field_height and field_width, return those values.
if ($media->hasField('field_height')
&& !$media->get('field_height')->isEmpty()
&& $media->get('field_height')->value > 0
&& $media->hasField('field_width')
&& !$media->get('field_width')->isEmpty()
&& $media->get('field_width')->value > 0) {
return [intval($media->get('field_width')->value),
intval($media->get('field_height')->value),
];
}
Copy link

@adam-vessey adam-vessey Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this makes sense to do before attempting to acquire the values from $image, is there's a decent chance that $image->{width,height} are closer to the source of truth: https://git.drupalcode.org/project/drupal/-/blob/10.2.x/core/modules/image/src/Plugin/Field/FieldType/ImageItem.php?ref_type=heads#L156-162

IIRC, part of the media ingest process can indeed copy these properties to fields on the media?, (i.e.: https://github.com/Islandora-Devops/islandora-starter-site/blob/52a1f3aa1aa52e7652986f53ad3f174bf6423ac1/config/sync/media.type.image.yml#L19-L20)

Also, this is already effectively accounted for below?: https://github.com/Islandora/islandora/pull/983/files#diff-edc92719b6b111e26d43c711272a79678edfb7defa5705a23f0dc70292e1898dR377-R386

If there's a desire to reweight the particular mechanisms, it might make sense to pull things out to events, or some kind of weighted plugin collection mechanism? Something configurable, either via a form or via implementing alters to adjust weights?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're running into a situation with tiff paged objects where the tiffs fall through all these conditions and end up having getimgsize run on them over and over. Even though they already had their width/height calculated and stored on the media.

I'm more or less just swooping in from fixing a running site and am putting this up for @alxp . Wasn't trying to introduce any architecture changes per se. Though I do feel we should short circuit out as soon as possible to avoid getimgsize or, even worse, talking to cantaloupe.

FWIW, just doing this took a 432 page book's manifest's load time from a minute and a half to 13 seconds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It then sounds like this code wasn't present in the environment in the first place, as acquisition from the parent entity already precedes the getimagesize() processing calls?

@joecorall joecorall mentioned this pull request Mar 7, 2024
Copy link

@noahwsmith noahwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this in production, if Cantaloupe can't actually read the image you get a fatal error like this one:

[info] Prepared 10 of 50 entities for processing.
 [info] Prepared 20 of 50 entities for processing.
 [info] Prepared 30 of 50 entities for processing.
 [info] Prepared 40 of 50 entities for processing.
 [info] Prepared 50 of 50 entities for processing.
 [info] Processed 10 of 50 entities.

In RequestException.php line 113:

  [GuzzleHttp\Exception\ServerException (500)]
  Server error: `GET https://compass.fivecolleges.edu/cantaloupe/iiif/2/https%3A%2F%2Fcompass.fivecolleges.edu%2Fsyste
  m%2Ffiles%2F2024-03%2Fview_510_0.tif` resulted in a `500 Internal Server Error` response:
  500 Internal Server Error

  dataType out of range!


  java.lang.IllegalArgumentException: dataType out of range!
        at it.ge (truncated...)


Exception trace:
  at /var/www/drupal/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:113
 GuzzleHttp\Exception\RequestException::create() at /var/www/drupal/vendor/guzzlehttp/guzzle/src/Middleware.php:72
 GuzzleHttp\Middleware::GuzzleHttp\{closure}() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:209
 GuzzleHttp\Promise\Promise::callHandler() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:158
 GuzzleHttp\Promise\Promise::GuzzleHttp\Promise\{closure}() at /var/www/drupal/vendor/guzzlehttp/promises/src/TaskQueue.php:52
 GuzzleHttp\Promise\TaskQueue->run() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:251
 GuzzleHttp\Promise\Promise->invokeWaitFn() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:227
 GuzzleHttp\Promise\Promise->waitIfPending() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:272
 GuzzleHttp\Promise\Promise->invokeWaitList() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:229
 GuzzleHttp\Promise\Promise->waitIfPending() at /var/www/drupal/vendor/guzzlehttp/promises/src/Promise.php:69
 GuzzleHttp\Promise\Promise->wait() at /var/www/drupal/vendor/guzzlehttp/guzzle/src/Client.php:189
 GuzzleHttp\Client->request() at /var/www/drupal/web/modules/contrib/islandora/modules/islandora_iiif/src/IiifInfo.php:113
 Drupal\islandora_iiif\IiifInfo->getImageDimensions() at /var/www/drupal/web/modules/contrib/islandora/modules/islandora_iiif/src/Plugin/Action/MediaAttributesFromIiif.php:143
 Drupal\islandora_iiif\Plugin\Action\MediaAttributesFromIiif->execute() at /var/www/drupal/web/core/lib/Drupal/Core/Action/ActionBase.php:22

With the suggested additional exception handling this is resolved and the error is handled gracefully.

use GuzzleHttp\Exception\ClientException;
use GuzzleHttp\Exception\ConnectException;
use GuzzleHttp\Exception\ServerException;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs use GuzzleHttp\Exception\RequestException; also.

modules/islandora_iiif/src/IiifInfo.php Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants