Improving storage performance #1659

GitHK · 2020-07-30T13:54:06Z

What do these changes do?

The list_files function is now optimised to run with big datasets, and no longer depends on third party S3 calls when listing files in the storage service.

Removed

database updates when retrieving the real file names (which might have been renamed) => responsible for 40% of the slowdown
file metadata updates in the database after invoking an S3 call for each file when listing all the files => responsible for the majority of the reminding slowdown

Added

a metadata_file_updater launched before returning the upload link for each file to be uploaded. It will try to update the metadata for each file. It uses an exponential backoff retry to account for huge file uploads. It also has a maximum amount of possible retries. If it fails to update the metadata, an error message will be logged.

Changes

Related issue number

Closes #1559

How to test

Start the project and make sure to have lots of files in your storage. With the old implementation performances degraded after 200 uploaded files.

Checklist

Did you change any service's API? Then make sure to bundle document and upgrade version (make openapi-specs, git commit ... and then make version-*)
Unit tests for the changes exist
Runs in the swarm
Documentation reflects the changes
New module? Add your github username to .github/CODEOWNERS

the metadata will be synced by a background worker spawned for for each file when a new upload url is generated

codecov · 2020-07-30T13:56:05Z

Codecov Report

Merging #1659 into master will increase coverage by 5.4%.
The diff coverage is 72.1%.

@@           Coverage Diff            @@
##           master   #1659     +/-   ##
========================================
+ Coverage    68.2%   73.6%   +5.4%     
========================================
  Files         256     278     +22     
  Lines        9442   10918   +1476     
  Branches     1010    1179    +169     
========================================
+ Hits         6442    8046   +1604     
+ Misses       2777    2529    -248     
- Partials      223     343    +120

Flag	Coverage Δ
#integrationtests	`56.7% <ø> (?)`
#unittests	`67.4% <72.1%> (-0.9%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...vices/storage/src/simcore_service_storage/utils.py	`69.5% <62.5%> (ø)`
...ervices/storage/src/simcore_service_storage/dsm.py	`65.0% <73.5%> (ø)`
...es/storage/src/simcore_service_storage/settings.py	`100.0% <0.0%> (ø)`
...ces/storage/src/simcore_service_storage/datcore.py	`15.4% <0.0%> (ø)`
...s/storage/src/simcore_service_storage/db_tokens.py	`90.0% <0.0%> (ø)`
...s/storage/src/simcore_service_storage/resources.py	`100.0% <0.0%> (ø)`
...age/src/simcore_service_storage/datcore_wrapper.py	`56.1% <0.0%> (ø)`
...storage/src/simcore_service_storage/application.py	`46.6% <0.0%> (ø)`
...ervices/storage/src/simcore_service_storage/cli.py	`75.7% <0.0%> (ø)`
...es/storage/src/simcore_service_storage/__init__.py	`100.0% <0.0%> (ø)`
... and 44 more

sanderegg

I think you can use tenacity for the exponential backoff, and it works through a decorator:
https://github.com/jd/tenacity

sanderegg · 2020-07-30T15:08:59Z

services/storage/src/simcore_service_storage/dsm.py

+        Will retry max_update_retries to update the metadata on the file after an upload.
+        If it is not successfull it will exit and log an error.


why not use tenacity here? I think most of these features are available in the @Retry decorator.

stuff like this:

@retry(wait=wait_exponential(multiplier=1, min=4, max=10)) def wait_exponential_1(): print("Wait 2^x * 1 second between each retry starting with 4 seconds, then up to 10 seconds, then 10 seconds afterwards") raise Exception

Because it was too generic for my use case. I needed to sleep only for a subsection of the function

Edit: I have skipped tenacity on purpose here. Also the exponential backoff generator implementation comes from the backoff library.

services/storage/src/simcore_service_storage/dsm.py

- using logging ensure_future - freformatted code

odeimaiz

Demo worked like a charm.

- UI/UX improvements (#1657) - Bump yarl from 1.4.2 to 1.5.1 in /packages/postgres-database (#1665) - Bump ujson from 3.0.0 to 3.1.0 in /packages/service-library (#1664) - Bump pytest-docker from 0.7.2 to 0.8.0 in /packages/service-library (#1647) - Improving storage performance (#1659) - Bump aiozipkin from 0.6.0 to 0.7.0 in /packages/service-library (#1642) - Theming (#1656) - Platform stability: (#1645) - is1594 fix and re-activate e2e testing (#1620) - 2 bugs fixed + Some improvements (#1634) - Fixes default (#1640) - Bump lodash from 4.17.15 to 4.17.19 (#1639) - Is1585/cleanup storage (#1586) - Fixes on publish studies handling (#1632) - Some enhancements and bug fixes (#1608) - Improve e2e (#1631) - filter studies by name before deleting them (#1629) - Maintenance/upgrades test tools (#1628) - Bugfix/concurent opening projects (#1598) - Bugfix/allow reading groups anonymous user (#1615) - Bump docker from 4.2.1 to 4.2.2 in /packages/postgres-database (#1605) - fix testing if node has gpu support (#1604) - [bugfix] Invalidate cache before starting a study (#1602) - Feature/fix e2e 2 (#1600) - fix deploy not needing e2e testing since it is disabled - reduce cardinality of metrics (#1593) - Excudes e2e stage from include until fixed (#1595) - Shared project concurrency (frontend) (#1591) - Homogenize studies and services (#1569) - [feature] UI Fine grained access - project locking and notification - Bugfix/apiserver does not need sslheaders (#1564) - Cleanup catalog service (#1582) - Maintenance/cleanup api server (#1578) - Adds support for GPU scheduling of computational services (#1553) - Maintenance/upgrades and tooling (#1546) - Is1570/study fails 500 (#1572) - Bump faker from 4.1.0 to 4.1.1 in /packages/postgres-database (#1573) - maintenance fix codecov reports (#1568) - Manage groups, Share studies (#1512) - Is/add notebook migration script (#1565) - Is1269/api-server upgrade (#1475) - added simcore_webserver_service in pytest simcore package (#1563) - add traefik endpoint to api-gateway (#1555)

Andrei Neagu added 4 commits July 30, 2020 15:33

added expo generator for more fine grained sleep

748de16

deque is more memory and time efficient

110b238

removing time wasters

ffc637b

updating file metadata after upload

c92d0e5

the metadata will be synced by a background worker spawned for for each file when a new upload url is generated

GitHK added t:enhancement Improvement or request on an existing feature a:storage issue related to storage service labels Jul 30, 2020

GitHK added this to the Xie-Xie milestone Jul 30, 2020

GitHK self-assigned this Jul 30, 2020

GitHK marked this pull request as draft July 30, 2020 13:54

GitHK changed the title ~~Improving storage performance~~ WIP: Improving storage performance Jul 30, 2020

GitHK and others added 2 commits July 30, 2020 16:38

Merge branch 'master' into improve-storage-performance

ecb2616

trying to fix codeclimate complaints

860257d

GitHK requested review from mguidon, odeimaiz and sanderegg July 30, 2020 14:55

GitHK marked this pull request as ready for review July 30, 2020 14:55

GitHK changed the title ~~WIP: Improving storage performance~~ Improving storage performance Jul 30, 2020

sanderegg requested changes Jul 30, 2020

View reviewed changes

- added missing annotations

91a8029

- using logging ensure_future - freformatted code

GitHK requested a review from sanderegg July 30, 2020 15:35

odeimaiz approved these changes Jul 31, 2020

View reviewed changes

sanderegg approved these changes Aug 3, 2020

View reviewed changes

Merge branch 'master' into improve-storage-performance

fdb565a

odeimaiz merged commit f0e11ca into ITISFoundation:master Aug 3, 2020

odeimaiz mentioned this pull request Aug 4, 2020

FREEZE_XieXie #1669

Merged

GitHK mentioned this pull request Aug 17, 2020

platform stability #1426

Closed

sanderegg mentioned this pull request Aug 21, 2020

FREEZE_DaJia1 #1733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving storage performance #1659

Improving storage performance #1659

GitHK commented Jul 30, 2020 •

edited

Loading

codecov bot commented Jul 30, 2020 •

edited

Loading

sanderegg left a comment

sanderegg Jul 30, 2020 •

edited

Loading

GitHK Jul 30, 2020 •

edited

Loading

odeimaiz left a comment

		Will retry max_update_retries to update the metadata on the file after an upload.
		If it is not successfull it will exit and log an error.

Improving storage performance #1659

Improving storage performance #1659

Conversation

GitHK commented Jul 30, 2020 • edited Loading

What do these changes do?

Removed

Added

Related issue number

How to test

Checklist

codecov bot commented Jul 30, 2020 • edited Loading

Codecov Report

sanderegg left a comment

Choose a reason for hiding this comment

sanderegg Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

GitHK Jul 30, 2020 • edited Loading

Choose a reason for hiding this comment

odeimaiz left a comment

Choose a reason for hiding this comment

GitHK commented Jul 30, 2020 •

edited

Loading

codecov bot commented Jul 30, 2020 •

edited

Loading

sanderegg Jul 30, 2020 •

edited

Loading

GitHK Jul 30, 2020 •

edited

Loading