Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource Tracking improvements #1307

Open
1 of 3 tasks
Tracked by #1333
SCA-ZMT opened this issue Mar 28, 2024 · 4 comments
Open
1 of 3 tasks
Tracked by #1333

Resource Tracking improvements #1307

SCA-ZMT opened this issue Mar 28, 2024 · 4 comments
Assignees

Comments

@SCA-ZMT
Copy link
Contributor

SCA-ZMT commented Mar 28, 2024

Tasks

Preview Give feedback
  1. matusdrobuliak66
  2. 3 of 4
    GitHK matusdrobuliak66
    odeimaiz
  3. matusdrobuliak66
@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented Apr 23, 2024

Performance Analysis

Preparation

Check queries used for testing
  • creation of data:
INSERT INTO resource_tracker_service_runs
(product_name, service_run_id, wallet_id, wallet_name, pricing_plan_id, simcore_user_agent, user_id, user_email, project_id, project_name, node_id, node_name, service_key, service_version, service_type, service_resources, service_additional_metadata, started_at, stopped_at, service_run_status, modified, last_heartbeat_at, pricing_unit_id, pricing_unit_cost_id, pricing_unit_cost, service_run_status_msg, missed_heartbeat_counter)
select 's4l', concat('test_', s), 4, 'Matus stress testing', NULL, 'undefined', 146457, '[email protected]', '52f4a2c4-36cb-11ee-a51f-02420ac12330', 'Stress testing', '52f4a2c4-36cb-11ee-a51f-02420ac12330', 'sleeper', 'simcore/services/comp/itis/sleeper', '2.1.6', 'DYNAMIC_SERVICE', '{}', '{}', now(), now(), 'SUCCESS', now(), now(), NULL, NULL, 0, 'testing', 0
FROM generate_series(1, 1000000) as s;

INSERT INTO resource_tracker_credit_transactions
(product_name, wallet_id, wallet_name, pricing_plan_id, user_id, user_email, osparc_credits, transaction_status, transaction_classification, service_run_id, payment_transaction_id, created, last_heartbeat_at, modified, pricing_unit_id, pricing_unit_cost_id)
SELECT 's4l', CASE WHEN random() < 0.05 THEN 0 ELSE 4 END, 'Matus stress testing', NULL, 146457, '[email protected]', -random(), CASE WHEN random() < 0.9999 THEN 'BILLED'::credittransactionstatus ELSE 'PENDING'::credittransactionstatus end, 'DEDUCT_SERVICE_RUN', service_run_id, '', now(), now(), now(), NULL, NULL
FROM resource_tracker_service_runs where wallet_name = 'Matus stress testing' and service_run_id LIKE 'test_%';
  • testing:
explain ANALYSE SELECT sum(resource_tracker_credit_transactions.osparc_credits) AS sum_1 
FROM resource_tracker_credit_transactions 
WHERE resource_tracker_credit_transactions.product_name = 's4l' AND resource_tracker_credit_transactions.wallet_id = 4 AND resource_tracker_credit_transactions.transaction_status IN ('PENDING', 'BILLED');

explain ANALYSE SELECT resource_tracker_service_runs.product_name, resource_tracker_service_runs.service_run_id, resource_tracker_service_runs.wallet_id, resource_tracker_service_runs.wallet_name, resource_tracker_service_runs.pricing_plan_id, resource_tracker_service_runs.pricing_unit_id, resource_tracker_service_runs.pricing_unit_cost_id, resource_tracker_service_runs.pricing_unit_cost, resource_tracker_service_runs.user_id, resource_tracker_service_runs.user_email, resource_tracker_service_runs.project_id, resource_tracker_service_runs.project_name, resource_tracker_service_runs.node_id, resource_tracker_service_runs.node_name, resource_tracker_service_runs.service_key, resource_tracker_service_runs.service_version, resource_tracker_service_runs.service_type, resource_tracker_service_runs.service_resources, resource_tracker_service_runs.started_at, resource_tracker_service_runs.stopped_at, resource_tracker_service_runs.service_run_status, resource_tracker_service_runs.modified, resource_tracker_service_runs.last_heartbeat_at, resource_tracker_service_runs.service_run_status_msg, resource_tracker_service_runs.missed_heartbeat_counter, resource_tracker_credit_transactions.osparc_credits, resource_tracker_credit_transactions.transaction_status 
FROM resource_tracker_service_runs LEFT OUTER JOIN resource_tracker_credit_transactions ON resource_tracker_service_runs.service_run_id = resource_tracker_credit_transactions.service_run_id 
WHERE resource_tracker_service_runs.product_name = 's4l' AND resource_tracker_service_runs.user_id = 146457 ORDER BY resource_tracker_service_runs.started_at DESC
 LIMIT 20 OFFSET 0;
 
explain analyse SELECT * from aws_s3.query_export_to_s3('SELECT resource_tracker_service_runs.product_name, resource_tracker_service_runs.service_run_id, resource_tracker_service_runs.wallet_name, resource_tracker_service_runs.user_email, resource_tracker_service_runs.project_name, resource_tracker_service_runs.node_name, resource_tracker_service_runs.service_key, resource_tracker_service_runs.service_version, resource_tracker_service_runs.service_type, resource_tracker_service_runs.started_at, resource_tracker_service_runs.stopped_at, resource_tracker_credit_transactions.osparc_credits, resource_tracker_credit_transactions.transaction_status FROM resource_tracker_service_runs LEFT OUTER JOIN resource_tracker_credit_transactions ON resource_tracker_service_runs.service_run_id = resource_tracker_credit_transactions.service_run_id WHERE resource_tracker_service_runs.product_name = ''s4l'' AND resource_tracker_service_runs.user_id = 146457 ORDER BY resource_tracker_service_runs.started_at DESC',
aws_commons.create_s3_uri('matus-testing', 'resource-usage-tracker-service-runs/testing/test9.csv', 'us-east-1'), 'format csv, HEADER true');

Results

  • db.t3.medium (2 vCPU, 4GB RAM)
Rows Statement Execution time (ms) CSV size
1 000 000 sum credits 290.928
1 000 000 list (join) 2856.100
1 000 000 export 9692.535 192 MB
---------- ----------- ------------------- --------
5 000 000 sum credits 2025.984
5 000 000 list (join) 37840.759
5 000 000 export 83072.873 1 GB
---------- ----------- ------------------- --------
10 000 000 sum credits 11621.637
10 000 000 list (join) 90805.676
10 000 000 export 224646.016 2.1 GB
  • db.t3.xlarge (4 vCPU, 16GB RAM)
Rows Statement Execution time (ms) CSV size
1 000 000 sum credits 312.325
1 000 000 list (join) 1424.304
1 000 000 export 11081.482 192 MB
---------- ----------- ------------------- --------
5 000 000 sum credits 1247.781
5 000 000 list (join) 28066.748
5 000 000 export 41154.578 1 GB
---------- ----------- ------------------- --------
10 000 000 sum credits 11871.515
10 000 000 list (join) 30492.221
10 000 000 export 86182.641 2.1 GB

Conclusion

  • sum credits
    • at some point needs to be improved by having "Wallet Balances" table with the current balance of the wallet, so we do not need to recalculate whole history
  • list (join)
    • Query can be optimized: Add index product name and started at. For joining add foreign key between services_runs and credit_transactions table
    • Long term still might be problematic, therefore we should change the logic in the frontend/backend and allow only "preview" last N usage. If user wants more he will need to use export functionality. Which test showed that works really well from the perspective of exporting the data to S3.
  • export
    • Needs to be implemented as long running task

@sanderegg
Copy link
Member

some questions from our test together:

  • what are the implications on the database responsiveness when you run these queries?
  • we saw that the frontend was blocked when running these queries (with some taking a very long time, and other failing), did you find out why?

@matusdrobuliak66
Copy link
Contributor

some questions from our test together:

  • what are the implications on the database responsiveness when you run these queries?
  • we saw that the frontend was blocked when running these queries (with some taking a very long time, and other failing), did you find out why?

Yes, when the frontend lists usage endpoint, we join 2 tables in the database and filter/sort based on user requests. When this was called multiple times, it caused the database to be overloaded, resulting in timeouts of requests as the database was unable to process all the queries at once.

  1. 🎨 🗃️ improve performance of listing usage (🚨⚠️) osparc-simcore#5720 should optimize the query itself.
  2. We discussed with Odei that in the long term, we might change the logic of the endpoint and allow users to see only previews. This means the frontend will not overload the backend with so many queries. If a user wants more, they will need to use the export functionality.
  3. Also, as you can see from the observation, we are using the database with only 2 CPUs. When I increased the instance to 4 CPUs, the performance of this query improved significantly. With this, we could also process more queries in parallel.

@mrnicegyu11
Copy link
Member

after talking to @matusdrobuliak66 we can close this parent case, keeping the linked issue around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants