Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

abrichr · 2024-06-04T23:46:48Z

Feature request

We need to extend #314 to include some useful tests and generate an automated report.

This involves:

Create recordings of three tasks:

Open a calculator and perform a short calculation
Open a spreadsheet (e.g. https://github.com/OpenAdaptAI/OpenAdapt/blob/cb70f35985eeb579fd3e13b20a9839b10729921d/tests/assets/excel.png), open a time tracking app (e.g. https://clockify.me), copy a week's worth of data from the spreadsheet into the app, and save/submit the data in the app. (e.g. https://www.youtube.com/watch?v=omP11q-o_0I)
Alternatively if browser events are not yet available (see Add Chrome browser event in database during recording #744), replicate something similar with two different spreadsheets open simultaneously (one for reading, one for writing).
Open powerpoint and create a short presentation.

Save them as fixtures
Add automated tests to run a replay (with configurable strategy, defaulting to VanillaReplayStrategy) and evaluate the outcome. Outcome evaluation can be implemented with WindowEvent data.
Add a script to log the outcome results to stdout and/or to a file.

Motivation

Scientific rigor and reproducibility.

The text was updated successfully, but these errors were encountered:

abrichr · 2024-06-04T23:56:51Z

@seanmcguire12 your assistance would be greatly appreciated!

abrichr · 2024-06-07T17:37:25Z

@KrishPatel13 outcome evaluation for web apps will depend on finishing #364

abrichr · 2024-06-10T16:40:53Z

Save a fixture with recording.task_description = "test: calculate 2x3" that is just like the video currently on the website.

Test 1: Run the VanillaReplayStrategy with empty instructions (or give it instructions like replay the recording verbatim). Use openadapt.window to assert that the calculator display area contains the expected value 6.

Test 2: Run the VanillaReplayStrategy with instructions like calculate 9-8+7. Use the same API to assert that the calculator display area contains the expected value 8.

Parameterize the replay strategy and iterate over all of them. Produce a report with the results.

abrichr · 2024-06-13T13:43:52Z

@seanmcguire12 please submit a PR with your work-in-progress 🙏

abrichr added the enhancement New feature or request label Jun 4, 2024

abrichr changed the title ~~Add baseline tests~~ Add performance tests Jun 4, 2024

abrichr added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 4, 2024

abrichr changed the title ~~Add performance tests~~ Add performance tests (TaskCompletionRateTest) Jun 13, 2024

seanmcguire12 mentioned this issue Jun 13, 2024

Feature/performance test #749

Draft

7 tasks

abrichr changed the title ~~Add performance tests (TaskCompletionRateTest)~~ Add Replay ("Policy") performance tests (TaskCompletionRateTest) Jun 24, 2024

abrichr assigned Animesh404 Jun 24, 2024

Animesh404 mentioned this issue Jul 12, 2024

Feat/performance test #850

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

abrichr commented Jun 4, 2024 •

edited

Loading

abrichr commented Jun 4, 2024

abrichr commented Jun 7, 2024

abrichr commented Jun 10, 2024 •

edited

Loading

abrichr commented Jun 13, 2024 •

edited

Loading

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Comments

abrichr commented Jun 4, 2024 • edited Loading

Feature request

Motivation

abrichr commented Jun 4, 2024

abrichr commented Jun 7, 2024

abrichr commented Jun 10, 2024 • edited Loading

abrichr commented Jun 13, 2024 • edited Loading

abrichr commented Jun 4, 2024 •

edited

Loading

abrichr commented Jun 10, 2024 •

edited

Loading

abrichr commented Jun 13, 2024 •

edited

Loading