Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

Open
abrichr opened this issue Jun 4, 2024 · 4 comments
Open

Add Replay ("Policy") performance tests (TaskCompletionRateTest) #704

abrichr opened this issue Jun 4, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@abrichr
Copy link
Member

abrichr commented Jun 4, 2024

Feature request

We need to extend #314 to include some useful tests and generate an automated report.

This involves:

  • Create recordings of three tasks:
  1. Open a calculator and perform a short calculation
  2. Open a spreadsheet (e.g. https://github.com/OpenAdaptAI/OpenAdapt/blob/cb70f35985eeb579fd3e13b20a9839b10729921d/tests/assets/excel.png), open a time tracking app (e.g. https://clockify.me), copy a week's worth of data from the spreadsheet into the app, and save/submit the data in the app. (e.g. https://www.youtube.com/watch?v=omP11q-o_0I)
    Alternatively if browser events are not yet available (see Add Chrome browser event in database during recording #744), replicate something similar with two different spreadsheets open simultaneously (one for reading, one for writing).
  3. Open powerpoint and create a short presentation.
  • Save them as fixtures
  • Add automated tests to run a replay (with configurable strategy, defaulting to VanillaReplayStrategy) and evaluate the outcome. Outcome evaluation can be implemented with WindowEvent data.
  • Add a script to log the outcome results to stdout and/or to a file.

Motivation

Scientific rigor and reproducibility.

@abrichr abrichr added the enhancement New feature or request label Jun 4, 2024
@abrichr abrichr changed the title Add baseline tests Add performance tests Jun 4, 2024
@abrichr abrichr added good first issue Good for newcomers help wanted Extra attention is needed labels Jun 4, 2024
@abrichr
Copy link
Member Author

abrichr commented Jun 4, 2024

@seanmcguire12 your assistance would be greatly appreciated!

@abrichr
Copy link
Member Author

abrichr commented Jun 7, 2024

@KrishPatel13 outcome evaluation for web apps will depend on finishing #364

@abrichr
Copy link
Member Author

abrichr commented Jun 10, 2024

Save a fixture with recording.task_description = "test: calculate 2x3" that is just like the video currently on the website.

Test 1: Run the VanillaReplayStrategy with empty instructions (or give it instructions like replay the recording verbatim). Use openadapt.window to assert that the calculator display area contains the expected value 6.

Test 2: Run the VanillaReplayStrategy with instructions like calculate 9-8+7. Use the same API to assert that the calculator display area contains the expected value 8.

Parameterize the replay strategy and iterate over all of them. Produce a report with the results.

@abrichr
Copy link
Member Author

abrichr commented Jun 13, 2024

@seanmcguire12 please submit a PR with your work-in-progress 🙏

@abrichr abrichr changed the title Add performance tests Add performance tests (TaskCompletionRateTest) Jun 13, 2024
@abrichr abrichr changed the title Add performance tests (TaskCompletionRateTest) Add Replay ("Policy") performance tests (TaskCompletionRateTest) Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants