Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split OSD tests by CI group #4745

Closed
ashwin-pc opened this issue Jun 5, 2024 · 16 comments
Closed

Split OSD tests by CI group #4745

ashwin-pc opened this issue Jun 5, 2024 · 16 comments
Assignees
Labels
enhancement New Enhancement

Comments

@ashwin-pc
Copy link
Member

Is your feature request related to a problem? Please describe

Currently OSD tests are run all together. OSD has a lot of tests that when all run together fail consistently.

Describe the solution you'd like

We should split the OSD tests by CI groups similar to the OSD CI and now the FTR CI as well

Describe alternatives you've considered

No response

Additional context

No response

@ashwin-pc ashwin-pc added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Jun 5, 2024
@peterzhuamazon
Copy link
Member

Current metrics:

ci-runn+ 1525 1353 99 21:11 pts/1 03:27:09 /home/ci-runner/.cache/Cypress/9.5.4/Cypress/Cypress --no-sandbox -- --run-project /tmp/tmphionwti_/OpenSearch-Dashboards --browser electron --env SECURITY_ENABLED=true,openSearchUrl=https://localhost:9200,WAIT_FOR_LOADER_BUFFER_MS=3000 --headed false --spec cypre

ps -p 1525 -o etime
    ELAPSED
   01:55:46
  Running:  core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/da          (27 of 57)
            shboard.spec.js


  Visualization Builder Dashboard Tests
    ✓ Should have valid visualizations (25439ms)
    ✓ Should be able to add a visualization

We have noticed when running OSD core integTest, after entering Security test mode it took nearly 2hrs to reach 27/57 test cases.

This next test case also seems hang as after 36min it still didnt move to another.

Thanks.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jun 21, 2024

When trying to use CYPRESS_NO_COMMEND_LOG=1 in opensearch-project/opensearch-dashboards-functional-test#1428, we see issues related to this test.


  Running:  core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/ba          (26 of 57)
            sic.spec.js


  Visualization Builder Base Tests
    ✓ Show existing visualizations in Visualize and navigate to it (21988ms)
    ✓ Navigate to Visualization Builder from Create Visualization (8311ms)
    1) Create new basic metric visualization
    2) Be able to add/ edit and remove a field
    3) Be able to save a visualization


  2 passing (5m)
  3 failing

  1) Visualization Builder Base Tests
       Create new basic metric visualization:
     AssertionError: Timed out retrying after 60000ms: Expected to find element: `[data-test-subj="homeIcon"]`, but never found it.
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/integration/core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/basic.spec.js:600:13)

  2) Visualization Builder Base Tests
       Be able to add/ edit and remove a field:
     AssertionError: Timed out retrying after 60000ms: Expected to find element: `[data-test-subj="homeIcon"]`, but never found it.
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/integration/core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/basic.spec.js:600:13)

  3) Visualization Builder Base Tests
       Be able to save a visualization:
     AssertionError: Timed out retrying after 60000ms: Expected to find element: `[data-test-subj="homeIcon"]`, but never found it.
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/integration/core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/basic.spec.js:600:13)

Related to: https://github.com/opensearch-project/opensearch-dashboards-functional-test/blame/2.15/cypress/integration/core-opensearch-dashboards/opensearch-dashboards/apps/vis_builder/basic.spec.js#L55-L57

It was suggested by team to reduce the defaultCommandTimeout from 60s to 10s in this PR: https://github.com/opensearch-project/opensearch-dashboards-functional-test/pull/1418/files#diff-db1f8bb89a89d5b089b39a2af3c27ab75092f1844a158eca0a76445d513d9044

It was later discovered that above change has failed many other OSD plugins tests as documented in opensearch-project/OpenSearch#14442, therefore we revert the above change in opensearch-project/opensearch-dashboards-functional-test#1421.

There are 2 approaches right now to resolve this:

  1. Team push for root cause the exact issue and fix on test cases
  2. Infra try to only apply the defaultCommandTimeout with 10s just for OSD Core and try to pass the checks.
  3. Seems like it is able to pass with another approach of add ci-groups changes for OSD tests #4796 at times. Monitoring.

Thanks.

@peterzhuamazon
Copy link
Member

After doing the following changes we now see a consistent failure on this error on NOSEC on ARM64/X64 TAR.

Running:  core-opensearch-dashboards/opensearch-dashboards/apps/vis_type_table            (3 of 5)
            /embed.spec.js                                                                          


  table visualization in embedded mode
    ✓ Should open table vis in embedded mode (31196ms)
    ✓ Should allow to filter in embedded mode (10004ms)
    ✓ Should filter for value in embedded mode (15156ms)
    1) Should filter out value in embedded mode


  3 passing (2m)
  1 failing

  1) table visualization in embedded mode
       Should filter out value in embedded mode:
     CypressError: Timed out retrying after 60050ms: `cy.click()` failed because this element is detached from the DOM.

`<button class="euiButtonEmpty euiButtonEmpty--primary euiButtonEmpty--small" type="button" aria-label="Filter out value: 0" data-test-subj="filterOutValue">...</button>`

Cypress requires elements be attached in the DOM to interact with them.

The previous command that ran was:

  > `cy.find()`

This DOM element likely became detached somewhere between the previous and current command.

Common situations why this happens:
  - Your JS framework re-rendered asynchronously
  - Your app code reacted to an event firing and removed the element

You typically need to re-query for the element or add 'guards' which delay Cypress from running new commands.

https://on.cypress.io/element-has-detached-from-dom
      at $Cy.ensureAttached (http://localhost:5601/__cypress/runner/cypress_runner.js:163936:76)
      at runAllChecks (http://localhost:5601/__cypress/runner/cypress_runner.js:150536:12)
      at retryActionability (http://localhost:5601/__cypress/runner/cypress_runner.js:150616:16)
      at tryCatcher (http://localhost:5601/__cypress/runner/cypress_runner.js:13022:23)
      at Function.Promise.attempt.Promise.try (http://localhost:5601/__cypress/runner/cypress_runner.js:10296:29)
      at whenStable (http://localhost:5601/__cypress/runner/cypress_runner.js:168808:63)
      at http://localhost:5601/__cypress/runner/cypress_runner.js:168305:14
      at tryCatcher (http://localhost:5601/__cypress/runner/cypress_runner.js:13022:23)
      at Promise._settlePromiseFromHandler (http://localhost:5601/__cypress/runner/cypress_runner.js:10957:31)
      at Promise._settlePromise (http://localhost:5601/__cypress/runner/cypress_runner.js:11014:18)
      at Promise._settlePromise0 (http://localhost:5601/__cypress/runner/cypress_runner.js:11059:10)
      at Promise._settlePromises (http://localhost:5601/__cypress/runner/cypress_runner.js:11139:18)
      at Promise._fulfill (http://localhost:5601/__cypress/runner/cypress_runner.js:11083:18)
      at http://localhost:5601/__cypress/runner/cypress_runner.js:12697:46
  From Your Spec Code:
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/support/index.js:3171:3)

The previous comments vis builder related errors are resolved after combing #4796 and 10s reduce on defaultCommandTimeout.

@peterzhuamazon
Copy link
Member

After more tries I realize my initial implementation of changing 60s to 10s is noop due to cypress only accept int instead of str as value type for defaultCommandTimeout.

This means if we apply the 10s correctly, the test will immediately fail from the start due to not enough wait:


  Running:  core-opensearch-dashboards/opensearch-dashboards/apps/vis_type_table            (1 of 5)
            /basic.spec.js


  table visualization basic functions
    1) "before all" hook for "Should apply changed params and allow to reset"


  0 passing (26s)
  1 failing

  1) table visualization basic functions
       "before all" hook for "Should apply changed params and allow to reset":
     AssertionError: Timed out retrying after 10000ms: Expected to find element: `[data-test-subj="superDatePickerstartDatePopoverButton"],[data-test-subj="superDatePickerShowDatesButton"]`, but never found it.

Because this error occurred during a `before all` hook we are skipping the remaining tests in the current suite: `table visualization basic f...`
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/support/index.js:1304:110)

So we back to use 60s for all test.

@peterzhuamazon
Copy link
Member

We have identified that the ftrepo is using ubuntu-latest while jenkins is using alma8.
Opened an issue to track this so ftrepo is onboarding jenkins ci images.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jun 22, 2024

More updates:
OSD Core test running on Windows is facing a json formatting issue and failed:
osd-core-jsonissue-stdout.txt

Related to this: opensearch-project/OpenSearch-Dashboards#6970 it seems like.

cc: @AMoo-Miki to take a look on this.
Thanks!

@peterzhuamazon
Copy link
Member

After doing the following changes we now see a consistent failure on this error on NOSEC on ARM64/X64 TAR.

Running:  core-opensearch-dashboards/opensearch-dashboards/apps/vis_type_table            (3 of 5)
            /embed.spec.js                                                                          


  table visualization in embedded mode
    ✓ Should open table vis in embedded mode (31196ms)
    ✓ Should allow to filter in embedded mode (10004ms)
    ✓ Should filter for value in embedded mode (15156ms)
    1) Should filter out value in embedded mode


  3 passing (2m)
  1 failing

  1) table visualization in embedded mode
       Should filter out value in embedded mode:
     CypressError: Timed out retrying after 60050ms: `cy.click()` failed because this element is detached from the DOM.

`<button class="euiButtonEmpty euiButtonEmpty--primary euiButtonEmpty--small" type="button" aria-label="Filter out value: 0" data-test-subj="filterOutValue">...</button>`

Cypress requires elements be attached in the DOM to interact with them.

The previous command that ran was:

  > `cy.find()`

This DOM element likely became detached somewhere between the previous and current command.

Common situations why this happens:
  - Your JS framework re-rendered asynchronously
  - Your app code reacted to an event firing and removed the element

You typically need to re-query for the element or add 'guards' which delay Cypress from running new commands.

https://on.cypress.io/element-has-detached-from-dom
      at $Cy.ensureAttached (http://localhost:5601/__cypress/runner/cypress_runner.js:163936:76)
      at runAllChecks (http://localhost:5601/__cypress/runner/cypress_runner.js:150536:12)
      at retryActionability (http://localhost:5601/__cypress/runner/cypress_runner.js:150616:16)
      at tryCatcher (http://localhost:5601/__cypress/runner/cypress_runner.js:13022:23)
      at Function.Promise.attempt.Promise.try (http://localhost:5601/__cypress/runner/cypress_runner.js:10296:29)
      at whenStable (http://localhost:5601/__cypress/runner/cypress_runner.js:168808:63)
      at http://localhost:5601/__cypress/runner/cypress_runner.js:168305:14
      at tryCatcher (http://localhost:5601/__cypress/runner/cypress_runner.js:13022:23)
      at Promise._settlePromiseFromHandler (http://localhost:5601/__cypress/runner/cypress_runner.js:10957:31)
      at Promise._settlePromise (http://localhost:5601/__cypress/runner/cypress_runner.js:11014:18)
      at Promise._settlePromise0 (http://localhost:5601/__cypress/runner/cypress_runner.js:11059:10)
      at Promise._settlePromises (http://localhost:5601/__cypress/runner/cypress_runner.js:11139:18)
      at Promise._fulfill (http://localhost:5601/__cypress/runner/cypress_runner.js:11083:18)
      at http://localhost:5601/__cypress/runner/cypress_runner.js:12697:46
  From Your Spec Code:
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress/support/index.js:3171:3)

The previous comments vis builder related errors are resolved after combing #4796 and 10s reduce on defaultCommandTimeout.

Create issue:

@peterzhuamazon
Copy link
Member

More Windows issues after applying the .gitattributes to fix json formatting:

  Running:  core-opensearch-dashboards/opensearch-dashboards/apps/data_explorer/            (5 of 8)
            shared_links.spec.js


  shared links
    shared links with state in query
      1) should allow for copying the snapshot URL
      √ should allow for copying the snapshot URL as a short URL
      2) should allow for copying the saved object URL
    shared links with state in sessionStorage
      √ should allow for copying the snapshot URL (17659ms)
      √ should allow for copying the snapshot URL as a short URL
      √ should allow for copying the saved object URL (13177ms)


  4 passing (3m)
  2 failing

  1) shared links
       shared links with state in query
         should allow for copying the snapshot URL:
     AssertionError: Timed out retrying after 60000ms: expected 'http://localhost:5601/app/data-explorer/discover?security_tenant=global#/?_a=(discover:(columns:!(_source),isDirty:!
f,sort:!()),metadata:(indexPattern:\'logstash-*\',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:\'2015-09-19T06:31:44.000Z\',to:\'2015-09-23T18:31:44
.000Z\'))&_q=(filters:!(),query:(language:kuery,query:\'\'))' to satisfy [Function]
      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress\integration\core-opensearch-dashboards\opensearch-dashboards\apps\data_explorer\shared_links.spec.js:262:8)

  2) shared links
       shared links with state in query
         should allow for copying the saved object URL:

      Timed out retrying after 60000ms
      + expected - actual

      -'http://localhost:5601/app/data-explorer/discover/#/view/ab12e3c0-f231-11e6-9486-733b1ac9221a?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3A
\'2015-09-19T06%3A31%3A44.000Z\'%2Cto%3A\'2015-09-23T18%3A31%3A44.000Z\'))'
      +'http://localhost:5601/app/data-explorer/discover/#/view/ab12e3c0-f231-11e6-9486-733b1ac9221a?_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3A
\'2015-09-19T13%3A31%3A44.000Z\'%2Cto%3A\'2015-09-24T01%3A31%3A44.000Z\'))'

      at Context.eval (http://localhost:5601/__cypress/tests?p=cypress\integration\core-opensearch-dashboards\opensearch-dashboards\apps\data_explorer\shared_links.spec.js:281:84)

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jun 22, 2024

Adding multiple fixes to the OSD Core related issues in ftrepo here, thanks @AMoo-Miki for providing help on Windows section.

@peterzhuamazon
Copy link
Member

peterzhuamazon commented Jun 22, 2024

Apparently this PR is missed in backports, thus showing all these errors here:

Screenshot 2024-06-22 at 8 28 38 AM

cc: @ashwin-pc @abbyhu2000
Thanks!

@peterzhuamazon
Copy link
Member

More fixes on the deb and rpm integtests:

@peterzhuamazon peterzhuamazon moved this from 🆕 New to 🏗 In progress in Engineering Effectiveness Board Jun 24, 2024
@peterzhuamazon peterzhuamazon moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Jun 24, 2024
@peterzhuamazon
Copy link
Member

After @rishabh6788 and @peterzhuamazon changes we now able to run green end-to-end integTest on Jenkins with ci groups.

Can we close this issue for now @ashwin-pc ?

Thanks.

@ashwin-pc
Copy link
Member Author

Do we have a follow up issue to improve this since you guys weren't happy manually configuring the ci groups in multiple places?

@rishabh6788
Copy link
Collaborator

From infra team's perspective the ci-group # is handled in test manifest, see https://github.com/opensearch-project/opensearch-build/blob/main/manifests/2.15.0/opensearch-dashboards-2.15.0-test.yml#L21.

So in future if any ci-group is added or deleted from FT repo then we just have to update this number.
There has to be clear communication between FT repo ci-group changes and infra team.

No more action items as of now.

@peterzhuamazon
Copy link
Member

As the new issue is created now in OSD core, we will close this old issue as it has been implemented in 2.15.0.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New Enhancement
Projects
Status: ✅ Done
Development

No branches or pull requests

4 participants