Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7296] Reduce CI Time by Minimizing Duplicate Code Coverage in Tests #10492

Merged
merged 2 commits into from
Jan 17, 2024

Conversation

jonvex
Copy link
Contributor

@jonvex jonvex commented Jan 11, 2024

Change Logs

testBootstrapRead and TestHoodieDeltaStreamerSchemaEvolutionQuick have many combinations of params. While it is good to test everything, there are lots of code paths that have extensive duplicate testing. Reduce the number of tests while still maintaining code coverage

Impact

faster CI

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@linliu-code
Copy link
Contributor

@jonvex, when is "fullTest" set to "true"?

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments. Can you please call out the savings in CI time from this

}
}
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                      ```
                      String tableType = COW, MOR
                      Boolean shouldCluster = true
                      Boolean shouldCompact = true
                      Boolean rowWriterEnable = true
                      Boolean addFilegroups = true
                      Boolean multiLogFiles = true
                      Boolean useKafkaSource= false, true
                      Boolean allowNullForDeletedCols=false,true
                      ```
                      
                      I wonder if we just do sth like this. with new file groups, multiple log files, alongside cluster and compaction, should be the more complex (superset) scenario. no?

}
}
}
}
}
} else {
b.add(Arguments.of("COPY_ON_WRITE", true, true, true, true, true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar. could we just come up with the "superset"/higher complexity combination here. or is that it?

@vinothchandar
Copy link
Member

vinothchandar commented Jan 17, 2024

@vinothchandar vinothchandar merged commit 960c395 into apache:master Jan 17, 2024
31 checks passed
yihua pushed a commit that referenced this pull request Feb 27, 2024
…ests (#10492)

* reduce combos of tests

* build success

---------

Co-authored-by: Jonathan Vexler <=>
yihua pushed a commit that referenced this pull request May 3, 2024
…ests (#10492)

* reduce combos of tests

* build success

---------

Co-authored-by: Jonathan Vexler <=>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants