Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds IT that verifies scans see data written by concurrent writers #3639

Merged
merged 5 commits into from
Jul 24, 2023

Conversation

keith-turner
Copy link
Contributor

The IT ensures that when there are concurrent writes, scans, and table operations that scans always see any data written before the scan started.

The IT ensures that when there are concurrent writes, scans, and table
operations that scans always see any data written before the scan
started.
@keith-turner
Copy link
Contributor Author

Created this IT for testing #3634, however realized its good for 2.1.X and #3573 to stress read patterns the metadata table relies on, so decided to create it in 2.1 and merge it forward to elasticity.

@dlmarion
Copy link
Contributor

This test is a great idea

@keith-turner
Copy link
Contributor Author

I added merge, bulk import, compaction filtering, and batch scan to the test in 7cbc635. I also added some debugging tips based on a problem I ran into where bulk importing deletes was causing the test to fail. Spent hours trying to track down the cause and finally developed a technique that made tracking the problem down really quick. I documented the technique in the code and left a utility method related to the technique in the test.

The problem I ran into is documented here in the test. The problem is caused by the fact that major compactions no longer force a flush, this is a change made some time in the past, trying to find an issue for that. Not flushing before a full major compaction is ok when only doing live ingest or only doing bulk import, but for a mix of bulk import and live ingest not flushing before full major compaction has a consequence as I discovered in the test.

@@ -105,7 +105,7 @@ public int hashCode() {

@Override
public String toString() {
return size + " " + numEntries;
return size + " " + numEntries + " " + time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this adds to the existing pattern - but would it be worth it to use a more standardized toString format that includes the field names? Otherwise, you need to read the code to understand that '12 1000 1234` is 'size: 12, numEntries 1000, time 1234'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to improve the toString. I am thinking maybe I should not make this change in 2.1 because I didn't analyze how this toString method was used. I made the change when trying to figure out the problem with bulk importing delete keys. I'll either improve it or not change it after looking into how its used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That class is used in a huge number of places. I decided to err on the side of caution and not change its toString rather try to analyze everywhere its used.

Copy link
Contributor

@EdColeman EdColeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion on the toString, but that can be ignored if there is a reason not to use a standard toString format,

@keith-turner keith-turner merged commit edc16e5 into apache:2.1 Jul 24, 2023
* This test verifies that scans will always see data written before the scan started even when
* there are concurrent scans, writes, and table operations running.
*/
@Tag(SUNNY_DAY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to add this to the sunny tests, as it's not testing base functionality, but it's testing more complicated edge cases that can be left in the larger test suite.

@ctubbsii ctubbsii modified the milestones: 4.0.0, 2.1.2 Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants