-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds IT that verifies scans see data written by concurrent writers #3639
Conversation
The IT ensures that when there are concurrent writes, scans, and table operations that scans always see any data written before the scan started.
test/src/main/java/org/apache/accumulo/test/ScanConsistencyIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/ScanConsistencyIT.java
Outdated
Show resolved
Hide resolved
This test is a great idea |
I added merge, bulk import, compaction filtering, and batch scan to the test in 7cbc635. I also added some debugging tips based on a problem I ran into where bulk importing deletes was causing the test to fail. Spent hours trying to track down the cause and finally developed a technique that made tracking the problem down really quick. I documented the technique in the code and left a utility method related to the technique in the test. The problem I ran into is documented here in the test. The problem is caused by the fact that major compactions no longer force a flush, this is a change made some time in the past, trying to find an issue for that. Not flushing before a full major compaction is ok when only doing live ingest or only doing bulk import, but for a mix of bulk import and live ingest not flushing before full major compaction has a consequence as I discovered in the test. |
test/src/main/java/org/apache/accumulo/test/ScanConsistencyIT.java
Outdated
Show resolved
Hide resolved
…java fix check-style error
@@ -105,7 +105,7 @@ public int hashCode() { | |||
|
|||
@Override | |||
public String toString() { | |||
return size + " " + numEntries; | |||
return size + " " + numEntries + " " + time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that this adds to the existing pattern - but would it be worth it to use a more standardized toString format that includes the field names? Otherwise, you need to read the code to understand that '12 1000 1234` is 'size: 12, numEntries 1000, time 1234'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to improve the toString. I am thinking maybe I should not make this change in 2.1 because I didn't analyze how this toString method was used. I made the change when trying to figure out the problem with bulk importing delete keys. I'll either improve it or not change it after looking into how its used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That class is used in a huge number of places. I decided to err on the side of caution and not change its toString rather try to analyze everywhere its used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion on the toString, but that can be ignored if there is a reason not to use a standard toString format,
* This test verifies that scans will always see data written before the scan started even when | ||
* there are concurrent scans, writes, and table operations running. | ||
*/ | ||
@Tag(SUNNY_DAY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to add this to the sunny tests, as it's not testing base functionality, but it's testing more complicated edge cases that can be left in the larger test suite.
The IT ensures that when there are concurrent writes, scans, and table operations that scans always see any data written before the scan started.