Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WT-13428 Implement block manager file write unit tests #11016

Closed
wants to merge 15 commits into from

Conversation

jiechenbo
Copy link
Contributor

Summarize the reason behind this change (this might be the problem you're solving, or the context around the request) and the solution you have chosen.

Copy link

Thanks for creating a pull request! The below questions and checklist are intended to help with verifying your change is well tested. Response is optional, but if you choose to respond please edit this comment.

What makes this change safe?

A good answer to this question helps the reviewers understand where they should focus their attention, so please consider these questions:

  • Is the change risky or not? Why?
  • What tests are you adding or changing? Why?
  • What existing tests are you relying on?
  • What, if anything, are you concerned about that you'd like the reviewer to focus on?
    References:
  • Risk level guide
  • Testing frameworks

Checklist before requesting a review

  • I have performed a self-review of my code.
  • I have made corresponding changes to the documentation (if applicable).
  • I have added/updated tests that demonstrate my fix is effective or that my feature works correctly.


// Test that the block was correctly written.
std::string buf(expected_str.length(), ' ');
block->fh->handle->fh_read(block->fh->handle, (WT_SESSION *)session, offset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use static_cast<WT_SESSION *>(session) since Server Code Style says Do not use C-style casts, ever.

return block;
}

TEST_CASE("Block: __wti_block_write_off", "[block_write]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving some notes for our meeting later today. I see that this test suggests that it tests __wti_block_write_off but then only calls __ut_block_write_off is that intentional?

@@ -545,7 +545,7 @@ __wti_block_alloc(WT_SESSION_IMPL *session, WT_BLOCK *block, wt_off_t *offp, wt_
WT_ASSERT_SPINLOCK_OWNED(session, &block->live_lock);

/* If a sync is running, no other sessions can allocate blocks. */
WT_ASSERT(session, WT_SESSION_BTREE_SYNC_SAFE(session, S2BT(session)));
// WT_ASSERT(session, WT_SESSION_BTREE_SYNC_SAFE(session, S2BT(session)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a fairly minor mock on the session here to provide a zero flag btree if we want.


#ifdef HAVE_UNITTEST
int
__ut_block_write_off(WT_SESSION_IMPL *session, WT_BLOCK *block, WT_ITEM *buf, wt_off_t *offsetp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we opt to test this via __wti_block_write_off we won't need to wrap this.

* std::cout << str2 << std::endl;
* REQUIRE((__ut_block_write_off(session->get_wt_session_impl(), block, buf, &offset, &size,
* &checksum, false, false, false)) == 0);
* validate_block_write(session->get_wt_session_impl(), block, str2, offset, size, checksum, 2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_block_write() parameters are in the wrong order.

validate_block_write(session->get_wt_session_impl(), block, offset, size, checksum,
expected_str, buf->size, expected_offset);

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more readable to use #if 0 to "comment out" code than an actual comment.

}

/*
* Does this need a test? It is hard to test whether the function below respects this or not.
Copy link
Contributor

@tod-johnson-mongodb tod-johnson-mongodb Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test with caller_locked=true is needed. Actually it is easy to check whether caller_lock=true works. It is hard to test whether caller_lock=false works.

  1. __wt_spin_lock(session, &block->live_lock);
  2. __ut_block_write_off() with caller_locked = true. If this does not hang __block_write_off() correctly did not lock the lock.
  3. validate_block_write()
  4. __wt_spin_trylock(session, &block->live_lock) should return EBUSY to show the lock is still locked.
  5. __wt_spin_unlock(session, &block->live_lock) to unlock the lock.

To test caller_locked=false you need two tests.

  1. __wt_spin_trylock(session, &block->live_lock) returns 0 to show the lock was unlocked.
  2. __wt_spin_unlock(session, &block->live_lock); to unlock it again.
  3. __ut_block_off_write() with caller_locked = false.
  4. validate_block_write()
  5. __wt_spin_trylock(session, &block->live_lock) returns 0 to show the lock was unlocked.
  6. __wt_spin_unlock(session, &block->live_lock) to unlock it again.
    This shows the write worked but does not show the block was ever locked.

Something like the below but more complicated.
Thread 1.

  1. __wt_spin_lock(session, &block->live_lock);
  2. int expected_write_time = How long the write is expected to take.
  3. pthread_create() thread2
  4. start_time = read time.
  5. __ut_block_write_off() with caller_locked=false
    Thread 1 hangs here.
  6. end_time = read time.
  7. WT_ASSERT(session, (end_time - start_time) > 2 * expected_write_time); to verify it hung.
  8. validate_block_write()
  9. __wt_spin_trylock(session, &block->live_lock) should return 0 to show the lock is unlocked.
  10. __wt_spin_unlock(session, &block->live_lock);

Thread 2.

  1. sleep(10 * expected_write_time);
  2. __wt_spin_unlock(session, &block->live_lock);
  3. pthread_exit(0);

I said something like the above since the above test would fail if right after pthread_create() there is a context switch from thread 1 to thread 2 and thread 2 completes both steps 1 and 2 before there is a context switch back to thread 1. In that case thread 1 would not hang and the difference end_time - start_time would be too small. This is unlikely in current multicore CPUs plus some OSs context switch during sleep. However it could happen. A more complicated test with critical sections could completely eliminate the false failure.

// Create WT_ITEM buffer and copy a string into it.
WT_ITEM *buf;
REQUIRE(__wt_scr_alloc(session->get_wt_session_impl(), 0, &buf) == 0);
REQUIRE(__wt_buf_initsize(session->get_wt_session_impl(), buf, DEFAULT_BLOCK_SIZE) == 0);
Copy link
Contributor

@luke-pearson luke-pearson Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought things seemed a bit odd here, especially the call to expected_str.copy. I didn't anticipate the rabbit hole I would get myself into. But I'm starting to understand the issues that this test has. We need to think about what we're testing here and I'm still wrapping myself around that question. Our first test is trying to write a string to file.

The more WT way of setting the buffer would be:

    WT_BLOCK *block = create_block(session, cp);
    REQUIRE(__wt_block_write_size(session->get_wt_session_impl(), block, &buf_memsize) == 0);
    REQUIRE(__wt_buf_init(session->get_wt_session_impl(), &buf, buf_memsize) == 0);
    REQUIRE(__wt_buf_set(session->get_wt_session_impl(), &buf, expected_str.c_str(), expected_str.length()) == 0);

The current implementation of the test doesn't respect the allocation size of the block manager and somehow gets it to return an offset of 5? That piqued my interest. The data itself is size 5 so why would the offset be 5? It should either be 0 as we wrote at the start of the block of 512 as we wrote at the start of the next "chunk" in the block. So I updated the code to my implementation but then I wrote at an offset of 512 which while expected was surprising, so I probed a bit further and found that this line in this change is why I wrote at 512:

    block->size = DEFAULT_BLOCK_SIZE;

Because we're using the in memory file system we're not getting the correct file header information counted when we call __wt_block_open. This originates from here:

WT_ERR(__wt_filesize(session, block->fh, &block->size));
. I tested a normal WT file being opened on disk and this call sets block->size to 4096. But if we use the in memory FS we start at size 0. Which is weird and I don't fully understand yet but explains why we need the size line you added. Still this seem wrong, but then it begs the question of what are we trying to test? Should the file have header info added?

I also noted that for the first test case we are only appending to the block, and wonder if for all test cases that is true. We also set the fit to best but to actually test that we'd need to do some pretty complicated writes and deletions so is that within the scope of this test?

{
// Test offset, size and checksum.
expected_offset += expected_size;
REQUIRE(offset == expected_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset should also always be modulo of alloc size.

@jiechenbo jiechenbo closed this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants