Blackbox testing and fallback testing #715

AlexGuteniev · 2020-04-14T09:44:24Z

I notice tests I've seen so far are completely black box, this means they don't use internal function, and potentially can test other STL. Is this a requirement?
If so, this raises question how to test fallback scenarios for std::call_once or std::atomic<T>::wait. As I understand, CI runs on "normal" machines, not outdated.

The text was updated successfully, but these errors were encountered:

miscco · 2020-04-14T11:16:06Z

Not a Maintainer but my 2 Cents

Test should never depend on internal machinery but only externally visible public API. You get in all sorts of trouble because to access internal machinery you generally need to expose it and once you do all hell breaks loose. You might want to have a look on @tituswinters talks about refactoring and maintainability

Technically there is also no reason why any of those tests should not run against a different standard library implementation (modulo the obvious not yet implemented features). It is a standard after all.

With fallbacks do you mean unsupported systems such as windows XP? You can always run the test suite manually on your local machine for whatever platform you desire.

AlexGuteniev · 2020-04-14T11:37:29Z

With fallbacks do you mean unsupported systems such as windows XP?

For call_once it is just XP/2003.
However for #593 it also means Vista or Win7.
Atomic wait has three OS level support:

Native Win8+
SRW Lock+Cond.Var. Vista+
Timed backoff XP/2003

The later is for internal use by parallel_algorithms.cpp or call_once, or shared_ptr, whatever.

You can always run the test suite manually on your local machine for whatever platform you desire.

So I need at least two machines: 64 bit XP or 2003 and 64 bit Vista or Win7. The first is sort of exotic today.

Anyway, actually I'm running these tests by temporary altering the implementation (changing a variable in debugger), so that's not a problem to test these locally.

Not having these tests in CI is more of concern.

StephanTLavavej · 2020-04-14T12:06:32Z

As a general rule, we test the Standard interface for Standard semantics. Occasionally we test implementation-specific guarantees, commenting that we’re doing so. In exceptional cases, we directly test internal machinery, usually when doing so provides better correctness validation than would easily be achievable. The Boyer-Moore algorithm tests are one example.

At this time, we haven’t attempted to run our tests against other STL implementations, although that is potentially interesting in the future, so remaining disciplined about non-Standard testing (keeping it uncommon and prominently noted) is good for multiple reasons.

Also in exceptional cases, we can arrange for test code to trigger fallback paths; there was one such example in our ConcRT adapting layer, although we never used that in automated testing. (Allocation failure fallbacks in stable_sort can be triggered through Standard code.)

I think your scenario justifies an exception for using non-Standard machinery in test code.

tituswinters · 2020-04-14T12:08:54Z

It is distinctly an anti-pattern for both your tests and the APIs under test to need to reach into internals / implementation details.

If you must do that, it is suggestive that the public APIs in question are at the wrong level of granularity (you have to pierce that abstraction to evaluate the details in question).

If you do break that abstraction, you've now tightly coupled the tests and the implementation, which weakens the validity of the test and adds busywork when internals change in ways that are unobservable by the public API. (See Chapter 12 in my recent book.)

AlexGuteniev · 2020-04-14T13:33:04Z

I do not see a completely clean solution, except running XP and Vista on Contniuos Integration server.

STL API is defined by C++ Standard, which does not imply anything about operating system.

Yet, OS versions are different. It is suboptimal not to take advantage of Win8+ Native API (performance advantage due to more efficient native implementeation), and it is not possible to avoid obligation to support older OS versions.

I came up with relatively clean solution:
Double underscore __std_atomic_set_api_level function.
Must be called before the implementation internally selects platform API - otherwise has no effect.

One test that does not call it -- it is standard test.
Other two non-standard tests that differ only by calling __std_atomic_set_api_level with 0x600 and 0x502, otherwise sthey share imlementation with the first test.

See this commit: f2256d9

MikeGitb · 2020-04-14T14:20:54Z

Everything else aside: If the code is supposed to work on Windows XP, shouldn't testing on a genuine WinXP Machine (Physical or VM) be part of the regular testing process?

StephanTLavavej · 2020-04-15T02:27:45Z

If you must do that, it is suggestive that the public APIs in question are at the wrong level of granularity (you have to pierce that abstraction to evaluate the details in question).

Yep. Of course, WG21 controls the public API, and even when it's at the right level of granularity, piercing the abstraction is sometimes worth it (very rarely).

Here's the specific example I mentioned - the Boyer-Moore algorithm in the Standard has a very nice interface for users, but it's also worthwhile to punch through and directly inspect the "delta_2" table within:

STL/tests/std/tests/P0220R1_searchers/test.cpp

Lines 26 to 44 in 3141965

    
           // Tests for Boyer-Moore Table2 construction 
        
           // This isn't easily observable from the public interface (other than that matches will be wrong), 
        
           // and the algorithm is self contained, so we're testing it directly. 
        
           void test_case_boyer_moore_table2_construction(string_view sv, initializer_list<ptrdiff_t> expectedValues) { 
        
               assert(sv.size() == expectedValues.size()); 
        
               vector<ptrdiff_t> results(sv.size()); 
        
               equal_to<> eq; 
        
               _Build_boyer_moore_delta_2_table(results.data(), sv.data(), static_cast<ptrdiff_t>(sv.size()), eq); 
        
               assert(equal(results.begin(), results.end(), expectedValues.begin(), expectedValues.end())); 
        
           } 
        
           void test_boyer_moore_table2_construction() { 
        
               // Test cases from Boyer and Moore's original 1977 paper on page 4: 
        
               test_case_boyer_moore_table2_construction("abcxxxabc", {14, 13, 12, 11, 10, 9, 11, 10, 1}); 
        
               test_case_boyer_moore_table2_construction("abyxcdeyx", {17, 16, 15, 14, 13, 12, 7, 10, 1}); 
        
               // Test case from Knuth, Morris, and Pratt's updated 1977 paper, from which our delta2 
        
               // construction algorithm is derived, on page 20: 
        
               test_case_boyer_moore_table2_construction("badbacbacba", {19, 18, 17, 16, 15, 8, 13, 12, 8, 12, 1}); 
        
           }

(We recently discovered in #713 that the Knuth-Morris-Pratt algorithm for constructing the delta_2 table was incorrect - not just the implementation, the published algorithm - and we'll need to add additional test cases here and elsewhere. We should have had randomized testing of the public interface to begin with, but I argue that directly testing the results of this nontrivial internal algorithm provides additional value.)

Another potential example would have been directly testing the bignum implementation powering <charconv> from_chars(), which is otherwise invisible. That implementation has had a correctness issue in the past (now fixed) that direct testing would have been more likely to discover. The commonality is "small simple-looking interface conceals high-powered algorithms and data structures".

AlexGuteniev · 2020-04-15T14:10:21Z

Aside XP, I hope that all tests would run on Vista or 7 at least once before shipping.
In #593 I've added atomic waits into other STL parts, to address #370 and #598 .
Now atomic wait changes have code paths not directly covered by atomic wait tests, but covered by, for example, tr1 tests. Note that it is Vista/7 code path in that change is non-trivial, XP/2003 or 8+ are trivial.

AlexGuteniev · 2020-10-02T06:42:22Z

Everything else aside: If the code is supposed to work on Windows XP, shouldn't testing on a genuine WinXP Machine (Physical or VM) be part of the regular testing process?

The code is no longer supposed to work on Windows XP.

The variation between Vista and Win 7 is mutexes (TryAcquire... for SRWLOCK)
Another variation is atomic wait implementation which is between Win 8 and earlier (WaitOnAddress). Note that odd-sized atomics still use pre-Win 8 fallback on Win 8+, so dedicated pre-Win 8 coverage isn't even essential.

So in short after XP drop, the variation is too low.
Still in a perfect world need genuine Vista SP0 to test.

AlexGuteniev added the question Further information is requested label Apr 14, 2020

StephanTLavavej closed this as completed Apr 14, 2020

StephanTLavavej added the resolved Successfully resolved without a commit label Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blackbox testing and fallback testing #715

Blackbox testing and fallback testing #715

AlexGuteniev commented Apr 14, 2020

miscco commented Apr 14, 2020

AlexGuteniev commented Apr 14, 2020

StephanTLavavej commented Apr 14, 2020

tituswinters commented Apr 14, 2020

AlexGuteniev commented Apr 14, 2020

MikeGitb commented Apr 14, 2020

StephanTLavavej commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

AlexGuteniev commented Oct 2, 2020

Blackbox testing and fallback testing #715

Blackbox testing and fallback testing #715

Comments

AlexGuteniev commented Apr 14, 2020

miscco commented Apr 14, 2020

AlexGuteniev commented Apr 14, 2020

StephanTLavavej commented Apr 14, 2020

tituswinters commented Apr 14, 2020

AlexGuteniev commented Apr 14, 2020

MikeGitb commented Apr 14, 2020

StephanTLavavej commented Apr 15, 2020

AlexGuteniev commented Apr 15, 2020

AlexGuteniev commented Oct 2, 2020