Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c/driver/common/utils.c: tables are found when they should not #1100

Closed
OleMussmann opened this issue Sep 23, 2023 · 7 comments · Fixed by #1168
Closed

c/driver/common/utils.c: tables are found when they should not #1100

OleMussmann opened this issue Sep 23, 2023 · 7 comments · Fixed by #1168
Assignees

Comments

@OleMussmann
Copy link
Contributor

Certain string matchings fail in c/driver/common/utils.c when trying to find a table by name. A loop iterates through all tables and tries to find a match. The matching code is currently:

if (!strncmp(name.data, table_name, name.size_bytes) { return table; }

name.data is a contiguous buffer of table names, like adbc_pkey_testtest_canceladbc_fkey_test_baseadbc_fkey_testadbc_composite_pk[...]. The table_name (let's say adbc_pkey_test) is now compared to the first name.size_bytes of this buffer, which usually works out. If a non-existing table_name string is composed of an existing table name plus starting characters of another table, a match can be made where there is none. A correctly matching example is:

adbc_pkey_test
adbc_pkey_testtest_canceladbc_fkey_test_baseadbc_fkey_testadbc_composite_pk[...]
             ^
(compared until here)

An example that matches erroneously is:

adbc_pkey_testtest
adbc_pkey_testtest_canceladbc_fkey_test_baseadbc_fkey_testadbc_composite_pk[...]
                 ^
(compared until here)

N.B.: The tables are returned in nondeterministic order, so this might not fail reliably.

Software version: current main branch 8a832d6

@lidavidm
Copy link
Member

ah...fun.

@lidavidm lidavidm added this to the ADBC Libraries 0.8.0 milestone Sep 23, 2023
@lidavidm
Copy link
Member

at some point I'm going to refactor things into C++ (and hopefully C++17) and then we can just use string_view to compare things sensibly

@lidavidm
Copy link
Member

@ywc88 might you be able to fix this for now? guess we have to do a bit of arithmetic and strncmp

@OleMussmann
Copy link
Contributor Author

As a mitigation, you could also compare the lengths of the strings. Like:

if ((name.size_bytes == (int64_t) strlen(table_name)) && (!strncmp(name.data, table_name, name.size_bytes))) { ... }

That should at least reduce the chance of false matches, I'm not 100% sure it's a full fix. By the way, the current method of string comparisons is also used in many other functions, it would need some testing to figure out if a fix needs to be applied somewhere else as well.

@WillAyd
Copy link
Contributor

WillAyd commented Oct 2, 2023

Hmm so the ArrowStringView that is the name member in your code above has a size_bytes that exceeds 14 bytes? I thought the StringView would be tracking that to make sure we don't overrun the comparison length like this

@Tropicanic Tropicanic self-assigned this Oct 4, 2023
@OleMussmann
Copy link
Contributor Author

OleMussmann commented Oct 4, 2023

After some more testing, I think it works a bit different than stated in the issue text above, my apologies. The bug is still there, the explanation might be slightly different, though. StringView works fine and returns the correct length. The problem lies in the way strncmp is used: it evaluates the comparison after name.size_bytes bytes. Imagine I search for a table_name adbc_pkey_test_something_something with a length of 34 bytes. It will match an existing table called adbc_pkey_test with a length of 14 bytes. Let's see this typed out:

adbc_pkey_test_something_something <- long table_name
adbc_pkey_testtest_canceladbc_fkey_test_baseadbc_fkey_testadbc_composite_pk[...]  <- name.data
             ^
(compared until here, name.size_bytes: 14 bytes)

The strncmp comparison stops after 14 bytes (name.size_bytes), not checking the full length of table_name. It finds thus false positive matches when a table name starts with the name of an existing table and extends beyond.

Still, comparing both string lengths should probably help here, e.g. as above:
if ((name.size_bytes == (int64_t) strlen(table_name)) && (!strncmp(name.data, table_name, name.size_bytes))) { ... }

Whether a fix like this is needed in other comparisons, I do not know. I hope this is the correct interpretation now. I ran into this issue by trying to see how certain tests fail, but sometimes they passed where they should not.

@WillAyd
Copy link
Contributor

WillAyd commented Oct 4, 2023

Ah nice find. I think the upfront strlen check makes sense. Should likely be done for all comparisons in the module

lidavidm pushed a commit that referenced this issue Oct 5, 2023
… correctly (#1168)

Fixes #1100 

Test before fix:
```
Expected equality of these values:
  AdbcGetObjectsDataGetTableByName(&mock_data, "mock_catalog", "mock_schema", "table_suffix")
    Which is: 0x16d014ee8
  &mock_table_suffix
    Which is: 0x16d014ea8
arrow-adbc/c/driver/common/utils_test.cc:220: Failure
Expected equality of these values:
  AdbcGetObjectsDataGetColumnByName(&mock_data, "mock_catalog", "mock_schema", "table", "column_suffix")
    Which is: 0x16d014df8
  &mock_column_suffix
    Which is: 0x16d014d48
arrow-adbc/c/driver/common/utils_test.cc:224: Failure
Expected equality of these values:
  AdbcGetObjectsDataGetConstraintByName(&mock_data, "mock_catalog", "mock_schema", "table", "constraint_suffix")
    Which is: 0x16d014d08
  &mock_constraint_suffix
    Which is: 0x16d014cc8
[  FAILED  ] AdbcGetObjectsData.GetObjectsByName (0 ms)
```

Test after fix:
```
$ ctest                                                                           
Test project arrow-adbc/build
    Start 1: adbc-driver-common-test
1/2 Test #1: adbc-driver-common-test ..........   Passed    0.25 sec
    Start 2: adbc-driver-sqlite-test
2/2 Test #2: adbc-driver-sqlite-test ..........   Passed    0.19 sec

100% tests passed, 0 tests failed out of 2

Label Time Summary:
driver-common    =   0.25 sec*proc (1 test)
driver-sqlite    =   0.19 sec*proc (1 test)
unittest         =   0.43 sec*proc (2 tests)

Total Test time (real) =   0.44 sec
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants