forked from apache/kudu
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[maintenance] Add privilege maintenance thread pool for privilege tab… #13
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Equal operator in gflag 'predicates' should be '=' but not '==' Change-Id: I84f8f6958697f79d05c5b7f7ba3c438b05db639c Reviewed-on: http://gerrit.cloudera.org:8080/12699 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon <[email protected]>
The newly-added sysinfo-test consists of scenarios for Linux-specific ParseMaxCpuIndex() function. This function is not implemented on macOS, so the sysinfo-test in its current state makes sense only for Linux Kudu builds. This is a follow-up to feebbb2. Change-Id: I3c954b66534690b2c50e86d15f9c9be9d4ce3ed7 Reviewed-on: http://gerrit.cloudera.org:8080/12703 Reviewed-by: Will Berkeley <[email protected]> Tested-by: Kudu Jenkins
DEPENDENCY_URL is defined in vars.sh and defaults to CLOUDFRONT_URL_PREFIX. CLOUDFRONT_URL_PREFIX is not used directly anywhere else. Change-Id: I948ea2f87d497d6c752a012f2c166d3f9b25a065 Reviewed-on: http://gerrit.cloudera.org:8080/12713 Tested-by: Kudu Jenkins Reviewed-by: Attila Bukor <[email protected]>
Added a basic tserver-level test to use the SplitKeyRange endpoint that would fail without this patch. Change-Id: If444ffa408b3827425ab07ef06ffd6ccc10e926e Reviewed-on: http://gerrit.cloudera.org:8080/12707 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Yao Xu <[email protected]> Reviewed-by: Grant Henke <[email protected]>
Adds handling of authz tokens to the Java client. The Java client will now cache tokens upon opening a table, and use them for RPCs that need them (e.g. Writes and Scans), reacquiring them when receiving word that they are expired. This is tested as follows: - TestAuthnTokenReacquire's test for scans and writes is repurposed to also test for reacquisition of authz tokens when they expire - basic tests are added to test the token cache - a test is added to test authz reacquisition in the case that a multi-master deployment undergoes a leadership change - a test is added to test authz reacquisition upon invalid or expired tokens during prolonged workloads against a multi-master deployment Change-Id: Iadd5f7709b45628d7ddd9e2b100d0dfaabbf15af Reviewed-on: http://gerrit.cloudera.org:8080/12279 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Hao Hao <[email protected]>
One reason that GetTableLocations can end up slow is contention on the TSDescriptor object. This patch changes the spinlock to an rw_spinlock. The benchmark improves throughput by about 77%: table_locations-itest --gtest_filter=\*Bench\* \ --benchmark_num_tablets 300 --benchmark_num_threads 30 \ --benchmark_runtime_secs=10 before fix: Count: 10555 Mean: 8567.29 Percentiles: 0% (min) = 68 25% = 7712 50% (med) = 8320 75% = 9152 95% = 10880 99% = 12224 99.9% = 17024 99.99% = 27776 100% (max) = 30384 with rwlock: Count: 18758 Mean: 4024.09 Percentiles: 0% (min) = 136 25% = 3472 50% (med) = 4032 75% = 4512 95% = 5248 99% = 6048 99.9% = 7712 99.99% = 13440 100% (max) = 13976 Change-Id: Id48635e49f11d20fa1802b17a3ff5771632dfd01 Reviewed-on: http://gerrit.cloudera.org:8080/12614 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Todd Lipcon <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
This is fallout from merging 7645f5b without rebasing on 8fbf1cc. Change-Id: I2d4ad780390620983d6437c3862a5f6d907788c1 Reviewed-on: http://gerrit.cloudera.org:8080/12721 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
Before this patch, a delete operation required that the key columns and only the key columns are set. If any additional column was set, the operation would fail server side. With this patch, any non-key columns are stripped when encoding the row. This allows extra columns to be set in a delete operation client side without inflating the message on the wire. Change-Id: Id302f349a35b9faf7382c74f9a3823f8e76f000c Reviewed-on: http://gerrit.cloudera.org:8080/12705 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Adar Dembo <[email protected]>
The accesses of ExternalDaemon::bound_rpc_addr() in the test's helper threads raced with the call to RestartAnyCrashedTabletServers() in the main thread; the latter resets ExternalDaemon.status_, the member from which the RPC address is derived. Noticed this on the flaky test dashboard. Change-Id: Iafd901b0efbe205136eae5ea5de31b6fe6565480 Reviewed-on: http://gerrit.cloudera.org:8080/12720 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
The publish script for the binary jar needs to sign each artifact before it can be released in a non-snapshot repository. This patch changes the script to sign the artifacts when invoking it in deploy mode. Additional changes: * Default to the release staging repo instead of the snapshot repo * Fix some script comments Change-Id: I205765a6a9bedba11a7c96d5a56435f7aed5071e Reviewed-on: http://gerrit.cloudera.org:8080/12691 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <[email protected]>
This test had a couple of test-only bugs related to bookkeeping, resulting in this test being flaky in certain scenarios. This patch fixes those bugs: 1. Ensure that we don't generate duplicate keys in the same list (a.k.a. RowSet) during the test because that can't occur "in the wild" and the MergeIterator does not support it. 2. Ensure that we don't maintain more than one instance of a single row key in the 'expected' result set. Switch to using an ordered map to enforce that test invariant. An additional benefit to using that data structure is that we can skip the sorting step when iterating the expected results. Before this change, the test had a failure rate of about 10% according to the flaky test dashboard. After this change, the tests passes reliably and succeeded 1024/1024 times when I looped it using dist-test: http://dist-test.cloudera.org/job?job_id=mpercy.1552437266.46341 Change-Id: I5ca9cdfa4620b1a9d4144de2289cc80ac37060aa Reviewed-on: http://gerrit.cloudera.org:8080/12735 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
Added a test to reproduce deadlock when fork()-ed but not yet exec()-ed subprocess tries to log on a failed execve() call. The test is disabled because it fails. The follow-up changelist will fix the problem and re-enable the test. Change-Id: Ie88073bb7fcccf24b4451fe3c4e4cd8bd471ca67 Reviewed-on: http://gerrit.cloudera.org:8080/12740 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Alexey Serbin <[email protected]>
For a newly fork()-ed, but not yet exec()-ed process, only async-signal safe functions should be called. At least, glog's LOG() should not be called since it involves acquiring a lock while flushing the message into the log sink. Otherwise, the child process can deadlock on attempt to write a log message if glog's mutex has been copied from the parent's address space in a locked state. Change-Id: Ic9dca4ca8b1a6d72c9fc818ea41109c80ace3e39 Reviewed-on: http://gerrit.cloudera.org:8080/12739 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Alexey Serbin <[email protected]>
Change-Id: Ib3acd021ae7e777c6f8cd6384247aea941b5c293 Reviewed-on: http://gerrit.cloudera.org:8080/12737 Tested-by: Andrew Wong <[email protected]> Reviewed-by: Mike Percy <[email protected]>
Unfamiliar with Maven Central, Subversion, and the end-to-end release process, even with the releasing docs, I was unsure of a few things. I've added some minor bits of info that might be helpful for future release managers. Change-Id: I3e6d6f7024cb32f8e0cd726e9d2652fddc688fe9 Reviewed-on: http://gerrit.cloudera.org:8080/12742 Tested-by: Andrew Wong <[email protected]> Reviewed-by: Mike Percy <[email protected]>
This patch makes the KuduScanner iterable to simplify the API for processing the RowResults. It also pushes the keep alive calls down into the KuduScannerIterator so that any user of the KuduClient can automatically leverage the API the same way the Spark integration does. The existing RowResultIterator implementation reuses the same RowResult object for all rows in the batch. This can cause unexpected behavior when storing RowResults from the scanner. This patch changes the RowResultIterator to instead create a RowResult object for every row, while still sharing the underlying data. I have replaced as many places where the scanner is iterated as I can find with this new implementation. The Spark implementations were replaced as well, but a special hasNext method with a callback was required. Change-Id: I3e4ac59e30d0562c0a381d5e304af1dcfdcf5a1a Reviewed-on: http://gerrit.cloudera.org:8080/12715 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins
Adds an example of using the Kudu binary Jar to execute integration tests against a KuduMiniCluster. Change-Id: Ife9103557b30b4105ef57ed36a34f3c93ba2dc6d Reviewed-on: http://gerrit.cloudera.org:8080/12318 Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]>
This patch modifies the way how the location assignment is run during the course of tablet server registration with master. Prior to this patch, TSManager's lock on the registry of all tablet servers was held while running the location assignment command. That might lead to stacking many TS registration heartbeats in flight while master's service threads are waiting to acquire the TSManager's lock. This patch introduces changes to run the location assignment command without holding the TSManager's lock. That makes the registration of tablet servers with masters more robust and makes the processing of concurrent heartbeats from tablet servers more efficient overall. Change-Id: I59cd5f6ed19c162a7c9f9a6527e78cab782b4539 Reviewed-on: http://gerrit.cloudera.org:8080/12749 Tested-by: Kudu Jenkins Reviewed-by: Will Berkeley <[email protected]> Reviewed-by: Adar Dembo <[email protected]>
TestRecvFailure wanted an interleaving of client thread and server thread similar to the following: Server: enters echo (recv -> write) loop Server: blocking recv Client: stop the server Client: blocking write to match server's blocking recv Server: blocking write Client: blocking recv to match server's blocking write Client: blocking recv Server: exits echo loop and closes connection because it was stopped Client: blocking recv fails because connection is closed A sleep was used in the client thread to try to ensure that the server reached the blocking recv call before the client shut the server down. However, under TSAN, occasionally the client was able to stop the server before the server entered the echo loop, so the server closed the connection before any data could be sent, failing the test. This patch moves the client call to stop the server to after the first recv-write succeeds, guaranteeing the server is in the echo loop. Without this patch, I saw 10/2000 runs fail in TSAN with 8 stress threads. 8 were due to KUDU-2576. With this patch, I saw 4/2000, all of which were due to a different issue that will be addressed in a follow-up. Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4 Reviewed-on: http://gerrit.cloudera.org:8080/12758 Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Will Berkeley <[email protected]>
Currently, SentryAuthzProvider::Authorize() performs authorization based on the rules that a privilege implies another when: 1. the authorizable from the former implies the authorizable from the latter, and 2. the action from the former implies the action from the latter, and 3. grant option from the former implies the grant option from the latter. We fetch from Sentry the privileges for the given user that might imply the given authorizable and action (via list_sentry_privileges_by_user_and_itsgroups Sentry API). We then proceed to check the implication rules on the returned actions. However, we rely on the API to enforce the implication rules of the returned authorizables. This ignored the fact that the API returns all privileges granted to the user that match the authorizable of each scope in the input authorizable hierarchy (not just the implied authorizables). So when Alice tries to create table 'default.b' (this requires 'CREATE ON DATABASE'), the Sentry API will return anything that matches: server == "server1" && (db == "default" || db == NULL) which may include 'ALL ON default.a'. Previously we would only take into consideration the fact that 'ALL' is returned, and proceed to incorrectly permit the create operation. This patch adds privilege scope validation to SentryAuthzProvider to ensure only authorizable with a higher scope on the hierarchy can imply authorizables with a lower scope on the hierarchy. Theoretically speaking, an alternative is to add authorizable scope filtering in Sentry server. However, such an API is inherently a Kudu-only API, which by necessity would filter out authorizable wildcard matching (that Sentry's default client policy validation supports). Moreover, client-side filtering can also help enabling more aggressive caching. Change-Id: I89437a04a4fa18e501d21c3abf5d66a2d22ce58a Reviewed-on: http://gerrit.cloudera.org:8080/12500 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Andrew Wong <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
Previously, the result of this call wasn't checked, which could lead to, for example, printing out nonsensical and misleading remote addresses, e.g. Network error: BlockingRecv error: failed to read from TLS socket (remote: 0.0.0.0:0): Connection reset by peer (error 104) This patch checks the result status and uses "unknown" for the peer address when it can't be obtained from getpeername: Network error: BlockingRecv error: failed to read from TLS socket (remote: unknown): Connection reset by peer (error 104) Change-Id: Ibd43f30ad11f192463d697f570a997b7e41c7088 Reviewed-on: http://gerrit.cloudera.org:8080/12760 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
Very rarely (~3/2000 times in TSAN with 8 stress threads), tls_socket-test will fail with an log like the following: I0314 19:20:54.118880 236 tls_socket-test.cc:109] server: negotiation complete I0314 19:20:54.119151 223 tls_socket-test.cc:109] client: negotiation complete I0314 19:21:04.127199 236 tls_socket-test.cc:165] server echoing 33406976 bytes /data/6/wdberkeley/kudu/src/kudu/security/tls_socket-test.cc:234: Failure Failed Bad status: Network error: BlockingRecv error: failed to read from TLS socket (remote: unknown): Connection reset by peer (error 104) It seems the following is happening: 1. The client and the echo server connect successfully. 2. The client sends its payload of 32MiB (33554432 bytes) in BlockingWrite. 3. The server, while looping in BlockingRecv receiving the payload and through some combination of resource saturation, unfavorable scheduling, and EINTR returns from recv, fails to read the whole payload before timing out. Notice the 10 second delay between the second and third messages (the timeout is 10s) and the number of bytes being echoed of < 32MiB. 4. The server terminates the connection because of the timeout, but this does not result in a failure on its side because the server was stopped by the client. 5. The client fails when it first tries to BlockingRecv from the closed connection, instead of on the second BlockingRecv as the test intends. This seems like a test-only issue- the time out on the server side seems like reasonable behavior. Since it's so rare, tripling the timeout should hopefully make the issue stop or at least make it much, much rarer. With a 10s timeout, 2000 runs on TSAN, and 8 stress threads, I saw 2-4 failures. With a 30s timeout, I see 0. Change-Id: Ibc615ea8f03a74f38b2bd6f3b4c140b3e435d4f3 Reviewed-on: http://gerrit.cloudera.org:8080/12761 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
Change-Id: Icce3c63c3facdbad6170f1105e1393297fa94ee1 Reviewed-on: http://gerrit.cloudera.org:8080/12728 Reviewed-by: Grant Henke <[email protected]> Tested-by: Andrew Wong <[email protected]> Tested-by: Grant Henke <[email protected]> (cherry picked from commit 93c2e5e) Reviewed-on: http://gerrit.cloudera.org:8080/12736 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
* change docs to say apidocs, cpp-client-api, and docs should be symlinks * index.md should be copied from the previous release as it has the highest chance to conform to the latest download page criteria, which is also linked now. * change the example announcement email to the one of 1.8.0 which includes release highlights as some of the other projects do as it can generate interest. Change-Id: Iacb123272fbfc4e678f898c37c807c20ceaf7d22 Reviewed-on: http://gerrit.cloudera.org:8080/12575 Reviewed-by: Andrew Wong <[email protected]> Tested-by: Andrew Wong <[email protected]>
This toString is shown in Spark UI SQL view, right now what users see in the Spark UI is just the object default toString ("Scan org.apache.kudu.spark.kudu.KuduRelation@8ad78a") which doesn't provide good information. With this change, Spark UI will show the right name and also the table which is being read (so users know what we are actually Scanning). Spark UI SQL will now show "Scan Kudu my_table" (similar to other sources, like Parquet/Hive, which shows "Scan Parquet my_table" ) instead, allowing much easier profiling on complex queries/DFs. So users can see that its a KUDU source and the name of the table. Change-Id: I58e11c36a0c7467baf43f99b9db41e2b4643bec8 Reviewed-on: http://gerrit.cloudera.org:8080/12773 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Grant Henke <[email protected]>
Adds a write option to repartition the data to match the target Kudu partitions. Additionally provides the option to sort while repartitioning. Repartitioning ensures that one task/client is only writing to a single tablet. This improves throughput by improving batching especially for tables with a large number of partitions. Additionally sorting before writing to Kudu reduces the amount of compactions needed and can improve sustained throughput. Change-Id: I8763615997bccc08901235841149fc3bacb321e7 Reviewed-on: http://gerrit.cloudera.org:8080/12484 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
Make the Gradle and Gradle Wrapper download URLs configurable from gradle.properties so they can be changed to a mirror if needed in an internal network behind a firewall. Change-Id: I9317d8a84ccf788892163de1d89999e7749fdb34 Reviewed-on: http://gerrit.cloudera.org:8080/12775 Reviewed-by: Grant Henke <[email protected]> Tested-by: Kudu Jenkins
Separated the generic counters into CacheMetrics to make it possible to use CacheMetrics for other cache types. This changelist also addresses KUDU-2749, separating BlockCacheMetrics and FileCacheMetrics. These changes are introduced to accommodate a few follow-up modifications introducing a TTL-based cache. Change-Id: Ic77b937e23c600763ec987a96ce98a47ad95ee18 Reviewed-on: http://gerrit.cloudera.org:8080/12776 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]>
This is a preparatory changelist to introduce FIFO-based cache in one of follow-up changelists. Change-Id: I963ef41a8097d96fd9530107495dc22066222de4 Reviewed-on: http://gerrit.cloudera.org:8080/12796 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins
Change-Id: Ia5eaa8c64edcb35a6936d8699396d45ef8457dea Reviewed-on: http://gerrit.cloudera.org:8080/12768 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
Added command to reset authz privileges cache into kudu CLI. Synopsis: kudu master authz_cache reset <all_master_addresses_in_cluster> The tool requires all masters' end-points to be specified in the command line. This is to keep things as consistent as possible, since resetting the authz privileges cache only on a subset of masters could lead to unexpected surprises upon master leadership change. It's possible to reset authz privileges cache at any subset of masters in a Kudu cluster adding --force flag. For example, to reset authz cache only at one master out of three in multi-master Kudu cluster, run kudu master authz_cache reset --force <master_address> Added corresponding tests as well. A scenario for a 'positive' case is not among the added ones: it will be a separate effort to bring in HMS+Sentry into the framework for testing kudu CLI tool. However, there is coverage for a 'positive' case for cache reset elsewhere. In particular, the scenarios of the SentryAuthzProviderCacheITest give appropriate coverage using the ResetAuthzCache RPC endpoint. Those scenarios test and AdminCliTest.TestAuthzResetCacheNotImplemented should provide enough coverage. Change-Id: If49f3ebdea22ea0df3aaec313b4db949dc834943 Reviewed-on: http://gerrit.cloudera.org:8080/13139 Reviewed-by: Andrew Wong <[email protected]> Reviewed-by: Hao Hao <[email protected]> Tested-by: Alexey Serbin <[email protected]>
Added a separate trace record for the creation of a new table in HMS catalog if the integration with HMS is enabled. This helps to differentiate HMS-related timings from the timings of writing the information on the newly created table and its tables into the system table. Change-Id: I4946c12fdccf3a661f1dfb9a6baac4428df3d209 Reviewed-on: http://gerrit.cloudera.org:8080/13247 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Adar Dembo <[email protected]>
It turns out that if a ConnectToCluster "RPC" results in an RPC to each master and each of those RPCs times out, the ConnectToCluster RPC ends up with a ServiceUnavailable status, not a TimedOut status. This updates the test to reflect this. Before this patch, I saw 58/200 failures; after it I saw zero. The 58/200 seems big compared to the frequency of this in precommits, etc, but anyway this fix ought to help regardless of how frequent the problem is. Change-Id: I7d1a429fe5749c31495e0d10f74a0cfb1e623310 Reviewed-on: http://gerrit.cloudera.org:8080/13202 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
This patch adds a client side version of DELETE IGNORE for use in the restore job. It does this the same way existing client side INSERT IGNORE support was added, by checking and filtering the error code client side. More optimal server side ignore functionality is tracked by KUDU-1563. Change-Id: Ib718b72a73b06d9b9dc809fde3c52a1d498ceb23 Reviewed-on: http://gerrit.cloudera.org:8080/13246 Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Grant Henke <[email protected]> Tested-by: Grant Henke <[email protected]>
…tions This changes the response for GetTableLocations to send a top-level list of TSInfoPB, and then each replica refers into that list by index. This avoids having to copy a TSInfoPB per replica, and instead bounds the number of serializations to the number of TS in the cluster. In the worst case, this may slightly regress performance if the number of replicas in the response is much fewer than the number of TS in the cluster, since there is some extra overhead of building the index mapping. In the best case (large number of replicas in the response) there is a big improvement. To handle wire compatibility, the server only uses the new response format in the case that the client indicates it expects it by passing a new boolean in the request. The client handles old servers by handling a response in either format. This patch only implements the new protocol in the C++ client. The Java client will be updated in a follow-up patch. Benchmarked a ~5x improvement with: table_locations-itest --gtest_filter=\*Bench\* \ --benchmark_num_tablets 300 --benchmark_num_threads 30 \ --benchmark_runtime_secs=10 with rwlock (previous patch): Count: 22179 Mean: 3901.19 Percentiles: 0% (min) = 52 25% = 3600 50% (med) = 3904 75% = 4192 95% = 4608 99% = 4896 99.9% = 5760 99.99% = 10048 100% (max) = 10661 with these improvements: Count: 109590 Mean: 569.492 Percentiles: 0% (min) = 31 25% = 512 50% (med) = 556 75% = 604 95% = 716 99% = 912 99.9% = 1232 99.99% = 2528 100% (max) = 6336 Change-Id: Ief65d0825e919f681b7ade6a856c103201f8305c Reviewed-on: http://gerrit.cloudera.org:8080/12615 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Will Berkeley <[email protected]>
When serving CreateTable request, run basic validation and authz-related checks before waiting for HMS notification listener to catch up. That's to fail fast if the incoming request doesn't pass basic sanity verification or isn't authorized. Change-Id: Ia407554fcd703d424180218f12f23556f54eefeb Reviewed-on: http://gerrit.cloudera.org:8080/13257 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Alexey Serbin <[email protected]>
Updated the signature of the Lookup() and Insert() methods of the Cache interface to return UniqueHandle. That helps in automating reference counting for the cached items. Also, Release() and Free() methods have been moved into the protected section. With UniqueHandle returned from Lookup() and Insert(), Release() and Free() are no longer required to be the public part of the Cache's interface. Change-Id: I31518cdfa269e6806aa5dda180956896d8cb3404 Reviewed-on: http://gerrit.cloudera.org:8080/13175 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Alexey Serbin <[email protected]>
Table column count is not obvious displayed when it has been altered several times, there maybe some holes in column IDs. This patch show the current column count of table. Change-Id: I8c6d471b6ce4f226b58a29e43f3a0c1c916669b2 Reviewed-on: http://gerrit.cloudera.org:8080/13243 Reviewed-by: Will Berkeley <[email protected]> Tested-by: Kudu Jenkins
This deflakes testForceIncrementalBackup by waiting 1 ms before setting the beforeMs. This is important because backup times are at ms granuarity even though snapshot times are at microsecond/logic granularity. Change-Id: Icaab7ff856e46860f6d4ee999b05afd783e01443 Reviewed-on: http://gerrit.cloudera.org:8080/13262 Tested-by: Grant Henke <[email protected]> Reviewed-by: Will Berkeley <[email protected]>
Change-Id: Ic5c8e0feaead7252c0e66ae4aeacac4932be1a10 Reviewed-on: http://gerrit.cloudera.org:8080/13258 Reviewed-by: Andrew Wong <[email protected]> Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <[email protected]>
acelyc111
force-pushed
the
my_privilege_mops
branch
from
May 8, 2019 02:13
336a1ec
to
15868cf
Compare
A stopgap solution for EasyJson in case when 'size_t' is not the same type as 'uint64_t'. As an alternative, it would be possible to handle that in a more generic way in easy_json.cc, but std::enable_if is not applicable there. The option of putting explicit instantiation of template member function after its specialization and avoiding a warning (-Winstantiation-after-specialization) might be a way to go. This is a follow-up to fd6155f. After the patch mentioned above, but prior to this fix, an attempt to build on macOS would fail with a linker error like below: Undefined symbols for architecture x86_64: "kudu::EasyJson& kudu::EasyJson::operator=<unsigned long>(unsigned long)", referenced from: kudu::master::MasterPathHandlers::HandleTablePage(kudu::WebCallbackRegistry::WebRequest const&, kudu::WebCallbackRegistry::WebResponse*) in master_path_handlers.cc.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) Change-Id: Ica806031be9af6cc31d9654d3eaa7ac98f616420 Reviewed-on: http://gerrit.cloudera.org:8080/13281 Tested-by: Kudu Jenkins Reviewed-by: Will Berkeley <[email protected]>
This is a small cleanup on the explicit instantiations of template methods in EasyJson class. Relying on the deduction of the template arguments by compiler, it's possible to reduce the clutter. This patch contains no functional changes. Change-Id: I7a735baaa2352d8572887e4bb764fb5382da2f92 Reviewed-on: http://gerrit.cloudera.org:8080/13280 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]>
acelyc111
force-pushed
the
my_privilege_mops
branch
2 times, most recently
from
May 9, 2019 08:03
5371770
to
6ec74df
Compare
acelyc111
force-pushed
the
my_privilege_mops
branch
from
May 12, 2019 01:17
6ec74df
to
5371770
Compare
…les and tablets This commit add a privilege thread pool in maintenance manager for privilege tables and tablets. In a Kudu cluster with thousands of tables and tablets, it's hard for a specified tablet's maintenance OPs to be launched when their scores are not the highest, even if the table the tablet belongs to is high priority for Kudu users. This patch allow user to specify privilege tables and tablets by gflags, these maintenance OPs of these privilege tables and tablets can be launched in a privilege thread pool, so they can have greater chance to be launched. Change-Id: I3ea3b73505157678a8fb551656123b64e6bfb304
acelyc111
force-pushed
the
my_privilege_mops
branch
from
May 14, 2019 09:38
5371770
to
84b07ba
Compare
acelyc111
pushed a commit
that referenced
this pull request
Jun 10, 2019
I noticed a funny failure in TestKuduScanner.testDiffScan: 21:55:41.681 [DEBUG - New I/O worker #13] (AsyncKuduScanner.java:556) Can not open scanner org.apache.kudu.client.NonRecoverableException: snapshot scan start timestamp is earlier than the ancient history mark. Consider increasing the value of the configuration parameter --tablet_history_max_age_sec. Snapshot timestamp: P: 0 usec, L: 0 Ancient History Mark: P: 1558907741677535 usec, L: 0 Physical time difference: -1558907741.678s The client logs the request it sent, which indeed has a 'snap_start_timestamp' of 0. This timestamp is supposed to come from incrementing the propagated timestamp received from the client after some random inserts and mutations. Fortunately, this test has nice logging: it logs the operations it did and the timestamp: 21:55:40.874 [INFO - Time-limited test] (TestKuduScanner.java:250) Before: {} 21:55:40.874 [INFO - Time-limited test] (TestKuduScanner.java:255) startHT: 0 'generateMutationOperations' could generate 0 mutations, causing this test to fail. This requires the random number generator to generate 0 3 times in a row when sampling uniformly from {0, 1, 2, 3, 4}, so the odds of failure due to this defect in the test were only 1/125. I fixed this by adjusting the number of initial inserts generated so it is always at least 1. I ran the test 1250 times with the fix and saw zero failures. Without the fix, I saw 9/1250 failures. Change-Id: Iadb46e5dae71724aa8ff88d04c40ef4eaf1ddd2a Reviewed-on: http://gerrit.cloudera.org:8080/13495 Tested-by: Will Berkeley <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
acelyc111
pushed a commit
that referenced
this pull request
Sep 21, 2019
The Messenger's lock is only intended to protect closing_, acceptor_pools_, and rpc_services_. This change adjusts its usage to reflect that: 1. There's no need to take the lock in the destructor. 2. It was held for longer than necessary in QueueInboundCall. 3. It wasn't needed at all in DumpConnections. The motivation for this was a TSAN lock inversion warning I saw in a precommit job, between the Messenger lock and glog's vmodule lock. The warning seems wrong (the vmodule lock is released after a VLOG statement ends), but one way to avoid it altogether is to not take the Messenger lock in its destructor. WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=5867) Cycle in lock order graph: M1870 (0x7b14000172f8) => M37857528269694952 (0x000000000000) => M1870 Mutex M37857528269694952 acquired here while holding mutex M1870 in main thread: #0 pthread_rwlock_wrlock /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:1352 (kudu+0x4a360f) #1 glog_internal_namespace_::Mutex::Lock() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/base/mutex.h:250:30 (libglog.so.0+0x1abe7) #2 glog_internal_namespace_::MutexLock::MutexLock(glog_internal_namespace_::Mutex*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/base/mutex.h:290 (libglog.so.0+0x1abe7) #3 google::InitVLOG3__(int**, int*, char const*, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/glog-0.3.5/src/vlog_is_on.cc:199 (libglog.so.0+0x1abe7) #4 kudu::rpc::Messenger::ShutdownInternal(kudu::rpc::Messenger::ShutdownMode) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:283:5 (libkrpc.so+0xab101) #5 kudu::rpc::Messenger::AllExternalReferencesDropped() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:249:3 (libkrpc.so+0xaaeb7) #6 std::__1::mem_fun_t<void, kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1120:17 (libkrpc.so+0xaf3a5) #7 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*, std::__1::mem_fun_t<void, kudu::rpc::Messenger>, std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkrpc.so+0xaf3a5) #8 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x56affe) #9 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x56affe) #10 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x56affe) #11 kudu::client::KuduClient::Data::~Data() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client-internal.cc:179:1 (libkudu_client.so+0x136260) #12 kudu::client::KuduClient::~KuduClient() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client.cc:394:3 (libkudu_client.so+0x1130cc) #13 std::__1::default_delete<kudu::client::KuduClient>::operator()(kudu::client::KuduClient*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 (libkudu_client.so+0x12be1b) #14 std::__1::__shared_ptr_pointer<kudu::client::KuduClient*, std::__1::default_delete<kudu::client::KuduClient>, std::__1::allocator<kudu::client::KuduClient> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkudu_client.so+0x12be1b) #15 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x550d1e) #16 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x550d1e) #17 std::__1::shared_ptr<kudu::client::KuduClient>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x550d1e) #18 kudu::tools::LeaderMasterProxy::~LeaderMasterProxy() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.h:233:7 (kudu+0x576cf9) #19 kudu::tools::(anonymous namespace)::ListMasters(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_master.cc:180:1 (kudu+0x572d0b) #20 _ZNSt3__18__invokeIRPFN4kudu6StatusERKNS1_5tools13RunnerContextEEJS6_EEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSA_DpOSB_ /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/type_traits:4482:1 (kudu+0x52e48b) #21 kudu::Status std::__1::__invoke_void_return_wrapper<kudu::Status>::__call<kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&>(kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__functional_base:318 (kudu+0x52e48b) #22 std::__1::__function::__func<kudu::Status (*)(kudu::tools::RunnerContext const&), std::__1::allocator<kudu::Status (*)(kudu::tools::RunnerContext const&)>, kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1562:12 (kudu+0x52e3bd) #23 std::__1::function<kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1916:12 (libkudu_tools_util.so+0x6c1c4) #24 kudu::tools::Action::Run(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action.cc:258:10 (libkudu_tools_util.so+0x6a8d4) #25 kudu::tools::DispatchCommand(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, kudu::tools::Action*, std::__1::deque<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:132:15 (kudu+0x5b42b6) #26 kudu::tools::RunTool(int, char**, bool) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:204:16 (kudu+0x5b5211) #27 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:265:10 (kudu+0x5b557e) Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message Mutex M1870 acquired here while holding mutex M37857528269694952 in thread T8: #0 AnnotateRWLockAcquired /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:271 (kudu+0x4d53ff) #1 kudu::rw_spinlock::lock() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/locks.h:112:5 (libkudu_client.so+0x177762) #2 kudu::percpu_rwlock::lock() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/locks.h:222:22 (libkudu_client.so+0x1776f2) #3 std::__1::lock_guard<kudu::percpu_rwlock>::lock_guard(kudu::percpu_rwlock&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__mutex_base:104:27 (libkrpc.so+0xac9c9) #4 kudu::rpc::Messenger::~Messenger() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:430 (libkrpc.so+0xac9c9) #5 std::__1::default_delete<kudu::rpc::Messenger>::operator()(kudu::rpc::Messenger*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:2285:5 (libkrpc.so+0xb246b) #6 std::__1::__shared_ptr_pointer<kudu::rpc::Messenger*, std::__1::default_delete<kudu::rpc::Messenger>, std::__1::allocator<kudu::rpc::Messenger> >::__on_zero_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3586 (libkrpc.so+0xb246b) #7 std::__1::__shared_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3490:9 (kudu+0x56affe) #8 std::__1::__shared_weak_count::__release_shared() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:3532 (kudu+0x56affe) #9 std::__1::shared_ptr<kudu::rpc::Messenger>::~shared_ptr() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4468 (kudu+0x56affe) #10 std::__1::shared_ptr<kudu::rpc::Messenger>::reset() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/memory:4603:5 (libkrpc.so+0xc0771) #11 kudu::rpc::ReactorThread::RunThread() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:499 (libkrpc.so+0xc0771) #12 boost::_mfi::mf0<void, kudu::rpc::ReactorThread>::operator()(kudu::rpc::ReactorThread*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkrpc.so+0xca669) #13 void boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> >::operator()<boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>&, boost::_bi::list0&, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkrpc.so+0xca5ba) #14 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >::operator()() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkrpc.so+0xca543) #15 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::rpc::ReactorThread>, boost::_bi::list1<boost::_bi::value<kudu::rpc::ReactorThread*> > >, void>::invoke(boost::detail::function::function_buffer&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkrpc.so+0xca339) #16 boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0xba0b1) #17 kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:657:3 (libkudu_util.so+0x1ee174) Thread T8 'rpc reactor-588' (tid=5886, running) created by main thread at: #0 pthread_create /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (kudu+0x490e36) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:601:15 (libkudu_util.so+0x1ed95b) #2 kudu::Status kudu::Thread::Create<void (kudu::rpc::ReactorThread::*)(), kudu::rpc::ReactorThread*>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (kudu::rpc::ReactorThread::* const&)(), kudu::rpc::ReactorThread* const&, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:164:12 (libkrpc.so+0xc5a15) #3 kudu::rpc::ReactorThread::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:185:10 (libkrpc.so+0xc026e) #4 kudu::rpc::Reactor::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/reactor.cc:759:18 (libkrpc.so+0xc4911) #5 kudu::rpc::Messenger::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:446:5 (libkrpc.so+0xaad72) #6 kudu::rpc::MessengerBuilder::Build(std::__1::shared_ptr<kudu::rpc::Messenger>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/rpc/messenger.cc:205:3 (libkrpc.so+0xaa7cd) #7 kudu::client::KuduClientBuilder::Build(std::__1::shared_ptr<kudu::client::KuduClient>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/client/client.cc:349:3 (libkudu_client.so+0x112561) #8 kudu::tools::LeaderMasterProxy::Init(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, kudu::MonoDelta const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.cc:786:30 (libkudu_tools_util.so+0x7740c) #9 kudu::tools::LeaderMasterProxy::Init(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_common.cc:792:10 (libkudu_tools_util.so+0x774d6) #10 kudu::tools::(anonymous namespace)::ListMasters(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action_master.cc:109:3 (kudu+0x572be3) #11 _ZNSt3__18__invokeIRPFN4kudu6StatusERKNS1_5tools13RunnerContextEEJS6_EEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSA_DpOSB_ /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/type_traits:4482:1 (kudu+0x52e48b) #12 kudu::Status std::__1::__invoke_void_return_wrapper<kudu::Status>::__call<kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&>(kudu::Status (*&)(kudu::tools::RunnerContext const&), kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/__functional_base:318 (kudu+0x52e48b) #13 std::__1::__function::__func<kudu::Status (*)(kudu::tools::RunnerContext const&), std::__1::allocator<kudu::Status (*)(kudu::tools::RunnerContext const&)>, kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1562:12 (kudu+0x52e3bd) #14 std::__1::function<kudu::Status (kudu::tools::RunnerContext const&)>::operator()(kudu::tools::RunnerContext const&) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/c++/v1/functional:1916:12 (libkudu_tools_util.so+0x6c1c4) #15 kudu::tools::Action::Run(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) const /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_action.cc:258:10 (libkudu_tools_util.so+0x6a8d4) #16 kudu::tools::DispatchCommand(std::__1::vector<kudu::tools::Mode*, std::__1::allocator<kudu::tools::Mode*> > const&, kudu::tools::Action*, std::__1::deque<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:132:15 (kudu+0x5b42b6) #17 kudu::tools::RunTool(int, char**, bool) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:204:16 (kudu+0x5b5211) #18 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/tools/tool_main.cc:265:10 (kudu+0x5b557e) Change-Id: I1fd93c06b14bc97a9ac4a37a5b6ca55ffa38f544 Reviewed-on: http://gerrit.cloudera.org:8080/14250 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <[email protected]> Reviewed-by: Alexey Serbin <[email protected]>
acelyc111
pushed a commit
that referenced
this pull request
Oct 12, 2019
The KernelStackWatchdog thread runs independently of the test thread, and by calling IsBeingDebugged, it winds up creating a trace event of its own. This is problematic given that trace-test sets up event callbacks to write to test fixture members, which go out of scope in between tests. The only solution I could find was to avoid starting the KernelStackWatchdog in trace-test by passing Thread::NO_STACK_WATCHDOG into thread creation. I also had to do this when creating the trace sampling thread, but given that's not on by default, I don't think it's so bad that we lose watchdog monitoring for it. To test, I ran trace-test under gdb and set a breakpoint in KernelStackWatchdog::RunThread. With the fix, gdb no longer hit that breakpoint. WARNING: ThreadSanitizer: data race (pid=4111) Read of size 8 at 0x0000015ba5c8 by thread T2: #0 kudu::TraceEventCallbackTest::Callback(long, char, unsigned char const*, char const*, unsigned long, int, char const* const*, unsigned char const*, unsigned long const*, unsigned char) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:463:5 (trace-test+0x4f107f) #1 kudu::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<kudu::debug::ConvertableToTraceFormat> const*, unsigned char) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event_impl.cc:1911:7 (libkudu_util.so+0x1208b3) #2 kudu::debug::TraceEventHandle trace_event_internal::AddTraceEventWithThreadIdAndTimestamp<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(char, unsigned char const*, char const*, unsigned long, int, long const&, unsigned char, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event.h:1314:10 (libkudu_util.so+0x146f58) #3 kudu::debug::TraceEventHandle trace_event_internal::AddTraceEvent<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(char, unsigned char const*, char const*, unsigned long, unsigned char, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/debug/trace_event.h:1330:10 (libkudu_util.so+0x146bef) #4 kudu::(anonymous namespace)::PosixEnv::NewSequentialFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::unique_ptr<kudu::SequentialFile, std::__1::default_delete<kudu::SequentialFile> >*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/env_posix.cc:1077:5 (libkudu_util.so+0x140905) #5 kudu::ReadFileToString(kudu::Env*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::faststring*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/env.cc:73:19 (libkudu_util.so+0x140054) #6 kudu::IsBeingDebugged() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/os-util.cc:154:14 (libkudu_util.so+0x1c9687) #7 kudu::KernelStackWatchdog::RunThread() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:141:9 (libkudu_util.so+0x17de59) #8 boost::_mfi::mf0<void, kudu::KernelStackWatchdog>::operator()(kudu::KernelStackWatchdog*) const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x17fd89) #9 void boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> >::operator()<boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>&, boost::_bi::list0&, int) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x17fcda) #10 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > >::operator()() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x17fc63) #11 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > >, void>::invoke(boost::detail::function::function_buffer&) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x17fa59) #12 boost::function0<void>::operator()() const /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkudu_util.so+0x1f1dd1) #13 kudu::Thread::SuperviseThread(void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:657:3 (libkudu_util.so+0x1ef3f4) Previous write of size 8 at 0x0000015ba5c8 by main thread: #0 kudu::TraceEventCallbackTest::SetUp() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:340:16 (trace-test+0x4f3a17) #1 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x552ef) #2 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x552ef) #3 testing::Test::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2470:3 (libgmock.so+0x343c1) #4 testing::TestInfo::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x3574c) #5 testing::TestCase::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x36226) #6 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x425fa) #7 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x5625f) #8 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x5625f) #9 testing::UnitTest::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x41ee2) #10 RUN_ALL_TESTS() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x351b) #11 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2cc6) Location is global 'kudu::TraceEventCallbackTest::s_instance' of size 8 at 0x0000015ba5c8 (trace-test+0x0000015ba5c8) Thread T2 'kernel-watcher-' (tid=4116, running) created by main thread at: #0 pthread_create /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:992 (trace-test+0x453c86) #1 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:601:15 (libkudu_util.so+0x1eebdb) #2 kudu::Status kudu::Thread::CreateWithFlags<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::KernelStackWatchdog>, boost::_bi::list1<boost::_bi::value<kudu::KernelStackWatchdog*> > > const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:152:12 (libkudu_util.so+0x17eed1) #3 kudu::KernelStackWatchdog::KernelStackWatchdog() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:71:3 (libkudu_util.so+0x17dc36) #4 Singleton<kudu::KernelStackWatchdog>::CreateInstance() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:124:18 (libkudu_util.so+0x17f664) #5 Singleton<kudu::KernelStackWatchdog>::Init() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:117:17 (libkudu_util.so+0x17f604) #6 GoogleOnceInternalInit(int*, void (*)(), void (*)(void*), void*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/once.cc:43:7 (libgutil.so+0x2d7b3) #7 GoogleOnceInit(GoogleOnceType*, void (*)()) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/once.h:53:5 (libkudu_util.so+0x113e4d) #8 Singleton<kudu::KernelStackWatchdog>::get() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/gutil/singleton.h:79:5 (libkudu_util.so+0x17f5b1) #9 kudu::KernelStackWatchdog::GetInstance() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:87:12 (libkudu_util.so+0x17f423) #10 kudu::KernelStackWatchdog::CreateAndRegisterTLS() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.cc:219:3 (libkudu_util.so+0x17ed17) #11 kudu::KernelStackWatchdog::GetTLS() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:170:7 (libkudu_util.so+0x1f2901) #12 kudu::ScopedWatchKernelStack::ScopedWatchKernelStack(char const*, int) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/kernel_stack_watchdog.h:248:13 (libkudu_util.so+0x1f1b70) #13 kudu::Thread::StartThread(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::function<void ()> const&, unsigned long, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.cc:600:5 (libkudu_util.so+0x1eebaf) #14 kudu::Status kudu::Thread::Create<void (*)(int, int), int, int>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, void (* const&)(int, int), int const&, int const&, scoped_refptr<kudu::Thread>*) /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/thread.h:170:12 (trace-test+0x4f03ef) #15 kudu::TraceTest_TestChromeTracing_Test::TestBody() /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:172:5 (trace-test+0x4e750b) #16 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x552ef) #17 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x552ef) #18 testing::Test::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2474:5 (libgmock.so+0x344b8) #19 testing::TestInfo::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2656:11 (libgmock.so+0x3574c) #20 testing::TestCase::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2774:28 (libgmock.so+0x36226) #21 testing::internal::UnitTestImpl::RunAllTests() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4649:43 (libgmock.so+0x425fa) #22 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2402:10 (libgmock.so+0x5625f) #23 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:2438 (libgmock.so+0x5625f) #24 testing::UnitTest::Run() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/googletest-release-1.8.0/googletest/src/gtest.cc:4257:10 (libgmock.so+0x41ee2) #25 RUN_ALL_TESTS() /home/jenkins-slave/workspace/kudu-master/2/thirdparty/installed/tsan/include/gtest/gtest.h:2233:46 (libkudu_test_main.so+0x351b) #26 main /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/test_main.cc:106:13 (libkudu_test_main.so+0x2cc6) Change-Id: I5dc974be22ff101dcb8091be1fe692ab61376bc2 SUMMARY: ThreadSanitizer: data race /home/jenkins-slave/workspace/kudu-master/2/src/kudu/util/trace-test.cc:463:5 in kudu::TraceEventCallbackTest::Callback(long, char, unsigned char const*, char const*, unsigned long, int, char const* const*, unsigned char const*, unsigned long const*, unsigned char) Reviewed-on: http://gerrit.cloudera.org:8080/14256 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Adar Dembo <[email protected]>
acelyc111
pushed a commit
that referenced
this pull request
Jan 12, 2020
Previously the FunctionGaugeDetacher in a Tablet could be destructed after some of the state in the Tablet had already been destructed. Specifically, the metrics would be detached after destructing 'last_rw_time_lock_' et al. This led to some TSAN warnings[1]. Without this patch, RaftConsensusNumLeadersMetricTest TestNumLeadersMetric, which deletes tablets and then immediately fetches metrics, would fail 29/1000 times in TSAN mode. With it, it only failed 7/1000, those due to an unrelated timeout (not addressed in this patch). I went ahead and updated other misordered instances of FunctionGaugeDetachers. As far as I can tell, there aren't any other affected tests. [1]: ================== WARNING: ThreadSanitizer: data race (pid=30286) Write of size 1 at 0x7b58000521c8 by thread T156 (mutexes: write M3961, write M2995): #0 AnnotateRWLockDestroy /data/8/awong/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cc:264 (kudu-tserver+0x4ab83e) #1 kudu::rw_spinlock::~rw_spinlock() ../src/kudu/util/locks.h:89:5 (libtserver.so+0x262d4b) #2 kudu::tablet::Tablet::~Tablet() ../src/kudu/tablet/tablet.cc:270:1 (libtablet.so+0x174886) #3 std::__1::default_delete<kudu::tablet::Tablet>::operator()(kudu::tablet::Tablet*) const /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2338:5 (libtablet.so+0x233026) #4 std::__1::__shared_ptr_pointer<kudu::tablet::Tablet*, std::__1::default_delete<kudu::tablet::Tablet>, std::__1::allocator<kudu::tablet::Tablet> >::__on_zero_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3511:5 (libtablet.so+0x2343bf) #5 std::__1::__shared_count::__release_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3415:9 (libtserver.so+0x125c9c) #6 std::__1::__shared_weak_count::__release_shared() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:3457:27 (libtserver.so+0x125c12) #7 std::__1::shared_ptr<kudu::tablet::Tablet>::~shared_ptr() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4393:19 (libtserver.so+0x1478b7) #8 std::__1::shared_ptr<kudu::tablet::Tablet>::reset() /data/8/awong/kudu/thirdparty/installed/tsan/include/c++/v1/memory:4528:5 (libtablet.so+0x2622c6) #9 kudu::tablet::TabletReplica::Stop() ../src/kudu/tablet/tablet_replica.cc:318:13 (libtablet.so+0x2578bd) #10 kudu::tserver::TSTabletManager::DeleteTablet(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, kudu::tablet::TabletDataState, boost::optional<long> const&, kudu::tserver::TabletServerErrorPB_Code*) ../src/kudu/tserver/ts_tablet_manager.cc:972:12 (libtserver.so+0x25a568) #11 kudu::tserver::DeleteTabletRunnable::Run() ../src/kudu/tserver/ts_tablet_manager.cc:882:36 (libtserver.so+0x27caa8) #12 kudu::ThreadPool::DispatchThread() ../src/kudu/util/threadpool.cc:685:22 (libkudu_util.so+0x40e2f6) #13 boost::_mfi::mf0<void, kudu::ThreadPool>::operator()(kudu::ThreadPool*) const ../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libkudu_util.so+0x4357bc) #14 void boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> >::operator()<boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::ThreadPool>&, boost::_bi::list0&, int) ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libkudu_util.so+0x43566d) #15 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >::operator()() ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libkudu_util.so+0x4355b1) #16 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::ThreadPool>, boost::_bi::list1<boost::_bi::value<kudu::ThreadPool*> > >, void>::invoke(boost::detail::function::function_buffer&) ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libkudu_util.so+0x4351e0) #17 boost::function0<void>::operator()() const ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0x13b9c1) #18 kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:675:3 (libkudu_util.so+0x3eed69) Previous atomic write of size 4 at 0x7b58000521c8 by thread T12 (mutexes: write M261907054171829296): #0 __tsan_atomic32_compare_exchange_strong /data/8/awong/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:780 (kudu-tserver+0x4b475e) #1 base::subtle::Release_CompareAndSwap(int volatile*, int, int) ../src/kudu/gutil/atomicops-internals-tsan.h:93:3 (libtablet.so+0x1d4842) #2 kudu::rw_semaphore::unlock_shared() ../src/kudu/util/rw_semaphore.h:91:19 (libtablet.so+0x1d4766) #3 kudu::rw_spinlock::unlock_shared() ../src/kudu/util/locks.h:99:10 (libtablet.so+0x1d45da) #4 kudu::shared_lock<kudu::rw_spinlock>::~shared_lock() ../src/kudu/util/locks.h:283:11 (libtablet.so+0x1a3bf5) #5 kudu::tablet::Tablet::LastReadElapsedSeconds() const ../src/kudu/tablet/tablet.cc:1989:1 (libtablet.so+0x17457c) #6 kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>::Run(kudu::tablet::Tablet const*) ../src/kudu/gutil/bind_internal.h:155:12 (libtablet.so+0x1e5c1c) #7 kudu::internal::InvokeHelper<false, unsigned long, kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, void (kudu::tablet::Tablet*)>::MakeItSo(kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, kudu::tablet::Tablet*) ../src/kudu/gutil/bind_internal.h:865:21 (libtablet.so+0x1e5a8c) #8 kudu::internal::Invoker<1, kudu::internal::BindState<kudu::internal::RunnableAdapter<unsigned long (kudu::tablet::Tablet::*)() const>, unsigned long (kudu::tablet::Tablet const*), void (kudu::internal::UnretainedWrapper<kudu::tablet::Tablet>)>, unsigned long (kudu::tablet::Tablet const*)>::Run(kudu::internal::BindStateBase*) ../src/kudu/gutil/bind_internal.h:1065:12 (libtablet.so+0x1e59a7) #9 kudu::Callback<unsigned long ()>::Run() const ../src/kudu/gutil/callback.h:396:12 (libtserver.so+0x15acfb) #10 kudu::FunctionGauge<unsigned long>::value() const ../src/kudu/util/metrics.h:1239:22 (libtserver.so+0x15ab1f) #11 kudu::FunctionGauge<unsigned long>::WriteValue(kudu::JsonWriter*) const ../src/kudu/util/metrics.h:1243:19 (libtserver.so+0x15a7bc) #12 kudu::Gauge::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:716:3 (libkudu_util.so+0x31da17) #13 void kudu::WriteMetricsToJson<std::__1::unordered_map<kudu::MetricPrototype const*, scoped_refptr<kudu::Metric>, std::__1::hash<kudu::MetricPrototype const*>, std::__1::equal_to<kudu::MetricPrototype const*>, std::__1::allocator<std::__1::pair<kudu::MetricPrototype const* const, scoped_refptr<kudu::Metric> > > > >(kudu::JsonWriter*, std::__1::unordered_map<kudu::MetricPrototype const*, scoped_refptr<kudu::Metric>, std::__1::hash<kudu::MetricPrototype const*>, std::__1::equal_to<kudu::MetricPrototype const*>, std::__1::allocator<std::__1::pair<kudu::MetricPrototype const* const, scoped_refptr<kudu::Metric> > > > const&, kudu::MetricJsonOptions const&) ../src/kudu/util/metrics.cc:64:7 (libkudu_util.so+0x321ff2) #14 kudu::MetricEntity::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:388:3 (libkudu_util.so+0x31a9f4) #15 kudu::MetricRegistry::WriteAsJson(kudu::JsonWriter*, kudu::MetricJsonOptions const&) const ../src/kudu/util/metrics.cc:517:7 (libkudu_util.so+0x31c34a) #16 kudu::server::DiagnosticsLog::LogMetrics() ../src/kudu/server/diagnostics_log.cc:345:3 (libserver_process.so+0xac51a) #17 kudu::server::DiagnosticsLog::RunThread() ../src/kudu/server/diagnostics_log.cc:225:7 (libserver_process.so+0xabc50) #18 boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>::operator()(kudu::server::DiagnosticsLog*) const ../thirdparty/installed/tsan/include/boost/bind/mem_fn_template.hpp:49:29 (libserver_process.so+0xb88ac) #19 void boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> >::operator()<boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>&, boost::_bi::list0&, int) ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:259:9 (libserver_process.so+0xb875d) #20 boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> > >::operator()() ../thirdparty/installed/tsan/include/boost/bind/bind.hpp:1222:16 (libserver_process.so+0xb8671) #21 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, kudu::server::DiagnosticsLog>, boost::_bi::list1<boost::_bi::value<kudu::server::DiagnosticsLog*> > >, void>::invoke(boost::detail::function::function_buffer&) ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:159:11 (libserver_process.so+0xb82a0) #22 boost::function0<void>::operator()() const ../thirdparty/installed/tsan/include/boost/function/function_template.hpp:770:14 (libkrpc.so+0x13b9c1) #23 kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:675:3 (libkudu_util.so+0x3eed69) Change-Id: Ib32120178a68b5389e167643e9bb8b89f8c625b9 Reviewed-on: http://gerrit.cloudera.org:8080/14979 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins
acelyc111
pushed a commit
that referenced
this pull request
Jun 10, 2021
This patch fixes an issue resulting in a SIGABRT crash in Kudu client when working with stale scan tokens which contain information about tablet locations for a table (see KUDU-1802) whose range partition was dropped. The patch also adds a test scenario reproducing the crash; now it passes and can catch future regressions. This patch is a follow-up to d23ee5d. Prior the change in src/kudu/client/meta_cache.cc was back-ported from Kudu 1.14 as part of this fix, the scenario crashed with SIGABRT when running with the stack trace similar to the following (this one below was captured on macOS): * frame #0: 0x00007fff7035833a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff70414e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x00007fff702df808 libsystem_c.dylib`abort + 120 frame #3: 0x000000010ca1a259 libglog.0.dylib`google::logging_fail() at logging.cc:1474:3 frame #4: 0x000000010ca19121 libglog.0.dylib`google::LogMessage::SendToLog() [inlined] google::LogMessage::Fail() at logging.cc:1488:3 frame #5: 0x000000010ca1911b libglog.0.dylib`google::LogMessage::SendToLog() at logging.cc:1442 frame #6: 0x000000010ca19815 libglog.0.dylib`google::LogMessage::Flush() at logging.cc:1311:5 frame #7: 0x000000010ca1d76f libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2023:5 frame #8: 0x000000010ca1a5f9 libglog.0.dylib`google::LogMessageFatal::~LogMessageFatal() at logging.cc:2022:37 frame #9: 0x0000000103e365e3 libkudu_client.dylib`std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > >::mapped_type& FindOrDie<std::__1::map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, kudu::client::internal::MetaCacheEntry, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, kudu::client::internal::MetaCacheEntry> > > >() at map-util.h:109:3 frame #10: 0x0000000103e34cbb libkudu_client.dylib`kudu::client::internal::MetaCache::ProcessGetTableLocationsResponse() at meta_cache.cc:943:23 frame #11: 0x0000000103e86166 libkudu_client.dylib`kudu::client::KuduScanToken::Data::PBIntoScanner() at scan_token-internal.cc:192:35 frame #12: 0x0000000103e88051 libkudu_client.dylib`kudu::client::KuduScanToken::Data::DeserializeIntoScanner() at scan_token-internal.cc:111:10 frame #13: 0x0000000103d55d3c libkudu_client.dylib`kudu::client::KuduScanToken::DeserializeIntoScanner() at client.cc:1879:10 Change-Id: I5b8370290c13b1e496f461ed5bc2e0193bdf4b19 Reviewed-on: http://gerrit.cloudera.org:8080/17152 Tested-by: Alexey Serbin <[email protected]> Reviewed-by: Andrew Wong <[email protected]> (cherry picked from commit 7c8dca6) Conflicts: src/kudu/client/meta_cache.cc src/kudu/client/scan_token-test.cc Reviewed-on: http://gerrit.cloudera.org:8080/17158 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke <[email protected]>
acelyc111
pushed a commit
that referenced
this pull request
Jun 10, 2021
It previously possible for a CommitTask to be destructed before completing the loop of scheduling all asynchronous tasks. This led to a race as seen below: WARNING: ThreadSanitizer: data race (pid=32435) Write of size 8 at 0x7b1c000ce2d8 by thread T105 (mutexes: write M424881254664896540): #0 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::__destruct_at_end(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:427:12 (txn_commit-itest+0x576cb1) #1 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::clear() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:369:29 (txn_commit-itest+0x5770d1) #2 std::__1::__vector_base<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~__vector_base() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:463:9 (txn_commit-itest+0x59caf9) #3 std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::~vector() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:555:5 (libtransactions.so+0x8c2a0) #4 kudu::transactions::CommitTasks::~CommitTasks() ../src/kudu/transactions/txn_status_manager.h:177:26 (libtransactions.so+0xcce8b) #5 kudu::RefCountedThreadSafe<kudu::transactions::CommitTasks, kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks> >::DeleteInternal(kudu::transactions::CommitTasks const*) ../src/kudu/gutil/ref_counted.h:153:44 (libtransactions.so+0xcce1a) #6 kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks>::Destruct(kudu::transactions::CommitTasks const*) ../src/kudu/gutil/ref_counted.h:116:5 (libtransactions.so+0xccdc8) #7 kudu::RefCountedThreadSafe<kudu::transactions::CommitTasks, kudu::DefaultRefCountedThreadSafeTraits<kudu::transactions::CommitTasks> >::Release() const ../src/kudu/gutil/ref_counted.h:144:7 (libtransactions.so+0xccd70) #8 scoped_refptr<kudu::transactions::CommitTasks>::~scoped_refptr() ../src/kudu/gutil/ref_counted.h:266:13 (libtransactions.so+0xbf785) #9 std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >::~pair() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/utility:315:29 (libtransactions.so+0xc7652) #10 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::__destroy<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > >(std::__1::integral_constant<bool, false>, std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> >&, std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:1747:23 (libtransactions.so+0xc7614) #11 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::destroy<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > >(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> >&, std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> >*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:1595:14 (libtransactions.so+0xc7518) #12 std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > >::operator()(std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__hash_table:844:13 (libtransactions.so+0xc740d) #13 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > > >::reset(std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2593:7 (libtransactions.so+0xc72e0) #14 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*> > > >::~unique_ptr() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/memory:2547:19 (libtransactions.so+0xc6cbc) #15 std::__1::__hash_table<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::__unordered_map_hasher<long, std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::hash<long>, true>, std::__1::__unordered_map_equal<long, std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, std::__1::equal_to<long>, true>, std::__1::allocator<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> > > >::erase(std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*>) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__hash_table:2598:5 (libtransactions.so+0xc676e) #16 std::__1::unordered_map<long, scoped_refptr<kudu::transactions::CommitTasks>, std::__1::hash<long>, std::__1::equal_to<long>, std::__1::allocator<std::__1::pair<long const, scoped_refptr<kudu::transactions::CommitTasks> > > >::erase(std::__1::__hash_map_iterator<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<long, scoped_refptr<kudu::transactions::CommitTasks> >, void*>*> >) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/unordered_map:1193:57 (libtransactions.so+0xc5b40) #17 kudu::transactions::TxnStatusManager::RemoveCommitTask(long, kudu::transactions::CommitTasks const*) ../src/kudu/transactions/txn_status_manager.h:433:26 (libtransactions.so+0xbefc6) #18 kudu::transactions::CommitTasks::IsShuttingDownCleanupIfLastOp() ../src/kudu/transactions/txn_status_manager.cc:181:28 (libtransactions.so+0x97dea) #19 kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2::operator()(kudu::Status const&) const ../src/kudu/transactions/txn_status_manager.cc:319:9 (libtransactions.so+0xaefd6) #20 decltype(std::__1::forward<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&>(fp)(std::__1::forward<kudu::Status const&>(fp0))) std::__1::__invoke<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&>(kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/type_traits:3530:1 (libtransactions.so+0xaeefd) #21 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&>(kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2&, kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__functional_base:348:9 (libtransactions.so+0xaee3d) #22 std::__1::__function::__alloc_func<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2, std::__1::allocator<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2>, void (kudu::Status const&)>::operator()(kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1533:16 (libtransactions.so+0xaedbd) #23 std::__1::__function::__func<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2, std::__1::allocator<kudu::transactions::CommitTasks::AbortTxnAsyncTask(int)::$_2>, void (kudu::Status const&)>::operator()(kudu::Status const&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1707:12 (libtransactions.so+0xad06c) #24 std::__1::__function::__value_func<void (kudu::Status const&)>::operator()(kudu::Status const&) const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1860:16 (libmaster.so+0x32ca24) #25 std::__1::function<void (kudu::Status const&)>::operator()(kudu::Status const&) const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:2419:12 (libmaster.so+0x31d80b) #26 kudu::transactions::ParticipantRpc::Finish(kudu::Status const&) ../src/kudu/transactions/participant_rpc.cc:227:3 (libtransactions.so+0x7f3e7) ... Previous read of size 8 at 0x7b1c000ce2d8 by thread T186 (mutexes: read M322424363142217872): #0 std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >::size() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/vector:656:46 (libtransactions.so+0x8d2f9) #1 kudu::transactions::CommitTasks::AbortTxnAsync() ../src/kudu/transactions/txn_status_manager.cc:365:42 (libtransactions.so+0x989d2) #2 kudu::transactions::TxnStatusManager::BeginAbortTransaction(long, boost::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&, kudu::tserver::TabletServerErrorPB*) ../src/kudu/transactions/txn_status_manager.cc:1219:25 (libtransactions.so+0xa3cc6) #3 kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3::operator()() const ../src/kudu/transactions/txn_status_manager.cc:378:3 (libtransactions.so+0xb245d) #4 decltype(std::__1::forward<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(fp)()) std::__1::__invoke<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/type_traits:3530:1 (libtransactions.so+0xb2180) #5 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&>(kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3&) /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/__functional_base:348:9 (libtransactions.so+0xb20e0) #6 std::__1::__function::__alloc_func<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3, std::__1::allocator<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3>, void ()>::operator()() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1533:16 (libtransactions.so+0xb2080) #7 std::__1::__function::__func<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3, std::__1::allocator<kudu::transactions::CommitTasks::ScheduleBeginAbortTxnWrite()::$_3>, void ()>::operator()() /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1707:12 (libtransactions.so+0xb042f) #8 std::__1::__function::__value_func<void ()>::operator()() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:1860:16 (libtserver_test_util.so+0x58396) #9 std::__1::function<void ()>::operator()() const /data/3/awong/Repositories/kudu/thirdparty/installed/tsan/include/c++/v1/functional:2419:12 (libtserver_test_util.so+0x58098) ... This patch fixes this by caching the size before iterating. Prior to this patch, the test failed in TSAN mode 3/100 times. With this patch, it passed 1000/1000 times. Change-Id: Ic974354b300f2a6c1b04505e740249273f33b80c Reviewed-on: http://gerrit.cloudera.org:8080/17283 Reviewed-by: Alexey Serbin <[email protected]> Tested-by: Kudu Jenkins
acelyc111
pushed a commit
that referenced
this pull request
Jun 10, 2021
We recently added a few test cases where the client negotiation fails with this error (which is what we expect): GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Server kudu/[email protected] not found in Kerberos database) Apparently SASL doesn't allocate enough memory for this error message in some cases which causes these tests to be flaky with a ~20% error rate with AddressSanitizer enabled: ==9298==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60e00003e2d6 at pc 0x000000530bf4 bp 0x7f8eb50ad0f0 sp 0x7f8eb50ac8a0 READ of size 151 at 0x60e00003e2d6 thread T88 (client-negotiat) #0 0x530bf3 in __interceptor_strlen.part.35 sanitizer_common/sanitizer_common_interceptors.inc:365:5 #1 0x7f8ee6ad9ee8 in std::basic_ostream<char, std::char_traits<char> >& std::operator<<<std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0x113ee8) #2 0x7f8eeb7c9c9b in kudu::rpc::SaslLogCallback(void*, int, char const*) ../src/kudu/rpc/sasl_common.cc:102:29 #3 0x7f8eeb30241c in sasl_seterror (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x1441c) #4 0x7f8edd8f143d in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x243d) #5 0x7f8edd8f2452 in _init (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/sasl2/libgssapiv2.so+0x3452) #6 0x7f8eeb2f7844 in sasl_client_step (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9844) #7 0x7f8eeb2f7bc5 in sasl_client_start (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0x9bc5) #8 0x7f8eeb678679 in kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1::operator()() const ../src/kudu/rpc/client_negotiation.cc:594:14 #9 0x7f8eeb67831c in std::_Function_handler<int (), kudu::rpc::ClientNegotiation::SendSaslInitiate()::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:282:9 #10 0x7f8ef3b28220 in std::function<int ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #11 0x7f8eeb7c5840 in kudu::rpc::WrapSaslCall(sasl_conn*, std::function<int ()> const&, char const*) ../src/kudu/rpc/sasl_common.cc:341:12 #12 0x7f8eeb67363b in kudu::rpc::ClientNegotiation::SendSaslInitiate() ../src/kudu/rpc/client_negotiation.cc:593:20 #13 0x7f8eeb66e0c7 in kudu::rpc::ClientNegotiation::AuthenticateBySasl(kudu::faststring*, std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/client_negotiation.cc:523:14 #14 0x7f8eeb667b99 in kudu::rpc::ClientNegotiation::Negotiate(std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/client_negotiation.cc:220:7 #15 0x7f8eeb715027 in kudu::rpc::DoClientNegotiation(kudu::rpc::Connection*, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime, std::unique_ptr<kudu::rpc::ErrorStatusPB, std::default_delete<kudu::rpc::ErrorStatusPB> >*) ../src/kudu/rpc/negotiation.cc:218:3 #16 0x7f8eeb712095 in kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection> const&, kudu::TriStateFlag, kudu::TriStateFlag, kudu::MonoTime) ../src/kudu/rpc/negotiation.cc:295:9 #17 0x7f8eeb74d4ad in kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr<kudu::rpc::Connection> const&)::$_1::operator()() const ../src/kudu/rpc/reactor.cc:614:3 #18 0x7f8eeb74d06c in std::_Function_handler<void (), kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr<kudu::rpc::Connection> const&)::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #19 0x71b760 in std::function<void ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #20 0x7f8ee917d03d in kudu::ThreadPool::DispatchThread() ../src/kudu/util/threadpool.cc:669:7 #21 0x7f8ee91817dc in kudu::ThreadPool::CreateThread()::$_1::operator()() const ../src/kudu/util/threadpool.cc:742:48 #22 0x7f8ee918162c in std::_Function_handler<void (), kudu::ThreadPool::CreateThread()::$_1>::_M_invoke(std::_Any_data const&) ../../../include/c++/8/bits/std_function.h:297:2 #23 0x71b760 in std::function<void ()>::operator()() const ../../../include/c++/8/bits/std_function.h:687:14 #24 0x7f8ee915660a in kudu::Thread::SuperviseThread(void*) ../src/kudu/util/thread.cc:674:3 #25 0x7f8eec6106da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da) #26 0x7f8ee64de71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e) 0x60e00003e2d6 is located 0 bytes to the right of 150-byte region [0x60e00003e240,0x60e00003e2d6) allocated by thread T88 (client-negotiat) here: #0 0x5a4bb8 in malloc /home/abukor/src/kudu/thirdparty/src/llvm-9.0.0.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3 #1 0x7f8eeb2fa1df in _buf_alloc (/tmp/dist-test-taskexUtyr/build/dist-test-system-libs/libsasl2.so.3+0xc1df) This patch suppresses address sanitizer errors in sasl_seterror(). Change-Id: Ie66e1f14c9750b13676c7e28e6439057a5e73341 Reviewed-on: http://gerrit.cloudera.org:8080/17317 Tested-by: Attila Bukor <[email protected]> Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Grant Henke <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…les and tablets
This commit add a privilege thread pool in maintenance manager for
privilege tables and tablets.
In a Kudu cluster with thousands of tables and tablets, it's hard for
a specified tablet's maintenance OPs to be launched when their scores
are not the highest, even if the table the tablet belongs to is high
priority for Kudu users. This patch allow user to specify privilege
tables and tablets by gflags, these maintenance OPs of these privilege
tables and tablets can be launched in a privilege thread pool, so they
can have greater chance to be launched.