-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: 404 table not found even when the table exists #975
Comments
There is a delay between creating a table and its availability. Test 1
Test2
|
I think you answered your own question. Creating a table is an eventually consistent operation: it may not appear to exist everywhere until after some time has passed. In the screenshots above, the second request, which happens 2 seconds after the first, may end up at a different location where the creation hasn't yet propagated. Have you considered Uploader.TableTemplateSuffix as a way to create the table automatically? |
I started with TableTemplateSuffix but I get the impression that it can't use a table from a dataset as a template for a new table in a different dataset. So, if I create a new dataset, there is no way to base my new table on anything and the only option I am left with is to create it from a schema. Please correct me if I am wrong here. I am creating datasets and tables dynamically based on the fields of the event. If the insertAll request fails, I create either both dataset and table or just table based on the error response. (I parse the error to know what does not exist) I am reading from Kinesis steams and plan to use Lambda for streaming into BQ. Do you know what strategy Apache Beam uses to get around these limitations of dynamically creating datasets/tables and streaming data into them without waiting for 10 minutes and retrying? |
Yes, BigQuery's streaming inserts definitely exhibit eventually consistent behaviors when you reactively create a table in response to a streaming notFound response. This is covered in the streaming section of the BigQuery troubleshooting docs. However, its not clear to me from the comments whether you're dealing with constantly evolving schemas, or if there's a known set of messages/schemas and you're just inspecting and redirecting them based on message structure. If its the latter case, having the tables created beforehand should likely suffice, as streaming to a table that already exists should encounter none of the eventual consistency issues you're observing with your current approach. However, if the schemas are truly dynamic, one of the things you might consider is leveraging BigQuery's ability to use complex types, particularly arrays of structs. Consider the following schema:
Within it, you have a set of known top level fields (event_time, sender, destination, etc), but "metrics" is an array of key/value structs, which can receive many unrelated values and does not require schema evolution. You can access/manipulate filter the arrays as needed using Standard SQL, which has its own topic in the query documentation. |
Closing, but please reopen if you have more questions or concerns. |
@jba Main concern is that nowhere in documentation it mentions anything like "table will eventually be there" and no recommended solutions offered either. Everyone has to figure this out on their own which is wasteful. I spent a few hours yesterday thinking I was going mad. It would be nice if user experience was better in this regard. |
Hello, I've got the same issue. Due to some reasons I can't use There would be really useful some kind of |
Facing this issue again, @jba can we reopen this issue again and maybe you can consider any technique that allows to understand if table and dataset are created succesfully and can be used without getting 404, getting metadata of that objects doesn't help, if it returned successfully I still getting 404 when inserting or reading data |
@shollyman sorry for bothering, but maybe you can help with this? |
Guys, how are you? This problems is factible for me too. My solution is addition a delay time for insert in new table. my code is in .net5 but the logic is aplicated for your problem.
|
Same issue for me, but with a different client (nodejs). I believe it's not specific to the go platform, it's a server-side issue. |
Yup it's available everywhere, including the nodejs client, |
For nodejs you can re run the script with child_process after setTimeout for 2 minutes. |
Hey, any update on this? or a more "consistent" workaround? I'm using the nodejs client and I'm experiencing the same issue. I've tried adding up to 5 minutes of timeout between table creation and data insert, but it's still very unreliable. |
Whenever I create the dataset and remove it due to some errors and then create it and add some table to it, I get this error. But when I just create a new dataset with a different name, everything works without a problem. |
experiencing this issue in python after creating a new table. |
I had to add a small delay, e.g., |
When I call If an existence check passes then 404 with the message "Table X not found" is not the right message to return from insert since it clearly does exist. This kind of error for something I want to exist implies that the action I should take to fix it is to create it but in this case I shouldn't because it already exists. Instead I should be told to wait. I get "eventually consistent" but I'm going to need the API to be consistent on its definition of existence. I'm disappointed that the best guidance appears to be gambling on a sufficiently large sleep. I'm probably going to put a retry on the insert with an exponential wait. It would be nice if this kind of logic was built into the SDK or if there was an availability check since existence appears to not provide that assurance. |
There's effectively nothing we can do here, as this is a classic eventual consistency issue. We document the behavior, but I'll try to explain it in a bit more detail the "why". When you intermingle operations that change table metadata and stream data into a table, you're likely to observe the effects of this eventually consistent behavior. The streaming system, by nature of its vastly different scale, caches table metadata aggressively, in a combination of shared and individual caches. Generally, the pattern that causes users the most problems is a stream->create table->stream pattern. The manifestation often looks something like the following:
It's the first streaming call that triggers the problem here. The call requires the streaming system to load the metadata state, and at that moment the table doesn't exist. This negative existence is cached by the streaming system for a time even if the table is created immediately afterwards. Subsequent streaming calls leverage their cached metadata, and thus reject inserts for a time until such time as the cached table metadata expires and gets refreshed. Callers receive inconsistent responses because each streaming backend instance may have a slightly different cache state. Generally, the best thing to do is to design your interactions so that you don't change table metadata while interacting with the streaming system. In the previous example, ensuring the table is created before the first streaming call is generally sufficient. There are other interaction patterns, like deleting and recreating a table with the same name that will trigger similarly observed behaviors. In this case, rather than caching a negative existence, what users will observe is that not all writes appear to arrive to the table. In this case, it's because some writes may have been send to the old (now deleted) table, and some to the new. Similarly, schema evolution, where the schema of a table is extended, may take some time before all the streaming backends see the updated changes to a given table's schema. Hopefully this provides some additional background into the nature of the issue. |
Source-Link: googleapis/googleapis@256da8c Source-Link: googleapis/googleapis-gen@523a493 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNTIzYTQ5MzI4N2E2ODUyZDg3MTlkNzU5NTI4OWY2MWVkZGZjYzAxMCJ9
#11469) - [ ] Regenerate this pull request now. PiperOrigin-RevId: 719114015 Source-Link: https://togithub.com/googleapis/googleapis/commit/8ae42f34e12cb8ab1ca66290322f4c8662c71a0a Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/02635deac0f24ff83b042b3a904808e488984b01 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMDI2MzVkZWFjMGYyNGZmODNiMDQyYjNhOTA0ODA4ZTQ4ODk4NGIwMSJ9 BEGIN_NESTED_COMMIT docs(gsuiteaddons): Minor documentation edits PiperOrigin-RevId: 719012138 Source-Link: https://togithub.com/googleapis/googleapis/commit/d196cb2d69e542020101d8a30f1e33497835a4f1 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/900acb8906d3fa681136610bd5f335cef8f4368a Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiOTAwYWNiODkwNmQzZmE2ODExMzY2MTBiZDVmMzM1Y2VmOGY0MzY4YSJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT feat(aiplatform): Add machine_spec, data_persistent_disk_spec, network_spec, euc_config, shielded_vm_config to `.google.cloud.aiplatform.v1beta1.NotebookRuntime` docs(aiplatform): Deprecate `is_default` in message `.google.cloud.aiplatform.v1beta1.NotebookRuntimeTemplate` docs(aiplatform): Deprecate `service_account` in message `.google.cloud.aiplatform.v1beta1.NotebookRuntimeTemplate` docs(aiplatform): Deprecate `service_account` in message `.google.cloud.aiplatform.v1beta1.NotebookRuntime` PiperOrigin-RevId: 718471184 Source-Link: https://togithub.com/googleapis/googleapis/commit/c905b7dc8447785e10afd1c3df2f019756f43628 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/76d53214120d652c3f3fee3c52ea1ec9434eac71 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNzZkNTMyMTQxMjBkNjUyYzNmM2ZlZTNjNTJlYTFlYzk0MzRlYWM3MSJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT feat(compute): Update Compute Engine API to revision 20250107 (#975) Source-Link: https://togithub.com/googleapis/googleapis/commit/256da8cda0349a20d51eba72820f266077796b05 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/523a493287a6852d8719d7595289f61eddfcc010 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNTIzYTQ5MzI4N2E2ODUyZDg3MTlkNzU5NTI4OWY2MWVkZGZjYzAxMCJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT feat(aiplatform): Add machine_spec, data_persistent_disk_spec, network_spec, euc_config, shielded_vm_config to message `.google.cloud.aiplatform.v1.NotebookRuntime` docs(aiplatform): Deprecate `is_default` in message `.google.cloud.aiplatform.v1.NotebookRuntimeTemplate` docs(aiplatform): Deprecate `service_account` in message `.google.cloud.aiplatform.v1.NotebookRuntimeTemplate` docs(aiplatform): Deprecate `service_account` in message `.google.cloud.aiplatform.v1.NotebookRuntime` PiperOrigin-RevId: 718467020 Source-Link: https://togithub.com/googleapis/googleapis/commit/7e9066bdf5d870e822c4256e720e1a38f8a9b390 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/7f9a3e29f8b24285ac6ee9dd462275d4c2cc224e Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiN2Y5YTNlMjlmOGIyNDI4NWFjNmVlOWRkNDYyMjc1ZDRjMmNjMjI0ZSJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT docs(spanner/admin/database): fix typo timzeone -> timezone PiperOrigin-RevId: 717555125 Source-Link: https://togithub.com/googleapis/googleapis/commit/318818b22ec2bd44ebe43fe662418b7dff032abf Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/bee9a658fc228dcd88e8a92b80efc4ccc274fe55 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYmVlOWE2NThmYzIyOGRjZDg4ZThhOTJiODBlZmM0Y2NjMjc0ZmU1NSJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT feat(datastream): A new field `ssl_config` is added to message `.google.cloud.datastream.v1.PostgresqlProfile` feat(datastream): A new message `PostgresqlSslConfig` is added docs(datastream): A comment for message `OracleAsmConfig` is changed docs(datastream): A comment for field `password` in message `.google.cloud.datastream.v1.OracleAsmConfig` is changed docs(datastream): A comment for field `name` in message `.google.cloud.datastream.v1.PrivateConnection` is changed docs(datastream): A comment for field `name` in message `.google.cloud.datastream.v1.Route` is changed docs(datastream): A comment for field `name` in message `.google.cloud.datastream.v1.ConnectionProfile` is changed docs(datastream): A comment for field `name` in message `.google.cloud.datastream.v1.Stream` is changed docs(datastream): A comment for field `name` in message `.google.cloud.datastream.v1.StreamObject` is changed PiperOrigin-RevId: 717526711 Source-Link: https://togithub.com/googleapis/googleapis/commit/5c2ccb6470e6a483f73c1be39b3d185e15d11808 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/56c10375a6adb08d9e5e870497b83d03f7408b3e Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNTZjMTAzNzVhNmFkYjA4ZDllNWU4NzA0OTdiODNkMDNmNzQwOGIzZSJ9 END_NESTED_COMMIT BEGIN_NESTED_COMMIT fix(firestore/apiv1): bump default deadline on CreateDatabase and RestoreDatabase to 2 minutes feat(firestore/apiv1): add filter argument to FirestoreAdmin.ListBackupsRequest PiperOrigin-RevId: 716763143 Source-Link: https://togithub.com/googleapis/googleapis/commit/3776db131e34e42ec8d287203020cb4282166aa5 Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/10db5ac476a94aa4c9e0a24946d9fa1b7ea456f6 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMTBkYjVhYzQ3NmE5NGFhNGM5ZTBhMjQ5NDZkOWZhMWI3ZWE0NTZmNiJ9 END_NESTED_COMMIT
Client
BigQuery Go Client
Describe Your Environment
Linux 4.15.11-1-ARCH SMP PREEMPT x86_64 GNU/Linux
go version go1.10.1 linux/amd64
Expected Behavior
I am trying to stream into BigQuery.
It should just insert the data without 404 table not found error since the table already exists.
Actual Behavior
404 table not found
Before running
go run
the package was updated withgo get -u cloud.google.com/go/bigquery
I am able to consitently reproduce it with the code below:
First make sure the dataset exists
In the below code, change the
tableID
to a table which does not exist yet.It will first insert without creating table, which is expected to fail.
Then it will create new table
Finally, it will insert again but this time also we'll end up with 404 table not found error.
Here is my project ID: pureapp-199410 (API request logs might help here)
In checkTable.go I have:
The text was updated successfully, but these errors were encountered: