Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All databases nodes crashes while executing even the most simple INSERT INTO SELECT FROM query #11186

Closed
SloNN opened this issue Nov 1, 2024 · 11 comments · Fixed by #11474, #11289, #11631 or #12138
Assignees

Comments

@SloNN
Copy link
Collaborator

SloNN commented Nov 1, 2024

Issueing the most trivial query

$v = SELECT cast(count(*) as uint32)
    FROM `raw/kikimr_query-replay_prod`
    ;

insert into cnt2 (key,c) values(1,$v);

All database nodes crashes

telegram-cloud-photo-size-2-5269278991171315021-y

@nikvas0
Copy link
Collaborator

nikvas0 commented Nov 2, 2024

Reproduced this on another cluster.

Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]: VERIFY failed (2024-11-02T17:23:50.083020+0300): tablet_id=44;verification=Stage == from;fline=actor.h:63;from=1;real=0;to=2;
Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]:   ydb/library/actors/core/log.cpp:748
Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]:   ~TVerifyFormattedRecordWriter(): requirement false failed
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 0. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:83: NPrivate::InternalPanicImpl(int, char const*, char const*, int, int, int, TBasicStringBuf<char, std::__y1::char_traits<char>>, char const*, unsigned long) @ 0xAFF387D
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 1. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:55: NPrivate::Panic(NPrivate::TStaticBuf const&, int, char const*, char const*, char const*, ...) @ 0xAFED89C
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 2. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/log.cpp:748: NActors::TVerifyFormattedRecordWriter::~TVerifyFormattedRecordWriter() @ 0xBFDFC33
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 3. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:63: NKikimr::NOlap::NDataReader::TActor::SwitchStage(NKikimr::NOlap::NDataReader::TActor::EStage, NKikimr::NOlap::NDataReader::TActor::EStage) @ 0x140A01DD
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 4. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.cpp:38: NKikimr::NOlap::NDataReader::TActor::HandleExecute(TAutoPtr<NActors::TEventHandle<NKikimr::NKqp::TEvKqpCompute::TEvScanError>, TDelete>&) @ 0x140A0753
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 5. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:85: NKikimr::NOlap::NDataReader::TActor::StateFunc(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x140A0DB2
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 6. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:251: NActors::TGenericExecutorThread::TProcessingResult NActors::TGenericExecutorThread::Execute<NActors::TMailboxTable::THTSwapMailbox>(NActors::TMailboxTable::THTSwapMailbox*, unsigned int, bool) @ 0xBFC10F7
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 7. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:440: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*)::$_0::operator()(unsigned int, bool) const @ 0xBFB8621
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 8. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:493: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*) @ 0xBFB8064
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 9. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:524: NActors::TExecutorThread::ThreadProc() @ 0xBFB8E5F
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 10. /home/ns-vasilev/ydbwork/ydb/util/system/thread.cpp:244: (anonymous namespace)::TPosixThread::ThreadProxy(void*) @ 0xAFF7BFE
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 11. ??:0: ?? @ 0x7F7DC4B91608
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 12. ??:0: ?? @ 0x7F7DC4AB1352

@vlad-gogov
Copy link
Collaborator

vlad-gogov commented Nov 5, 2024

Logs from OLAP TESTING VLA COMMON3: https://paste.yandex-team.ru/332eb072-8a16-4af2-9c6a-379bdaeee6c1
Version: ydb-stable-24-3-12

@nikvas0
Copy link
Collaborator

nikvas0 commented Nov 6, 2024

@vlad-gogov
Copy link
Collaborator

After applying the changes from #11289 and locally reproducing the issue, I received the following message: https://paste.yandex-team.ru/73c8aacb-8373-4686-8978-0219fe149401
image

@nikvas0
Copy link
Collaborator

nikvas0 commented Nov 7, 2024

Looks like snapshot is really older.

Snapshot too old: {1730969344010:844424930162099}. CS min read snapshot: {1730975557090:max}. now: 2024-11-07T10:37:37.171747Z,
❯ date -uR -r 1730969344
Thu, 07 Nov 2024 08:49:04 +0000
❯ date -uR -r 1730975557
Thu, 07 Nov 2024 10:32:37 +0000

@vlad-gogov
Copy link
Collaborator

vlad-gogov commented Nov 11, 2024

local reproducing in unit test: #11468

@nikvas0 nikvas0 linked a pull request Nov 12, 2024 that will close this issue
@nikvas0 nikvas0 reopened this Nov 12, 2024
@vlad-gogov vlad-gogov linked a pull request Nov 13, 2024 that will close this issue
@vlad-gogov vlad-gogov reopened this Nov 14, 2024
@vlad-gogov vlad-gogov linked a pull request Nov 15, 2024 that will close this issue
vlad-gogov added a commit that referenced this issue Nov 15, 2024
@SloNN
Copy link
Collaborator Author

SloNN commented Nov 25, 2024

Make the same query once again on new version and all database node crashed

$cnt = select cast(count(*) as int64) from `raw/kikimr_ydb_kikimr-log`; insert into cnt3(key,c) values(5,$cnt)

@SloNN SloNN reopened this Nov 25, 2024
@vlad-gogov
Copy link
Collaborator

vlad-gogov commented Nov 25, 2024

Stack: https://paste.yandex-team.ru/9796dcef-3c46-42c1-8445-c496b2aea87d
Verify:

AFL_VERIFY(it != ConflictedWriteIds.end())("write_id", writeId)("write_ids_count", ConflictedWriteIds.size());

@SloNN
Copy link
Collaborator Author

SloNN commented Nov 27, 2024

Updated cluster to version stable-24-3-13-with-logs.83a2a35
Database nodes crashes after executing the same query

$cnt = select cast(count(*) as int64) from `raw/kikimr_ydb_kikimr-log`; insert into cnt3(key,c) values(5,$cnt)

@vlad-gogov
Copy link
Collaborator

Stack: https://paste.yandex-team.ru/584d4bfe-22ce-4735-9c72-34617de38158
Verify:

AFL_VERIFY(it != ConflictedWriteIds.end())("write_id", writeId)("write_ids_count", ConflictedWriteIds.size());

@vlad-gogov vlad-gogov linked a pull request Dec 2, 2024 that will close this issue
vlad-gogov added a commit that referenced this issue Dec 6, 2024
@vlad-gogov vlad-gogov reopened this Dec 6, 2024
@vlad-gogov
Copy link
Collaborator

Logs: https://paste.yandex-team.ru/0bfde892-430c-4b86-84b4-11feeb0491ff
Query: SELECT COUNT(*) FROM `raw/kikimr_ydb_kikimr-log`;
Error: Scan failed at tablet 72075186224042551, reason: task_error:cannot read blob range { Blob: DS:2181038094:[72075186224042551:3:7044832:2:64:4488:0] Offset: 3576 Size: 912 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants