Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Address mapping and compute engine runtime field issues (#117792) #118049

Merged
merged 2 commits into from
Dec 5, 2024

Conversation

martijnvg
Copy link
Member

Backports the following commits to 8.x:

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
This change addresses the following issues:

Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes elastic#117644
@martijnvg martijnvg added :StorageEngine/Mapping The storage related side of mappings >bug auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:StorageEngine labels Dec 5, 2024
@martijnvg
Copy link
Member Author

Unrelated rolling upgrade failures. Two qa rolling upgrade qa modules fail to form because of the following error:

[2024-12-05T12:36:17,388][WARN ][o.e.c.c.ClusterFormationFailureHelper] [test-cluster-0] master not discovered or elected yet, an election requires at least 2 nodes with ids from [4t21Aca3TLq0Q8W9fTYrZQ, hER7OzUkTxqLAvrly5Z_zw, vue2IphwQ_egkVUF-5aJKQ], have discovered possible quorum [{test-cluster-0}{vue2IphwQ_egkVUF-5aJKQ}{igMFSvepSD-vWskVI6DaTA}{test-cluster-0}{127.0.0.1}{127.0.0.1:43147}{cdfhilmrstw}{8.18.0}{7000099-8521000}, {test-cluster-1}{hER7OzUkTxqLAvrly5Z_zw}{EZzHS3qdR1yuxiTrtlfa-g}{test-cluster-1}{127.0.0.1}{127.0.0.1:46255}{cdfhilmrstw}{8.17.0}{7000099-8521000}, {test-cluster-2}{4t21Aca3TLq0Q8W9fTYrZQ}{v1ppPFwZTkunXxpl07R86w}{test-cluster-2}{127.0.0.1}{127.0.0.1:35255}{cdfhilmrstw}{8.17.0}{7000099-8521000}] who claim current master to be [{test-cluster-2}{4t21Aca3TLq0Q8W9fTYrZQ}{v1ppPFwZTkunXxpl07R86w}{test-cluster-2}{127.0.0.1}{127.0.0.1:35255}{cdfhilmrstw}{8.17.0}{7000099-8521000}]; discovery will continue using [[::1]:34525, [::1]:43751] from hosts providers and [{test-cluster-0}{vue2IphwQ_egkVUF-5aJKQ}{igMFSvepSD-vWskVI6DaTA}{test-cluster-0}{127.0.0.1}{127.0.0.1:43147}{cdfhilmrstw}{8.18.0}{7000099-8521000}] from last-known cluster state; node term 4, last-accepted version 59 in term 3; for troubleshooting guidance, see https://www.elastic.co/guide/en/elasticsearch/reference/master/discovery-troubleshooting.html
[2024-12-05T12:36:17,838][INFO ][o.e.c.c.JoinHelper       ] [test-cluster-0] failed to join {test-cluster-2}{4t21Aca3TLq0Q8W9fTYrZQ}{v1ppPFwZTkunXxpl07R86w}{test-cluster-2}{127.0.0.1}{127.0.0.1:35255}{cdfhilmrstw}{8.17.0}{7000099-8521000}{ml.max_jvm_size=536870912, ml.config_version=12.0.0, ml.machine_memory=126604136448, ml.allocated_processors_double=32.0, testattr=test, transform.config_version=10.0.0, xpack.installed=true, ml.allocated_processors=32} with JoinRequest{sourceNode={test-cluster-0}{vue2IphwQ_egkVUF-5aJKQ}{igMFSvepSD-vWskVI6DaTA}{test-cluster-0}{127.0.0.1}{127.0.0.1:43147}{cdfhilmrstw}{8.18.0}{7000099-8521000}{ml.max_jvm_size=536870912, ml.config_version=12.0.0, ml.machine_memory=126604136448, ml.allocated_processors_double=32.0, testattr=test, transform.config_version=10.0.0, xpack.installed=true, ml.allocated_processors=32}, compatibilityVersions=CompatibilityVersions[transportVersion=8803000, systemIndexMappingsVersion={.triggered_watches=MappingsVersion[version=1, hash=-502826165], .secrets-inference=MappingsVersion[version=1, hash=-1434574148], .fleet-agents-7=MappingsVersion[version=1, hash=-270798539], .fleet-servers-7=MappingsVersion[version=1, hash=-916922632], .fleet-policies-leader-7=MappingsVersion[version=1, hash=-1108172796], .ml-config=MappingsVersion[version=1, hash=-319778629], .geoip_databases=MappingsVersion[version=1, hash=-305757839], .security-tokens-7=MappingsVersion[version=1, hash=576296021], .snapshot-blob-cache=MappingsVersion[version=1, hash=632712485], .security-profile-8=MappingsVersion[version=2, hash=-909540896], .search-app-1=MappingsVersion[version=1, hash=-501711141], .watches=MappingsVersion[version=1, hash=-1045118511], .fleet-artifacts-7=MappingsVersion[version=1, hash=-1593703898], .query-rules-2=MappingsVersion[version=2, hash=-1272560824], .transform-internal-007=MappingsVersion[version=1, hash=1144737897], .fleet-enrollment-api-keys-7=MappingsVersion[version=1, hash=-1804942283], .fleet-actions-7=MappingsVersion[version=1, hash=-2624357], .tasks=MappingsVersion[version=0, hash=-945584329], .fleet-secrets-7=MappingsVersion[version=1, hash=-745394230], .ml-meta=MappingsVersion[version=2, hash=-613742866], .security-7=MappingsVersion[version=3, hash=-832976091], .connector-secrets-1=MappingsVersion[version=1, hash=-745394230], .logstash=MappingsVersion[version=1, hash=-1058806351], .ml-inference-000005=MappingsVersion[version=3, hash=919553140], .inference=MappingsVersion[version=2, hash=-1459421596], .async-search=MappingsVersion[version=0, hash=-1403744380], .fleet-policies-7=MappingsVersion[version=1, hash=-201702522], .synonyms-2=MappingsVersion[version=1, hash=-888080772]}], features=[esql.metrics_syntax, security.role_mapping_cleanup, semantic_text.default_elser_2, search.vectors.k_param_supported, query_rules.test, random_reranker_retriever_supported, geoip.downloader.database.configuration, data_stream.failure_store.tsdb_fix, esql.st_x_y, semantic_text.always_emit_inference_id_fix, mapper.logsdb_default_ignore_dynamic_beyond_limit, esql.resolve_fields_api, esql.mv_ordering_sorted_ascending, esql.agg_values, semantic_text.search_inference_id, esql.async_query, mapper.range.null_values_off_by_one_fix, mapper.constant_keyword.synthetic_source_write_fix, script.term_stats, query_rule_list_types, mapper.vectors.bit_vectors, esql.st_disjoint, mapper.range.date_range_indexing_fix, mapper.source.synthetic_source_stored_fields_advance_fix, snapshot.repository_verify_integrity, rest.capabilities_action, esql.metrics_counter_fields, retrievers_supported, mapper.pass_through_priority, esql.disable_nullable_opts, text_similarity_reranker_retriever_composition_supported, mapper.fix_parsing_subobjects_false_dynamic_false, knn_retriever_supported, mapper.ignored_source.dont_expand_dots, esql.st_contains_within, query_rule_retriever_supported, text_similarity_reranker_retriever_supported, security.roles_metadata_flattened, features_supported, flattened.ignore_above_support, semantic_text.delete_fix, put_database_configuration_action.ipinfo, script.hamming, mapper.index_sorting_on_nested, esql.casting_operator, esql.metadata_fields, unified_highlighter_matched_fields, mapper.track_ignored_source, routing.boolean_routing_path, mapper.boolean_dimension, mapper.keyword_dimension_ignore_above, data_stream.auto_sharding, esql.counter_types, standard_retriever_supported, semantic_text.in_object_field_fix, routing.multi_value_routing_path, mapper.ignored_source.always_store_object_arrays_in_nested, semantic_text.zero_size_fix, mapper.subobjects_auto_fixes, mapper.sparse_vector.store_support, semantic_text.single_field_update_fix, mapper.source.synthetic_source_with_copy_to_and_doc_values_false, data_stream.rollover.lazy, esql.st_centroid_agg, mapper.ignore_above_index_level_setting, usage.data_tiers.precalculate_stats, mapper.source.remove_synthetic_source_only_validation, mapper.source.synthetic_source_fallback, mapper.source.synthetic_source_copy_to_inside_objects_fix, mapper.keyword_normalizer_synthetic_source, mapper.source.synthetic_source_copy_to_fix, esql.from_options, logsdb_telemetry, license-trial-independent-version, tsdb.ts_routing_hash_doc_value_parse_byte_ref, esql.timespan_abbreviations, get_database_configuration_action.multi_node, simulate.mapping.validation, simulate.mapping.addition, mapper.query_index_mode, simulate.support.non.template.mapping, esql.base64_decode_encode, mapper.subobjects_auto, esql.spatial_shapes, mapper.synthetic_source_keep, esql.string_literal_auto_casting, slm.interval_schedule, rrf_retriever_composition_supported, cluster.stats.source_modes, simulate.index.template.substitutions, security.migration_framework, desired_node.version_deprecated, simulate.component.template.substitutions, test_features_enabled, data_stream.lifecycle.global_retention, stats.include_disk_thresholds, esql.string_literal_auto_casting_extended, esql.spatial_points_from_source, esql.mv_sort, mapper.source.mode_from_index_setting, meta_fetch_fields_error_code_changed, logsdb_telemetry_stats, file_settings, mapper.vectors.bbq, health.dsl.info, health.extended_repository_indicator, mapper.segment_level_fields_stats, simulate.mapping.validation.templates, esql.st_intersects, mapper.vectors.int4_quantization, rest.local_only_capabilities, rrf_retriever_supported, repositories.supports_usage_stats, mapper.flattened.ignore_above_with_arrays_support], minimumTerm=4, optionalJoin=Optional[Join[votingNode={test-cluster-0}{vue2IphwQ_egkVUF-5aJKQ}{igMFSvepSD-vWskVI6DaTA}{test-cluster-0}{127.0.0.1}{127.0.0.1:43147}{cdfhilmrstw}{8.18.0}{7000099-8521000}{ml.max_jvm_size=536870912, ml.config_version=12.0.0, ml.machine_memory=126604136448, ml.allocated_processors_double=32.0, testattr=test, transform.config_version=10.0.0, xpack.installed=true, ml.allocated_processors=32}, masterCandidateNode={test-cluster-2}{4t21Aca3TLq0Q8W9fTYrZQ}{v1ppPFwZTkunXxpl07R86w}{test-cluster-2}{127.0.0.1}{127.0.0.1:35255}{cdfhilmrstw}{8.17.0}{7000099-8521000}{ml.max_jvm_size=536870912, ml.config_version=12.0.0, ml.machine_memory=126604136448, ml.allocated_processors_double=32.0, testattr=test, transform.config_version=10.0.0, xpack.installed=true, ml.allocated_processors=32}, term=4, lastAcceptedTerm=3, lastAcceptedVersion=59]]}
org.elasticsearch.transport.RemoteTransportException: [test-cluster-2][127.0.0.1:35255][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: Node vue2IphwQ_egkVUF-5aJKQ is missing required features [inner_retrievers_filter_support]
	at org.elasticsearch.cluster.coordination.NodeJoinExecutor.enforceNodeFeatureBarrier(NodeJoinExecutor.java:470) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.coordination.NodeJoinExecutor.execute(NodeJoinExecutor.java:177) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:1075) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:1038) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService.executeAndPublishBatch(MasterService.java:245) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.lambda$run$2(MasterService.java:1691) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService$BatchingTaskQueue$Processor.run(MasterService.java:1688) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService$5.lambda$doRun$0(MasterService.java:1283) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.action.ActionListener.run(ActionListener.java:452) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.cluster.service.MasterService$5.doRun(MasterService.java:1262) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1575) ~[?:?]

Unrelated to this change.

@martijnvg martijnvg merged commit b8afe64 into elastic:8.x Dec 5, 2024
13 of 16 checks passed
@martijnvg martijnvg deleted the backport/8.x/pr-117792 branch December 5, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.18.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants