-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Role Mapping Cleanup Security Migration #114830
Conversation
@@ -117,6 +117,7 @@ private static IndexVersion def(int id, Version luceneVersion) { | |||
public static final IndexVersion ENABLE_IGNORE_MALFORMED_LOGSDB = def(8_514_00_0, Version.LUCENE_9_11_1); | |||
public static final IndexVersion MERGE_ON_RECOVERY_VERSION = def(8_515_00_0, Version.LUCENE_9_11_1); | |||
public static final IndexVersion UPGRADE_TO_LUCENE_9_12 = def(8_516_00_0, Version.LUCENE_9_12_0); | |||
public static final IndexVersion ADD_ROLE_MAPPING_MIGRATION = def(8_517_00_0, Version.LUCENE_9_12_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to allow us to check if an index was created on the same version as the migration. This means that the index was created with all the settings/mappings/docs needed, so we don't need to run the migration (it can be short-circuit and skipped since there is nothing to migrate).
...main/java/org/elasticsearch/xpack/security/action/rolemapping/ReservedRoleMappingAction.java
Outdated
Show resolved
Hide resolved
...curity/src/main/java/org/elasticsearch/xpack/security/support/SecurityMigrationExecutor.java
Outdated
Show resolved
Hide resolved
84c38fc
to
d36b152
Compare
@@ -267,6 +272,11 @@ private static boolean isCreatedOnLatestVersion(IndexMetadata indexMetadata) { | |||
return indexVersionCreated != null && indexVersionCreated.onOrAfter(IndexVersion.current()); | |||
} | |||
|
|||
private static Set<String> getFileSettingsMetadataHandlerRoleMappingKeys(ClusterState clusterState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't exactly part of the security index, so this doesn't belong here, but I added it here anyway for simplicity. Ideally the security migration state should be tracked in a separate cluster state handler. I think that's an improvement/refactor that should be done in a separate PR if we keep adding migrations on state outside the security index.
The new state handler would still need to handle the main security index manager state, since we rely heavily on state from it in the framework (is the index available for example).
ddfe166
to
c23b783
Compare
createNativeRoleMapping("no_name_conflict", Map.of("meta", "test")); | ||
assertAllRoleMappings(client(), "no_name_conflict", "operator_role_mapping_1"); | ||
} else if (CLUSTER_TYPE == ClusterType.UPGRADED) { | ||
waitForSecurityMigrationCompletion(adminClient(), 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests will fail here until the file service change is merged, since the preconditions for the migration are not met.
Pinging @elastic/es-security (Team:Security) |
Hi @jfreden, I've created a changelog YAML for you. |
@@ -189,7 +189,7 @@ public List<TemplateRoleName> getRoleTemplates() { | |||
* This is not used within the mapping process, and does not affect whether the expression matches, nor which roles are assigned. | |||
*/ | |||
public Map<String, Object> getMetadata() { | |||
return Collections.unmodifiableMap(metadata); | |||
return metadata != null ? Collections.unmodifiableMap(metadata) : Collections.emptyMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not set to null anywhere in production code, but could be in the future (since constructor is public) and can in tests (as I found out).
...ugin/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityMigrations.java
Show resolved
Hide resolved
} | ||
} | ||
|
||
public static class CleanupRoleMappingDuplicatesMigration implements SecurityMigration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prod code looks solid 💯 I'll take another pass over it Monday and go through tests as well.
...in/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityIndexManager.java
Show resolved
Hide resolved
boolean hasFileSettingsMetadata = fileSettingsMetadata != null; | ||
// If there is no fileSettingsMetadata, there should be no reserved state (this is to catch bugs related to | ||
// name changes to FILE_SETTINGS_METADATA_NAMESPACE) | ||
assert hasFileSettingsMetadata || clusterState.metadata().reservedStateMetadata().isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd include a message in the assert with a bit of the failure context like: "encountered unknown namespace in reserved state metadata"
...ugin/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityMigrations.java
Show resolved
Hide resolved
Map<Boolean, List<ExpressionRoleMapping>> partitionedRoleMappings = Arrays.stream(roleMappings) | ||
.collect( | ||
Collectors.partitioningBy( | ||
roleMapping -> roleMapping.getMetadata().get(ExpressionRoleMapping.READ_ONLY_ROLE_MAPPING_METADATA_FLAG) != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it's fair enough to move this into an isReadOnly()
method (or similarly-named) on ExpressionRoleMapping
ActionListener<DeleteRoleMappingResponse> groupListener = new GroupedActionListener<>( | ||
names.size(), | ||
ActionListener.wrap(responses -> { | ||
Map<Boolean, List<DeleteRoleMappingResponse>> deletesPartitionedByFound = responses.stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we're only after the count here so instead of a partition-by I'd just go with a for loop that counts number not found and number deleted
...ugin/security/src/main/java/org/elasticsearch/xpack/security/support/SecurityMigrations.java
Show resolved
Hide resolved
ActionListener.wrap(responses -> { | ||
Map<Boolean, List<DeleteRoleMappingResponse>> deletesPartitionedByFound = responses.stream() | ||
.collect(Collectors.partitioningBy(DeleteRoleMappingResponse::isFound)); | ||
if (deletesPartitionedByFound.containsKey(false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitioningBy
will always include entries for both partitions (an empty list if nothing was found) so this would print a WARN
even if no role mappings were missed.
this doesn't matter if we use a regular counter var as per my comment above, but if we stick with partitioningBy
gotta fix
); | ||
@Override | ||
public boolean checkPreConditions(SecurityIndexManager.State securityIndexManagerState) { | ||
// If there are operator defined role mappings, make sure they've been loaded in to cluster state before launching migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: need to update the comment to reflect the SKIP
condition
.collect( | ||
Collectors.partitioningBy( | ||
roleMapping -> roleMapping.getMetadata().get(ExpressionRoleMapping.READ_ONLY_ROLE_MAPPING_METADATA_FLAG) != null | ||
&& (boolean) roleMapping.getMetadata().get(ExpressionRoleMapping.READ_ONLY_ROLE_MAPPING_METADATA_FLAG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: lets assert the flag is actually boolean to avoid failed cast surprises
import static org.hamcrest.CoreMatchers.is; | ||
import static org.hamcrest.Matchers.containsInAnyOrder; | ||
|
||
public class SecurityIndexRoleMappingCleanupIT extends AbstractUpgradeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of ITs, could you also update this bit: https://github.com/elastic/elasticsearch/blob/main/qa/rolling-upgrade/src/javaRestTest/java/org/elasticsearch/upgrades/FileSettingsRoleMappingUpgradeIT.java#L113?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up changing this to contains
and I found a bug in the old file handling. I think we might want to create a patch and backport to versions before 8.15.
The migration never executed because there wasn't a security index (expected behaviour from the migration), but since the role mappings should have been created in the security index it was strange that the index didn't exist. Turns out the call to the native role mapping store here can happen while the SecurityIndexManager
is in unrecovered state. This leads to the native role mapping store thinking that the index is not up to date here and nothing is inserted in to the store.
This won't impact the current state of things, so we can proceed, but we should discuss how to address this after this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have actually gotten fixed here: #92173
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
6f549ea
to
9533f6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look great, too. A few coverage suggestions -- I'll make one more pass when all is ready. Thanks for iterating on this!
awaitFileSettingsWatcher(); | ||
// Setup listener to wait for role mapping | ||
var fileBasedRoleMappingsWrittenListener = setupClusterStateListener(masterNode, "everyone_kibana_alone"); | ||
// Write role mappings with fallback name, this should block any security migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is off here: TEST_JSON_WITH_ROLE_MAPPINGS
does not have a mapping with a fallback name
waitForMigrationCompletion(SecurityMigrations.CLEANUP_ROLE_MAPPING_DUPLICATES_MIGRATION_VERSION); | ||
} | ||
|
||
public void testSkipMigrationNoFileBasedMappings() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love these tests!
Might also add the other skip case where there are file settings but no role mapping section
...t/java/org/elasticsearch/xpack/security/support/CleanupRoleMappingDuplicatesMigrationIT.java
Show resolved
Hide resolved
true | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a test for a native role mapping with the suffix but without the read-only flag
} | ||
|
||
private static ExpressionRoleMapping nativeRoleMapping(String name) { | ||
return new ExpressionRoleMapping(name, null, null, null, null, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we randomize (or otherwise handle) randomBoolean() ? null : Map.of()
? Just to cover both cases where metadata is null or explicitly doesn't have the flag set
* | ||
* @return RoleMappingsCleanupMigrationStatus | ||
*/ | ||
static RoleMappingsCleanupMigrationStatus getRoleMappingsCleanupMigrationStatus(ClusterState clusterState, int migrationsVersion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It be great to cover this method with some units tests -- the ITs are great but unit tests we can much better cover some of the NOT_READY cases, e.g., size mismatch & some (but not all) mappings with fallback names.
} | ||
assert indexManager.getRoleMappingsCleanupMigrationStatus() == READY; | ||
|
||
getRoleMappings(client, ActionListener.wrap(roleMappings -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth covering some of the failure paths with unit tests: in particular, a test where some deletions are 404 or failures showing that we still run through the whole set of names feels important
} | ||
|
||
private void deleteNativeRoleMappings(Client client, List<String> names, ActionListener<Void> listener) { | ||
ActionListener<DeleteRoleMappingResponse> groupListener = new GroupedActionListener<>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we add an assertion that names
is not empty?
assertAllRoleMappings(client(), "operator_role_mapping_1", "operator_role_mapping_2"); | ||
} else if (CLUSTER_TYPE == ClusterType.MIXED) { | ||
// Create a native role mapping that doesn't conflict with anything before the migration run | ||
createNativeRoleMapping("no_name_conflict", Map.of("meta", "test")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a native mapping with a reserved suffix here (it should not get deleted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah whoops, my bad -- in a mixed cluster createNativeRoleMapping("native_role_mapping" + ExpressionRoleMapping.READ_ONLY_ROLE_MAPPING_SUFFIX, Map.of("meta", "test"));
can result in a rejected request -- I think we want to move this to the (CLUSTER_TYPE == ClusterType.OLD)
branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh that's right!
We can't create it there since we won't be bale to validate it. Not all bwc tests will hit the old cluster code because of the assumeTrue
statements. I think we should skip it in this case for that reason.
@@ -261,10 +271,63 @@ private SystemIndexDescriptor.MappingsVersion getMinSecurityIndexMappingVersion( | |||
* Check if the index was created on the latest index version available in the cluster | |||
*/ | |||
private static boolean isCreatedOnLatestVersion(IndexMetadata indexMetadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to rename this to isCreatedOnLatestMigrationVersion
or something similar, since this method no longer checks latest index version per se, but latest migration version
@n1v0lg Thanks for the review! I've addressed all the comments and reverted the index settings like we discussed offline. |
This PR adds a cleanup security migration for role mapping duplicates that exist both in cluster state and the
.security
index. The cleanup task is implemented as a security migration.The task compares the names of the role mappings in the operator-defined
settings.json
and the names of the role mappings in the.security
index and removes the ones in the.security
index.Issue
This cleanup is done because there was a bug (fixed here) that caused ECK customers to have the same role mappings with the same names present both in cluster state and the
.security
index. The effective role mapping is the combination of the.security
index mapping and the cluster state one. After the bug fix, if the mapping in cluster state is modified (throughsetting.json
), the.security
index will still contain the old mapping and this could lead to unintended behaviour.Dependencies