Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(advanced-search): Complete Advanced Search: backend changes & tying UI together #6068

Merged
merged 39 commits into from
Oct 4, 2022

Conversation

gabe-lyons
Copy link
Contributor

@gabe-lyons gabe-lyons commented Sep 27, 2022

This pr accomplishes a few things:

  • Adds advanced search components introduced in previous PRs into the Search + Embedded Search + Impact Analysis pages.
  • Sends advanced search filters to the backend
  • Makes a breaking change to the graphql api to remove value and replace with values for FacetFilterInput graphql param. This means we will only send collections of values back for each filter. If only one value is provided, the array will have one element.
  • Adds backend logic to use values instead of value from facet filter input, handle different conditions, handle negation, and handle OR/AND union types.
  • Adds tests for new backend search logic + integration tests in cypress to apply a few different advanced search filter concepts
  • Adds documentation for using advanced filters.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX labels Sep 27, 2022
@github-actions
Copy link

github-actions bot commented Sep 27, 2022

Unit Test Results (build & test)

584 tests   580 ✔️  12m 54s ⏱️
143 suites      4 💤
143 files        0

Results for commit 8eef5f4.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 27, 2022

Unit Test Results (metadata ingestion)

       8 files         8 suites   59m 3s ⏱️
   719 tests    716 ✔️ 3 💤 0
1 440 runs  1 434 ✔️ 6 💤 0

Results for commit 8eef5f4.

♻️ This comment has been updated with latest results.

Comment on lines 153 to 156
Value of the field to filter by (soon to be deprecated)
"""
value: String!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is a breaking change (we are removing the value field from FacetFilterInput

loading: boolean;
}

export const SearchFilters = ({ facets, selectedFilters, onFilterSelect, loading }: Props) => {
export const SimpleSearchFilters = ({ facets, selectedFilters, onFilterSelect, loading }: Props) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename here to avoid confusion with advanced search components.

Comment on lines 139 to 149
if (pairMatch.isPresent()) {
final BoolQueryBuilder orQueryBuilder = new BoolQueryBuilder();
String[] pairMatchValue = pairMatch.get();
for(String field: pairMatchValue) {
Criterion criterionToQuery = new Criterion();
criterionToQuery.setCondition(criterion.getCondition());
criterionToQuery.setNegated(criterion.isNegated());
criterionToQuery.setValue(criterion.getValue());
criterionToQuery.setField(field + KEYWORD_SUFFIX);
orQueryBuilder.should(getQueryBuilderFromCriterionForSingleField(criterionToQuery));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic makes sure we filter by editable and non-editable

@gabe-lyons gabe-lyons changed the title [WIP] advanced search feat(advanced-search): Complete Advanced Search: backend changes & tying UI together Sep 30, 2022
Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! there's a good amount of code here and stuff i'm less familiar with, but I don't have anything major here by any means.

Comment on lines 133 to 134
result.setValue(filter.getValues().get(0));
if (filter.getValues() != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you move line 133 inside of this not null check to prevent a null-pointer guy? and maybe a check that it's not an empty list?

"Failed to execute search: " + String.format("entity type %s, query %s, filters: %s, start: %s, count: %s",
input.getType(), input.getQuery(), input.getFilters(), start, count), e);
"Failed to execute search: " + String.format("entity type %s, query %s, filters: %s, orFilters: %s, start: %s, count: %s",
input.getType(), input.getQuery(), input.getFilters(), input.getOrFilters(), start, count), e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so wherever it's getFilters that means "and" filters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately yes. Otherwise, we'd need to make breaking changes to the graphql api which we know folks are already making programatic calls to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see us taking our GraphQL API and making it an OR of ANDs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nontrivial semantic model debt I think...

@@ -68,6 +68,11 @@ input SearchInput {
Facet filters to apply to search results
"""
filters: [FacetFilterInput!]

"""
Filters are by default 'AND'-ed together. This lets you provide a set of OR filters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answers my question. might be nice to make note of that "and" nature on the doc string for filters above and elsewhere as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

onChangeQuery={onChangeQuery}
onChangeFilters={onChangeFilters}
onChangePage={onChangePage}
onChangeUnionType={setUnionType}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why onChangeUnionType and not just keep passing as setUnionType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscollins3456 just to keep consistency with the other function names in EmbeddedListSearch


#### Adding an Advanced Filter

Currently, Advanced Filters support filtering by Column name, Container, Domain, Description (entity or column level), Tag & Glossary Term (entity or column level), Owner, Entity Type, Subtype, Environment and soft-deleted status.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the & between Tag & Glossary Term just be a comma?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good call

@@ -57,6 +61,7 @@ public AllEntitiesSearchAggregator(
_searchRanker = Objects.requireNonNull(searchRanker);
_cachingEntitySearchService = Objects.requireNonNull(cachingEntitySearchService);
_entityDocCountCache = new EntityDocCountCache(entityRegistry, entitySearchService);
_entityRegistry = entityRegistry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add entityRegistry here if it's not being used? Same thing with the two imports above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean two imports above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment was out of date! there were a few unused imports in this file I believe. but alas they are no longer there

if (condition == Condition.EQUAL) {
// If values is set, use terms query to match one of the values
if (!criterion.getValues().isEmpty()) {
if (BOOLEAN_FIELDS.contains(fieldName) && criterion.getValues().size() == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a boolean field then values shouldn't be anything other than 1, right? keeping the check regardless isn't a bad idea, just wanted to check for myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct!

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@@ -81,20 +83,73 @@ public static Map<String, String> buildFacetFilters(@Nullable List<FacetFilterIn
if (!validFacetFields.contains(facetFilterInput.getField())) {
throw new ValidationException(String.format("Unrecognized facet with name %s provided", facetFilterInput.getField()));
}
facetFilters.put(facetFilterInput.getField(), facetFilterInput.getValue());
facetFilters.put(facetFilterInput.getField(), facetFilterInput.getValues().get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no chance this can throw an NPE right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved


Urn urn = new TestEntityUrn("test", "testUrn", "VALUE_1");
ObjectNode document = JsonNodeFactory.instance.objectNode();
document.set("urn", JsonNodeFactory.instance.textNode(urn.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we pull out some of these constants that we reuse? Also keeps things more extensible if we add more tests

@@ -39,6 +39,41 @@ The filters sidebar sits on the left hand side of search results, and lets users
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/filters_highlighted.png" />
</p>

### Advanced Filters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for updating these docs!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course :)


#### Adding an Advanced Filter

Currently, Advanced Filters support filtering by Column name, Container, Domain, Description (entity or column level), Tag (entity or column level), Glossary Term (entity or column level), Owner, Entity Type, Subtype, Environment and soft-deleted status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would capitalize Column Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

cy.contains(/of [0-9]+ results/);
});
describe("search", () => {
// it("can hit all entities search, see some results (testing this any more is tricky because it is cached for now)", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these meant to be commented out....?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has been reverted already :)

@@ -102,13 +100,16 @@ private Filter buildFilter(@Nullable FilterInput filtersInput, @Nullable final S
}
List<FacetFilterInput> facetFilters = new ArrayList<>();
if (status != null) {
facetFilters.add(new FacetFilterInput("status", status, ImmutableList.of(status), false, SearchCondition.EQUAL));
FacetFilterInput filter = new FacetFilterInput();
filter.setField("status");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this a constant?

Copy link
Collaborator

@jjoyce0510 jjoyce0510 Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Gabe didn't originally make this but yeah I think it should be. Either in this PR or a followup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do this in a followup

// In the case that user sends filters to be or-d together, we need to build a series of conjective criterion
// arrays, rather than just one for the AND case.
public static ConjunctiveCriterionArray buildConjunctiveCriterionArrayWithOr(
final List<Criterion> andCriterions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need an annotation for andCriterions too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer relevant

public static Filter buildFilter(@Nullable List<FacetFilterInput> facetFilterInputs) {
if (facetFilterInputs == null || facetFilterInputs.isEmpty()) {
public static Filter buildFilter(@Nullable List<FacetFilterInput> andFilters, @Nullable List<FacetFilterInput> orFilters) {
if ((andFilters == null || andFilters.isEmpty()) && (orFilters == null || orFilters.isEmpty())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you separate these two conditions out into boolean variables? it's a little hard to read, and you can use the the same variables down below when building the andCriterions and orCriterions.

Maybe hasAndFilters and hasOrFilters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

});

return facetFilters;
}

// In the case that user sends filters to be or-d together, we need to build a series of conjective criterion
// arrays, rather than just one for the AND case.
public static ConjunctiveCriterionArray buildConjunctiveCriterionArrayWithOr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: annotation this @Nonnull

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

});

return facetFilters;
}

// In the case that user sends filters to be or-d together, we need to build a series of conjective criterion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: conjunctive criterion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

});

return facetFilters;
}

// In the case that user sends filters to be or-d together, we need to build a series of conjective criterion
// arrays, rather than just one for the AND case.
public static ConjunctiveCriterionArray buildConjunctiveCriterionArrayWithOr(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving this here -> The name of this function + the parameters it takes are not immediately making it clear to me what this function does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved!

final List<Criterion> andCriterions,
@Nonnull List<FacetFilterInput> orFilters
) {
return new ConjunctiveCriterionArray(orFilters.stream().map(orFilter -> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are combining the existing "and" filters with a set of OR filters? Why is this necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Will try to understand on my own just leaving my initial reactions)

andCriterionsForOr.add(criterionFromFilter(orFilter));

return new ConjunctiveCriterion().setAnd(
andCriterionsForOr
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we are doing something like this...

If you pass a list of and criterion:

[ field equals value, field2 equals value2 ]

and some set of ORs

[ field3 equals value3, field4 equals value4 ]

then we will essentially build an OR of ANDs?

OR [ [ field equals value] AND [field2 equals value2 ] AND [ field3 equals value3 ], [ field equals value] AND [field2 equals value2 ] AND [ field4 equals value4 ] ]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we convert into an OR of ANDs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had to work through this in my head for some time. We should consider adding some comments or something to give an example !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resovled!

.map(filter -> new Criterion().setField(getFilterField(filter.getField())).setValue(filter.getValue()))
.collect(Collectors.toList())))));

final List<Criterion> andCriterions = andFilters != null && !andFilters.isEmpty()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is a bit confusing now. I was imagining we'd just have a nested OR or ANDs coming in from the FacetFilterInput. It's kinda hard to understand that if you provide both AND and OR filters that we'll combine them (and how)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in my update- clearer now!

@@ -52,7 +52,7 @@ public CompletableFuture<SearchResults> get(DataFetchingEnvironment environment)
"Executing search for multiple entities: entity types {}, query {}, filters: {}, start: {}, count: {}",
input.getTypes(), input.getQuery(), input.getFilters(), start, count);
return UrnSearchResultsMapper.map(_entityClient.searchAcrossEntities(entityNames, sanitizedQuery,
ResolverUtils.buildFilter(input.getFilters()), start, count, ResolverUtils.getAuthentication(environment)));
ResolverUtils.buildFilter(input.getFilters(), input.getOrFilters()), start, count, ResolverUtils.getAuthentication(environment)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit odd - input.getFilters vs input.getOrFilters -> are we saying we are just taking on this model semantics debt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in my update

"""
Condition for the values. How to If unset, assumed to be equality
"""
condition: SearchCondition
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchCondition? These operators seem a bit independent of search. What if we called this

FilterOperator? or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

"""
Filters are by default 'AND'-ed together. This lets you provide a set of OR filters
"""
orFilters: [FacetFilterInput!]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a nested model instead of reusing FacetFilterInput?

basically:

type OrFilter {
    and: [FacetFilterInput!] 
}

This is a standard way to define filters, it's Disjunctive Normal Form

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason i call this out is that once we commit this code it's going to become very nontrivial to change this, for the same reason its hard to change the existing semantics around AND filters..

Copy link
Collaborator

@jjoyce0510 jjoyce0510 Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And unsurprisingly, this is how we chose to model the Filter.pdl model already, because we were looking ahead at this.

It's shaped as

{
    or: [
       {
             and: [ { field: field, value: value } ]
       }
    ]
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@gabe-lyons gabe-lyons merged commit ce90310 into datahub-project:master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants