Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize age_exists function #586

Merged
merged 1 commit into from
Jan 25, 2023
Merged

Optimize age_exists function #586

merged 1 commit into from
Jan 25, 2023

Conversation

moeed-k
Copy link
Contributor

@moeed-k moeed-k commented Jan 23, 2023

-age_exists, which is the executor for exists(property) function, was making up to 3 redundant memory accesses.

-exists(property) predicate function only serves to check if a property exists or not.

-In Cypher, if a property is assigned the value of NULL, this is considered the same as the property not existing at all. Thus the function calls to get the value type is not needed as the check for the null argument itself filters out all NULL inputs. If a property passes this check, it implies existence.

-age_exists, which is the executor for exists(property) function, was making up to 3 redundant memory accesses.
-exists(property) predicate function only serves to check if a property exists or not.
-In Cypher, if a property is assigned the value of NULL, this is considered the same as the property not existing at all. Thus the function calls to get the value type is not needed as the check for the null argument itself filters out all NULL inputs. If a property passes this check, it implies existence.
@jrgemignani
Copy link
Contributor

@moeed-k I appreciate that you looked into optimizing this function. However, can you please thoroughly explain why the code that you want to remove, can actually be removed.

@moeed-k
Copy link
Contributor Author

moeed-k commented Jan 23, 2023

@moeed-k I appreciate that you looked into optimizing this function. However, can you please thoroughly explain why the code that you want to remove, can actually be removed.

Yes of course. According to my understanding, all properties in AGE can either have a NULL value, or some actual value, and the exists(property) function only serves to check whether this value is NULL (returns false) or not (returns true). I was testing around the code in the debugger and I realized that whenever the property of a function is NULL (e.g. prop1: null), it is always filtered out by line number 5464 ( PG_ARGISNULL(0) ). The macro returns true, and the function returns false in the next line.

The next few lines involve getting the agtype root node, checking if it is a scalar, getting the value of the root if it is a scalar, and then finally accessing the value type to check if it is NULL or not. However, since all NULL values were already filtered out at line 5464, this last check is never actually true (shown below).

        if (agtv_value->type == AGTV_NULL)
            PG_RETURN_BOOL(false);

In all my testing, I was unable to make this condition true. What this does mean, however, is that whenever there is a non-null scalar value passed to exists(), it goes through the trouble of making all those redundant checks to verify the scalar's value type.

The regression tests didn't flag any errors, and my own testing was also yielding the same output as before.

@jrgemignani
Copy link
Contributor

jrgemignani commented Jan 23, 2023

@moeed-k I should point out that one should be very careful about using existing regression tests as proof that a change doesn't produce an issue or an error. There have been a few cases where code was accidentally removed for a particular functionality that wasn't covered with an existing regression test. Later on, users found those functionalities to be missing and created issues for them.

As an AGTYPE value can be AGTV_NULL, which is a NULL, we can't ignore that possibility from occurring. There are other ways of generating AGTV_NULL besides, necessarily, user input.

@moeed-k
Copy link
Contributor Author

moeed-k commented Jan 23, 2023

@jrgemignani Thank you for the tip about regression tests. I'll keep that in mind going forward.

Perhaps there is something I'm missing here, but if we consider a graph with the following nodes:

{"id": 844424930131980, "label": "person", "properties": {"name": "A", "eyes": null}}::vertex
{"id": 844424930132014, "label": "person", "properties": {"name": "B"}}::vertex

And then the following is query is run:

    MATCH (p:person)
    WHERE NOT exists(p.eyes)
    RETURN p

Both nodes are returned back from line 5464 ( PG_ARGISNULL(0) ) . For person B it is obvious since the eyes property doesn't exist, but for person A the property exists and is stored as a null value. Doesn't this imply that it is returned early despite having an AGTV_NULL type?

@jrgemignani
Copy link
Contributor

@moeed-k hmmmm,... I see your point. If we can show that a property will only ever be an standard object (AGTV_OBJECT), not anything else, then, yes, that code can be removed.

@jrgemignani
Copy link
Contributor

jrgemignani commented Jan 24, 2023

@moeed-k Another issue is that an object appears to always be created, empty if nothing is passed. So, we probably need to re-think how the exists function should work. Thoughts? Is an empty object NULL?

Datum _agtype_build_vertex(PG_FUNCTION_ARGS)
{
...
    //if the properties object is null, push an empty object
    if (fcinfo->args[2].isnull)
    {
        result.res = push_agtype_value(&result.parse_state, WAGT_BEGIN_OBJECT,
                                       NULL);
        result.res = push_agtype_value(&result.parse_state, WAGT_END_OBJECT,
                                       NULL);
    }

Datum _agtype_build_edge(PG_FUNCTION_ARGS)
{
...
    /* if the properties object is null, push an empty object */
    if (fcinfo->args[4].isnull)
    {
        result.res = push_agtype_value(&result.parse_state, WAGT_BEGIN_OBJECT,
                                       NULL);
        result.res = push_agtype_value(&result.parse_state, WAGT_END_OBJECT,
                                       NULL);
    }

@moeed-k
Copy link
Contributor Author

moeed-k commented Jan 24, 2023

@jrgemignani I went through the code and here's what I found.

Firstly, properties only ever up show up as AGTV_OBJECT types. The following code is always triggered (line 2913):

/* if we were passed an agtype_value OBJECT */
    else if (map_value != NULL && map_value->type == AGTV_OBJECT)
    {
        map_value = get_agtype_value_object_value(map_value, key, key_len);
    }

Second, the returned map_value is:

  1. NULL if the property doesn't exist at all.
  2. AGTV_NULL if the property exists but the value is stored as a NULL.

I believe line 3461 is the code that makes the difference in the exists function

        /* for NULL values return NULL */
        if (object_value == NULL || object_value->type == AGTV_NULL)
        {
            PG_RETURN_NULL();
        }

We can see from here that regardless of whether the property exists or not (Case 1), or if it exists as an AGTV_NULL (Case 2), the result is returned as a NULL. Thus the age_exists function receives NULL for both cases, and performs the same behavior for both cases.

As such, checking for the AGTV_NULL value inside the age_exists function is unnecessary - it was already checked earlier.

As for your question, I'm not entirely sure, but I am inclined to believe that an empty object should be NULL. At least semantically, that should be true since properties that are empty are non-existent i.e NULL. Properties can have length 0, but that is not the same as being empty. Here's something I found on the Neo4j docs:
https://neo4j.com/developer/kb/understanding-non-existent-properties-and-null-values/

@jrgemignani
Copy link
Contributor

@moeed-k Then I believe that you are correct and we can eliminate that code.

@moeed-k
Copy link
Contributor Author

moeed-k commented Jan 25, 2023

@moeed-k Then I believe that you are correct and we can eliminate that code.

That's good to hear.

@jrgemignani jrgemignani merged commit 2a233e3 into apache:master Jan 25, 2023
jrgemignani pushed a commit that referenced this pull request Feb 14, 2023
-age_exists, which is the executor for exists(property) function, was making up to 3 redundant memory accesses.
-exists(property) predicate function only serves to check if a property exists or not.
-In Cypher, if a property is assigned the value of NULL, this is considered the same as the property not existing at all. Thus the function calls to get the value type is not needed as the check for the null argument itself filters out all NULL inputs. If a property passes this check, it implies existence.
jrgemignani pushed a commit that referenced this pull request Feb 15, 2023
-age_exists, which is the executor for exists(property) function, was making up to 3 redundant memory accesses.
-exists(property) predicate function only serves to check if a property exists or not.
-In Cypher, if a property is assigned the value of NULL, this is considered the same as the property not existing at all. Thus the function calls to get the value type is not needed as the check for the null argument itself filters out all NULL inputs. If a property passes this check, it implies existence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants