Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the need for primary key column when embedding parents in &select #411

Closed
ruslantalpa opened this issue Dec 8, 2015 · 6 comments
Closed
Assignees

Comments

@ruslantalpa
Copy link
Contributor

(opened this issue so that i don't forget the bug, and i have a ID for the issue)

@begriffs
Copy link
Member

begriffs commented Apr 3, 2016

@ruslantalpa is this issue still relevant?

@ruslantalpa
Copy link
Contributor Author

Yes this is still an issue

http://localhost:3000/projects?select=*,client{name}

returns this

{
  "hint": null,
  "details": null,
  "code": "42703",
  "message": "column clients.id does not exist"
}

@steve-chavez
Copy link
Member

This happens only in Parent embeds because when detecting this relation PostgREST executes a different sql query(doing explicit joins) than the queries used for the Child and Many to Many embeds(they use implicit joins), a way I see to fix this is to change the Parent query to a query similar to the other ones, with implicit joins, so it's basically like this:

-- Current query generated by GET /projects?select=*,clients(id,name)
WITH pg_source AS (
  SELECT "test"."projects".*,
          row_to_json("clients_clients".*) AS "clients"
  FROM "test"."projects"
  LEFT OUTER JOIN (
    SELECT "test"."clients"."id",
           "test"."clients"."name"
    FROM "test"."clients"
  ) AS "clients_clients" ON "clients_clients"."id" = "projects"."client_id")
SELECT NULL AS total_result_set,
       pg_catalog.count(_postgrest_t) AS page_total, 
       array[]::text[] AS header,
       coalesce(array_to_json(array_agg(row_to_json(_postgrest_t))), '[]')::character varying AS BODY
FROM ( SELECT * FROM pg_source) _postgrest_t;


-- Proposed new query with implicit joins
WITH pg_source AS (
  SELECT "test"."projects".*,
          (SELECT row_to_json("clients")
           FROM (
             SELECT "test"."clients"."id",
                    "test"."clients"."name"
             FROM "test"."clients"
             WHERE "test"."clients"."id" = "test"."projects"."client_id" 
           ) "clients"
          ) AS "clients"
  FROM "test"."projects") 
SELECT NULL AS total_result_set,
       pg_catalog.count(_postgrest_t) AS page_total, 
       array[]::text[] AS header,
       coalesce(array_to_json(array_agg(row_to_json(_postgrest_t))), '[]')::character varying AS body
FROM ( SELECT * FROM pg_source) _postgrest_t;

They both give the same result:

[{"id":1,"name":"Windows 7","client_id":1,"clients":{"id":1,"name":"Microsoft"}},{"id":2,"name":"Windows 10","client_id":1,"clients":{"id":1,"name":"Microsoft"}},{"id":3,"name":"IOS","client_id":2,"clients":{"id":2,"name":"Apple"}},{"id":4,"name":"OSX","client_id":2,"clients":{"id":2,"name":"Appl
e"}},{"id":5,"name":"Orphan","client_id":null,"clients":null}]

But the new query can be used without the id in the querystring like in
GET projects/?select=name,clients(name), look at the sql query to see why.

The drawback of this new query is that is more expensive, when doing an EXPLAIN the costs are:

  • Current query with explicit joins: cost=146.38..146.40
  • New query with implicit joins:cost=15526.90..15526.92

However the Child and Many to Many embeds also generate high cost queries:

-- Many to many embed
GET /tasks?select=id,users{id}
cost=16838.48..16838.49

-- Child embed
GET /clients?select=*,projects{name}
cost=42793.13..42793.14

So as a first step towards solving this issue I'm gonna change the Parent embed to the new proposed query and make the tests pass, this I think would allow for a simplification of the requestToQuery function and then I'll see how all the queries can be accommodated to use explicit joins.

@steve-chavez steve-chavez self-assigned this Sep 13, 2017
@ruslantalpa
Copy link
Contributor Author

ruslantalpa commented Sep 13, 2017

i am completley against that.
having to request and aditional filed is a minor inconvinience.
a drop in performance by 100 times (2 orders of magnitude) is a big problem.

the direction should be how do we make child embeds faster, not the other way arround.

it would be mich better to make it so that if the user requests a parent without an id, we automatically add the id to the select so that we do not generate a bad query, and when the user wonders why did i get the id, the docs would say that it's necessary for the query.
a hike in cost from 150 to 15000 is unacceptable

just because some cases of the embeds have worse performance, does nto amke it acceptable to drag down the other cases when it's fast

@steve-chavez
Copy link
Member

I mentioned that the complete solution will have all the embedding queries doing explicit joins(lowest costs).

@ruslantalpa
Copy link
Contributor Author

but dont we have that right now with parrent embedding, and in this type of join, you need the id,
and seccond, i don't think you can do explicit joins for everything because that would be a "flat" thing, so there is no tree like structure/recursive nature to the query which you need.

I mean you are welcome to try :) and even if you are doing this for one level (up or down) think how it will work going three levels down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants