Skip to content
This repository has been archived by the owner on Apr 24, 2024. It is now read-only.

Hierarchy Implementation Proposal #793

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

temmey
Copy link
Contributor

@temmey temmey commented Aug 5, 2023

Basics

  • I added a line to /doc/CHANGELOG.md
  • The PR is rebased with current master.
  • Details of what you changed are in commit messages.
  • References to issues, e.g. close #X, are in the commit messages.
  • The buildserver is happy.

Checklist

  • I have installed and I am using pre-commit hooks
  • I fully described what my PR does in the documentation
    (not in the PR description)
  • I fixed all affected documentation
  • I fixed the introduction tour
  • I wrote migrations in a way that they are compatible with already present data
  • I fixed all affected decisions
  • I added unit tests for my code
  • I added code comments, logging, and assertions as appropriate
  • I translated all strings visible to the user
  • I mentioned every code or binary not directly written or done by me in reuse syntax
  • I created left-over issues for things that are still to be done
  • Code is conforming to our Architecture
  • Code is conforming to our Guidelines
    (exceptions are documented)
  • Code is consistent to our Design Decisions

Review

  • I've tested the code
  • I've read through the whole code
  • Examples are well chosen and understandable

@temmey temmey changed the title first version of the proposal Hierarchy Implementation Proposal Aug 5, 2023
Copy link
Contributor

@markus2330 markus2330 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I like the idea but it looks like it doesn't fulfill the requirements?

doc/decisions/database_plant_hierarchy.md Outdated Show resolved Hide resolved
Cons:

- Attribute overrides can only be done on the variety or cultivar level.
- More complex insert and update logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be hidden behind triggers? In general this sounds like a good idea, as we then don't need to keep Rust, maybe Python (E2E) and JavaScript code (scraper) in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and the view would act just like our plant table right now, we only extend it with some more properties.

@markus2330
Copy link
Contributor

I like the idea more and more. Remaining questions:

What about backend compilation time? From rust we basically only see the view, don't we?

How is the performance?

How will we do the import? Will we have several CSVs for each hierarchy level?

@temmey
Copy link
Contributor Author

temmey commented Aug 6, 2023

What about backend compilation time? From rust we basically only see the view, don't we?

With Rust, we would handle it as a table. We need to adjust the schema.rs with the schema.patch file.
See: https://deterministic.space/diesel-view-table-trick.html for an example.
The other tables would also be generated in the schema.rs.
We could remove them for now with the patch file, but in the long term, we may want to build functionality that requires these tables.

How is the performance?

😬 Well, every insert/update in the plants table would need to check up to 5 other tables with these functions.
I am not experienced enough to make a prediction about how the performance would look.
But since plants insert/update, as far as I know, is not something which would be excessively used except while importing, I don't think it will be a huge factor.

How will we do the import? Will we have several CSVs for each hierarchy level?

Since we want to replace the insert functions, our current implementation should work. It may be slower, but see my answer above.

@markus2330
Copy link
Contributor

How is the performance?

I mean for what we do at runtime: searching plants etc.

@markus2330
Copy link
Contributor

To bring it shorter to an end: I think there are only two options: either like you imagine or with everything in plants (keys point to other rows in the same table). Can you please write the pro/cons between these two in the decision?

@temmey
Copy link
Contributor Author

temmey commented Aug 14, 2023

For now, I implemented it following this hierarchy.
image
This increases complexity as some plants are cultivars, others are varieties.
Cultivars can inherit from species and varieties.
While solvable with my proposal, is this hierarchy necessary?
We currently identify cultivars only using ''. Is this correct? For example, in 'Solanum lycopersicum 'Lemon drop',' 'Lemon drop' is the cultivar.

Copy link
Member

@badnames badnames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thus far.
As usual here are a few things I noticed while reading over your changes.

@@ -34,6 +34,8 @@ See the [PSQL documentation](https://www.postgresql.org/docs/current/ddl-inherit

> Table inheritance is typically established when the child table is created, using the INHERITS clause of the CREATE TABLE statement.

> Rust Diesel isn't intended for that. To only select data from a specific table, and not include all child tables, we would need to use the `FROM ONLY` keyword, which is not implemented in Rust Diesel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Rust Diesel isn't intended for that. To only select data from a specific table, and not include all child tables, we would need to use the `FROM ONLY` keyword, which is not implemented in Rust Diesel.
> Rust Diesel isn't intended for that.
> To only select data from a specific table, and not include all child tables, we would need to use the `FROM ONLY` keyword, which is not implemented in Rust Diesel.

Sentences need to be written on separate lines according to our documentation guidelines.


[Example](example_migrations/one-table-per-taxonomy-view-functions)

It's similar to `One table for taxonomy ranks and one for concrete plants` We are extending it with a view and custom functions to reduce insert and update complexity in the backend and scraper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's similar to `One table for taxonomy ranks and one for concrete plants` We are extending it with a view and custom functions to reduce insert and update complexity in the backend and scraper.
It is similar to `One table for taxonomy ranks and one for concrete plants`.
We are extending it with a view and custom functions to reduce insert and update complexity in the backend and scraper.

@@ -85,6 +87,46 @@ Cons:
- Almost everything in the plants table needs to be nullable.
- More complex insert and update logic.

### One table per taxonomy rank and one for concrete plants. + View and custom insert/update/delete functionality
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This title is a bit confusing in my opinion.

Pros:

- Inserting new plants is easy. We only need to implement minor backend changes.
- Properties overrides can be done on every level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Properties overrides can be done on every level.
- Property overrides can be done on every level.

Only if we can't find a match, the value should be written.
- We can offset this issue by implementing insert/update functions.
Since they are going to be complicated, long-term maintainability may be an issue.
- Almost everything in the plants and parent tables needs to be nullable. (Is this a downside?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so. If everything is nullable (even if this goes against our data model) we need to perform more manual integrity checks.

-- COALESCE function accepts an unlimited number of arguments.
-- It returns the first argument that is not null.

--todo
Copy link
Member

@badnames badnames Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English everywhere, even if you just leave notes for yourself.
This would help non german speaking team members to gauge your progress more easily when reviewing draft PRs.

Copy link
Contributor

@chr-schr chr-schr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the proposal if overriding plant properties on every rank is something that we need.
Select performance would be great if we use a materialized view. Currently we only insert our plants offline and never update them, so the materialized view would never need to be refreshed.
Or are we expecting that to change in the near future? (Inserting/updating plants live)

As far as I understand, maintainability will suffer a bit. Adding a new plant property would then additionally involve:

  • Adding a new column to all plant rank tables (families, genera, species, varieties, cultivars)
  • Updating the plants_view to include the new column
  • (Maybe) updating the insert / update trigger functions to support the new column

@chr-schr chr-schr linked an issue Apr 16, 2024 that may be closed by this pull request
6 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plant Hierarchy Implementation Proposal
4 participants