Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for composite identifiers delimited by / #163

Merged
merged 1 commit into from
Aug 30, 2023

Conversation

nvasilevski
Copy link
Contributor

@nvasilevski nvasilevski commented Aug 25, 2023

This commit extends global id to allow representing models with composite identifiers. The value will be joined by /. For example:

Given a TravelRoute model with origin and destination as the compsoite primary key, the global id will be represented as:

TravelRoute.new(origin: "Ottawa", destination: "New York").to_global_id
# => gid://app/TravelRoute/Ottawa/New%20York

Context

Next version of Rails will support models with a composite primary key and globalid need to provide capabilities to present such models. One of the major use-cases is the Active Record objects serialization in Active Job jobs

Major changes

  • Default locator now enforces models to implement primary_key class method

Choosing the delimiter

We end up choosing / as a delimiter since it has the least change of conflicting with one of the values of the composite primary key. However it leads to slightly changed semantic when it comes to splitting the segments of the gid
For example a broken URI like 'gid://app/alsoapp/Person/12' used to considered invalid due to malformed app name. But now the same URI will be parsed as:

app - app name
alsoapp - model name
Person/12 - composite model id

Basically this is the only concern with the / being used as a delimiter

As an alternative we were considering _ to serve as a delimiter but since the delimiter value is going to be hardcoded we decided against it as it has a higher chance of conflicting with the actual value of the composite key

What reviewers should be focusing on

a global id like 'gid://app/Person/1/2' used to be considered invalid. After the change it will be possible to parse and initialize an URI::GID instance and 1/2 will be parsed as ['1','2'] composite key however it won't be possible to look up records using GlobalID instance created from such URI::GID

To prevent malicious actors from crafting gids with unrealistically long composite keys we have limited the maximum size of the composite key by 20
This has been done to prevent issues like GHSA-23c2-gwp5-pxw9

Integration test

Here is a script that tests changes in integration with Active Job

Expand
  # frozen_string_literal: true

  require "bundler/inline"

  gemfile(true) do
    source "https://rubygems.org"

    git_source(:github) { |repo| "https://github.com/#{repo}.git" }

    gem "rails", github: "rails/rails", branch: "main"
    gem "globalid", github: "Shopify/globalid", branch: "ac-gid-cpk"
    gem "sqlite3"
  end

  require "active_job"
  require "active_record"
  require "minitest/autorun"

  # This connection will do for database-independent bug reports.
  ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
  ActiveRecord::Base.logger = Logger.new(STDOUT)

  ActiveRecord::Schema.define do
    create_table :orders, force: true do |t|
      t.integer :shop_id
      t.string :name
    end

    create_table :events, force: true do |t|
      t.string :name
      t.text :data
    end
  end

  GlobalID.app = "demo"

  class Order < ActiveRecord::Base
    include GlobalID::Identification
    self.primary_key = [:shop_id, :id]
  end

  class Event < ActiveRecord::Base
    serialize :data, Hash
  end

  class CpkJob < ActiveJob::Base
    def perform(order:)
      Event.create(name: "order processed", data: {id_value: order.id_value, shop_id: order.shop_id, name: order.name})
    end
  end

  class CpkJobTest < ActiveJob::TestCase
    def test_stuff
      CpkJob.perform_later(order: Order.create(id: [123, 1], name: "serialized as a gid"))
      perform_enqueued_jobs

      assert_equal 1, Event.count
      event = Event.first
      assert_equal "order processed", event.name
      assert_equal({id_value: 1, shop_id: 123, name: "serialized as a gid"}, event.data)
    end
  end

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is doing way much than it should. A lot of validations and I don't know what we are trying to protect against.

Do we really need those validations?

@@ -43,19 +46,22 @@ def parse_encoded_gid(gid, options)

def initialize(gid, options = {})
@uri = gid.is_a?(URI::GID) ? gid : URI::GID.parse(gid)
validate_model_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this validation? What are we trying to protect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to prevent someone manually crafting a GID like gid://app/Person/1/2/3/4/5.... which will be parsed as [1,2,3,4,5] and passed to find() which may lead to more than one record being loaded.

As an alternative we can move the check to the locator but then custom locators won't be protected. Which may not be a bad thing

Let me know if you'd like to completely loosen the restriction or locator would be a better place to ensure we only target single object when calling find

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalID is primarily used as a serialization format. Someone will not manually be creating GIDs. If people are using GlobalID not a serialization format, then they will be responsible for checking if their input is correct.

I think this is better handled at the locator level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Moved the validation to the locator level. Instead of raising it now returns nil as this seems to be locator's interface when it comes to locating records by a malformed gid

Comment on lines 23 to 26
assert_nil GlobalID.parse('gid://app/Person/1/2')
assert_nil GlobalID.parse('gid://app/CompositePrimaryKeyModel/tenant-key-value/id-value/something_else')
assert_nil GlobalID.parse('gid://app/CompositePrimaryKeyModel/tenant-key-value/')
assert_nil GlobalID.parse('gid://app/CompositePrimaryKeyModel/tenant-key-value')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are protecting too much. All those glogal ids should be valid. If they can't be located, isn't a job of the parser to find out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't a job of the parser to find out.

I asked in the comment above but would you prefer Locator to be the one to check that identifier size matches model's primary key to avoid loading more records than necessary?

lib/global_id/global_id.rb Outdated Show resolved Hide resolved
@etiennebarrie
Copy link

What do you think about moving Global ID to use to_key instead of id? The argument in favor is that to_key already is an Array, and represents all the key attributes at once.

The negative point is that Global ID was really built to work with Active Record rather than Active Model (which defines the #to_key interface).
Active Record models are also by transitivity Active Model models, and to_key is redefined, so that part is mostly fine.
The .find interface is really Active Record's, but previously it was pretty easy to make an Active Model (or any PORO) work with Global ID by simply defining find on the class. Now the class needs to respond to primary_key as well.

I guess the class also needed to implement where(id:) to support locate_many, and now it will need to implement where with keyword args matching the return value of primary key.

Test script showing current API
# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "activemodel"
  gem "globalid"
end

require "active_model"
require "global_id"
require "minitest/autorun"

GlobalID.app = "my-app"

class Model
  include ActiveModel::Model
  attr_accessor :key
  include GlobalID::Identification
  alias_method :id, :key

  @models = {}

  def self.create(key:)
    @models[key] = new(key:)
  end

  def self.clear
    @models.clear
  end

  def self.find(key)
    if key.is_a?(Array)
      key.map { |key| find(key) }
    else
      @models.fetch(key)
    end
  end

  def self.where(id:)
    if id.is_a?(Array)
      id.flat_map { |key| where(id: key) }
    else
      Array(@models[id])
    end
  end
end

class ModelGIDTest < Minitest::Test
  def test_to_global_id
    assert_equal "gid://my-app/Model/foo", Model.new(key: "foo").to_global_id.to_s
  end

  def test_locate
    assert_raises { GlobalID::Locator.locate("gid://my-app/Model/foo") }
    model = Model.create(key: "foo")
    assert_equal model, GlobalID::Locator.locate("gid://my-app/Model/foo")
  ensure
    Model.clear
  end

  def test_locate_many
    assert_raises do
      GlobalID::Locator.locate_many(["gid://my-app/Model/foo", "gid://my-app/Model/bar"])
    end
    foo = Model.create(key: "foo")
    assert_raises do
      GlobalID::Locator.locate_many(["gid://my-app/Model/foo", "gid://my-app/Model/bar"])
    end
    assert_equal [foo], GlobalID::Locator.locate_many(["gid://my-app/Model/foo", "gid://my-app/Model/bar"], ignore_missing: true)
    bar = Model.create(key: "bar")
    assert_equal [foo, bar], GlobalID::Locator.locate_many(["gid://my-app/Model/foo", "gid://my-app/Model/bar"])
  ensure
    Model.clear
  end
end
Diff for the test script to support this branch's API
diff --git i/test_gid.rb w/test_gid.rb
index 91730c79cc..0db53a1f42 100644
--- i/test_gid.rb
+++ w/test_gid.rb
@@ -6,7 +6,7 @@
   source "https://rubygems.org"
 
   gem "activemodel"
-  gem "globalid"
+  gem "globalid", github: "https://github.com/rails/globalid/pull/163"
 end
 
 require "active_model"
@@ -39,13 +39,17 @@ def self.find(key)
     end
   end
 
-  def self.where(id:)
-    if id.is_a?(Array)
-      id.flat_map { |key| where(id: key) }
+  def self.where(key:)
+    if key.is_a?(Array)
+      key.flat_map { |key| where(key: key) }
     else
-      Array(@models[id])
+      Array(@models[key])
     end
   end
+
+  def self.primary_key
+    :key
+  end
 end
 
 class ModelGIDTest < Minitest::Test

@adrianna-chang-shopify
Copy link
Contributor

adrianna-chang-shopify commented Aug 28, 2023

Hey @etiennebarrie , great question! IIRC @nvasilevski and I discussed using #to_key early into our Global ID investigation, but ultimately decided to stick with id (and consequently #primary_key) for the reason you mentioned -- models using Global ID are already expected to conform to an Active Record interface, needing to implement #find, #where, #id, etc. Defining #primary_key / redefining #where with kwargs on an Active Model class or PORO technically gets those implementations closer to the Active Record API that Global ID is meant to expect, so that felt like a fair expectation.

That said, it's somewhat of a moot point given that, as you mentioned, Active Records inherently define #to_key so that should work just fine 😄 And it would probably clean up our code a little bit, since #to_key will always be an Array. I'm happy to push up a commit and see what we think.

EDIT: Okay, so a blocker to using #to_key is that we validate that an id passed into GlobalID::Locator matches the shape of the model's primary key:

def model_id_is_valid?(gid)
primary_key = Array(gid.model_class.primary_key)
primary_key_size = primary_key.size
Array(gid.model_id).size == primary_key_size
end

Since we don't have an instance here, we can't use to_key.

Probably best to stick with #primary_key for now, since most GID models will be Active Records anyways.

This commit extends global id to allow representing models with composite
identifiers. The value will be joined by `/`. For example:

Given a `TravelRoute` model with `origin` and `destination` as the
compsoite primary key, the global id will be represented as:

```
TravelRoute.new(origin: "Ottawa", destination: "New York").to_global_id
=> gid://app/TravelRoute/Ottawa/New%20York
```

Co-authored-by: Adrianna Chang <[email protected]>
Co-authored-by: Nikita Vasilevsky <[email protected]>
@adrianna-chang-shopify
Copy link
Contributor

cc @byroot @rafaelfranca could I get y'all to take another look at this? Thanks!

else
model_class.find(ids)
end
end

private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private

Comment on lines +169 to +172
primary_key = Array(gid.model_class.primary_key)
primary_key_size = primary_key.size

Array(gid.model_id).size == primary_key_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
primary_key = Array(gid.model_class.primary_key)
primary_key_size = primary_key.size
Array(gid.model_id).size == primary_key_size
Array(gid.model_id).size == Array(gid.model_class.primary_key).size

Comment on lines +144 to +145
ids_by_model.each do |model, ids|

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ids_by_model.each do |model, ids|
ids_by_model.each do |model, ids|

@rafaelfranca
Copy link
Member

hmm, those are small thins I can fix myself. Merging.

@ghiculescu
Copy link
Member

FYI #168

@kyrofa
Copy link

kyrofa commented Sep 4, 2023

models using Global ID are already expected to conform to an Active Record interface, needing to implement #find, #where, #id, etc. Defining #primary_key / redefining #where with kwargs on an Active Model class or PORO technically gets those implementations closer to the Active Record API that Global ID is meant to expect, so that felt like a fair expectation.

I won't debate its fairness, but that expectation isn't conveyed well. The README says "Mix GlobalID::Identification into any model with a #find(id) class method. Support is automatically included in Active Record." We didn't interpret that to mean that we needed to define #primary_key, which means this PR was a breaking change for us (#166).

Probably best to stick with #primary_key for now, since most GID models will be Active Records anyways.

That's probably true, but please don't forget about the few models that aren't coming from ActiveRecord. Documentation about the API that is required and treating that required API spec as part of the public API for globalid would be very much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants