Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading has_many :through association in a before_save callback creates duplicate records #54176

Closed
jenshenny opened this issue Jan 9, 2025 · 8 comments · Fixed by #54238
Closed

Comments

@jenshenny
Copy link
Contributor

jenshenny commented Jan 9, 2025

Steps to reproduce

# typed: true
# frozen_string_literal: true
require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  gem "rails", github: "rails/rails", branch: "main"
  
  gem "sqlite3"
end
require "active_record/railtie"
require "minitest/autorun"

# This connection will do for database-independent bug reports.
ENV["DATABASE_URL"] = "sqlite3::memory:"

class TestApp < Rails::Application
  config.load_defaults(Rails::VERSION::STRING.to_f)
  config.eager_load = false
  config.logger = Logger.new($stdout)
  config.secret_key_base = "secret_key_base"
end

Rails.application.initialize!

ActiveRecord::Schema.define do
  create_table "appointments", force: true do |t|
    t.integer  "doctor_id"
    t.integer  "patient_id"
  end

  create_table "doctors", force: true do |t|
    t.string   "name"
  end

  create_table "patients", force: true do |t|
    t.string   "name"
  end
end

class Doctor < ActiveRecord::Base
  has_many :appointments
  has_many :patients, through: :appointments

  before_save do
    appointments.map(&:patient) + patients
  end
end

class Patient < ActiveRecord::Base
  has_many :appointments
  has_many :doctors, through: :appointments
end

class Appointment < ActiveRecord::Base
  belongs_to :doctor
  belongs_to :patient
end

class BugTest < ActiveSupport::TestCase
  def test_association_stuff
    doctor = Doctor.new(name: "Dr Dolittle")
    appointment = doctor.appointments.build
    patient = appointment.build_patient(name: "John")

    doctor.save!

    assert_equal(1, doctor.appointments.size)
  end
end

Expected behavior

1 appointment is created.

Actual behavior

2 appointments are created. Without the before_save, 1 appointment is being created.

After a bit of digging, this is what I believe is happening:

  1. [Fix #33155] Set through target for new records #51114 fixed an issue where through association targets are not always properly set. AutosaveAssociation#save_collection_association calls ActiveRecord::AutosaveAssociation#associated_records_to_validate_or_save to get the associated records to save, which gets them via the target method on the association. So, previously, the target was not set and no records were returned, and thus no records were saved. But now, the target is set properly and the method does return a record (the patient) to save, which leads to another appointment being created.
  2. With the before_save, it sets the has many through association. Without the before save doing this, AutosaveAssociation#save_collection_association would be a no-op for the has many through association between Doctor and Patient. This is because association_instance_get(reflection.name) would return nil and ActiveRecord::AutosaveAssociation#associated_records_to_validate_or_save would never be run.

cc: @joshuay03

@VitaliiShevchenko
Copy link
Contributor

Hi, @jenshenny. Supposedly, ActiveRecord don't understand which records need to create and therefore make two identical records. Maybe code below resolve your problem.

(appointments.map(&:patient) + patients).uniq

or even it will be better

self.patients = (appointments.map(&:patient) + patients).uniq

@joshuay03
Copy link
Contributor

Thanks @jenshenny, autosave is something I didn't consider. I'm pretty sure this is the patch we want: main...joshuay03:rails:fix-54176

I'll add some coverage and get a PR up soon.

@adrianna-chang-shopify
Copy link
Contributor

Hey @joshuay03 ! Thanks for looking at this so quickly. I was pairing with @jenshenny to investigate this behaviour. One thing we were debating while investigating was how the ordering of building the parent / children vs the join record impacts a potential duplicated join record. The example Jenny provided in the repro is definitely solved by your patch:

class BugTest < ActiveSupport::TestCase
  def test_association_stuff
    # Doctor as parent, build the join record off of the parent, then build the associated patient off the join record
    # With patch, we persist the appointment as part of saving doctor, and no longer save an additional join record
    # when saving the patient
    doctor = Doctor.new(name: "Dr Dolittle")
    appointment = doctor.appointments.build
    patient = appointment.build_patient(name: "John")

    doctor.save!

    assert_equal(1, doctor.appointments.size)
  end

✅ Passes

However, when the patient is built first, and then the join record as a child of patient, we still see two appointments:

  def test_association_stuff_2
    doctor = Doctor.new(name: "Dr Dolittle")
    patient = doctor.patients.build(name: "John")
    appointment = patient.appointments.build(doctor: doctor)

    doctor.save!

    assert_equal(1, doctor.reload.appointments.size)
  end

  1) Failure:
BugTest#test_association_stuff_2 [scripts/save_collection_association_bug.rb:170]:
Expected: 1
  Actual: 2

❌ Fails

If we build the appointment with doctor as the parent, we see one:

  def test_association_stuff_3
    doctor = Doctor.new(name: "Dr Dolittle")
    patient = doctor.patients.build(name: "John")
    appointment = doctor.appointments.build(patient: patient)

    doctor.save!

    assert_equal(1, doctor.appointments.size)
  end

Passes ✅

So scenarios 1 + 3 are solved by the patch because the join record was built with doctor, patient's parent, as its parent, so we avoid saving patient's appointment-collection because we expect it to be autosaved by doctor (if I'm tracing through the logic correctly 😅 ).

Does the behaviour in scenario 2 seem correct, or would we expect it to behave the same way as the other two examples? 🤔

@joshuay03
Copy link
Contributor

joshuay03 commented Jan 11, 2025

So scenarios 1 + 3 are solved by the patch because the join record was built with doctor, patient's parent, as its parent, so we avoid saving patient's appointment-collection because we expect it to be autosaved by doctor (if I'm tracing through the logic correctly 😅 ).

It seems that scenario 1 is the only one that passes on 7.2 and 8.0, but fails on main i.e. is a regression. 2 and 3 also fail on those releases, but 3 just so happens to be addressed by my patch. I think they're all related regardless, and it makes sense to address them all together where possible. Just thought it was worth mentioning.

Does the behaviour in scenario 2 seem correct, or would we expect it to behave the same way as the other two examples? 🤔

Hmm I'm not sure tbh. Having a look.

@joshuay03
Copy link
Contributor

joshuay03 commented Jan 13, 2025

Just realised I didn't answer your question in my previous reply.

So scenarios 1 + 3 are solved by the patch because the join record was built with doctor, patient's parent, as its parent, so we avoid saving patient's appointment-collection because we expect it to be autosaved by doctor (if I'm tracing through the logic correctly 😅 ).

Not really. So doctor will autosave both appointment and patient. And appointment will autosave patient. The patch ensures that we leave the responsibility of the patient autosaving to appointment i.e. its direct parent by only autosaving new/changed records if the association is a through association.

The behaviour of doctor autosaving patient is the new behaviour that caused the regression. Before my original patches (all of which I'll link below for reference), the patient wouldn't have been set on doctor when built on appointment, and would therefore only be saved by appointment.

@joshuay03
Copy link
Contributor

joshuay03 commented Jan 13, 2025

So scenarios 1 + 3 are solved by the patch because the join record was built with doctor, patient's parent, as its parent, so we avoid saving patient's appointment-collection because we expect it to be autosaved by doctor (if I'm tracing through the logic correctly 😅 ).

It seems that scenario 1 is the only one that passes on 7.2 and 8.0, but fails on main i.e. is a regression. 2 and 3 also fail on those releases, but 3 just so happens to be addressed by my patch. I think they're all related regardless, and it makes sense to address them all together where possible. Just thought it was worth mentioning.

Does the behaviour in scenario 2 seem correct, or would we expect it to behave the same way as the other two examples? 🤔

Hmm I'm not sure tbh. Having a look.

I think it makes sense that scenario 3 also now passes with this patch. There we explicitly build both patient and appointment on doctor, so we'd expect that only one join record is created as the end state is similar to scenario 1.

I think we could argue that scenario 2 is slightly different, and much trickier to 'fix' if we wanted to. Fixing that would mean that we need to traverse the opposite direction on the join record to associate the appointment with the doctor, which is not easy.

The docs for has_many :through do specify that the join record will be created if not directly set, but it doesn't say what happens if it's set on the inverse record. Considering this case also fails on 7.2 and 8.0, I want to say that it could be considered out of the scope of this issue?

I'll aim to send a PR for the patch this week.

@adrianna-chang-shopify
Copy link
Contributor

Hey @joshuay03 ! Thanks for the detailed reply, you're great ❤️

Not really. So doctor will autosave both appointment and patient. And appointment will autosave patient. The patch ensures that we leave the responsibility of the patient autosaving to appointment i.e. its direct parent by only autosaving new/changed records if the association is a through association.

Aha, okay got it -- that makes sense. The extra details there (which might be apparent, but this is what was tripping me up before so I'll elaborate in case it helps others 😅 ) -- as a part of autosaving patient via saving doctor, we additionally insert a through record (appointment) because we identify that the doctor-patient association is a has-many-through. This is what was leading to the duplicate join record.

I'm on board with marking scenario 2 as out of scope -- as you've expressed, having patient built on doctor and appointment built on patient makes it difficult to avoid creating duplicate join records. We save both doctor and patient (and their autosaves) independently: saving doctor autosaves patient, which creates join record, and then patient autosaves appointment respectively. At the very least, it's an issue separate from the regression your patch introduced.

If you could get a patch up ASAP that would be great, this would unblock us being able to upgrade to Rails main at Shopify. Thank you for your work!

@joshuay03
Copy link
Contributor

If you could get a patch up ASAP that would be great, this would unblock us being able to upgrade to Rails main at Shopify. Thank you for your work!

Oof my bad 😅 I've got a PR up: #54238

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants