Repo.stream no longer supports preload #54

jhonathas · 2018-12-06T21:50:06Z

In version 0.4.0 it was possible to use a load function as can be seen in the link below. But this option has been removed. I believe we will need to have the load function again. I'll explain below.
https://github.com/infinitered/elasticsearch-elixir/blob/58948d14a5806d76aa469702c39eef8643738ac4/guides/upgrading/0.4.x_to_0.5.x.md

In Ecto 3.0, support for preload along with Repo.stream has been removed as can be seen on these links:

https://elixirforum.com/t/repo-stream-with-preload-new-warning/17043

elixir-ecto/ecto@6655a9a#diff-122d0a4bbce6a65cc1523584a00193aaR138)

Without the option to preload next to the stream, there is only the option of preloading within Elasticsearch.Document, which is bad because it would have to preload every record, that is, we would lose the preload in batch.

I think that including again the load option, we could go back to doing things like:

  def load (schema, offset, limit)
    schema
    |> offset (^ offset)
    |> limit (^ limit)
    |> Repo.all ()
    |> Repo.preload ([: ad,: address,: medias])
  end

I do not know if I could explain. If you have any doubt, I can try to give you more examples.

The text was updated successfully, but these errors were encountered:

danielberkompas · 2018-12-06T23:27:53Z

Offset and limit are poor for performance. We are going to need to adopt a cursor-based solution for paging through documents.

hipertracker · 2018-12-31T04:47:50Z

Preload is a must for inner joins. How to upload data which needs data from related tables? Currently, it is very limited to flat tables.

hipertracker · 2018-12-31T05:05:30Z

To overcome Ecto 3 blocking preloads in the stream, I just replaced Repo.stream/1 with Repo.all/1 in Elasticsearch.Store. I can't see a better solution.

jhonathas · 2018-12-31T13:03:02Z

But in doing so, you will have n + 1 problem. Instead of having only 1 preload query, as it was in the stream, you will have preload queries for each row returned by .all.

jhonathas · 2018-12-31T13:05:38Z

Currently I use all/1, but I know it is not a good solution, especially when my base grows more.

hipertracker · 2018-12-31T15:18:47Z

No, I have no N+1 problem. For 3 inner joins I have 4 queries altogether.

jhonathas · 2019-02-06T00:19:21Z

With me it did not preload even me trying to use the join :(

ksherman · 2019-04-06T17:55:48Z

Also curious about solutions, I'm stuck with a MSSQL db and Tds.Ecto doesn't support stream at all 🤣

danielberkompas · 2019-04-07T02:11:47Z

@ksherman This is something I want to address in version 2.0.0. I'm going to recommend that you use a stream based on cursors. You can build one pretty easily using the paginator library.

defmodule CursorStream do
  def new(repo, query, opts \\ []) do
    Stream.resource(
      fn ->
        %{
          query: query,
          entries: [],
          cursor: nil,
          repo: repo,
          opts: opts
        }
      end,
      &next/1,
      &cleanup/1
    )
  end

  defp next(%{entries: []} = state) do
    opts = Keyword.merge(state.opts, after: state.cursor)
    page = state.repo.paginate(state.query, opts)

    case page.entries do
      [h | entries] ->
        {[h], %{state | entries: entries, cursor: page.metadata.after}}

      [] ->
        {:halt, state}
    end
  end

  defp next(%{entries: [h | t]} = state) do
    {[h], %{state | entries: t}}
  end

  defp cleanup(_state) do
    # no op
  end
end

# Usage
CursorSteam.new(MyApp.Repo, query, opts_for_paginator)

Cursors avoid the limit/offset performance problems, as well as the transaction problems of Repo.stream.

ksherman · 2019-04-11T18:24:19Z

@danielberkompas Okay, I tried this out, and I think its kinda just running the same page of products over and over? My CursorStream is what you pasted above. This is what I added to the store:

  @impl true
  def stream(schema) do
    CursorStream.new(
      Repo,
      from(s in schema, where: s."FoodItemID" > 9000 and s."FoodItemID" < 9050, preload: :brand),
      cursor_fields: [:id],
      limit: 50
    )
  end

The where on the FoodItemID was just to start with a small collection of results, this table has > 1.5M rows. There's only 9 products that match that range, the mix command just seems to index those 9 products over and over.

danielberkompas · 2019-04-15T13:53:22Z

@ksherman Are you sure that your ID field is named :id? It looks like it might be capitalized, ID?

ksherman · 2019-04-18T02:14:16Z

Hey! Sorry for the late reply, I've tried a few different options in the cursor_fields. Our id is actually :FoodItemID and using that doesn't seem to stop it from looping either.

I know offsets aren't desirable, for now this worked for me:

  def new(query, chunk_size \\ 500, offset \\ 0) do
    Stream.resource(
      fn -> 0 end,
      fn
        :stop ->
          {:halt, :stop}

        offset ->
          rows = EfAPI.Repo.all(from(query, limit: ^chunk_size, offset: ^offset))

          if Enum.count(rows) < chunk_size do
            {rows, :stop}
          else
            {rows, offset + chunk_size}
          end
      end,
      fn _ -> :ok end
    )
  end

tcrossland · 2019-10-02T12:09:55Z

To overcome Ecto 3 blocking preloads in the stream, I just replaced Repo.stream/1 with Repo.all/1 in Elasticsearch.Store. I can't see a better solution.

One option would be to perform a join query and programatically hydrate the structs, for example:

from(comment in Comment,
    join: post in assoc(comment, :post)
    select: {comment, post}
)
|> Repo.stream()
|> Stream.map(fn {comment, post} -> Map.put(comment, :post, post) end)

jhonathas · 2019-10-02T12:39:53Z

I'll test this approach. Today I'm doing preload inside the document, but it gets too slow ...
thanks

diegonogueira · 2019-10-02T20:41:59Z

To overcome Ecto 3 blocking preloads in the stream, I just replaced Repo.stream/1 with Repo.all/1 in Elasticsearch.Store. I can't see a better solution.

One option would be to perform a join query and programatically hydrate the structs, for example:
from(comment in Comment,
    join: post in assoc(comment, :post)
    select: {comment, post}
)
|> Repo.stream()
|> Stream.map(fn {comment, post} -> Map.put(comment, :post, post) end)

Thanks, @tcrossland.
The problem with this approach is when we need to use has_many associations..

seanwash · 2019-10-02T21:36:28Z

@diegonogueira

You could use Stream.chunk_every/2 and Stream.flat_map/2 to preload records in batches.

I'm doing something similar with larger CSVs that I need to build:

# ...
|> Repo.stream()
|> Stream.chunk_every(10)
|> Stream.flat_map(fn chunk ->
  Repo.preload(chunk, Feed.product_preloads())
end)
# ... Do something with your stream

This preloads records in groups of 10, and flatmap flattens the chunks back into a single stream. The performance overhead here seems to be negligible in my testing so far.

diegonogueira · 2019-10-03T14:51:15Z

Thanks, @seanwash!
It works! :)

diegonogueira · 2019-10-03T17:03:38Z

I created my stream_preload method on Repo file.

defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.Postgres

  def reload(%module{id: id}) do
    get(module, id)
  end

  def stream_preload(stream, size, preloads) do
    stream
    |> Stream.chunk_every(size)
    |> Stream.flat_map(fn chunk ->
      Repo.preload(chunk, preloads)
    end)
  end
end

And the store file:

defmodule MyApp.PostStore do
  @behaviour Elasticsearch.Store
  alias MyApp.Repo

  @impl true
  def stream(schema) do
    schema
    |> Repo.stream()
    |> Repo.stream_preload(500, :comments)
  end

  @impl true
  def transaction(fun) do
    {:ok, result} = Repo.transaction(fun, timeout: :infinity)
    result
  end
end

config.exs

...
bulk_page_size: 500,
bulk_wait_interval: 0
...

Thanks

jhonathas · 2019-10-03T17:03:52Z

Thanks, @seanwash!

It worked very well!

danielberkompas changed the title ~~Problem on use Repo.stream on store.~~ Repo.stream no longer supports preload Dec 6, 2018

jhonathas closed this as completed Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repo.stream no longer supports preload #54

Repo.stream no longer supports preload #54

jhonathas commented Dec 6, 2018

danielberkompas commented Dec 6, 2018

hipertracker commented Dec 31, 2018

hipertracker commented Dec 31, 2018

jhonathas commented Dec 31, 2018

jhonathas commented Dec 31, 2018

hipertracker commented Dec 31, 2018

jhonathas commented Feb 6, 2019

ksherman commented Apr 6, 2019

danielberkompas commented Apr 7, 2019

ksherman commented Apr 11, 2019 •

edited

Loading

danielberkompas commented Apr 15, 2019

ksherman commented Apr 18, 2019 •

edited

Loading

tcrossland commented Oct 2, 2019 •

edited

Loading

jhonathas commented Oct 2, 2019

diegonogueira commented Oct 2, 2019

seanwash commented Oct 2, 2019 •

edited

Loading

diegonogueira commented Oct 3, 2019

diegonogueira commented Oct 3, 2019

jhonathas commented Oct 3, 2019

Repo.stream no longer supports preload #54

Repo.stream no longer supports preload #54

Comments

jhonathas commented Dec 6, 2018

danielberkompas commented Dec 6, 2018

hipertracker commented Dec 31, 2018

hipertracker commented Dec 31, 2018

jhonathas commented Dec 31, 2018

jhonathas commented Dec 31, 2018

hipertracker commented Dec 31, 2018

jhonathas commented Feb 6, 2019

ksherman commented Apr 6, 2019

danielberkompas commented Apr 7, 2019

ksherman commented Apr 11, 2019 • edited Loading

danielberkompas commented Apr 15, 2019

ksherman commented Apr 18, 2019 • edited Loading

tcrossland commented Oct 2, 2019 • edited Loading

jhonathas commented Oct 2, 2019

diegonogueira commented Oct 2, 2019

seanwash commented Oct 2, 2019 • edited Loading

diegonogueira commented Oct 3, 2019

diegonogueira commented Oct 3, 2019

jhonathas commented Oct 3, 2019

ksherman commented Apr 11, 2019 •

edited

Loading

ksherman commented Apr 18, 2019 •

edited

Loading

tcrossland commented Oct 2, 2019 •

edited

Loading

seanwash commented Oct 2, 2019 •

edited

Loading