Track Task callers in process dictionary #7995

josevalim · 2018-07-26T21:50:47Z

Elixir tasks makes it really straight-forward for developers to execute chunks of code concurrently. This is usually done with Task.async/1:

Task.async(fn ->
  ...
end) |> Task.await()

However, in practice, developers should prefer to use Task.Supervisor.async, as the supervision tree gives more visibility over spawned tasks:

Task.Supervisor.async(MySup, fn ->
  ...
end) |> Task.await()

Unfortunately, moving to Task.Supervisor has one big downside: we lose the relationship between the process who started the task and the task itself. That's because in the first example, the parent of the task is the caller process, but in the second example, the parent is the supervisor.

This means instrumentation/monitoring/logging tools and even testing tools like Mox and the Ecto's SQL Sandbox are incapable of detecting the relationship between them.

I propose that we add a new field to the Task process dictionary, called $callers, that tracks exactly the caller of the task. This will allow tools to properly build the relationship between processes and allow testing tools that rely on ownership mechanisms to propagate those effectively without requiring tests to run synchronously.

The text was updated successfully, but these errors were encountered:

josevalim · 2018-07-26T21:51:05Z

@fishcakez I would love your opinion on this one.

lackac · 2018-07-26T22:10:59Z

Wouldn't there be only a single caller for any process? In that case $caller might be a better dictionary key. Apologies if I'm missing something or if this is bike-shedding territory. :)

AndrewDryga · 2018-07-26T22:11:46Z

I can add our use case in logging that I feel can be improved.. In order for this to work for us we would need to implement Process.get(process, key, default). The use case is following - we would like all tasks spawned from a single HTTP request to include request_id in their logs. For doing so we capture metadata (via Logger.metadata/0) and set it inside task (via Logger.metadata(metadata). We might cheat and take $callers from spawned task to figure out what caller request_id is, but currently, that's not possible.

Other way how we use process dictionary is for concurrent testing with mocs. Our HTTP clients read endpoint URL which is replaced with bypass endpoints in some tests, doing so we don't rely on global application state (eg. Application environmnet) and can run those tests concurrently.

Both of them can be solved if we would start propagating some process dictionary keys automatically to spawned processes, but I'm not sure this is a good idea because of overhead and somewhat ambiguous behavior (you still can spawn process by calling Erlang function).

josevalim · 2018-07-27T07:13:08Z

@lackac excellent question. We may have a process that starts a task that starts another task. We would like to track this all the way, hence $callers. The dollar sign is to mirror the current $ancestors. Note that we cannot use $caller to get the direct caller and access the direct caller to get its direct caller because in some cases, such as in Task.start and in Task.async_nolink, the direct caller may be dead, which means this information would be lost.

fertapric · 2018-07-27T07:58:15Z

I have created this example to illustrate what would (I think) be the content of the $callers key (hope it helps):

self() 
#=(1)> #PID<0.2.0>

Process.get(:"$ancestors")  
#=(2)> [#PID<0.1.0>]

Process.get(:"$callers")  
#=(3)> [#PID<0.1.0>]

Task.async(fn ->
  self() 
  #=(4)> #PID<0.3.0>

  Process.get(:"$ancestors")  
  #=(5)> [#PID<0.2.0>, PID<0.1.0>]

  Process.get(:"$callers")  
  #=(6)> [#PID<0.2.0>, PID<0.1.0>]

  Task.Supervisor.async(MySup, fn ->
    self() 
    #=(7)> #PID<0.4.0>

    Process.get(:"$callers")  
    #=(8)> [#PID<0.3.0>, #PID<0.2.0>, PID<0.1.0>]

    Task.async(fn ->
      self() 
      #=(9)> #PID<0.5.0>

      Process.get(:"$callers")  
      #=(10)> [#PID<0.4.0>, #PID<0.3.0>, #PID<0.2.0>, PID<0.1.0>]

      Task.Supervisor.async(MySup, fn ->
        self() 
        #=(11)> #PID<0.6.0>

        Process.get(:"$callers")  
        #=(12)> [#PID<0.5.0>, #PID<0.4.0>, #PID<0.3.0>, #PID<0.2.0>, PID<0.1.0>]
      end)
    end)
  end)
end)

I have some doubts regarding (3), because I'm not sure if the $callers key will be populated at that level. Maybe the first $callers with results would be (6)?

josevalim · 2018-07-27T09:25:59Z

(3) should work but at the top level one likely won't UNLESS it was started as Task.

michalmuskala · 2018-07-27T13:54:26Z

I like this idea. I think the question is - should this integrate with logger metadata?

josevalim · 2018-07-27T14:18:17Z

@michalmuskala for now I would say no, or at least discuss it elsewhere, but we at least know this makes it possible.

fishcakez · 2018-07-27T14:35:36Z

Is it possible to have a memory leak with $callers? It has a subtle difference with $ancestors because if an ancestor exits then the descendants must exit according to OTP. However these are tasks so Task.async would leak $ancestors in the same case right now, as a Task is not aware of its parent / doesn't have a receive loop.

Note that $ atoms in OTP use the $ sign to indicate that the atom itself and any associated term is for internal use only.

josevalim · 2018-07-27T15:47:45Z

@fishcakez very good point. I see two options:

Go ahead anyway since this issue exists for ancestors too
Only track $caller and make sure we reset ancestors to the most immediate parent process unless running inside a supervisor

I would note that proc_lib has the same memory leak, so probably not an issue?

fishcakez · 2018-08-15T13:18:12Z

Only track $caller and make sure we reset ancestors to the most immediate parent process unless running inside a supervisor

I think this approach won't work as well because we need to traverse up the callers stack to the first pid matching. It's possible that first or last caller may not be the caller we want to associate with.

For example if an async task performs the ownership checkout in Ecto's SQL sandbox, we don't want to get the sandbox for its parent.

Go ahead anyway since this issue exists for ancestors too

So lets go for this one.

sheharyarn · 2018-10-13T03:16:28Z

Is someone working on this? I would like to take a jab at it if you can point me in the right direction!

josevalim · 2018-10-13T07:05:37Z

@sheharyarn no, please go ahead!

sheharyarn · 2018-10-13T07:29:45Z

@josevalim Great! Any pointers/guidance on how and where I should get started?

GregMefford · 2018-11-27T15:17:03Z

@sheharyarn Without having looked at the code, I believe the way this would work is that you'd:

find the place in the code where Task.async and Task.Supervisor.Async get to the point of spawning a new process.
Right before that happens, grab the calling process's $callers variable from its process dictionary using Process.get
Prepend self() to the list
In the new spawned Task process, use Process.set to set the $callers variable in the child process.

josevalim · 2018-12-11T13:47:25Z

I am working on this!

This allows us to track which process invoked the Task besides its ancestors in the supervision tree. Closes #7995

josevalim · 2018-12-11T15:07:03Z

Closing in favor of #7995.

This allows us to track which process invoked the Task besides its ancestors in the supervision tree. Closes #7995

josevalim added Note:Discussion Kind:Enhancement App:Elixir Level:Intermediate labels Jul 26, 2018

josevalim removed the Note:Discussion label Oct 3, 2018

josevalim added this to the v1.8.0 milestone Oct 3, 2018

josevalim mentioned this issue Nov 26, 2018

Removing caller_pid from log entries break some integrations elixir-ecto/ecto#2843

Closed

josevalim pushed a commit that referenced this issue Dec 11, 2018

Add $callers tracking to Task

1889b1b

This allows us to track which process invoked the Task besides its ancestors in the supervision tree. Closes #7995

josevalim mentioned this issue Dec 11, 2018

Add $callers tracking to Task #8510

Merged

josevalim closed this as completed Dec 11, 2018

josevalim added a commit that referenced this issue Dec 11, 2018

Add $callers tracking to Task (#8510)

95934e5

This allows us to track which process invoked the Task besides its ancestors in the supervision tree. Closes #7995

danielspofford mentioned this issue Jan 8, 2019

Update devices when tags are updated. nerves-hub/nerves_hub_web#325

Merged

garthk mentioned this issue Apr 9, 2019

Reinstate tracing parallel preloads spandex-project/spandex_ecto#12

Closed

This was referenced May 25, 2021

removed ecto/ecto#2

Closed

Does tracing(telemetry) the execution of parallel preloads work now? elixir-ecto/ecto#3644

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track Task callers in process dictionary #7995

Track Task callers in process dictionary #7995

josevalim commented Jul 26, 2018

josevalim commented Jul 26, 2018

lackac commented Jul 26, 2018 •

edited

Loading

AndrewDryga commented Jul 26, 2018 •

edited

Loading

josevalim commented Jul 27, 2018

fertapric commented Jul 27, 2018 •

edited

Loading

josevalim commented Jul 27, 2018

michalmuskala commented Jul 27, 2018

josevalim commented Jul 27, 2018

fishcakez commented Jul 27, 2018

josevalim commented Jul 27, 2018

fishcakez commented Aug 15, 2018

sheharyarn commented Oct 13, 2018

josevalim commented Oct 13, 2018

sheharyarn commented Oct 13, 2018

GregMefford commented Nov 27, 2018

josevalim commented Dec 11, 2018

josevalim commented Dec 11, 2018

Track Task callers in process dictionary #7995

Track Task callers in process dictionary #7995

Comments

josevalim commented Jul 26, 2018

josevalim commented Jul 26, 2018

lackac commented Jul 26, 2018 • edited Loading

AndrewDryga commented Jul 26, 2018 • edited Loading

josevalim commented Jul 27, 2018

fertapric commented Jul 27, 2018 • edited Loading

josevalim commented Jul 27, 2018

michalmuskala commented Jul 27, 2018

josevalim commented Jul 27, 2018

fishcakez commented Jul 27, 2018

josevalim commented Jul 27, 2018

fishcakez commented Aug 15, 2018

sheharyarn commented Oct 13, 2018

josevalim commented Oct 13, 2018

sheharyarn commented Oct 13, 2018

GregMefford commented Nov 27, 2018

josevalim commented Dec 11, 2018

josevalim commented Dec 11, 2018

lackac commented Jul 26, 2018 •

edited

Loading

AndrewDryga commented Jul 26, 2018 •

edited

Loading

fertapric commented Jul 27, 2018 •

edited

Loading