Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add host and disk aliases to respective Worker dicts #2670

Merged
merged 4 commits into from
May 9, 2019

Conversation

pentschev
Copy link
Member

@mrocklin as per our discussion in rapidsai/dask-cuda#35

@pentschev
Copy link
Member Author

Travis errors are not related to this PR.

Dictionary mapping keys to actual values stored on disk, always empty
if memory_limit is not defined and neither memory_target_fraction
nor memory_spill_fraction are defined, implying spilling host memory
to disk is disabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that we still keep these on the worker.data object, as we do in dask-cuda.

So

worker.memory -> worker.data.memory
worker.disk -> worker.data.disk

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad. I'll fix it.

@@ -488,6 +498,9 @@ def __init__(
else:
self.data = dict()

self.host = self.data.fast.d if hasattr(self.data, "fast") else self.data
self.disk = self.data.slow.d if hasattr(self.data, "slow") else dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only do this if we're using the zict/buffer solution (see the conditional just above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure that they're both defined no matter which conditional is true. Since there are several elifs, I would need to define host in all of them, which isn't very nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I realize what's your suggestion now. Well, for both this and the previous one, we need to define self.data as a custom object, maybe HostFile (analogous to DeviceHostFile). Is there a simpler way I'm not thinking of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now the else.

I think that we should only have memory and disk attributes if they're using zict. Otherwise I'm not too concerned. Only expert users will turn off zict. I'm still in favor of adding self.data.memory and self.data.disk only if we're in the branch above where we're creating the Buffer. Otherwise I wouldn't bother.

My aversion to adding .memory and .disk to the top level Worker namespace is mostly that it's already crowded and that these are useful, but probably not that useful. It's also less clear what they mean in that context. It's also not something that we can guarantee if the user provides a different mapping type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should only have memory and disk attributes if they're using zict. Otherwise I'm not too concerned. Only expert users will turn off zict. I'm still in favor of adding self.data.memory and self.data.disk only if we're in the branch above where we're creating the Buffer. Otherwise I wouldn't bother.

That's fine by me. But my previous comment is still valid, since self.data is a zict.Buffer, we can't simply add attribute host and disk, I think we would need to add a new object, something like HostFile. Or is there a simpler way of doing that you're thinking of?

@mrocklin
Copy link
Member

mrocklin commented May 8, 2019

That's fine by me. But my previous comment is still valid, since self.data is a zict.Buffer, we can't simply add attribute host and disk, I think we would need to add a new object, something like HostFile. Or is there a simpler way of doing that you're thinking of?

Actually, due to the permissivity of Python, we can

self.data.memory = ...
self.data.disk = ...

@pentschev
Copy link
Member Author

Actually, due to the permissivity of Python, we can

self.data.memory = ...
self.data.disk = ...

Ah right, I was on the wrong conda environment at first, that's why it didn't seem to work when I first attempted. Done in the latest commit.

I've kept the self.data.host naming though, this way it's consistent with that of dask-cuda, and I'd prefer to have host for both rather than memory, as it will be misleading to its intent if there's also device memory.

@mrocklin
Copy link
Member

mrocklin commented May 8, 2019

I've kept the self.data.host naming though, this way it's consistent with that of dask-cuda, and I'd prefer to have host for both rather than memory, as it will be misleading to its intent if there's also device memory.

I would prefer memory rather than host. The "host" nomenclature is only known to GPU users. This will confuse the vast majority of Dask users today. I don't think that we should optimize for CUDA users yet.

@pentschev
Copy link
Member Author

I would prefer memory rather than host. The "host" nomenclature is only known to GPU users. This will confuse the vast majority of Dask users today. I don't think that we should optimize for CUDA users yet.

I understand your point, and personally, I'd agree with that if dask-cuda didn't exist yet. However, I do think that we should try to be future-proof to some extent, and host makes more sense in a long-term scope. IMO, this is one of the purposes of documentation as well.

@mrocklin
Copy link
Member

mrocklin commented May 8, 2019

I understand your point, and personally, I'd agree with that if dask-cuda didn't exist yet. However, I do think that we should try to be future-proof to some extent, and host makes more sense in a long-term scope. IMO, this is one of the purposes of documentation as well.

I disagree. The term host is not known to non-GPU users.

@pentschev
Copy link
Member Author

I disagree. The term host is not known to non-GPU users.

I stand by my point, documentation should always be there to clarify. But then what about having host as an alias to memory? This way we would be able to be concise/explicit for dask-cuda, with the burden of maintaining one extra line of code.

@mrocklin
Copy link
Member

mrocklin commented May 8, 2019

I stand by my point, documentation should always be there to clarify. But then what about having host as an alias to memory? This way we would be able to be concise/explicit for dask-cuda, with the burden of maintaining one extra line of code.

The term host doesn't make sense from a non-GPU perspective. The term host implies that there is something else like guest, that is possibly more important.

This way we would be able to be concise/explicit for dask-cuda

Stepping into my core Dask maintainer role, I'm not interested in tailoring the Dask experience to Dask-cuda, except in making things more general.

If the Dask-Cuda project wants to monkey-patch things, that's fine, but we're not going to affect the majority of users with CUDA terminology.

@pentschev
Copy link
Member Author

No further comments from me.

@mrocklin
Copy link
Member

mrocklin commented May 8, 2019 via email

@mrocklin mrocklin merged commit ffe0838 into dask:master May 9, 2019
@mrocklin
Copy link
Member

mrocklin commented May 9, 2019

Thanks @pentschev ! This is in.

muammar added a commit to muammar/distributed that referenced this pull request Jul 18, 2019
* upstream/master:
  Add WeakSet _instances attributes to all classes (dask#2673)
  Cap worker's memory limit by the hard limit of the maximum resident memory (dask#2665)
  Switch from (ip, port) to address in tests (dask#2684)
  Catch RuntimeError to avoid serialization fail when using pytorch (dask#2619)
  Add CONTRIBUTING.md (dask#2680)
  Consolidate logic around services (dask#2679)
  Fix uri_from_host_port import in dask-mpi (dask#2683)
  Move dashboard_address logic into Scheduler/Worker (dask#2678)
  Fix pytest.config deprecation warning (dask#2677)
  Use config accessor method for "scheduler-address" (dask#2676)
  Add memory and disk aliases to Worker.data (dask#2670)
  Move interface/host/port handling from CLI to classes (dask#2667)
  Remove AioClient (dask#2668)
  Add release procedure doc (dask#2672)
  bump version to 1.28.0
@pentschev pentschev deleted the host-disk-aliases branch November 11, 2020 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants