-
Notifications
You must be signed in to change notification settings - Fork 974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wgpu-core less generic (brainstorm) #5124
Comments
I think it might be a bit understated, but breaking things up (not necessarily into traits) by making some source type visible could make downcasts less error (or panic) prone. By for example, limiting legal downcasting through a trait: enum Vulkan {}
Impl ApiMarker for Vulkan {}
trait ApiMarker;
trait DowncastRef<A: ApiMarker, T> {
fn downcast_ref(&T) -> &Self;
}
impl DowncastRef<Vulkan, TextureView> for VulkanTextureView {
fn downcast_ref(resource: &TextureView) -> &Self {
// get the resource
}
}
|
Note that this could all be bundled into another Exactly how it would have to fit together is tricky. |
I agree, I meant that it's an understood part of the design space, while right now I'm not sure how feasible it is to erase all of the hal types without doubling the amount of allocations and indirection. |
FYI, I'm investigating whether this technique can be used (or be made) sound in Rust, this would mean that smart pointers could be stored internally as their hal types in wgpu-core, but when shared with types in wgpu (or elsewhere) they could be accessed through trait objects. This could potentially reduce the number of necessary downcasts, and the representation of a concrete type is more efficient and can access private methods not exported through a public trait. In effect, something like |
One of the questions to consider is at which layer of wgpu's architecture should we have the dyn traits.
I have a preference towards A because it removes the need for an extra interface layer. In a potential future where webrender uses B is the "let's not touch wgpu-hal" approach but I don't really see an advantage to it unless we don't find a solution for how to create the resource types in A. I see C mostly as a step towards A or B. |
Making I think what makes this less obvious stems from the fact that |
You are arguing at a fairly abstract and conceptual level. I had an equally abstract counter-argument but all that it shows is different view points and philosophies. Instead, let's come up with concrete (skeletons of) solutions, compare them and once we have defined our goal, we can break it down into its incremental steps. |
Feel free to shoot the abstract counter argument, but since you felt mine were too abstract, I'll at least try to rephrase them here more concretely: wgpu-core and wgpu-hal are very different in how hard they are to make object safeThis boils down to is that a method in pub fn texture_create_view<A: HalApi>(
&self,
texture_id: id::TextureId,
desc: &resource::TextureViewDescriptor,
id_in: Option<id::TextureViewId>,
) -> (id::TextureViewId, Option<resource::CreateTextureViewError>); And the corresponding unsafe fn create_texture_view(
&self,
texture: &A::Texture,
desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>; The associated types here are In comparison, wgpu-core has its own high level behavior and will most likely require its own traits anywayThe second aspect is that a wgpu-core pub struct Texture<A: HalApi> {
pub(crate) inner: Snatchable<TextureInner<A>>,
pub(crate) device: Arc<Device<A>>,
pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
pub(crate) hal_usage: hal::TextureUses,
pub(crate) format_features: wgt::TextureFormatFeatures,
pub(crate) initialization_status: RwLock<TextureInitTracker>,
pub(crate) full_range: TextureSelector,
pub(crate) info: ResourceInfo<Texture<A>>,
pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
} So at least initially, I'd expect that we have to wrap |
Thanks.
They would be replaced with references like So it goes back to the problem of how to create the resources. When Taking the example of wgpu-core's // most of wgpu-core only deals with `Arc<Texture>` for which the backend is erased.
pub type Texture = TextureResource<dyn SnatchableResource>
pub struct TextureResource<A: SnatchableResource> {
pub(crate) raw: A,
// Members below, are unchanged.
pub(crate) device: Arc<Device<A>>,
pub(crate) desc: wgt::TextureDescriptor<(), Vec<wgt::TextureFormat>>,
pub(crate) hal_usage: hal::TextureUses,
pub(crate) format_features: wgt::TextureFormatFeatures,
pub(crate) initialization_status: RwLock<TextureInitTracker>,
pub(crate) full_range: TextureSelector,
pub(crate) info: ResourceInfo<Texture<A>>,
pub(crate) clear_mode: RwLock<TextureClearMode<A>>,
pub(crate) views: Mutex<Vec<Weak<TextureView<A>>>>,
pub(crate) bind_groups: Mutex<Vec<Weak<BindGroup<A>>>>,
} The Snippet from my earlier comment shows how Snatchable Resource would work.
Lets elaborate on this, since I certainly don't want to see a rewrite of wgpu-hal. The changes I have in mind are pretty superficial and mechanical (See the quick and dirty experiment I just linked). The key missing piece is the creation of the resources.
unsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
) -> Result<Box<dyn TextureViewResource>, DeviceError>; Problem solved. It's a rather simple refactoring to do, we can easily do it incrementally. In my That's at the cost of an extra allocation and pointer chase for each resource. I'm not very satisfied but it's not completely unreasonable.
unsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
) -> Result<A::TextureView, DeviceError>; Then the generics propagate into wgpu-core. How can we contain it? Is it a separate dispatch mechanism specifically for the create functions which encapsulates a bit of wgpu-core side generic code to create the concrete
unsafe fn create_texture_view(
&self,
texture: &dyn TextureResource
desc: &TextureViewDescriptor,
output: SomethingRepresentingTheMemoryToPlaceTheTextureInto
) -> Result<(), DeviceError>; |
FWIW, changing the signature of every function and every use of |
Ok, then I think that the churn is acceptable. It mostly touches function the signatures at the API boundary, it's easy to do and can be done piecemeal. Removing generics from large parts of wgpu-core is going to be a lot more invasive. Note that these wgpu-hal changes (putting aside the |
I noticed that you're not removing the |
I just want to generally express my interest in the wgpu-hal way, assuming performance is acceptable. I didn't realize that was even possible, let alone low lift enough to not be a major problem. |
Ideally these would go away, it depends on how the I'm going to keep throwing ideas without trying to filter out the bad ones. Here's my latest thought: If we care about the What's nice about this is that if it turns out that the extra allocation does not matter, rolling back to just returning a normal |
I think this is better thought of the other way around. If the simple solution (just box the thing) turns out to be bad for performance, we can figure out a better way without breaking too much. |
Isn't it necessary for them to go away to achieve the stated goals? Otherwise I don't see how we're reducing code generation which is what would contribute to reduced binary sizes and reduced compile times. Most of the duplicate monomorphization would be in the wgpu-core to wgpu-hal glue where each type and method that has a |
To clarify, here's a summary of what produces the most LLVM right now on my system configuration when I build wgpu in release mode. We can see that the more heavily dominating sources are wgpu-core, where there's 3 copies of most methods taking |
Kind of yes, but the object-safe and non-object-safe parts of hal's API could be split into two traits. Here is an example which is a direct continuation of the gist from my first comment: In wgpu-hal the device API is split into two traits (I gave them terrible names so don't think about them too much): /// Object safe trait.
pub trait DeviceApi {
fn buffer_map(&self, buffer: &dyn Resource) -> Result<(), Error>;
// etc.
}
/// Parts of the API that we couldn't make object-safe (mostly the create_* functions) are split into this "Factory" trait.
pub trait HalDeviceFactory {
type Texture: Resource;
fn create_hal_texture(&self, desc: &TextureDescriptor) -> Result<Self::Texture, Error>;
// etc.
} In wgpu-core, an object-safe trait creates the type-erased objects we want. The implementation knows about the concrete hal types so it can place them in the core resource struct, but it doesn't have to do anything else so the generics are contained into a small part of the code: /// The object-safe equivalent to `HalDeviceFactory`
pub trait CoreDeviceFactory {
fn create_texture(&self, desc: &hal::TextureDescriptor) -> Result<Arc<Texture>, Error>;
}
// A blanket impl is enough to do what we need:
impl<T> CoreDeviceFactory for T where T: hal::HalDeviceFactory {
fn create_texture(&self, desc: &hal::TextureDescriptor, label: String, etc: Stuff) -> Result<Arc<Texture>, Error> {
let typed_texture = Arc::new(TextureResource {
label,
raw: Snatchable::new(self.create_hal_texture(desc)?),
});
Ok(typed_texture)
}
} Updated gist: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3/999fb470cc2037ad6340f4bbc1a2fe666ab1e1e0 |
Ok, this is pretty much what I want to do. To me that is essentially what doing "wgpu-core first" means, because it doesn't actually require changing anything in wgpu-hal. Once this is done, we'd have capitalized on most of the available gains, and can revisit what actually needs to be fixed in wgpu-hal to further reduce duplicate code generation. |
Yes, that's a big part of my motivation to put the dynamic dispatch closer to hal. In an ideal world wgpu-core has (almost) no code generic on the backend and close to none of its generated code is duplicated due the the existence of multiple backends. |
I'll leave you to fiddle a bit more, once you have a complete example I can look at it further if you want me to. The tool I've used to test changes so far is cargo llvm-lines. |
Here is a rough plan:
|
The plan did not survive first contact with the text editor. The problem is that because the existing (non-object-safe) traits in wgpu-hal are generic (for example: // Error: Unconstrained parameter A.
impl <A, T: Device<A>> DeviceFactory for T {
// ...
} That said, if we wrap the |
Actually, I'm not sure a wrapper would help at all. Another approach is to take the generic parameter out of the hal traits and turn them into associated types: pub trait Device: WasmNotSendSync {
type: A: Api;
// If only the following syntax worked, unfortunately we get an ambiguous associated type error.
// unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<Self::A::Buffer, Error>;
// This works but is an ergonomics nightmare.
unsafe fn create_buffer(&self, desc: &BufferDescriptor) -> Result<<Self::A as Api>::Buffer, Error>;
}
impl<T: Device> DeviceFactory for T {
unsafe fn dyn_create_buffer(&self, desc: &BufferDescriptor) -> Result<Box<dyn BufferResource>, Error> {
}
} That's a huge bummer, because even if the end state is that the |
I think that if this is the wall that's holding us back, we should just go ahead. A bit of temporary ugly code is a fair trade for our end goal. @nical do you have a branch where you were working on this? I understand you've been a bit busy, so I thought I could take a crack at it. |
@lylythechosenone the little I have on branches is probably easier to re-write than rebase. I had started experimenting on this one and this one but these are more exploratory than actual implementations. I'm glad that you are motivated to pick this up, since I can't dedicate time to this in the foreseeable future. I thought I had some notes about this but unfortunately I'm either misremembering or I lost them so I'll try to page the plan back in my head: It is very important to do this incrementally. It's going to be a difficult change to land, it will touch most of wgpu-core and will conflict with most PRs happening there so landing changes often and in small chunks is key. I encourage you to experiment on a throwaway branch and re-implement cleaner/isolated commits once you have identified what sticks. Probably a good first thing to try is what I suggested in my previous comment about moving the generic parameters to associated types. I was worried that it would make things messy but Connor later told me it'd probably be fine. There will probably be parts of the API that will need to be split off into traits implemented in wgpu-core, for example creating the resources needs to know about the concrete type (for example Don't hesitate to reach out with questions or to bounce ideas, and thanks again for looking into this. |
I am happy to report that converting all |
Alright, I have done a bit of work on the dynamic traits. They are mostly implemented, however a few functions are unimplemented due to lack of dynamic forms for:
Currently, these structs all follow a very similar pattern: struct Foo<'a, A: Api> {
thing: &'a A::Thing,
// some metadata
bar: Bar,
baz: Baz,
} We could make dynamic forms of them in one of two ways. Firstly, we could duplicate them and create specific dynamic forms, e.g.: struct Foo<'a> {
thing: &'a dyn Thing,
// some metadata
bar: Bar,
baz: Baz,
} Alternatively, we could adapt the existing ones like so: struct Foo<'a, T: Thing + ?Sized> {
thing: &'a T,
// some metadata
bar: Bar,
baz: Baz,
} The first allows us to not make any more breaking changes. However, the second could be more intuitive. Please tell me your thoughts, and let me know if you think of a third option. |
Per more Matrix chat: Turns out the conversion from some untyped to typed structures can be fairly heavy: E.g. |
This is also true of the |
A |
Acceleration structure-related downcasts cannot be done currently. They take a reference to their entries. I may have to do some |
Alright, that's everything that needs to be done in wgpu-hal. I'm going to need a lot of help with core, because I've never touched the codebase before. I'm also not entirely sure what the structure should be. I know we need some "factory" traits for creation of objects, and these should be dynamic. They should create core resources. However, that's about where my knowledge ends. @nical it'd be helpful if you could fill me in on whatever more you know, if you do. |
I did some looking around, and I think I get the basic idea now. I'd still love a more thorough explanation. |
What I'm currently thinking (in order of implementation steps):
This will also require wrapper traits for all of the hal traits, although I'm not sure where that should fit in. These wrappers will build off of the dynamic versions, but re-implement We will also need a struct Snatchable<T: ?Sized> {
snatched: UnsafeCell<bool>,
value: UnsafeCell<ManuallyDrop<T>>,
} where the
I'm gonna start with step 1 (as most do), and see how far I get. I'll try to make this as small of a change as possible in terms of breaking things, especially outside of core. |
sounds pretty good to me!
So that would be along the lines of @nical 's third suggestion in the original post? (
given that each implementation is implemented with a value of the backend enum we could consider using that? 🤔 |
The only issue with this is: who decides which number means what?
What about unfilled slots? A I think a hash-lookup there is actually fine, and potentially better. If we do choose to use TypeId and find that performance is a problem, we could skip hashing, as typeids are already (at least partially) hashed values. I don't think it will be too much of a problem, however. One thing to consider is what type of hashmap we use. Because this map will be shared between threads, we need to synchronize it. We could use something like |
well, we do. Right now there's an enum for all backends already. Minimal version just uses that
we're literally talking about having a fixed sized array of less than 10 pointers that we need once in the hub, no? Sounds like I got this wrong |
I meant for out-of-tree backends.
Yeah, you're actually right. Sometimes I go into over-optimization mode, that would probably be fine. The only way it isn't fine is if we store hubs inline rather than in an arc, but I think we want the arc for contention minimization anyway. |
@nical I thought one of the goals here was to reduce compilation time during development. Couldn't that be accomplished just by enabling only one backend? |
The "clutter of generic code" factor I don't find so compelling. It's not something that keeps me from getting work done. I'm not sure this work should be a priority for us at the moment. |
After some talk with @nical on matrix, I have a rough roadmap for what we need to do:
Then after all of that we'll regroup and discuss. |
One important update: Thinks to decide:
|
Adding a bit of context to @lylythechosenone's previous comments (context that can already be gathered from other places but is easy to miss):
|
Exactly. Although do note that resources will still have generics, they will just have a default type, and fns will only be implemented for this default type. |
I'll use this issue to explore various ways we could make wgpu-core less generic.
The motivations include reducing the clutter of overly generic code and more importantly to reduce binary size, compile times.
Factor out non-generic code into separate functions
There is some amount of code that could be easily factored out, for example some of the validation. Isolating the validation also has the advantage that it can be laid out to match the specification so that we can more easily check that the validation is correct.
There is only a limited amount of bang for buck we can get
Making the resources not generic
We'll get the biggest wins by far if we can erase the type of the resources. most of wgpu-core is generic only because something needs to hold on to a generic resource.
Downcasting
Assuming we can accept the cost of downcasting every resource at the top of every hal functions, we could use trait objects and downcast via
Any
.The gpu-hal APIs would take
&dyn Resource
parameters instead of some associated types, and the first thing they would do is downcast these dyn traits into the concrete types they are expecting.Note that we call also break it down into plenty of traits like
TextureResource
,BufferResource
, etc. That's a rather small detail once we have overcome the other obstacles.Boxed resources
The simple approach (which I find very unsatisfying) is to box all hal resources:
The wgpu-core and wgpu-hal structures are no longer next to each other in memory, that most likely has a cost but it is hard to estimate it without doing all of the work.
Dynamically sized types
The last member of a struct can be a DST, which turns the structure into a dst as well:
It's quite nice, however there are a few constraints:
Option<dyn hal::Resource>
device_create_*
functions would probably need to still be generic.Code of the DST experiment: https://gist.github.com/nical/c5d88aaf97f20815756a36dc5c94b5a3 it also shows how snatchable resources would work.
Can we do it without Any/downcast?
Maybe, with a copious amount of unsafe code which might be fine. Suggestions are welcome!
The text was updated successfully, but these errors were encountered: