-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Access::Custom; it forces table scans to figure out what can run #573
Conversation
34047b7
to
fa2b975
Compare
534a5d2
to
cc3e4a4
Compare
4f8bed2
to
be1b229
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there is an awful lot of complexity here, and it is not clear to me how much of that complexity is inherent versus accumulated. Are you happy with the state of things right now? Are there any particular bits that you find annoying?
I have a hunch that there is a better way to represent this concept of a 'work id' in the type system, and to do so in such a way that lets us delete a lot of code.
I also understand that the glyphs/gvar ids are tricky, because unlike all of the other work tasks these can't be known at compile time, as the set of glyphs obviously depends on the source format.
Due to lack of a way to get a discriminant other than from an instance you end up with this I would also like to make One only applicable to variants that do not contain data or to have a way to identify that and treat One(variant without data) as equivalent to All(variant without data) because All can resolve completion by simply checking a counter. EDIT: I added AccessBuilder to reduce the noise of examples like the one above and hopefully make it clearer when the match is on the variant vs on the variant + the data for the exemplar. |
Rebased to resolve conflict with main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the builder is a nice addition, but I also think there is a lot of complexity around this access/workid abstraction and I'm not sure that complexity is well spent.
That feels beyond the scope of this PR, though, so I'm happy to see this get in if it is improving performance in some meaningful way.
Do you have a specific suggestion? - it seems at least somewhat required to me. You need an identity for tasks to be able to say A depends on B. You could dump access checks but that opens the door for inconsistently reproducible ordering problems. |
it's hard to sort of tease apart the various parts of the current system to arrive at concrete suggestions, but from some earlier experimentation I did develop some ideas I found useful.
I also think that existence of separate id types and context types for fe/be is unnecessary, and these could be unified. I also-also think that there are better ways to express the concept of an id, in the type system, than with an enum. I was playing around a bit with the idea of actually just making this a struct that would store a type id and an optional 'instance identifier', so that we are actually reusing the type of the resource as the identifier. But all of this is complicated, and hard to be sure about without really fleshing out at least a toy implementation, and it is certainly beyond the scope of this PR. |
The majority of what you are describing is already the case.
The work identifier specifies an asset or resource (naming is the hardest part) more than actual work given than one work can both consume and produce many things so this is true other than the name not being resource or artifact or asset or w/e. Maybe we should rename things.
It's possible but I'm of the opinion it would be harmful. The phases are deliberately separated and some errors, such as a FE task trying to use a BE output, won't compile.
This is true already
The set of possible ids is known in advance. Which ones have an instance identifier is known in advance. To me that's describing an enum. What would the advantage be of this change? |
As with most of my changes, I'm not suggesting a drastic departure from where we are right now, just a slightly different way of encoding things in the type system. My dream is an end state where we have a
Agree that this is largely a matter of naming.
I'm unconvinced this is a problem in practice. In the world I am imagining, a type would still only be able to access the things that it had named as dependencies. A FE task would not be able to name a BE resource, and so couldn't request it.
If it is, I have misunderstood the point of this PR, which appeared to be sort of bolting an 'all of these things' concept onto the existing workid idea, which is different from having
So the rationale here is based on the idea of having a single context which doesn't actually know what it is storing, and is shared between fe/be. In this world the set of possible ids is not known ahead of time, at least not in the place where the context is declared. Basically instead of having a context with named fields, the context would wrap some sort of map type that would store type-erased resources, with the key being this struct. This is hard to describe in prose, but imagine something like, /// Stores all the intermediate state
pub(crate) struct ResourceStore {
storage: HashMap<ResourceIdentifier, AnyResource>,
}
pub(crate) struct AnyResource {
inner: Arc<dyn Any>,
// just used for debugging
type_name: &'static str,
}
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub struct ResourceIdentifier {
/// The type of the generated resource
type_id: TypeId,
/// For resources with multiple instances (glyphs, e.g.) some extra token
/// that uniquely identifies a specific instance
instance_id: (), //TODO
}
impl ResourceStore {
pub fn get_any(&self, ident: &ResourceIdentifier) -> Option<&AnyResource> {
self.storage.get(ident)
}
pub fn get<T: 'static>(&self) -> Result<&T, ResourceError> {
self.get_any(&ResourceIdentifier::<T>::new())
.ok_or_else(|| ResourceError::Missing(std::any::type_name::<T>()))
.and_then(AnyResource::downcast)
}
} Here the let kerning: ir::Kerning = context.get()?; and it 'just works'. Again, this is just one idea, and there may be serious problems to it, but at some point it could be an interesting idea to explore. |
The code example helps, ty. I was imagining our current contexts glued together which implies everything can see everything. The sketch neatly avoids that. I agree it feels like there might be something nicer than what we do currently here (assuming no blocker exists going from sketch to real implementation). I'm a bit concerned that the ROI would be low. It would be a lot of work and the benefits I see are:
Are there other benefits?
This part seems strongly informed by the problem you encountered in writing a test where we create and run specific work rather awkwardly. Those tests are poorly written, we should probably go fix them. However, I'm not immediately certian that a non-test users needs the "run to an arbitrary point" capability? Thinking "aloud," I think we could make the test that seems to inform the end state easier through a less invasive change. For example, to run until a specific thing is produced you would, roughly:
Not as elegant as
You need something to create a graph of work, and to update it at runtime because the graph is not a pure function of source but changes in response to operations on that source, such as to generate a new glyph based on what we found in the glyphs of the source. You've still got "work," you still need to know what work depends on, and you've still got something responsible for executing a workload. Hm. In the sketched model whose job is it to create the work to build the things you need to provide the ir::Kerning? You need to know what it reads, transitively, and to assemble a graph. Getting that graph right is arguably the interesting part. |
The main benefits as I see them (just for this context sketch):
Less than "run to an arbitrary point" I've been thinking of this as "generate this arbitrary artifact". In the test example the artifact might be some IR, but an artifact might also be an instance or a subset or something else. The idea is that there is a single mechanism/abstraction through which resources are resolved, regardless of what point in the program graph you find yourself. I think this has a very nice conceptual simplicity.
I think that any changes we might make should be done in stages, and I think that there are reasonable ways to break it up. For instance, I think that we could probably unify the contexts without changing too much else, and iterate from there. I definitely agree that anything we can do in the immediate term that improves testing would be great.
Right, I think you're definitely identifying the right questions here. I have thought about this a reasonable amount, and I have some ideas. I'll try to sketch out at a high level how I've imagined this whole system fitting together:
So that's a hand-wavy sketch. The devil is in the details; for instance in some cases (e.g. glyphs) we do not know all of the resources we need (assuming one per glyph) until we have done some preliminary work, and so some And then all of this would be rolled up in a nice And again, this is all just ideas, and although I have played around with implementing some of this there are lots of unanswered questions; but I do think that figuring out a scalable & ergonomic architecture would offer some significant long term benefits. I also think that we can get here iteratively. Ultimately most of the important logic in Indeed, in a hypothetical world where we wanted to explore these changes, the route I would propose would be to first refactor |
Access::Custom took a closure and required a scan over pending jobs to determine if it matched any. Instead count the number of pending jobs of each type and resolve "blocks on all of type" (e.g. all IR Glyphs) by checking if the count of those pending is 0.
Key outcome is deletion of
can_run_scan
fromfontc/src/workload.rs
. hyperfine reports no significant change in performance:A mildly amusing outcome is that you can potentially detect that all-of(something) is done before you can detect a specific id is done because the former is a decrement to an atomic and the latter requires a message to pass through a channel.
I feel like there should be a nicer way than how I've rigged AllOrOne but I haven't pinned it down.