Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generic map merging framework #937

Open
jcranch opened this issue Mar 13, 2023 · 6 comments
Open

Use generic map merging framework #937

jcranch opened this issue Mar 13, 2023 · 6 comments

Comments

@jcranch
Copy link

jcranch commented Mar 13, 2023

It is possible to make the code for merge tactics generic (so it can be used for Map and IntMap without duplication, and could also be used for other map-like data structures in other libraries), and to increase the safety in doing so.

I have created a repository containing both an implementation and a fairly lengthy explanation of how it works.

There are some decisions to be made about how to do it: most noticeably, there is a dependency upon Kinoshita's witherable library. I have demonstrated that this library can be trimmed down heavily, but there are still several possibilities for where it should go: it could go into base, into containers, or remain as a separate library (which would then need to be a dependency of containers).

@tomjaguarpaw has spent some time reviewing these ideas (I'm grateful for feedback on an earlier version); he may be willing to make his thoughts known.

@tomjaguarpaw
Copy link
Member

Yes, I found @jcranch's write up excellent and I'd recommend it to the containers maintainers. His idea could prove to be a very nice and practical way of generalising merge functionality.

@meooow25
Copy link
Contributor

Thanks for the write up. I do agree that this is a good way to define map merging, for two reasons as I see it:

  1. We can use the same strategy types for Map and IntMap (and other maps)
  2. We can expose the strategy constructors safely, allowing for more use cases without resorting to the internals

But I also see some hurdles in the way of adopting this in containers:

  • The primary problem is with dependencies, which you have also noted. I do not believe that containers is the right place to define (and make publicly available) WitherableWithIndex and superclasses. Which means that containers must depend on indexed-traversable and witherable. As containers is a boot library, it can only depend on other boot libraries. Adding new boot libraries would be something we would have to convince the library maintainers and the GHC devs of. Alternately, the classes could be moved into base, which means getting approval from the CLC in addition to the above.
  • Another issue is that I don't see how this scheme can be extended beyond map-merging. We have requests for a similar Set-Set merge (Generalized merge API for sets #732) and also Map-Set merge (Define partitionKeys: fused version of restrictKeys and withoutKeys #975 (comment)). Do you have any thoughts on this? Ideally we'd adopt the same (or as similar as possible) technique for all of them.
  • Finally, the new definition of WhenMissing requires RankNTypes, but containers has always maintained a portable implementation. In theory we could work around this by using the new definition only on GHC.

@jcranch
Copy link
Author

jcranch commented Oct 19, 2024

This is my first foray into anything more central to the Haskell ecosystem than writing libraries. As a result, I think I'm ill-equipped to comment on the first and third of your hurdles, which seem to me to be dependent mostly on strategic priorities. I genuinely do not know what types of upheaval the relevant maintainers are willing to tolerate over any given timeframe. I'd be pleased to see experienced people state their opinions.

The second of your hurdles is, however, within my competence. Set is not defined, but could be defined as

newtype Set a = Set { underlyingMap :: Map a () }

I have always calmly assumed that the containers people know what they're doing, and had solid performance reasons justifying a bespoke implementation rather than just implementing most of Data.Set in terms of Data.Map.Strict!

The value of this observation is that it gives you a way of specialising desirable functionality from maps to sets: any time you have an interesting function involving Map, there is a corresponding function involving Set which would be obtained from specialising the value type to (). This rule of thumb tells us exactly how to do set-set and set-map merges, and we get the following type signatures as the most general plausible merges:

imergeMapAndSetToMapA
  :: (Applicative f, Ord k)
  => WhenMissing f k a c -- ^ What to do with keys in @m@ but not @s@
  -> WhenMissing f k () c -- ^ What to do with keys in @s@ but not @m@
  -> WhenMatched f k a () c -- ^ What to do with keys in both @m@ and @s@
  -> Map k a -- ^ Map @m@
  -> Set k -- ^ Set @s@
  -> f (Map k c)

imergeMapAndSetToSetA
  :: (Applicative f, Ord k)
  => WhenMissing f k a () -- ^ What to do with keys in @m@ but not @s@
  -> WhenMissing f k () () -- ^ What to do with keys in @s@ but not @m@
  -> WhenMatched f k a () () -- ^ What to do with keys in both @m@ and @s@
  -> Map k a -- ^ Map @m@
  -> Set k -- ^ Set @s@
  -> f (Set k)

imergeSetA
  :: (Applicative f, Ord k)
  => WhenMissing f k () () -- ^ What to do with keys in @s1@ but not @s2@
  -> WhenMissing f k () () -- ^ What to do with keys in @s2@ but not @s1@
  -> WhenMatched f k () () () -- ^ What to do with keys in both @s1@ and @s2@
  -> Set k -- ^ Set @s1@
  -> Set k -- ^ Set @s2@
  -> f (Set k)

There are also of course the specialisations (presumably mergeMapAndSetToMapA, mergeMapAndSetToSetA, mergeSetA with no "i" at the beginning of the names) where f is Identity. There are some less plausibly useful combinations (for example, merging sets with sets to give maps).

The profusion of units in the type signatures are unpleasant, but the same merge tactics that work for map merges will work here, and will mean that the use cases which should be pleasant and idiomatic will indeed be pleasant and idiomatic. For example, we will have

union = mergeSetA preserveMissing preserveMissing preserveLeftMatched
intersection = mergeSetA dropMissing dropMissing preserveLeftMatched

Actually implementing these functions for Data.Map and Data.Set would not be a difficult exercise: the existing merge implementation for Data.Map could be adapted without any difficulty.

@treeowl
Copy link
Contributor

treeowl commented Oct 19, 2024

@jcranch, while we could, theoretically, define Set as a wrapper around Map (), that would use an extra word to represent each element. That has significant performance consequences.

@meooow25
Copy link
Contributor

Yes, and it would be even worse for IntSet because we would lose the packed Tips.
(Sharing Map/Set structure is also discussed in #851)

We will have to accommodate sets and maps being distinct types.

@jcranch
Copy link
Author

jcranch commented Oct 30, 2024

Don't worry (@treeowl and @meooow25), I'm not suggesting we do this, merely observing that the isomorphism between Set and Map with values in () tells us what to write.

Approach 1

I've implemented these functions, albeit in a slightly hackish way. The need for some amount of hackery is motivated by the issue (which might have been what @meooow25 was driving at last week) is that Set isn't Witherable (it isn't even a Functor, after all).

One can introduce a variant of Set with an unused variable (newtype Set' k v = Set' (Set k)), and fudge a Witherable instance on that, and then run your tactics on Set' (). To make this work you have to pass around bottom values systematically: for example, FilterableWithIndex defines

ifilter :: (k -> a -> Bool) -> Set k

and there is no value of a being carried around, so you just pass in an error value and hope that nobody will dream of using a WhenMissing operation on a Set that actually inspects the value.

It is the case that no standard WhenMissing tactic that makes sense in this context will inspect the value, and any meaningful tactic can be implemented in a way that doesn't inspect the value (after all, the value provided is in (), so there is nothing to be learned from inspecting it). However, it is possible to string together tactics clumsily which do inspect the value. It is perhaps unlikely that this will happen in practice, and in any case it's bad style to compose tactics (a single pass should always be more efficient). However, this caveat makes me slightly nervous.

Approach 2

The only safe alternative I can think of is to

  • produce "valueless" versions of the hierarchy of classes from Functor to WitherableWithIndex. Only the "WithIndex" variants above FunctorWithIndex make sense, and FoldableWithIndex is a bit of a nicety anyway. Thus the only three which are really needed are TraversableWithIndex, FilterableWithIndex and WitherableWithIndex, and you want functions

    traverseS :: Applicative f => (k -> f ()) -> s -> f s
    filterS :: (k -> Bool) -> s -> s
    witherS :: Applicative f => (k -> f Bool) -> s -> f s
    
  • produce merge tactics operating on types in those classes. That would then be how you merge Set (and IntSet, and hopefully one day HashSet, and...).

I'm happy to do this if someone wants to see it. It won't take long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants