Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diy7: additional "safe" relaxations reduces number of generated cycles #531

Closed
chacon01 opened this issue Feb 23, 2023 · 20 comments
Closed

Comments

@chacon01
Copy link

Hi,

We're observing that some invocations of diy7 cause fewer cycles to be generated when additional relaxations are added to the "safe" list. For example

$ diy7 -cycleonly true -arch AArch64 -safe "Rfe Fre [PodWR,Amo.Cas] PosRR" -relax "[DpDatadW,Rfi]" -size 5 -exact -eprocs -nprocs 2

produces one cycle, PodWR Amo.Cas Rfe DpDatadW Rfi PosRR Fre whereas

diy7 -cycleonly true -arch AArch64 -safe "Rfe Fre [PodWR,Amo.Cas] PosRR PodWR" -relax "[DpDatadW,Rfi]" -size 5 -exact -eprocs -nprocs 2

produces none (note the addition of PodWR in the "safe" list). My expectation would be that adding PodWR would have caused diy7 to produce a (non-strict) superset of what it produced without it. I tried experimenting with different modes; this doesn't appear to occur with -mode free (adding PodWR causes a new cycle to appear instead of causing one to not appear).

I see that the documentation is somewhat vague about what the sc mode does

In (default) sc mode, diy7 performs some optimisation, most of which we leave unspecified.

Is there some explanation why adding relaxations to the safe list would cause cycles to no longer be generated when using sc?

@maranget
Copy link
Member

Hi @chacon01. Here is an explanation, in sc mode, there is a so-called po_safe mechanism: When a plain po safe candidate is given, such as PodWR then sequences of two candidates with the same R/W status as the po safe candidate are rejected. This applies to your example: by adding PodWR to the safe candidates, one rejects the sequence Rfi PosRR as it starts with a write and ends with a read. That said this mechanism is questionable:

  • It is useful for "challenging" relax candidates without generating too many tests, which is not your aim.
  • It is not very consistent as Rfi PosRR is more akin to PosWR thans to PodRW.

I'd like to have your opinion on an evolution. For instance, I can introduce a new (default) mode that would not apply the po-safe mechanism. I will anyway fix the po-safe mechanism so that it considers s/d status. This last fix would restore the "increasing cycle" property in your example, but not in the general case.

@chacon01
Copy link
Author

Hi @maranget. I'm happy with the addition of a new mode that doesn't apply po_safe. I have no preference on whether this new mode would be the default instead of sc, though perhaps it would be better to make this new mode the default to avoid surprising users in the manner I was. I suppose other users would appreciate the documentation to describe the difference between these modes - I think the general description you provided about po_safe is good.

Are there any other optimizations in sc that might cause surprising behavior like po_safe that we should be aware of or consider removing in this new mode?

@maranget
Copy link
Member

Hi @chacon01. I'll add the new mode (called 'default'...) and attempt to document modes a bit more. As to your question, yes there is another subtlety : if you declare some fence candidate as safe, then some rejections of composing fence and dependencies candidate occur. Although I cannot reproduce the effect now, there is a potential "non-monotonic" effect here.

@maranget
Copy link
Member

maranget commented Feb 24, 2023

The 'fence' effect is marginal because all compositions c1;c2 where c1 or c2 is a fence are rejected, except a few where c1 (resp. c2) is a dependency c2 (resp c1) is a fence with the 's' specification and W/R extremities are the same as the one of the fence.The case looks so specific that I am tempted to suppress it. If one want such compositions (e.g. DpAddrdR DMB.LDsRR), one can use composite relaxations.

@chacon01
Copy link
Author

That makes sense, thanks. I'm not concerned about non-monotonicity for the fence case as you've described it. In the specific case you're considering, I agree that it can be suppressed provided that composite relaxations still allow it to be generated.

@maranget
Copy link
Member

Composite relaxations are not checked using mode checks. For instance [PodWR,PodRR] will be accepted and will find its way to cycles, even in the quite strict 'critical' mode that reject the compositon of internal candidates:

lim% diy7 -num false -arch AArch64 -cycleonly true -mode critical -safe Fre,Pod**  -nprocs 2
# diy7 -num false -arch AArch64 -cycleonly true -mode critical -safe Fre,Pod** -nprocs 2
# Version 7.56+03, Revision: bd83443f06b54d4c3dda8293297c8f8c70055cce
Generator produced 1 tests
SB: PodWR Fre PodWR Fre
lim% diy7 -num false -arch AArch64 -cycleonly true -mode critical -safe Fre,Pod**,[PodWR,PodRR]  -nprocs 2
# diy7 -num false -arch AArch64 -cycleonly true -mode critical -safe Fre,Pod**,[PodWR,PodRR] -nprocs 2
# Version 7.56+03, Revision: bd83443f06b54d4c3dda8293297c8f8c70055cce
Generator produced 3 tests
SB: PodWR Fre PodWR Fre
SB+po+poR-po: PodWR Fre PodWR PodRR Fre
SB+poR-pos: PodWR PodRR Fre PodWR PodRR Fre

@maranget
Copy link
Member

Draft PR #532 attempts a monotonic, better documented, default mode for diy. Documentation is work in progress. If you care for a try...

@chacon01
Copy link
Author

Hi @maranget, this appears to be working better. I'm seeing additional generated that weren't present under sc. I haven't had the chance to test for monotonicity; given the difficulty of getting the original non-monotonic example, I'm not sure how I'd go about testing for that... In any case, the generation changes in #532 look good to me. Thanks!

@maranget
Copy link
Member

maranget commented Feb 27, 2023

Thanks @chacon01. So PR #532 is close to being merged. Thanks again for your very clear issue report.

@benbancroft
Copy link
Contributor

Hi @maranget

It looks like some of the logic introduced in PR #532 has resulted in us losing some cycles in the new default mode that were previously being generated in the old SC mode.

Example:
DMB.LDdRW Rfe DpAddrdW Rfi DpAddrsW Rfe

If I enable the debug logic you added in your patch, I can see that Rfi DpAddrsW is the issue. Looking at the code, we allow Rfi Dp*d* but not Rfi Dp*s* - to me, both of these look somewhat useful?

With the following patch, I'm now able to generate the missing cycles.

diff --git a/gen/alt.ml b/gen/alt.ml
index 8dea46ef..8bb93aab 100644
--- a/gen/alt.ml
+++ b/gen/alt.ml
@@ -111,8 +111,8 @@ module Make(C:Builder.S)
 (*
   Now accept some internal with internal composition
  *)
-      | (Ws Int|Rf Int|Fr Int|Po (Diff,_,_)),Dp (_,Diff,_)
-      | (Dp (_,Diff,_)|Po (Diff,_,_)),(Ws Int|Rf Int|Fr Int)
+      | (Ws Int|Rf Int|Fr Int|Po (Diff,_,_)),Dp (_,_,_)
+      | (Dp (_,_,_)|Po (Diff,_,_)),(Ws Int|Rf Int|Fr Int)
       | Dp (_,Diff,_),Po (Diff,_,_)
       | Rf Int,Po (Same,_,_)
       | Po (Same,_,_),Rf Int

From a diff of my generation (~170k baseline cycles from before PR #532) with the above patch, I can see that we no longer miss any cycles and are seeing ~17k new cycles (vs 10k new cycles and 3k missing with just PR #532). The extra 4K cycles look to be as result of other new combinations of Rfi Dp*s* but also other internal combinations like Pod** Dp*s*.

Not sure the best way to approach this, but my personal opinion is the generator shouldn't be generating less cycles unless we can suitably justify it and we need to be extra careful when defining such rules - control of this might be better left to the user through the relaxations passed via -safe. Whitelisting the useful cases e.g. Rfi Dp*sR is certainly one option, but my worry is we might miss other useful cases if we are not careful.

@maranget
Copy link
Member

maranget commented Mar 1, 2023

Hi @benbancroft. I am sorry I have changed the patterns in a restrictive sense. I thought that the combination of of Dp..s.. candidates and of internal communication candidates was irrelevant, without asking for your opinion on this deletion.

If understand you correctly, you suggest applying the above patch? If so I am happy to make a PR of it.

@chacon01
Copy link
Author

chacon01 commented Mar 1, 2023

Hi @maranget, sorry that @benbancroft and I didn't notice that some cycles got lost on the original patch. I think Dp..sR candidates are interesting with internal communication, so we'd probably want to keep them.

I'm happy with having @benbancroft's patch applied.

@benbancroft
Copy link
Contributor

Hi @maranget
No worries at all. I actually missed this in the original generation we did when testing your PR, which just goes to show how reasoning about what may or may not be useful is not a simple problem and can be very subjective.

From testing my patch using a fairly exhaustive -safe list, I'm not seeing any other issues with missing cycles and we are now getting additional cycles which are providing us with coverage for the original issue which brought the po_safe problem to our attention, so I'd say we are onto a winning combination now. The other internal to internal composition logic looks good from what I can see, but will let you know if I detect any other regressions as we re-generate more of our cycles.

If it is easier for you, happy to submit a PR on my end - just let me know what works best. Many thanks again for working up the original PR.

@maranget
Copy link
Member

maranget commented Mar 8, 2023

Hi @chacon01 and @benbancroft, should we close the issue ? As PR #538 apparently fixes at least the lack of monotony.

@maranget
Copy link
Member

maranget commented Mar 31, 2023

Hi again @chacon01 and @benbancroft , PR #538 was somehow insufficient. More precisely the code was not implementing the documentation. As a consequence, less cycles involving Rfi were generated and an experiment described in the manual was no longer working as advertised (see issue #550). PR #552 (not merged yet) is a fix attempt.

As a result of PR #552, the number of cycles generated in default mode has increased significantly. Time permitting, I'd like to have your opinion on the new behaviour before merging.

@benbancroft
Copy link
Contributor

Hi @maranget

I've just kicked off our generation again using PR #552, so hopefully that will help establish what volume of increase we are talking about here. Hopefully this will finish by some point tomorrow.

@chacon01
Copy link
Author

Hi @maranget,

@benbancroft and I have looked at the generation results and while we're seeing some new cycles, others are now missing. The overall number of cycles hasn't changed by that much: of the ~846k we generated, there were ~58k new cycles and ~26k missing cycles.

One example of a missing cycle can be seen with

diy7 -arch AArch64 -nprocs 2 -eprocs -size 5 -stdout true -cycleonly true -safe "PodWRLA Fre" -relax DpAddrsR

Before #552 (as of 6e0f6cdc2d3fc7311d44e4968753eaa7e5aee1d4), we get

Generator produced 1 tests
SB000: PodWRLA FreAL PodWRLA DpAddrsRAP FrePL
Relaxations tested: {DpAddrsR}

With #552 (I merged fix-diy-default onto 6e0f6cdc2d3fc7311d44e4968753eaa7e5aee1d4), we now get no tests:

Generator produced 0 tests

I don't have an example where the "increasing cycle" property is violated with #552, but those examples are difficult to find in general.

@maranget
Copy link
Member

maranget commented Apr 14, 2023

Hi @chacon01 an and @benbancroft. I could reproduce the problem. However I have a question.

Before PR #532, which was introduced because of the non-monotonicity of diy7, your example was also rejected. More specifically, I checked this with diy7 at commit bd83443.

Namelely, this old version does not allow the composition po with different variable at endpoints followed by dependency with the same variable at endpoints (here PodWR followed by DpAddrsR) because this sequence is included in the (safe) edge PodWR. PR #532 still forbids the sequence, essentially because it is not allowed explicitly --- In the matching starting here dependency with same variable at endpoints does not appear. The composition "po diff followed by dependency same" is allowed later by commit 718ed52. Notice that the authorization results from a typo: (Ws Int|Rf Int|Fr Int|Po (Diff,_,_)),Dp (_,_,_) should probably have been written (Ws Int|Rf Int|Fr Int),(Po (Diff,_,_),Dp (_,_,_) as suggested by the next line, and by th apparent intend to allow the composition of local communication followed by "po diff f_or_ by dependency same". Finally commit 1cce4fa of PR #552 corrects the typo and the composition is rejected again.

FInally do we want to allow this composition of "po diff followed by dependency same"?. If the answer is yes I can allow it easily by extending PR #552. Otherwise can we merge the PR without further experiments?

@chacon01
Copy link
Author

Hi @maranget, thanks for the explanation. I don't think we need to allow this composition. All of the cycles missing from our generation appear to match this pattern, so I don't see any loss of coverage that would concern me. I'm happy to have #552 merged.

@maranget
Copy link
Member

Merged, thanks to all.

maranget added a commit that referenced this issue Oct 17, 2023
@chacon01 chacon01 closed this as completed Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants