Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix greater than TB scores in null move pruning. #4715

Closed
wants to merge 1 commit into from
Closed

Fix greater than TB scores in null move pruning. #4715

wants to merge 1 commit into from

Conversation

peregrineshahin
Copy link
Contributor

@peregrineshahin peregrineshahin commented Jul 30, 2023

This patch is a simplification and a fix to dealing with null moves scores that returns proven mates or TB scores by preventing 'null move pruning' if the nullvalue is in that range.

Current solution downgrades nullValues on the non-PV node but the value can be used in a transposed PV-node to the same position afterwards (Triangulation), the later is prone to propagate a wrong score (96.05) to root that will not be refuted unless we search further.

Score of (96.05) can be obtained be two methods,

  1. maxim static-eval returned on Pv update (mostly qSearch)
  2. this downgrade (clamp) in NMP

and theoretically can happen with or without TBs but the second scenario is more dangerous than the first.
This fixes the reproducible case in very common scenarios with TBs as shown in the debugging at discord.

Fixes: #4699

Passed STC:
https://tests.stockfishchess.org/tests/view/64c1eca8dc56e1650abba6f9
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 670048 W: 171132 L: 171600 D: 327316
Ptnml(0-2): 2134, 75687, 179820, 75279, 2104

Passed LTC:
https://tests.stockfishchess.org/tests/view/64c5e130dc56e1650abc0438
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 92868 W: 23642 L: 23499 D: 45727
Ptnml(0-2): 52, 9509, 27171, 9648, 54

Bench: 1573024

@snicolet
Copy link
Member

snicolet commented Jul 31, 2023

Note this fixes the issue on NMP only which could result in a zugzwang miss and qSearch does not probe TBs to which we currently trust the evaluation function (static eval) to be correct if the depth is <= 0 and qSearch returned a maximum static evaluation and the Pv is updated if PvNode or a nonPv node that ttScore gets used again for a transposed PvNode.

Sorry, I don't manage to parse the previous sentence :-)

Can you describe your scenario for the bug in master with steps like this:

step 1 :
step 2:
step 3:
...

thanks :-)

@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Jul 31, 2023

Scenario 1:

Step 1: if the depth is equal or lesser than 0 and we are on a pv-node we fall into a PV qSearch

Step 2: if the node is a new node that does not exist in TT we assign the eval to be the static evaluation

Step 3: if the evaluation function returns the maximum static evaluation and this score is > beta we save that node score in TT and return bestValue on that pv-node so the score gets updated from x.xx to 96.05 on rootNode.

Step 4: we have little time to refute the rootScore if the static evaluation turned out to evaluate the position wrongly.

Scenario 2:

Step 1: A non-PV qSearch is called on a position that does not exist in TT.

Step 2: We evaluate the position (before or after making a move, that's irrelevant) and the evaluation return the maximum static evaluation (96.05).

Step 3: We save that position in TT with a lower bound and break with that score.

Step 4: We got to that same node later by a transposition, and to our luck the search is not normal PV Search but rather a PV-qSearch and we extract the ttValue for that same position and break with that score which gets propagated to the root.

Step 5: we have no time to refute the root score if the static evaluation was wrong.

Scenario 3:

Step 1: Razoring in current master is done on RootNodes, PvNodes, and nonPv nodes leading to the same scenario with one step.

Scenario 4:

on moves that does not exist in TT, If the previous move is a Null Move we flip the static evaluation of the position which is prone to zugzwang and the same steps happens from above.

TLDR; the meaning of my statement is that we can still theoretically report a 96.05 and still be wrong about it and the only blame then would only be the static evaluation of such position and not something dangerous in search in the first 3 scenarios.

The 4th scenario I consider actually dangerous but evaluating instead of flipping the sign has been tried many times and loses Elo.

Conclusion:

Notice that there is a crucial difference between a PV Search and PV qSearch that makes these scenarios not possible in PvSearch: PV (normal )search has probing TB's for such positions but qSearch does not.

@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Jul 31, 2023

Now to actually explain what the issue this PR fixes in steps:

Step 1: We do null move search on non-PV nodes and an extra move to opponent could actually make the position winning for us instead of a draw (wrong zugzwang).

Step 2: The position turned out to really be a proven loss for opponent

Step 3: We flip the score obtained and prune with that score with a downgraded value (96.05) for us

Step 4: The position and score is saved in TT

Step 5: The wrong evaluation could be then used in Pv qSearch that updates the root score or Razoring

Step 6: We have very little time to refute the score.

@cj5716
Copy link
Contributor

cj5716 commented Aug 11, 2023

Is this ready for merge?

@vondele vondele added bench-change Changes the bench and removed no-bench-change labels Aug 11, 2023
@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Aug 11, 2023

For debugging reference see https://discord.com/channels/435943710472011776/813919248455827515/1139314949278154783

I edited the commit message removing the reference for other cases that this PR don't touch since in them it's non-reproducible for me.

Yes it is ready for merge.
Note that the simplification testing was functional on fishtest terms since mates and not just TB's are downgraded so the result data isn't corrupt.

@snicolet snicolet added the needs-analysis Needs further analysis label Aug 11, 2023
@peregrineshahin
Copy link
Contributor Author

drafted for now because master could find mate here after a minute of a pause with
(KBN vs K)

position fen 3k4/8/8/8/8/8/3BN3/3K4 w - - 0 1
go

info depth 43 seldepth 68 multipv 1 score mate 41 nodes 364713757

but the patch stuck in a GHI instead reporting a TB win without TB...
info depth 64 seldepth 80 multipv 1 score cp 20000 nodes 2081835203

@Disservin Disservin marked this pull request as draft August 14, 2023 08:23
@peregrineshahin
Copy link
Contributor Author

position fen 3k4/8/8/8/8/8/3BN3/3K4 w - - 0 1
setoption name hash value 128
go

Increasing the hash solved the strange GHI issue with 20000
I'm reopening since it is only an unlucky run that got hit with the issue #4413

info depth 39 seldepth 46 multipv 1 score cp 688 lowerbound nodes 37283641 nps 863907 hashfull 471 tbhits 0 time 43157 pv d1c2
info depth 35 currmove d1c2 currmovenumber 1
info depth 35 currmove d2g5 currmovenumber 2
info depth 35 currmove d2a5 currmovenumber 3
info depth 35 currmove e2g3 currmovenumber 4
info depth 35 currmove e2d4 currmovenumber 5
info depth 35 currmove d2f4 currmovenumber 6
info depth 35 currmove e2g1 currmovenumber 7
info depth 35 currmove e2c1 currmovenumber 8
info depth 35 currmove e2f4 currmovenumber 9
info depth 35 currmove e2c3 currmovenumber 10
info depth 35 currmove d2c3 currmovenumber 11
info depth 35 currmove d2e1 currmovenumber 12
info depth 35 currmove d2h6 currmovenumber 13
info depth 35 currmove d1c1 currmovenumber 14
info depth 35 currmove d2b4 currmovenumber 15
info depth 35 currmove d2e3 currmovenumber 16
info depth 35 currmove d1e1 currmovenumber 17
info depth 35 currmove d2c1 currmovenumber 18
info depth 39 seldepth 50 multipv 1 score mate 39 nodes 45681947 nps 895706 hashfull 500 tbhits 0 time 51001 pv d1c2 d8c7 c2c3 c7b6 c3c4 b6c6 e2d4 c6b7 c4c5 b7a7 c5c6 a7b8 d2g5 b8a7 d4e6 a7b8
g5e3 b8c8 e6c7 c8d8 c7d5 d8c8 e3f4 c8d8 f4g3 d8e8 g3h4 e8f8 d5f4 f8g8 c6d6 g8f8 d6d7 f8g7 d7e6
info depth 40 currmove d1c2 currmovenumber 1

@peregrineshahin peregrineshahin marked this pull request as ready for review August 17, 2023 12:46
This patch is a simplification and a fix to dealing with null moves scores that returns proven mates or TB scores by preventing 'null move pruning' if the nullvalue is in that range.

Current solution downgrades nullValues on the non-PV node but the value can be used in a transposed PV-node to the same position afterwards (Triangulation), the later is prone to propagate a wrong score (96.05) to root that will not be refuted unless we search further.

Score of (96.05) can be obtained be two methods,

maxim static-eval returned on Pv update (mostly qSearch)
this downgrade (clamp) in NMP
and theoretically can happen with or without TBs but the second scenario is more dangerous than the first.
This fixes the reproducible case in very common scenarios with TBs as shown in the debugging at discord.

Fixes: #4699

Passed STC:
https://tests.stockfishchess.org/tests/view/64c1eca8dc56e1650abba6f9
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 670048 W: 171132 L: 171600 D: 327316
Ptnml(0-2): 2134, 75687, 179820, 75279, 2104

Passed LTC:
https://tests.stockfishchess.org/tests/view/64c5e130dc56e1650abc0438
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 92868 W: 23642 L: 23499 D: 45727
Ptnml(0-2): 52, 9509, 27171, 9648, 54

Bench: 1603079
@peregrineshahin
Copy link
Contributor Author

It seems that after removing classical, I cannot reproduce the staticEval hitting TB range with bench (d50), leaving only the cases of NMP.
What kind of analysis is needed to accept this?

@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Sep 19, 2023

referring to this

     // Guarantee evaluation does not hit the tablebase range
+    Value oldV = v;
     v = std::clamp(v, VALUE_TB_LOSS_IN_MAX_PLY + 1, VALUE_TB_WIN_IN_MAX_PLY - 1);
+    dbg_hit_on(v != oldV);

@Disservin Disservin requested a review from vondele October 7, 2023 12:51
@vondele
Copy link
Member

vondele commented Oct 20, 2023

@peregrineshahin is there a testcase and a sequence of uci commands I can use to see before / after behavior?

@peregrineshahin
Copy link
Contributor Author

Yes. Here is the difference in behavior.

Before:

$ ./stockfish_3f7fb5ac1d58e1c90db063053e9f913b9df79994.exe
Stockfish dev-20231020-nogit by the Stockfish developers (see AUTHORS file)
position fen 3k4/8/8/8/8/8/3BN3/3K4 w - - 0 1
setoption name hash value 128
go
info string NNUE evaluation using nn-1ee1aba5ed4c.nnue
info depth 1 seldepth 1 multipv 1 score cp 165 nodes 21 nps 10500 hashfull 0 tbhits 0 time 2 pv e2g1
....
info depth 40 seldepth 42 multipv 1 score cp 477 lowerbound nodes 27306360 nps 999135 hashfull 356 tbhits 0 time 27330 pv e2g3
....
info depth 40 seldepth 45 multipv 1 score cp **9605** nodes 27336935 nps 998573 hashfull 356 tbhits 0 time 27376 pv e2g3 d8e8 d1e2 e8d
7 d2f4 d7c6 e2d3 c6d5 g3f5 d5c5 d3e4 c5c6 f4e3 c6d7 e4d5 d7c7 d5c5 c7d7 e3f4 d7e6 f5h4 e6d7 f4d6 d7e6 c5c6 e6f7 c6d7 f7g8 d6f4 g8f
8 d7d6 f8e8 f4g5 e8f7 g5e3 f7f6 e3f4 f6f7 h4f5 f7e8 f5e7 e8f7 f4e3 f7g7
....
....
....
info depth 41 seldepth 65 multipv 1 score mate 39 nodes 98640412 nps 1158866 hashfull 674 tbhits 0 time 85118 pv d1c2 d8e7 c2d3 e7
d7 d3e4 d7c6 e4e5 c6d7 e5d5 d7c8 d5d6 c8d8 d6c6 d8e7 e2g3 e7f7 c6d6 f7e8 d2g5 e8f7 g5h6 f7f6 h6e3 f6g7 d6e7 g7g6 e3d2 g6g7 d2g5 g7
g6 g5f6 g6h6 e7d6 h6h7 f6e7 h7g7 e7h4 g7f7 h4g5 f7g6 g5d8 g6f7
info depth 42 currmove d1c2 currmovenumber 1

After:

$ ./Stockfish_fixGlitchy3.exe
Stockfish dev-20231020-nogit by the Stockfish developers (see AUTHORS file)
position fen 3k4/8/8/8/8/8/3BN3/3K4 w - - 0 1
setoption name hash value 128
go
info depth 40 currmove e2g3 currmovenumber 1
info depth 40 seldepth 34 multipv 1 score cp 367 lowerbound nodes 25292144 nps 1011847 hashfull 340 tbhits 0 time 24996 pv e2g3
info depth 39 currmove e2g3 currmovenumber 1
info depth 40 seldepth 34 multipv 1 score cp 394 lowerbound nodes 25821134 nps 1014622 hashfull 344 tbhits 0 time 25449 pv e2g3
info depth 38 currmove e2g3 currmovenumber 1
info depth 40 seldepth 40 multipv 1 score cp 429 lowerbound nodes 26739115 nps 1016155 hashfull 353 tbhits 0 time 26314 pv e2g3
info depth 37 currmove e2g3 currmovenumber 1
info depth 40 seldepth 41 multipv 1 score mate 46 lowerbound nodes 28501147 nps 1039998 hashfull 359 tbhits 0 time 27405 pv e2g3
info depth 36 currmove e2g3 currmovenumber 1
info depth 36 currmove d2e3 currmovenumber 2
info depth 36 currmove d2a5 currmovenumber 3
info depth 36 currmove d2f4 currmovenumber 4
info depth 36 currmove d2g5 currmovenumber 5
info depth 36 currmove d2c3 currmovenumber 6
info depth 36 currmove d2e1 currmovenumber 7
info depth 36 currmove e2c3 currmovenumber 8
info depth 36 currmove d2b4 currmovenumber 9
info depth 36 currmove e2c1 currmovenumber 10
info depth 36 currmove e2d4 currmovenumber 11
info depth 36 currmove e2g1 currmovenumber 12
info depth 36 currmove d1e1 currmovenumber 13
info depth 36 currmove d1c2 currmovenumber 14
info depth 36 currmove d2c1 currmovenumber 15
info depth 36 currmove d1c1 currmovenumber 16
info depth 36 currmove e2f4 currmovenumber 17
info depth 36 currmove d2h6 currmovenumber 18
info depth 40 seldepth 43 multipv 1 score mate 45 nodes 29214375 nps 1046135 hashfull 364 tbhits 0 time 27926 pv e2g3 d8e8 d1e2 e8d7 d2f4 d7c6 e2d3 c6d5 g3f5 d5c5 d3e4 c5c6 f4e3 c6c7 e4d5 c7b7 d5d
6 b7a6 d6c6 a6a5 c6c5 a5a6 f5d6 a6a5 e3d4 a5a6 c5c6 a6a5 d4c3 a5a4 c6c5 a4b3 c5d4 b3a4 d4c4 a4a3 d6f5 a3a4 f5g3
info depth 41 currmove e2g3 currmovenumber 1
....
....

@peregrineshahin
Copy link
Contributor Author

peregrineshahin commented Oct 20, 2023

Worth noting that a 9605 score is not a correct downgrade to a null search mate score unless the same position can be reached via Triangulation.
With the commands above 'before' has that score as an intermidiate (that can be incorrect) and with the patch it goes directly from 429 to mate scores.

@vondele vondele added the to be merged Will be merged shortly label Oct 21, 2023
@vondele vondele closed this in a4fedd8 Oct 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Glitchy eval
5 participants