-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving rank feedback to user #73
Comments
What's the code you are using for these histograms? I built a little NN to predict rank and would be curious how it compares. It did immediately find one cheater who was OGS 5d but around 8k and timing out all his losses. |
Pretty cool! On the other hand, a neural network is always a blackbox. It can be really useful as a universal function approximator for complex functions, where symbolic/analytic functions are not available. But I think that is not the case here. |
It was mostly to see what the difference is / how well it performs. It is only a 2 layer, 15->20->10->1 NN using the histogram features as inputs, as I'm trying to avoid overfitting. |
It is interesting that the NN prediction deviates from y=x similarly to the move rank based kyu estimation. |
Let's try to get your estimate into v1.3 - any ideas on the user interface? just putting it in the 'info' box will be easiest, but maybe a bit hidden? |
Maybe a spline connected curve next to the score and win rate (under the timer). And under the plot, next to Score/Win Rate/Point Loss there could be the overall kyu rank of the game from move 1 to the current move. |
frankly that looks like we could do with linear interpolation (=just save the points and let the graphics primitives deal with it -- the score graph is definitely not matplotlib based) |
I don't think spline is crucial for the plot either. |
What about an option to be able to show dots according to their rank estimates rather than point losses? |
@Dontbtme A single move does not have a rank estimate -- it's a statistical estimate that requires many moves to be even close. |
The problem is that kyu estimation is not a single move statistic. |
Gotcha. That's a shame :p |
I'll give you one last idea before I stop wasting your time and then I'll call it a day :p |
The idea here is that average move rank is a more robust statistic than the average score loss. The rank estimation "simply" inverts the method used in the p-pick bot (after removing outliers). |
There is no particular evidence for this though, you can probably do well with score loss as well, or both. What is definitely true though is that the 15b single visit scoreLoss is very noisy/biased in endgame. |
I am pretty sure that using score loss alone or in combination with move rank can be used to create a human-like player that is even closer to the human style than the current bot.
This is exactly what I meant by more robust. It works even at policy level throughout the game. |
ROFL |
I cloned the v1.3 branch. I really like the way you solved to show the rank estimate. |
I found a game that shows the issue nicely. The first segment is only 30 moves long (15 each) resulting in a poor estimation of that part of the game. I fixed it by not plotting estimations that are made from less than 75% of the total segment length. Using 20b model the estimation of strong bots is even more accurate. Please notice that the calibration of calibrated rank bot does not apply for 20b, but the trends are the same. |
yeah I was a bit to aggressive in wanting the line to look nice across the whole length. maybe we can fake it by extrapolating the first point backward ;) |
What do you mean by taking it all the way? |
Well a course histogram with a few bins may be better, it's worth considering |
The user data based AI (#74 (comment)) can be used for estimating user ranks more accurately. |
The rank estimation of segments became much more accurate with the new user data based AI. I think it is pretty convincing, especially comparing it to previous estimations of user kyu ranks. |
It looks pretty good, and there are some impressive outputs when I try it on the OGS games. Incredible noise as well though, two games from the same player. Move quality for moves 1 to 178 B: 10.4k W: 4.1k |
I think a way to further decrease the noise could be by running the policy analysis a few times (3-5). The rank estimation function would be fed by the average of the reported move ranks. |
The policy is deterministic! |
That is interesting.
2nd run:
3rd run:
|
Aha, that may be because of the random rotations it does! Other than rotations it's deterministic, and there is no real way to force them currently. |
I see. Probably using the stronger 20b model will help in this respect. |
Yes, it's more consistent in that respect |
Yes, it is weird to see so many yellow and red dots in games like that. Katago on the other hand plays to maximize the score, making its style more similar to humans. Only in the last game did AlphaGo dip below 5d (except for the 4th game where it lost). |
I was trying to estimate the effect of the strength parameter of score loss and found even a fairly low number to beat the highest calibrated rank. Even though on OGS I found strength=0.5 to be around 5k maybe -- something is weird! |
Let's give it a try. I'm a bit concerned about whether this eventually goes to obvious_n or something. Also, whether this is appropriate for the lower strength ais as I'm sure this case happens a lot in joseki, and it won't even play the second best move! |
Strange. I ran 6 games, 5d calibrated rank won all of them. |
I ran two more and calibrated rank won, perhaps it was a fluke |
It seems that the bots became too strong (OGS ranks). |
Yes, particularly the weaker bots are getting a lot stronger. the effect on the higher ranks seems less, perhaps because policy alone becomes a less good strategy. updated ogs bots, let's see them plummet |
Give the user feedback on their game , such as 'your opening/middle game/endgame was around 8k'.
The text was updated successfully, but these errors were encountered: