-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697
Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697
Comments
I'll benchmark it |
This change will be useful for sorting. However, I don't think that it would improve the algorithm because there won't be any significant change in the value of I will expect a greater improvement if we simultaneously update the formula that uses D for calculating S. (But, I don't know what changes should be made in that formula.) |
I interpreted his suggestion as "Add this to the main formula". I am shocked. I have tried so many changes to the D formula, so the fact that this helps even by 1% is surprising.
It's not much, but it's "free" - no new parameters are needed |
OK. I will benchmark it in 20k collections tomorrow. If the result is stable, I will add it to FSRS-5. |
One problem: when D = 10, 1 - D / 10 = 0, rating |
Dang. I'll modify it and benchmark again |
def linear_damping_one_way(self, delta_d: Tensor, old_d: Tensor) -> Tensor:
return torch.where(delta_d <= 0, delta_d, delta_d * (10 - old_d) / 9)
def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
delta_d = -self.w[6] * (rating - 3)
new_d = state[:, 1] + self.linear_damping_one_way(delta_d, state[:, 1])
new_d = self.mean_reversion(self.init_d(4), new_d)
return new_d I will benchmark this one. |
Ok, looks good. Why are you dividing by 9, though? |
Weird. The variant is worse than FSRS-5 in my initial test with top 352 collections:
|
I'm benchmarking the two-way method. And it performs better:
So... should we implement it? |
The two-way is the one where Easy (or any other grade) doesn't change difficulty if it's already at 10? No, let's not. Users will definitely complain. |
Yeah. But it improves RMSE by 1% (relatively) among 8493 collections. It seems to say: ease hell is a myth, as similar to #695. |
I still think it would increase the number of complaints from users. @user1823 thoughts? |
Not decreasing the D when the user presses Easy seems to be wrong. But, the RMSE is decreasing. There can be two scenarios that explain this:
In either case, I think that implementing this would be worth it. However, it will be great if we can think of some modification that prevents this problem while still producing a similar decrease in RMSE. |
I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits |
If @richard-lacasse has an idea to solve this problem, then it would be great. Otherwise, I think that we should implement this for the reason I mentioned here.
To those users, say "just trust the algorithm". Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult. Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10. |
It's an asymptote, the Difficulty should never reach 10 exactly, that's the point. You'd have to hit Again so many times that it get smaller than machine epsilon, but not sure that's gonna be possible. |
Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy. |
But it will change a lot relative to the other cards in that space. If you want the changes to be reflected so users can see it, make Difficulty displayed on a log scale or something. I already have 99% of my cards showing up in the 90%-100% range, so the graph is already pretty useless. |
Maybe I'm just weird and this isn't a problem for other people. This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into |
One issue I can see is that even though it will be fine for sorting purposes, D is still affecting S, so stability won't get as affected by the Easy button once it get's close to 10. But if it's performing better in the simulations anyway... But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that. |
Alright, fine. @L-M-Sherlock it's up to you. Implement the two-way version if you want |
Are people ever hitting Easy on high difficulty cards anyway? I know I never do. Those are the cards that have given me the most trouble. |
I did an experiment to confirm that. With my parameters and rating_sequence Current formula: Proposed formula: So, even though the current formula makes the D equal to 10 just after pressing Again twice, the proposed formula takes D to 9.8 even after 7 Again button presses. |
Weighted by number of reviews
The new median parameters: [0.40255, 1.18385, 3.173, 15.69105, 7.1949, 0.5345, 1.4604, 0.0046, 1.54575, 0.1192, 1.01925, 1.9395, 0.11, 0.29605, 2.2698, 0.2315, 2.9898, 0.51655, 0.6621] The old median parameters: [0.4043, 1.18225, 3.13915, 15.6764, 7.2285, 0.4904, 1.0644, 0.0239, 1.6253, 0.1423, 1.0983, 1.95, 0.1014, 0.2998, 2.2834, 0.2407, 2.967, 0.5045, 0.66445] |
@richard-lacasse I have known other users with this so it's not a problem specific to you. I like your idea, and don't think practically anyone would reach a high D value, then press Easy and fret about how DSR value changed. |
It is worth noting that w[7] is reduced from 0.0239 to 0.0046. And w[9] is reduced from 0.1423 to 0.1192. |
@richard-lacasse Nope you are pretty normal. In the current update, my new decks clock around pretty ungodly Difficulty values. |
Can you run it to see how long it takes to reach 10 exactly? Mathematically that's impossible, but it will reach machine epsilon at some point and probably round up to 10. |
Assuming that I wrote the code correctly, the D didn't reach 9.995 even after 2000 consecutive Again presses. |
Will it cause more complaints? |
I don't think so. If D increases slowly on pressing Again, then it should also decrease slowly on pressing Good. |
OK, I will implement it in FSRS-rs and benchmark it. |
You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair. |
I just haven't pushed the updated python version to the repo. The benchmark result above is based on the Python version. |
Bad news: the change makes FSRS-rs worse. I don't know why. I need to check the consistency between the Python version and the Rust version again. Update: OK, I find out a serious problem and fix it: |
Does this mean that the benchmark will have FSRS-5 and FSRS-5.5? |
It's not a small change because some parameters have been changed significantly. Actually, I guess we need to do something like: |
I agree. For example, My parameters with current main: My parameters with linear damping: So, there is a 25% increase in w6 and a 23% decrease in w12. The other parameters have changed slightly. RMSE with old parameters evaluated using old optimizer: 0.0140 This means that if the user doesn't optimize their parameters after updating, their RMSE will increase. (This is true for almost any change to the algorithm, though.)
Unfortunately, finding a mathematical relation between the old and new parameters will be very difficult in this case. |
But the impact on metrics is small, around 1%. I think FSRS-5.1 makes more sense than FSRS-5.5 in this case |
Another problem: what if the initial difficulty is 10? In this case, the difficulty is immutable (if w[7] is 0). Fine. Now we have the reason to set a non-zero lower limit for w[7]. |
Just as discussed in the other topic about w7 #695 (comment) this change has shifted most optimal value of w7 more into negative, so this means optimizer wants to shit all cards into D=10 and clipper being mode value means that for most of the people all cards will want to be at max D all of the time. |
Then I guess we shouldn't implement damping after all. |
The problem is not with the damping itself, but optimizer prefers D=10 with the current setup, so damping D will mean that optimizer accommodates by decreasing w7 even further till it clips to whatever value you set. |
I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D. |
@Expertium I know you've tried R as a parameter in the Difficulty equation, but did you also take R out of the parameter list in Stability? D is a parameter in Stability, so if R is a parameter in both, that might explain why adding R to Difficulty didn't help anything. |
R affecting the increase in stability is one of the most important principles of the DSR model. Without it, we can't model the spacing effect since R depends on interval length, so removing it from the formula of S would mean that S doesn't depend on the interval length. |
S' depends on S, which has the interval length information, and if R is in the formula for Difficulty, and D is in the formula for Stability, then R is indirectly affecting the increase in Stability still. I'm spitballing here a little bit, but this is the first thing I would toy with. |
I've noticed that with my cards, they very quickly reach the maximum Difficulty score of 10. This might be somewhat due to my specific parameters, but I suspect others might experience a similar pattern, perhaps to a lesser extent.
When I mark a card as Again or Hard, it increases the Difficulty score significantly. However, when I mark a card as Good, it barely changes. While this is likely intentional, after getting a card wrong about half a dozen times, its Difficulty score becomes stuck at 10. Even if I subsequently get the card right 100 times, getting it wrong just once pushes it back to the maximum Difficulty.
The issue with this is that sorting cards by Difficulty Ascending becomes less useful because many cards are tied at$D = 10$ . As a result, I lose the granularity that this sorting method could provide. I think this could be addressed by using an asymptote instead of a hard cap at 10, while keeping the original equation for Difficulty ($D$ ) the same.
Currently, when$D$ exceeds 10, I believe the algorithm uses something like:
This means any information beyond 10 is lost each time it's capped. Instead, we could use an asymptote with a damping factor. For example:
1. Linear Dampening Factor
Let:
Then update$D\prime$ using:
In this equation, as$D$ approaches 10, the factor $1 - \frac{D}{10}$ decreases, reducing the increment $\Delta D$ and causing $D\prime$ to approach 10 asymptotically.
2. Exponential Damping Factor
Alternatively, you could parameterize the damping factor similar to how it's done in the Stability equation:
Here,$w$ would be a new weight and is a positive constant that controls the rate of damping. As $D$ approaches 10, the exponential term $e^{-w (10 - D)}$ decreases, causing $D\prime$ to approach 10 asymptotically.
Implementing an asymptote at 10 could preserve the usefulness of sorting by Difficulty and prevent the loss of information due to capping. It might be worth exploring this adjustment in the next version of FSRS to see if it improves the algorithm.
The text was updated successfully, but these errors were encountered: