Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for Adjusting Difficulty Score to Use an Asymptote at 10 #697

Closed
richard-lacasse opened this issue Oct 16, 2024 · 50 comments · Fixed by open-spaced-repetition/fsrs-optimizer#143 or open-spaced-repetition/fsrs-rs#239
Labels
enhancement New feature or request

Comments

@richard-lacasse
Copy link

I've noticed that with my cards, they very quickly reach the maximum Difficulty score of 10. This might be somewhat due to my specific parameters, but I suspect others might experience a similar pattern, perhaps to a lesser extent.

When I mark a card as Again or Hard, it increases the Difficulty score significantly. However, when I mark a card as Good, it barely changes. While this is likely intentional, after getting a card wrong about half a dozen times, its Difficulty score becomes stuck at 10. Even if I subsequently get the card right 100 times, getting it wrong just once pushes it back to the maximum Difficulty.

The issue with this is that sorting cards by Difficulty Ascending becomes less useful because many cards are tied at $D = 10$. As a result, I lose the granularity that this sorting method could provide. I think this could be addressed by using an asymptote instead of a hard cap at 10, while keeping the original equation for Difficulty ($D$) the same.

Currently, when $D$ exceeds 10, I believe the algorithm uses something like:

$$D\prime = min(D\prime, 10)$$

This means any information beyond 10 is lost each time it's capped. Instead, we could use an asymptote with a damping factor. For example:

1. Linear Dampening Factor

Let:

$$\Delta D = D\prime - D$$

Then update $D\prime$ using:

$$D\prime = D + \Delta D \cdot (1 - \frac{D}{10})$$

In this equation, as $D$ approaches 10, the factor $1 - \frac{D}{10}$ decreases, reducing the increment $\Delta D$ and causing $D\prime$ to approach 10 asymptotically.

2. Exponential Damping Factor

Alternatively, you could parameterize the damping factor similar to how it's done in the Stability equation:

$$D\prime = D + \Delta D \cdot e^{-w (10 - D)}$$

Here, $w$ would be a new weight and is a positive constant that controls the rate of damping. As $D$ approaches 10, the exponential term $e^{-w (10 - D)}$ decreases, causing $D\prime$ to approach 10 asymptotically.

Implementing an asymptote at 10 could preserve the usefulness of sorting by Difficulty and prevent the loss of information due to capping. It might be worth exploring this adjustment in the next version of FSRS to see if it improves the algorithm.

@richard-lacasse richard-lacasse added the enhancement New feature or request label Oct 16, 2024
@Expertium
Copy link
Collaborator

Expertium commented Oct 16, 2024

I'll benchmark it

@user1823
Copy link
Collaborator

This change will be useful for sorting.

However, I don't think that it would improve the algorithm because there won't be any significant change in the value of 11 - D.

I will expect a greater improvement if we simultaneously update the formula that uses D for calculating S. (But, I don't know what changes should be made in that formula.)

@Expertium
Copy link
Collaborator

Expertium commented Oct 17, 2024

I interpreted his suggestion as "Add this to the main formula".
And surprisingly, it (kind of) worked!
image

I am shocked. I have tried so many changes to the D formula, so the fact that this helps even by 1% is surprising.
@L-M-Sherlock we can improve RMSE by 1% (relatively) by doing this:

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = - self.w[6] * (rating - 3)
        new_d = state[:, 1] + delta_d * (1 - (state[:, 1] / 10))  # linear damping
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

It's not much, but it's "free" - no new parameters are needed

@L-M-Sherlock
Copy link
Member

OK. I will benchmark it in 20k collections tomorrow. If the result is stable, I will add it to FSRS-5.

@L-M-Sherlock
Copy link
Member

One problem: when D = 10, 1 - D / 10 = 0, rating Easy will not decrease D.

@Expertium
Copy link
Collaborator

Dang. I'll modify it and benchmark again

@L-M-Sherlock
Copy link
Member

    def linear_damping_one_way(self, delta_d: Tensor, old_d: Tensor) -> Tensor:
        return torch.where(delta_d <= 0, delta_d, delta_d * (10 - old_d) / 9)

    def next_d(self, state: Tensor, rating: Tensor) -> Tensor:
        delta_d = -self.w[6] * (rating - 3)
        new_d = state[:, 1] + self.linear_damping_one_way(delta_d, state[:, 1])
        new_d = self.mean_reversion(self.init_d(4), new_d)
        return new_d

I will benchmark this one.

@Expertium
Copy link
Collaborator

Expertium commented Oct 18, 2024

Ok, looks good. Why are you dividing by 9, though?
Oh, ok, I get it. Because 10-D is never 10, it's at most 9.

@L-M-Sherlock
Copy link
Member

Weird. The variant is worse than FSRS-5 in my initial test with top 352 collections:

Model: FSRS-5-dev
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5-dev-1 LogLoss (mean±std): 0.2939±0.1591
FSRS-5-dev-1 RMSE(bins) (mean±std): 0.0495±0.0335
FSRS-5-dev-1 AUC (mean±std): 0.7001±0.0771

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3348±0.1636
FSRS-5-dev RMSE(bins) (mean±std): 0.0681±0.0416
FSRS-5-dev AUC (mean±std): 0.6950±0.0892

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3380±0.1646
FSRS-5-dev RMSE(bins) (mean±std): 0.0706±0.0426
FSRS-5-dev AUC (mean±std): 0.6942±0.0914

Model: FSRS-5
Total number of users: 352
Total number of reviews: 10744330
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.2938±0.1587
FSRS-5 RMSE(bins) (mean±std): 0.0493±0.0332
FSRS-5 AUC (mean±std): 0.6989±0.0813

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3344±0.1629
FSRS-5 RMSE(bins) (mean±std): 0.0677±0.0411
FSRS-5 AUC (mean±std): 0.6949±0.0905

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3375±0.1639
FSRS-5 RMSE(bins) (mean±std): 0.0701±0.0421
FSRS-5 AUC (mean±std): 0.6941±0.0927

@L-M-Sherlock
Copy link
Member

I'm benchmarking the two-way method. And it performs better:

Model: FSRS-5-dev
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5-dev LogLoss (mean±std): 0.3465±0.1428
FSRS-5-dev RMSE(bins) (mean±std): 0.0557±0.0322
FSRS-5-dev AUC (mean±std): 0.6978±0.0639

Weighted average by log(reviews):
FSRS-5-dev LogLoss (mean±std): 0.3624±0.1551
FSRS-5-dev RMSE(bins) (mean±std): 0.0726±0.0395
FSRS-5-dev AUC (mean±std): 0.6995±0.0924

Weighted average by users:
FSRS-5-dev LogLoss (mean±std): 0.3631±0.1570
FSRS-5-dev RMSE(bins) (mean±std): 0.0747±0.0398
FSRS-5-dev AUC (mean±std): 0.6987±0.0955

Model: FSRS-5
Total number of users: 118
Total number of reviews: 3151255
Weighted average by reviews:
FSRS-5 LogLoss (mean±std): 0.3474±0.1433
FSRS-5 RMSE(bins) (mean±std): 0.0563±0.0322
FSRS-5 AUC (mean±std): 0.6952±0.0648

Weighted average by log(reviews):
FSRS-5 LogLoss (mean±std): 0.3629±0.1555
FSRS-5 RMSE(bins) (mean±std): 0.0728±0.0394
FSRS-5 AUC (mean±std): 0.6977±0.0942

Weighted average by users:
FSRS-5 LogLoss (mean±std): 0.3637±0.1573
FSRS-5 RMSE(bins) (mean±std): 0.0749±0.0397
FSRS-5 AUC (mean±std): 0.6969±0.0975

So... should we implement it?

@Expertium
Copy link
Collaborator

Expertium commented Oct 18, 2024

The two-way is the one where Easy (or any other grade) doesn't change difficulty if it's already at 10? No, let's not. Users will definitely complain.

@L-M-Sherlock
Copy link
Member

Yeah. But it improves RMSE by 1% (relatively) among 8493 collections. It seems to say: ease hell is a myth, as similar to #695.

@Expertium
Copy link
Collaborator

I still think it would increase the number of complaints from users. @user1823 thoughts?

@user1823
Copy link
Collaborator

Not decreasing the D when the user presses Easy seems to be wrong. But, the RMSE is decreasing. There can be two scenarios that explain this:

  • The better prediction of R at other reviews is offsetting the slight inaccuracy introduced by not decreasing the D.
  • The D shouldn't be decreased because the user is overestimating their learning.

In either case, I think that implementing this would be worth it.

However, it will be great if we can think of some modification that prevents this problem while still producing a similar decrease in RMSE.

@Expertium
Copy link
Collaborator

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

@user1823
Copy link
Collaborator

user1823 commented Oct 18, 2024

If @richard-lacasse has an idea to solve this problem, then it would be great.

Otherwise, I think that we should implement this for the reason I mentioned here.

I'm afraid that "I pressed Easy and D is still at 100%" will outweigh any algorithmic benefits

To those users, say "just trust the algorithm".

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

@richard-lacasse
Copy link
Author

It's an asymptote, the Difficulty should never reach 10 exactly, that's the point. You'd have to hit Again so many times that it get smaller than machine epsilon, but not sure that's gonna be possible.

@Expertium
Copy link
Collaborator

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

@richard-lacasse
Copy link
Author

But it will change a lot relative to the other cards in that space. If you want the changes to be reflected so users can see it, make Difficulty displayed on a log scale or something. I already have 99% of my cards showing up in the 90%-100% range, so the graph is already pretty useless.

@richard-lacasse
Copy link
Author

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.

Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

Screen Shot 2024-10-18 at 8 17 23 AM

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

@richard-lacasse
Copy link
Author

Still, suppose that D=9.9. In that case difficulty will barely change if the user presses Easy.

One issue I can see is that even though it will be fine for sorting purposes, D is still affecting S, so stability won't get as affected by the Easy button once it get's close to 10. But if it's performing better in the simulations anyway...

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

@Expertium
Copy link
Collaborator

Alright, fine. @L-M-Sherlock it's up to you. Implement the two-way version if you want

@richard-lacasse
Copy link
Author

Are people ever hitting Easy on high difficulty cards anyway? I know I never do. Those are the cards that have given me the most trouble.

@user1823
Copy link
Collaborator

But, it will take much longer to get near 9.9 because of the asymptote. Every change that's closer to 10 is smaller, so there will have to be a lot of Again button presses to even get near that.

I did an experiment to confirm that.

With my parameters and rating_sequence 3, 1, 1, 1, 1, 1, 1, 1, I got the following successive values of D.

Current formula:
5.3624, 9.0620, 10, 10, 10, 10, 10, 10

Proposed formula:
5.3624, 7.0779, 8.1582, 8.8385, 9.2669, 9.5366, 9.7065, 9.8134

So, even though the current formula makes the D equal to 10 just after pressing Again twice, the proposed formula takes D to 9.8 even after 7 Again button presses.

@L-M-Sherlock
Copy link
Member

Weighted by number of reviews

Model Parameters LogLoss RMSE (bins) AUC
FSRS-5-Linear Dampening 19 0.319±0.0053 0.049±0.0010 0.702±0.0028
FSRS-5 19 0.320±0.0052 0.050±0.0010 0.700±0.0028

The new median parameters: [0.40255, 1.18385, 3.173, 15.69105, 7.1949, 0.5345, 1.4604, 0.0046, 1.54575, 0.1192, 1.01925, 1.9395, 0.11, 0.29605, 2.2698, 0.2315, 2.9898, 0.51655, 0.6621]

The old median parameters: [0.4043, 1.18225, 3.13915, 15.6764, 7.2285, 0.4904, 1.0644, 0.0239, 1.6253, 0.1423, 1.0983, 1.95, 0.1014, 0.2998, 2.2834, 0.2407, 2.967, 0.5045, 0.66445]

@brishtibheja
Copy link

Maybe I'm just weird and this isn't a problem for other people.

@richard-lacasse I have known other users with this so it's not a problem specific to you. I like your idea, and don't think practically anyone would reach a high D value, then press Easy and fret about how DSR value changed.

@L-M-Sherlock
Copy link
Member

It is worth noting that w[7] is reduced from 0.0239 to 0.0046. And w[9] is reduced from 0.1423 to 0.1192.

@DerIshmaelite
Copy link

DerIshmaelite commented Oct 19, 2024

Moreover, this issue is really not that important because with the proposed change, having a card with D = 10 is difficult.
Even currently, I have only 171 cards out of 15000 cards that have D exactly equal to 10, even though 30% of my collection has D between 9 and 10.

Maybe I'm just weird and this isn't a problem for other people.

Screen Shot 2024-10-18 at 8 17 23 AM

This is what my D distribution looks like. When I do a search, 2650/8888 cards fall into prop:d=1, which is maxed out.

@richard-lacasse Nope you are pretty normal. In the current update, my new decks clock around pretty ungodly Difficulty values.
image

@richard-lacasse
Copy link
Author

the proposed formula takes D to 9.8 even after 7 Again button presses.

Can you run it to see how long it takes to reach 10 exactly? Mathematically that's impossible, but it will reach machine epsilon at some point and probably round up to 10.

@user1823
Copy link
Collaborator

Assuming that I wrote the code correctly, the D didn't reach 9.995 even after 2000 consecutive Again presses.

@L-M-Sherlock
Copy link
Member

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

@user1823
Copy link
Collaborator

It is worth noting that w[7] is reduced from 0.0239 to 0.0046.

Will it cause more complaints?

I don't think so. If D increases slowly on pressing Again, then it should also decrease slowly on pressing Good.

@L-M-Sherlock
Copy link
Member

OK, I will implement it in FSRS-rs and benchmark it.

@Expertium
Copy link
Collaborator

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

@L-M-Sherlock
Copy link
Member

You need to implement it both in the Rust version and in the Python version, otherwise the comparison won't be fair.

I just haven't pushed the updated python version to the repo. The benchmark result above is based on the Python version.

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Oct 20, 2024

image

With the Linear Dampening, I have more cards whose difficulty is in 10th bin...

My previous parameters: [1.0962, 1.543, 7.8692, 12.0038, 8.1849, 0.5031, 0.6852, 0.001, 1.3281, 0.0796, 0.8666, 2.5417, 0.0128, 0.2952, 0.7547, 0.0001, 3.2912, 0.1097, 0.6747]

My current parameters: [1.114, 1.601, 8.0182, 12.2391, 8.0261, 0.5454, 3.0807, 0.0026, 1.255, 0.0208, 0.7996, 2.5398, 0.0122, 0.3219, 0.3941, 0.0001, 3.6492, 0.102, 0.6778]

The w[6] increases significantly from 0.68 to 3.08.

image

The difficulty increases quickly, too.

However, it improves the metric:

image

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Oct 20, 2024

Bad news: the change makes FSRS-rs worse. I don't know why. I need to check the consistency between the Python version and the Rust version again.

Update: OK, I find out a serious problem and fix it:

@Expertium
Copy link
Collaborator

Expertium commented Oct 20, 2024

By the way, the model version will be updated to FSRS-5.5 if we introduce this change because I have released FSRS-5 in some packages.

Does this mean that the benchmark will have FSRS-5 and FSRS-5.5?
Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

@L-M-Sherlock
Copy link
Member

Also, maybe call it FSRS-5.1 and not FSRS-5.5? It's a pretty small change, after all.

It's not a small change because some parameters have been changed significantly. Actually, I guess we need to do something like:

@user1823
Copy link
Collaborator

some parameters have been changed significantly.

I agree. For example,

My parameters with current main:
1.0588, 4.7011, 31.5471, 79.2369, 7.561, 0.9454, 3.0717, 0.0, 1.8531, 0.3046, 1.3861, 1.6149, 0.0357, 0.4327, 1.8443, 0.0003, 6.0, 0.9915, 0.6734

My parameters with linear damping:
1.0652, 4.7012, 30.4195, 79.3164, 7.5425, 1.0393, 3.8364, 0.0, 1.8143, 0.292, 1.3124, 1.5111, 0.0276, 0.4369, 1.7987, 0.0003, 6.0, 1.013, 0.6778

So, there is a 25% increase in w6 and a 23% decrease in w12. The other parameters have changed slightly.

RMSE with old parameters evaluated using old optimizer: 0.0140
RMSE with new parameters evaluated using new optimizer: 0.0138
RMSE with old parameters evaluated using new optimizer: 0.0156

This means that if the user doesn't optimize their parameters after updating, their RMSE will increase. (This is true for almost any change to the algorithm, though.)

Actually, I guess we need to do something like:

Unfortunately, finding a mathematical relation between the old and new parameters will be very difficult in this case.

@Expertium
Copy link
Collaborator

It's not a small change because some parameters have been changed significantly.

But the impact on metrics is small, around 1%. I think FSRS-5.1 makes more sense than FSRS-5.5 in this case

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Oct 21, 2024

Another problem: what if the initial difficulty is 10? In this case, the difficulty is immutable (if w[7] is 0).

Fine. Now we have the reason to set a non-zero lower limit for w[7].

@Gilfaro
Copy link

Gilfaro commented Oct 21, 2024

Just as discussed in the other topic about w7 #695 (comment) this change has shifted most optimal value of w7 more into negative, so this means optimizer wants to shit all cards into D=10 and clipper being mode value means that for most of the people all cards will want to be at max D all of the time.

@Expertium
Copy link
Collaborator

Then I guess we shouldn't implement damping after all.

@Gilfaro
Copy link

Gilfaro commented Oct 21, 2024

The problem is not with the damping itself, but optimizer prefers D=10 with the current setup, so damping D will mean that optimizer accommodates by decreasing w7 even further till it clips to whatever value you set.
The best solution would be to investigate why and change the formulas so that it doesn't go into that optimization path.

@Expertium
Copy link
Collaborator

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

@L-M-Sherlock
Copy link
Member

image

A conservative converting is new w[6] = old w[6] + 0.5.

@richard-lacasse
Copy link
Author

I've experimented with dozens of changes to the D formula and nothing worked. At this point I'm just waiting for someone smarter than me to completely re-define D.

@Expertium I know you've tried R as a parameter in the Difficulty equation, but did you also take R out of the parameter list in Stability? D is a parameter in Stability, so if R is a parameter in both, that might explain why adding R to Difficulty didn't help anything.

@Expertium
Copy link
Collaborator

R affecting the increase in stability is one of the most important principles of the DSR model. Without it, we can't model the spacing effect since R depends on interval length, so removing it from the formula of S would mean that S doesn't depend on the interval length.

@richard-lacasse
Copy link
Author

S' depends on S, which has the interval length information, and if R is in the formula for Difficulty, and D is in the formula for Stability, then R is indirectly affecting the increase in Stability still.

I'm spitballing here a little bit, but this is the first thing I would toy with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
7 participants