Improve flips quality by introducing a "flips quality score" #96
Replies: 7 comments 12 replies
-
Would be useful to have some data on the economical technicalities. For example, will the |
Beta Was this translation helpful? Give feedback.
-
We can use present report method without need to go to other step. Will be needed only explain each button functionality and the purpose not to click them. Option 1: Coherent and imaginative below average = not reported / not approved (gray at present) 1/3 to report |
Beta Was this translation helpful? Give feedback.
-
I updated the proposal by adding the concept of "50% reports conversion rule" which would convert "reports" to "below average" votes for valid flips gathering 50% or more of reports from the qualification committee. |
Beta Was this translation helpful? Give feedback.
-
I actually agree with this idea, but why people with one bad flip didn't get reward at all :( at least need get paid even tho is only 1 idna what do u guys think? |
Beta Was this translation helpful? Give feedback.
-
I added two sections to the rationale. The first section tests how rewards are distributed between honest and colluding members in various adversarial scenarios. The second section tests how accurate the flips grading remains in these same adversarial scenarios. |
Beta Was this translation helpful? Give feedback.
-
Solid work, thanks. I like that you keep it simple for the regular users by making the grading session optional and making the flip quality choice binary. I support this proposal and believe it would benefit the network greatly |
Beta Was this translation helpful? Give feedback.
-
I came up with a similar experiment from ground up with publicly available LLM service and some primitive tools, with a perspective model in my mind. It successfully exports attributes suitable for the task, and I'm sure nurturing the model improve the score. |
Beta Was this translation helpful? Give feedback.
-
Abstract
Introduce an optional flip grading session during the long validation session. Flips graded above average pay six times more iDNAs than flips graded below average. Accurate grading of flips pays the same amount of iDNAs as accurate reports. Identities are assigned a "flips quality score" that is calculated based on the average grade received by their flips over the last six validated ceremonies. The number of flips an identity can submit is dependent on its "flips quality score".9 reasons to support this proposal:
How to support this proposal:
This is a community-driven proposal and it needs your active support for consideration as a formal IIP. The core team works hard to make Idena better every day. Undoubtedly, we can count on the core team to design and implement proposals and innovations that make Idena more competitive and attractive. Having a smart and talented core team doesn't mean that the network participants should passively stay on the sideline. If Idena aspires to become a digital democracy, its participants must become active contributors which include voicing concerns and proposing solutions for the core team to assess. If you agree with this proposal, please consider supporting it. Below are a few ways you can actively support this proposal:
Whether or not you agree with this proposal, your active participation will prove to all that Idena is a vibrant digital democracy.
Below are a few quotes to remind us that democracies can only be a reality if its constituents take an active role in its governance:
Motivation
Motivation 1: Improve flips quality to strenghen AI-resistance
Idena network security relies on the ability of the network to generate flips that are difficult for AI to solve. To be AI-resistant, flips stories need to be unique and unpredictable. One key mechanism that aims to prevent flips stories from being repeated is the "two keywords" rule which requires flip creators to integrate two randomly generated keywords into the story. Flips that don't contain the two required keywords are more likely to be reported. Despite the two keywords mechanism and additional rules, a large number of flips created are similar and predictable while being valid as per the flip creation rules. Because of their low quality, these flips are more easily solvable by AI which creates the potential for network security to be jeopardized. Also, many low-quality flips can easily be solved by a person without the need to read and understand the entire story conveyed in the flip. This promotes the ability for a person to more easily validate multiple identities. Valid flips of low-quality can be clustered in four main categories:
All the flips above are technically valid. Rather than adding new rules to the long list of existing rules, we propose to introduce a flip grading system that will incentivize identities to create high-quality flip stories. This system will penalize the submission of any four types (and more) of low-quality flips such as the ones described above. Also, this mechanism will increase the quality of the flips over time and will provide greater confidence that the Idena network is truly AI-resistant.
Motivation 2: Force flip farms to produce high-quality flips or have them face significant economic penalties
Flip farms are the main ones responsible for the mass production of low-quality flips. The goal of these actors isn't particularly to lower the security of the network. Rather, it is a consequence of them acting as rational economic actors within the boundaries of what is allowed them to do. There are at least three main drivers motivating flip farms from mass-producing low-quality flips:With this proposal, flip farms will have to choose between creating higher quality flips hence diminishing motivational factors 1 and 2, or keep on creating low-quality flips which will remove motivational factor 3. Indeed, this proposal introduces harsh penalties on flips rewards for actors that create low-quality flips (more about this in the subsequent sections). It also introduces a "flip quality score" that will further reduce the number of flips that can be created by actors who produce low-quality flips.
This dilemma won't be a minor consideration for flip farms since flip rewards represented 46.7% of the total validation rewards distribution for the top 10 pools in the last five epochs.
Also, the share of flip rewards for the top 10 pools is significantly higher than the share of flip rewards for the entire network, which indicates how important of an economic factor flip rewards are to flip farms.
For these reasons, we estimate that continuing to mass-produce low-quality flips under the proposal could be a significant threat to the economic model of the flip farms.
Conducted Research
First, we need to define what is considered a high-quality flip. In the context of Idena, a high-quality flip is a flip that conveys a meaningful story to humans while being hard for AI to comprehend. The meaningfulness of a flip stems from the coherence in the choice and order of the images. However, coherent stories that are often repeated (such as in meme flips) aren't difficult for AI to solve. For a flip to be truly AI-resistant, the flip narrative needs to be unexpected. This attribute can be reached through human creativity and imagination. Story coherence is a prerequisite to an imaginative story, however, not all coherent stories are imaginative. As such, a high-quality flip is a flip that narrates a story that is both coherent and imaginative.We put this grading framework to test on a series of 14 flips (Appendix A) of various supposed qualities to see what the grading distribution would look like. For each of these 14 flips, we graded the coherence and imagination level by responding to a series of questions aiming to assess these two attributes.
Grading of the 14 sampled flips by the degree of coherence and imagination of their stories:
The grading framework mentioned above is useful for precisely ranking each flip but it can be too complex to use by participants during the ceremony (too many questions to ask for each flip). However, this framework is useful to validate the relevance of the two criteria (coherent and imaginative stories). We will call this framework the "comprehensive grading framework". To facilitate the grading by participants during the ceremony we propose the following framework called the "simple grading framework":
The simple grading framework categorizes the flips into four main buckets. Below is how the 14 flips would be categorized following the simple grading framework.
Drawing a horizontal axis (from red to green) representing the increase in the quality of the flips, we can easily determine for each flip if their quality is below or above the average of the 14 flips.
Based on this research, we propose to reduce the grading options to only two by having participants answer the following question for each flip available for the grading session:
How coherent and imaginative is the story narrated in this flip?
The average is determined relatively to all the flips present during the grading session. We think it's important to use a comparative grading since each participant will have a subjective appreciation of how coherent and imaginative each story is. It also forces participants to make a distinction between lower and higher-quality flips regardless of their absolute quality. While the comprehensive and simple grading frameworks are useful to determine a precise flip grade, we recognize that in practice most participants may not have the time or make the effort to rigorously assess each flip criteria by criteria. As such, we want the grading question to be as simple and intuitive as possible by limiting the options to two. The concepts of "coherent" and "imaginative" stories are easily interpretable by any human and the nature of the comparative ranking doesn't require participants to either use the comprehensive or the simple framework (although the latter is quite fast) to come up with accurate results. These frameworks can be shared as ranking guidelines. See Appendix B for an example of how the question and grading guidelines could be presented during the grading session.
This grading system presents the advantage to be simple and intuitive for the participants while allowing them to accurately rank flips by their relative quality.
Specification
Grading session
We propose to add a "grading session" following the "report session". Each participant will be allowed to grade the flips that they haven't reported including the ones that they haven't approved. Participants will not be allowed to grade flips that they have reported. For each available flip, the participant gets to answer the question: "How coherent and imaginative is the story narrated in this flip?" by selecting one of the two options: "Coherent and imaginative below average" or "Coherent and imaginative above average". Similarly to "report credits", each participant has a limited number of "below average" and "above average" credits. We propose the number of "below average" and "above average" credits to each be a third of the number of flips available during the long session. Participants can skip the grading session altogether or grade as many flips as they wish within the limit of the available credits.Flips grading consensus
Only flips that don't get reported during the report session get a grade assigned to them. The grade of each flip is determined as such:There is no minimum committee size required for a flip to get assigned a grade. All flips that haven't been reported have a grade assigned to them.
Flip grading flow chart:
Also, we think that there is a particular edge case that could produce some odd flip grading results. This edge case would occur when a flip gathers 50% or more than 50% of reports from the qualification committee but would not end-up being reported either because the committee is too small (less than three) or it needs one more report. In this scenario, it is possible that the flip would end-up being graded "high-quality" as qualification committee members who reported the flip wouldn't be part of grading committee (for the flip they reported). For the scenario in which a flip has 50% or more reports but it still valid, we propose to convert each reports to a "Coherent and imaginative below average" vote. We will reference this rule as the "50% reports conversion rule". The identities who reported the flip would receive flip grading rewards for this flip if it is graded "low-quality" or "medium-quality". Below are a few examples to illustrate how the "50% reports conversion rule" would affect the final flip quality grade:
Reports/grading rewards
Participants must be honest when grading flips. As such we propose to incentivize honest committee members each time their grades align with the "grading consensus". To achieve this, we propose to modify the "reports rewards" fund into a "reports/grading rewards fund" so as to not create additional coins emission. A flip correctly graded pays the same amount as a flip correctly reported. Flips graded medium pays each committee member half a correct grade or report. This new rewards distribution also creates a more equal access to the reports rewards fund for all participants as each flip pay a reward as opposed to only reported flip. The difference in how much a participant can access from the reports rewards fund isn't dependent on the number of reported flips but rather the total number of flips present in his/her long session which tends to be more similar from one participant to another.Reports/grading payment ratio table:
Flips rewards
Currently, flip rewards are paid equally to each valid flip regardless of the flip quality. We propose to differentiate the reward amount based on the flip grade.Flips quality score
After validation, an "epoch flip quality score" is calculated for each valid identity. Each flip has a "flip score" associated with it. The "epoch flip quality score" is calculated for each epoch taking the average of all "flip score".In addition to an "epoch flip quality score", an "identity flip quality score" is computed by taking the average of the last six "epoch flip quality score". For identities that have less than 6 epochs of flip quality data, we take the average of all the existing "epoch flip quality score". The "identity flip quality score" is associated with an identity and will be used to determine the number of flip allowances for this identity for the next epoch.
Flip allowances
At the beginning of a new epoch, each identity is provided with a flip allowance which corresponds to the maximal number of flips an identity is allowed to create. The allowance is determined based on the "identity flip quality score".Currently, there is a disincentive to create more than 3 flips as creating more flips increases the chance to have one reported which means losing 100% of the validation rewards. With the flip quality score, we wouldn't want to discourage high-quality flip makers from creating as many flips as they are allowed to do. In addition, the new proposed system would confer to low-quality flip makers better odds of not having their validation rewards slashed. For these reasons, we propose to update how reported flips penalize validation rewards so that no matter their flip allowance, each identity will have an equal risk in regard to validation rewards slashing.
Rationale
1) CREATES AN ECONOMIC INCENTIVE TO PRODUCE HIGH-QUALITY FLIPS
To better visualize the impact of this flip grading and scoring mechanism, we modeled the difference in flips rewards distribution with and without the proposed design. To this effect, we set the four following identity profiles:
Assumptions of the model:
Impact on flips rewards distribution:
These charts show that the proposed flips scoring system does what it is intended to do as it distributes a higher share of the flips rewards over time to identities who produce higher-quality flips. Identities who don't improve the quality of their produced flips are severally penalized by the proposed design.
Impact on flips quality:
These charts show that even in a scenario in which the flips quality rate remains constant, the proposed system generate a higher share of high-quality flips over time through the flip allowances mechanism. That said, the main factor that will drive flip quality higher remains the incentives for low-quality flip makers to correct their behavior.
Evolution of flips quality score and flip allowances:
2) DISTRIBUTES MOST OF THE REPORTS/GRADING REWARDS TO HONEST PARTICIPANTS
The grading system couldn't be reliable if participants don't have an economic incentive to report and grade honestly. We wanted to verify for several adversarial scenarios to which extent the reports/grading rewards are distributed to honest committee members as opposed to colluding committee members. Below are the different parameters we used to build different adversarial scenarios:
Committee size: how many committee members are participating to the reports/grading sessions. We tested committee sizes of 4 and 6 members. We assumed the committee size to be fixed for all flips present for a given validation session.
Share of honest members: the share of committee members who report and grade flips honestly. Honest members are seeking to maximize their report/grading rewards assuming that the committee majority will be honest. We tested share of honest members of 3/4, 2/3 and 1/2. We assumed the share of honest members to be fixed for all flips present for a given validation session.
Share of colluding members: the share of committee members who collude by reporting and grading flips in a manner that goes against the report and grading rules/guidelines. In these scenarios, we assumed that colluding members aren't seeking to maximize their report/grading rewards. Instead, the main goal of colluding members is to favor flips of lower quality in an attempt to create a "low flips quality culture" among the network participants. We tested share of colluding members of 1/4, 1/3 and 1/2. We assumed the share of colluding members to be fixed for all flips present for a given validation session.
Number of flips in the long session: we choose to keep this parameter fixed at 12 flips per long session. Each committee member has 4 report credits, 4 below average votes and 4 above average votes.
Overall flips quality in the long session: long sessions can have a mix of different flips quality levels. We defined three flip quality levels:
Based on these flip quality levels, we tested three different mix of "overall flips quality" for a long session containing 12 flips:
We could also have tested a "very high" and "high" overall flips quality but decided not to do so because we wanted to keep the scenarios with a strong adversarial factor.
Report/grading strategy followed by colluding members: these are the rules the colluding members follow to decide which flips to report and how to grade flips. We assumed that all colluding members follow the same strategy in a given scenario. We defined two colluding strategies:
1. Don't report any flip
2. Approve all reportable and low-quality flips
3. Grade high-quality flips as below average until runs out of below average votes
4. Grade low-quality flips as above average until runs out of above average votes
5. Use remaining above average votes to grade reportable flips as above average
1. Report high-quality flips until runs out of report credits
2. Approve all reportable and low-quality flips
3. Grade non-reported high-quality flips as below average until runs out of below average votes
4. Grade low-quality flips as above average until runs out of above average votes
5. Use remaining above average votes to grade reportable flips as above average
In all tested scenarios, colluding members do not report and grade flips in the same order. For instance the first colluding member will start his/her report credits by going through the flip slots in sequential order starting with flip slot 1. The second colluding member will do the same starting with flip slot 2 (one flip slot increment from the preceding colluding member) and so on for following colluding members. The same incremental logic applies for the use of below average and above average vote.
Report/grading strategy followed by honest members: these are the rules the honest members follow to decide which flips to report and how to grade flips. We assumed that all honest members follow the same strategy in a given scenario. We defined two honest strategies:
1. Report reportable flips until runs out of report credits
2. Approve all high-quality and low-quality flips
3. Grade non-reported reportable flips as below average until runs out of below average votes
4. Use remaining below average votes to grade low-quality flips as below average
5. Grade high-quality flips as above average until runs out of above average votes
7. Use remaining above average votes to grade low-quality flips as above average
1. Report reportable flips until runs out of report credits
2. Approve all high-quality, low-quality and non-reported reportable flips
3. Grade non-reported reportable flips as below average until runs out of below average votes
4. Use remaining below average votes to grade low-quality flips as below average
5. Grade high-quality flips as above average until runs out of above average votes
6. Use remaining above average votes to grade low-quality flips as above average until runs out of above average votes
8. Use remaining above average votes to grade non-reported reportable flips as above average
In all tested scenarios, honest members do not report and grade flips in the same order. For instance the first honest member will start his/her report credits by going through the flip slots in sequential order starting with flip slot 1. The second honest member will do the same starting with flip slot 2 (one flip slot increment from the preceding honest member) and so on for following honest members. The same incremental logic applies for the use of below average and above average vote.
After testing many possible scenarios, it appears that only two parameters have a significant influence on how report/grading rewards are distributed between colluding and honest members. These two parameters are:
Below is a comparison of how much of the available report/grading reward each member types would earn in different tested scenarios:
Below is a heat map summary of the different scenario tested:
Key insights:
3) DELIVERS ACCURATE GRADING RESULTS AS LONG AS LESS THAN 51% COLLUDE
We also studied how closely to their true quality the grade assigned to flips would match in several adversarial scenarios. For the definition of the parameters used to build the different scenarios, please refer to the previous section. One new notion is introduced for this section:After testing many possible scenarios, it appears that only two parameters have a significant influence on how the flip grading accuracy. These two parameters are:
Below is a comparison of how accurate the grading is for each flips quality level in different tested scenarios:
Below is a heat map summary of the different scenario tested:
Key insights:
Conclusions:
We are highly confident that this proposed system would increase flips quality over time as we've proven that:As long as the network maintains a generalized level of collusion below 51% these effects would remain in place.
Appendix A
Appendix B
Example of presentation for the grading question:Example of presentation for the grading instructions:
Beta Was this translation helpful? Give feedback.
All reactions