-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bot Races, Scope, the Future of Judging #138
Comments
I'd like to bring up something I've suggested in the past
This would be much more efficient, since the ev for each bot crew is much higher. Currently the payout is split between 10-20 crews, in my suggestion one bot crew gets all of the payout. |
TL;DRI am a huge proponent of a public goods static analyzer as I believe a collective open-source community is the best way to grow a viable static analyzer that can reliably detect a lot of cases. Ultimately, an open-source static analyzer would excel and overshadow any privately developed project in the long term and be a net positive for the security community. Points of DiscussionThe main problems I see with this particular proposal that are not directly addressed are as follows: Problem 1: What happens to existing bot racer tools?The experiment has spawned multiple bots that will ultimately end up in a graveyard. This seems to be a bit unfair to bot racers who have worked on their projects from scratch. I understand they have been rewarded via the bot races, but I think there will be major pushback due to this point not being addressed. Solution 1: Priority of Detector IntegrationA potential solution to this problem would be to provide bot racers with a priority in integrating their detectors to the 4nalyz3r (or whichever detector ends up being the base for C4). In turn, this will allow them to have their detectors rewarded by the scheme described in the proposal before other wardens can contribute, effectively converting their invested effort into real value for both C4 as well as themselves. Problem 2: Reward per selectorI believe that a $100 reward per selector is low, especially when factoring in complex detection cases that should also eliminate false positives. A selector will most likely require a few hours to create and this would in turn make the hourly rate of wardens contributing to these selectors very low. Solution 2: Incremental reward per selectorA curve should be defined for the "reward model" of selector implementations for the analyzer. As more selectors are added, the reward increases organically. This would ensure that "quick" selectors are added first with a lower price-per-selector while more complex selectors are added later as the simple cases are covered closer to 100%. Problem 3: Who implements the selectors?Presently, bot races led to the creation of the same detector across multiple codebases each with varying results. This would translate into a real problem for this initiative whereby a single selector would have multiple proposals as implementations for the reward pool. Solution 3: Maintainer SystemI think a maintainer system should be put in place similar to bot races whereby a subset of wardens is eligible for submitting their detectors for the After the maintainer system is established, a similar system to judging and presorting can be set in place whereby the maintainer with the least rewards is selected as the next implementor as long as they illustrate willingness and so on. ConclusionI ultimately think this is the way forward for C4 to minimize the burden on judges and pre-sorters while actively contributing to the security community as a whole. While some problems exist in killing the bot race program directly, I believe we can come up with incentives for existing bot racers to offset these problems and thus create a fair path forward for C4 and this experiment's conclusion. Should we proceed with this path, a different organization issue should be opened to discuss the right way to judge selectors, how they should be rewarded, etc. I think this issue should contain itself to discussions around killing the experiment and how that would be perceived by the community, C4, as well as C4-affiliated members (judges, pre-sorters, bot racers, etc.). |
I disagree with offering moats for no clear outcome The same logic could be solved at the 4nalyz3r level by rewarding contributors for open source work, which helps C4 (removes spam issues), and helps the community for a fraction of the cost |
I don't believe that C4 has any obligations towards Bot Racers, just like they have no obligations towards Wardens and Judge I'd be happy to hear more thoughts around how to make OSS fair and paid in a way that incentivizes them, without causing the same issue of leeching. |
I believe the main problem with this train of thought is the assumptions around the lifetime of bot races. A better analogy here would be the announcement of a C4 contest of a specified pot, multiple wardens contributing their findings, and the contest ultimately being canceled with no rewards. The above would be unfair because the wardens participated in anticipation of receiving (or at least participating in the chance to) a reward. Similarly, bot racers have been developing their tools to capture value and acquire income. While we can argue that bot racers have been compensated via the races themselves by receiving rewards, we would not be accommodating to prospective bot racers who never had a chance to compete but did develop their tools. These individuals would have the entirety of their work dismissed with no form of compensation even though they anticipated one. I proposed two ways we can fairly incentivize these individuals, ensuring that they can convert their effort into OSS value while being compensated for it:
In any case, I think we should offer some form of advantage to bot racers over normal individuals who want to contribute to the |
You have a point, as long as we don't care for having lots of false positives then the 4nalyz3r is good enough. Btw, if we're going to build a public analyzer then maybe we should base it on Slither, which does actual parsing of the code. But that might be more difficult to maintain. |
I think the Slither is already an open-source initiative but it is seldomly contributed to not only due to the absence of a reward but also due to the fact that its code is hard to grasp. |
First of all, I agree that the experiment failed. Overall, the costs in time and money associated with the bot race outweight any advantage gained from the bot (more work for judges, smaller pot for wardens, no novel findings introduced etc...) It seems incentives are not there for bots to reach the next level of complexity, while the mechanics make bots converge to the same spam finds. There would be more value to everyone if the sponsor just gets a pre-audit QA report which you can get off the shelf for $500. The suggestion to revert to analyzer is a good one to have a baseline reduction of spam, more broadly the concept of a QA master / Gas master in charge of the respective races needs to be incorporated to allow the highly qualified judge role to focus on the main race. |
I agree that the mechanic has been broken for a long time. If you look at the first ~10 races, I was adding new and useful unique findings almost every race. Suddenly when Alex stopped being the judge, and judges unfamiliar with findings were scoring things, my ranking dropped, and from then on, I've had to redirect significant effort, not only to tooling for judges (which C4 is reluctant to endorse despite judges actually using and liking it), but to writing extra 'dispute' rules, in order to compensate for C4's unwillingness to directly address bots with a lot of invalid findings. Bots were winning by overloading judges with a smattering of invalid findings which I was excluding, rather than actually winning on real findings. It appears from this org issue that I've wasted my time. As I've said before, I won't be contributing to any open source tools/lists, and I suspect a chunk of the other racers won't either. |
Sad to see the experiment's result, this was an interesting one to follow. I would argue that Slither should be used here (full disclosure, I am Slither's author). Slither has significantly more capabilities than any other existing static analysis tools. It is more than an AST visitor: it has its own intermediate representation, inbuilt data dependencies, and has already 90+ detectors. Slither has been open source for years, and is already largely used by the community, with +100 contributors to it. We (slither's team) would be more than happy to make its integration with code4rena easier, and increase its documentation for external contributions (if it's a blocker). We could even develop private capabilities for projects going through cod4rena if cod4rena is looking for it (ex: to ease the triage). The reality is that maintaining a static analyzer is a huge effort, and something like "$300 / $1000 per contest" / "$100" per selector, is nowhere close to what is needed to maintain such a tool beyond the MVP phase. Relying on Slither, which is already maintained, is a better long term solution for the community imho. |
Current IssuesI agree that bot races are currently broken, but removing them entirely will not fix these issues. Most of the best QAs before bot races existed, were actually submitted by bots. Now that bots are so advanced, no warden will be able to earn any money from QAs due to the sheer amount of issues submitted by bots. Most of the "abuse" cited is due to the lack of rules for bot crews. There are a lot of controversial things happening, including:
These might be considered abuse, but they are actually expected by C4 and within the rules; most bot crews are already doing it. My Opinion on Proposed Solutions
Possible Solutions to Fix Bot RacesThe main goals here are to cut judging time, improve the value added to the sponsor, reduce spam, and enhance the competitiveness of the bot races.
|
I agree with @GalloDaSballo that bot races are a failed experiment and with his suggestions on how to fix / replace it. Let's not get lost in the weeds or specifics. The goal of bot races was to decrease spam and overall judging effort. This has not materialized and it's time for us to declare the experiment complete. |
I have an idea that attempts to find a sweet spot between incentivizing the development of powerful automated analysis and greatly reducing the associated costs, regardless of the tool that ends up being used. I agree incentivizing new detectors upfront will not work for the reasons already mentioned unless we can offer larger grant-like incentives, and even then these efforts would have to be manually coordinated and the process would not benefit from the competitive incentive model that has brought Code4rena where it is today. Instead, I propose we still keep a portion of the bot races budget and award it to valid bot findings from one canonical tool to which people contribute their detectors. The bot report would still be judged and its findings assessed as to their validity and awarded shares of the pot according to their severity, as with the H/M pot. Rewards would be paid out to the author of the detector, which would turn the task of contributing detectors from pro bono work to a nice passive income generating opportunity with upside potential (as the number of contests on code4rena increases) I imagine this would incentivize the development of new detectors early on as the payout will likely be concentrated on fewer findings. It would also have the effect of creating a race to be the first to implement detectors that most bot racers have already developed. As far as I can tell, both of these effects are desirable. Lastly, there's the problem of disicentivizing spam findings, which an approach like this would immensely exacerbate. One idea is to define a cumulative total false positive rate
|
I'm currently writing my own bot, and I think ending the bot race will create a lot more spam than you expect, especially from partially implemented bots like mine. I find it really annoying how many incorrect submissions and spam come from the winning bots, and everyone agrees they should definitely be penalized, not rewarded, for this. Both bot racers and judges are wasting time here. I truly believe bot races are a step in the right direction, and I don't believe in a community-based approach. Instead, I think we should standardize bots and detectors: known detectors should have an ID, a title, and a description defined by C4, and some tools should be used to do 95% of the judging and scoring. While some may argue that this could help less advanced bots, I believe it will drastically improve the quality of submissions and foster competition for very accurate and innovative detectors, rather than just the number of submissions. |
We could debate endlessly on this. On my end, I fully support @GalloDaSballo's proposal. The consensus is clearly that this experiment has failed, so let's not waste any more time and call it a day. As suggested let's use 4nalyz3r or slither and pay $100-200 per merged selector and advise in a few months. In the worst case, it will be another interesting experiment that has failed; in the best case, we'll have contributed to a cool public good. |
I think this is an interesting conversation and I hope it continues. Even though the post is polemic, I agree with many of the critiques. At the same time, I’m not sure I agree with the conclusion and I disagree with at least four of the assumptions:
This thread is driving at closure and conclusions, but from the big picture of Code4rena, I am not presently convinced of any specific solution. We’re not going to make a reactive change here. We’re going to be deliberate, listening to fact-based pros and cons people share rather than any specific conclusion or suggestion. What would be extremely helpful is to see two new issues started and linked to this one:
Both should be focused as much as possible on brief statements of facts and evidence, minimizing duplicate comments as much as practical. Then we will have something useful from which to proceed. — Some important asides: As an interdependent community, I don’t think anyone is served well by viewing this oppositionally or through the lens of factions. I don’t think it’s appropriate or correct to label bot racers as professional scammers. I apologize that I didn’t say that when you shared this with me before posting it, Alex. I have respect for the people doing this work, and this forum is not an acceptable place for this kind of labeling of fellow community members or their motivations. I didn’t fully understand your intent was to immediately post this, but I should have given you that feedback and for your sake as well as bot crews, I’m sorry to all that I didn’t. I take responsibility for that. I think as lllll points out and dade agrees that there are some perverse incentives in bot races that have never been adjusted. Changes requested directly by bot racers to score penalties haven’t been implemented largely due to a combination of consent (judges/lookouts have a right to consent to what they’re being asked to do) and the sustainability question that underlies that. |
Really glad to read this. There are some valid points being raised, but I think there's room for different recommendations and conclusions. Because so many issues are being presented simultaneously here, and given it is much easier to dismiss something than to build a compelling case for that same thing, the time to provide thoughtful feedback is much appreciated. |
Woah, alot of critics and suggestions here. I'm not just proposing a solution i'm just pointing out what would be better for C4, Bot racers, protocols and community. One of the competitive audit platform has introduced their own bot which is run on every audit by the platform itself. If c4 had applied the same technique it would have been so much better. As pointed out by other bot racers here that the current bot race mechanisms are broken and it's true. How can we get rid of this broken mechainism and find a way that's better for everyone? I'm partially agree with Gallo that C4 should have dedicated bot but not with the idea of getting rid of bot racers. How can we acheive this by having a dedicated c4 bot and incentivize existing bot racers as well for the time they've invested. The proposed suggestion would be c4 should work upon their dedicated c4 bot, whether its 4naly3er, Slither or a bot built from scratch, and hire only existing bot racers to sharpen c4 bot with their powerful detectors and get incentivized on detectors based like whenever there's bot race for new contest bot racers get incentivized for their detectors if their detectors found bug, and then open a Bot for everyone to contribute after some time when C4 bot reach to the level of exisitng top notch Bots. The reason i'm emphasazing on dedicated c4 bot is because every protocol deserves a fair bot race. What's mean by fair bot race? Let me explain.... Suppose there a unique detector to catch a unique bug in staking protocol in one of the great bot racer's bot let's assume it's lllll's bot as he said above that he was introduing unique detectors in every race. But next time, he couldn't find time to participate in the next staking protocol and that unique bug that could be found by a Bot would now fall into HM's category and Pot and I think it's unfair with the protocol that tha bug that could be covered in Bot Race Pot now will be allocated from Protocol's HM Pot. And MiloTruck also mentioned in his blog that one of the reason he quit bot racing was just because of unmatching time co-ordinates. Suppose, again, Milo's Bot is one of the best in the town but he just can't participate due to time constraints. Possible Suggestions
Possible Outcomes
Last Note For Bot RacersAs there were contradicts from bot racers that they won't be contributing to any other tool, so C4 couldn't get stick to anything that is not working and will get improvements. (Full disclosure, I'm not againt bot racers and I respect for their work) |
Not to mention the debate between Python and TypeScript, how are core changes compensated? Imagine implementing or enhancing Object/Graph/comment parsing capabilities in 4naly3er, maintaining over 1,000 detectors, or adding Vyper parsing. A cost of $100 for each of the 1,000 detectors amounts to $100,000. This does not account for multiple iterations on the same detector, core changes, complex detectors I wouldn't implement for $100, and the proposed hypothetical passive income.
C4 is a competitive place and many are here of us for this: winner and hard workers take most. If I want to do pay for detector jobs I would go to Fiveer. That's why I would prefer advocating for normalization and C4 tooling for merging/judging findings, which ultimately creates the same result: a competitive C4 bot based on external bots competing against each other. This can start small with common issues and probably all |
Hello everyone, I hope you are all doing well. Here i want to say my opinion about the bot-races. How can we get the most return from the bot-racers ?
Let's say 1 medium issue found in the race, and 20 low issues. This motivates the racers to implement more Highs, Mediums and Lows.
Now the racers tries to implement more and more NCs to get higher ranking, but if we have a specific and small pot for NCs, then the racers will stop adding NC's and they will focus on Lows, Mediums and Highs. |
@GalloDaSballo an excellent write up there!
Any open source proponent will whole heartily agree, supporting an open source static analyzer is great for the common good. Due to the competitive nature, bot races can only incentivize the private development of closed source bots, with C4 currently provides both direct (via rewards) and indirect (judge / finding feedback) funding. It make sense that C4 being an enterprise, would review/evaluate the cost/benefits of the product offering, and @sockdrawermoney outlines a sound approach for data gathering for a data driven decision
Looking forward to see the followup actions to this discussion 👍 |
@GalloDaSballo @Picodes @trust1995 @alex-ppg The only motivation for me is to remove the ridiculous bot-race. |
I finally read most of this thread. I would just like to add that many here make their claims based on the assumption that bot racers do not provide value of their own free will. But in reality this is not so, I created my bot even before bot racing, and my goal was to create an effective scanner capable of finding complex repetitive vulnerabilities. But later I joined the bot race and was forced to abandon my goal, because the economic initiative, controlled by the judges, forces me to add useless copy-paste checks, 20 NatSpec instances, etc. Only this allows you to stay in the upper half of the 20s. I and other wardens have complained multiple times on discord that judging results in reports that provide less benefit to the sponsor. At the moment, I'm just tired of having to add useless, copy-paste checks, and I'm currently not putting much effort into improving the bot. |
Executive Summary
My argument is that Bot Races are a waste of money, and the experiment should end.
I will argue why and offer a path forward as well
I recommend Sponsors and Judges to comment on this issue as a means to clarify whether they have a different opinion
I urge Wardens to send more instances in which Bots have caused issues in judging as a means to learn from past events
Why did Bot Races start?
Bot races where created to:
Did Bot Race Achieve Any of those goals?
New Audits Approach
Bot Races demonstrated that some bot automation can achieve a good enough result, quantifiably (from talking to racers) to between $50 and $1k in value.
That said, most bots are offered as pre-sales to projects, simply because no bot has offered any value that come close to manual review or more refined techniques such as invariant testing.
To this regard, bots have failed in showing a "new way for audits", they are at best a Certik / Source Hat substitute
And more generally, they are just a prettified checklist, something an intern could check given some guidance.
Have bots opened up a new source of offerings for C4?
I don't believe so.
As any offering of such low quality would come with high reputational risk, of giving automated stamps to scams, having a high % of exploits, both of which are not desireable.
Did the Spammers Stop?
Hell no, need I remind you that we've all been judging 1.5k+ findings contests consistently?
The amount of spammers has stabilized to a percentage of wardens, but the amount of spam has grown (in absolute terms), along with the growth of wardens
From ChatGPT crap, to the usual "quick qa scam", we are pretty familiar with the leeches, this is a valid problem
That bots didn't really help with
About Bot Races
Bot Races take, on average, 4 hours to judge
Funnily enough that's longer than the duration of the race itself
Bot Races have, on average a 5%+ of the total pot, many times offering a pot that is greater than the QA and Gas Pot Combined
Negative externalities of Bot Races
The additional work for presorters / judges <- Which is an obfuscated cost to the org, that is more than the Bot Fee
The cost to the sponsor <- Which is not in line with the value, for example auditbase would cost $150 / month (let's say $150), for 80% of the value provided by all bots
The constant change of scoping <- Many important findings have been either made OOS or in Scope because of Bot "mistakes"
The additional complexity in judging and scoring <- Changes to the baseline "known findings" imply a re-adjustment in judging all other reports
Scamming as a profession
There have been multiple instances of Bot omitting "obvious findings"
Some examples are (Please send more examples in the comments):
Some of the time it was possible to even see the Warden that "omitted" the finding, submit it in the main contest
This is an example of conflict of interest that stops working the second we question the morality of some wardens
We all understand that it is in the interest of the Bot Racer to omit a finding and then submit it in the main race
Examples of this have happened many times in the past, since the community get's used to something like admin privilege or fot always being QAs, they stop sending them.
Investigations
DELEGATE - Omissions - Abused afaict
https://gist.github.com/GalloDaSballo/378b21840f1ce063a5126f9785f920bc
IMPACT:
ENS - False negative - Not abused afaict
Bot missed the lack of safe transfer, actually saying it was disputed
code-423n4/2023-10-ens-findings#91
https://github.com/code-423n4/2023-10-ens-findings/issues/
IMPACT:
NEXT GEN - Mitigation would cause critical issue
code-423n4/2023-10-nextgen-findings#1045
In this instance, implementing the suggested fix at face value would open up the sponsor to a critical issue
IMPACT:
Conclusions
Bot Races introduce an issue related to the agency of the Sponsor
Some mitigations from Bot Races may be considered "obvious", others not so much
Because of the strict cadence between Bot Races and the Main Contest, this issue of agency is accentuated.
They create further complexity in an already complex judging environment
Solving the Problems
The real problems for C4 are:
Solving Scope
We can instantly solve scoping issues by falling back to 4nalyz3r
This will ensure that all contests have the same scope
All of the same OOS
We can change 4nalyz3r every 3 months or so (seasons if you will)
In which the bot will change
Spam
With the Supreme Court we defined a very lax QA judging process, that allows a Judge to skip many low quality spam submissions, as well as punish low inventiveness QA reports at their discretion.
This rule would entirely replace the need for a bot race output, ultimately the judge could "set the bar" as high as they see fit, and then score reports that surpass that base level
In lack of any evidence that bot races have removed spam, we already are using this heuristic, so no change is required
Long Term Meta Changes
This is the only reason why bots may have been a good idea, they could adapt and improve over time
In reality they have hit a cul de sac and some have even regressed
The path forward that I see is to take a very small amount from the C4 fee, something like $300 / $1000 per contest, and put it as a "public good pot"
This pot could be stewarded by 4nalyz3r maintainers and used to fund a few selectors over time.
For example, each new selector that is uniqu and is merged could be paid $100
This would require the work of the maintainer and 10 seconds of any competent judges time
There are many criticism for this.
But the alternative has been spending close to $200k for NOTHING.
ZERO.
NADA.
Bot racers have been paid to create their own moat, for this privilege to be abused, to no advantage to the organization, judging and the sponsor.
It's time we take the power back and give the money back to the main pot.
Conclusion
No longer offer bot races
Have Masons run 4nalyz3r, at no extra charge
Collect $300 / $1000 per contest
Define a simple $100 per selector, paid only on PR system and iterate on it
The text was updated successfully, but these errors were encountered: