Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OutcomePicker based on other users and their prediction history #259

Merged
merged 30 commits into from
Aug 29, 2022
Merged

Conversation

15sawyer
Copy link
Contributor

Pull Request Etiquette

Checklist

  • Tests have been added in relevant areas
  • Corresponding changes made to the documentation (README.adoc)

Type of change

  • Bug fix
  • New feature
  • Breaking change

Description

This PR adds two features:

  • Option to record other peoples predictions. These are taken from two separate sources:

    • Top-predictors from the event-update event.
    • From chat messages by reading the users prediction-badge.

    Addidtionally, the users approximate return-on-investment is calculated.

  • New outcome picker "mostTrusted". It utilizes above data to pick the outcome that is backed by the users with the highest average return-on-investment.

Sorry, something went wrong.

@15sawyer 15sawyer requested a review from Rakambda as a code owner August 27, 2022 11:31
@Rakambda Rakambda changed the base branch from main to develop August 27, 2022 11:41
@Rakambda
Copy link
Member

I started to review it a bit and I have to say that it's not that easy on GH x)

Overall it seems good just some parts that we probably have different views on like listeners.
To be honest the review started to be long and it's mostly just shifting some pieces of code from one place to another. I don't know if changes from maintainers are enabled but what I'll do it's just make the changes that I was reviewing and we can discuss about them. This way it'll be easier I think.

@codecov
Copy link

codecov bot commented Aug 27, 2022

Codecov Report

Merging #259 (2050696) into develop (78cb011) will decrease coverage by 3.86%.
The diff coverage is 61.48%.

@@              Coverage Diff              @@
##             develop     #259      +/-   ##
=============================================
- Coverage      87.58%   83.71%   -3.87%     
- Complexity       792      851      +59     
=============================================
  Files            164      170       +6     
  Lines           2287     2591     +304     
  Branches         157      175      +18     
=============================================
+ Hits            2003     2169     +166     
- Misses           241      379     +138     
  Partials          43       43              
Flag Coverage Δ
unittests-miner 84.19% <61.48%> (-4.02%) ⬇️
unittests-viewer 58.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...miner/api/chat/irc/TwitchIrcConnectionHandler.java 53.84% <ø> (ø)
.../data/message/ChannelLastViewedContentUpdated.java 0.00% <0.00%> (ø)
...sminer/miner/api/ws/data/message/EventCreated.java 0.00% <0.00%> (ø)
...sminer/miner/api/ws/data/message/EventUpdated.java 0.00% <0.00%> (ø)
...s/data/message/GlobalLastViewedContentUpdated.java 0.00% <0.00%> (ø)
...ntsminer/miner/api/ws/data/message/StreamDown.java 0.00% <0.00%> (ø)
...ointsminer/miner/api/ws/data/message/StreamUp.java 0.00% <0.00%> (ø)
...sminer/miner/api/ws/data/response/MessageData.java 100.00% <ø> (ø)
...nelpointsminer/miner/database/MariaDBDatabase.java 0.00% <0.00%> (ø)
...nnelpointsminer/miner/database/SQLiteDatabase.java 0.00% <0.00%> (ø)
... and 56 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Chat API:
* IRC -> split reconnection logic from chat parsing logic.
* Each chat implementation each have their own list of listeners and handles it by themselves.
* Each chat implementation will initialize itself properly depending on if we want to listen for messages or not.

Miner:
* Miner will instantiate the chat according to the config and add a listener to transform messages into events.

Event:
* EventUpdatedEvent won't be logged, it is too verbose and not of much use, it'll be used purely internally for database stuff.
* Event is handled later on to be put into DB.
* That event handle will be responsible for holding the actual DB connection rather than handling DB close to the chat api.

DB:
* By default create a No Op database that won't do anything
* Max pool size in config
@Rakambda
Copy link
Member

So I basically did a massive refactor. What you added actually touched on things that were not planned at first (like using the DB in a read mode somewhere in the code). If you want more details on what I did, it's in the commit message.

Overall this is the same logic as what you did just that pieces of code are moved a bit. Mostly trying to separate the chat API part from the DB part.

I haven't really looked at the logic behind the DB / OutcomePicker yet. I wonder if you have some stats on how it performs ?

@15sawyer
Copy link
Contributor Author

So I basically did a massive refactor. What you added actually touched on things that were not planned at first (like using the DB in a read mode somewhere in the code). If you want more details on what I did, it's in the commit message.

Overall this is the same logic as what you did just that pieces of code are moved a bit. Mostly trying to separate the chat API part from the DB part.

I haven't really looked at the logic behind the DB / OutcomePicker yet. I wonder if you have some stats on how it performs ?

Separating the Chat and DB is definitely good.

I have made some tests on channels that do a lot of predictions. The first results are promising, but I haven't tested it enough to be able to be more specific.

@15sawyer
Copy link
Contributor Author

I encountered a small bug. The Database throws an error if a string contains characters which require utf8mb4:

java.sql.SQLSyntaxErrorException: (conn=3940) Incorrect string value: '\xF0\x9F\x91\x91 \xC2...' for column miner.ResolvedPrediction.Title at row 1

We can either change the character set of the table or clean the title. I think this is the only field that is affected by this.

@Rakambda
Copy link
Member

Rakambda commented Aug 27, 2022

Defining it as utf8mb4 seems ok.

(you can actually set it that way for every table to avoid more problems of that kind in the future)

@Rakambda Rakambda self-requested a review August 27, 2022 19:34
@Rakambda
Copy link
Member

The DB part seems to work fine. Probably will merge it in not too long. Will have to see if this is effective or not but it isn't actually that important, in the end it'll be people's choice to use it or not.

Someday I may refactor tables to merge the Prediction & ResolvedPrediction as they basically hold similar info (could add a "placed" & "placed-badge" columns for our own prediction and we could find back the same info as in the Prediction table). But that's for another time and doesn't really is in the scope of that PR.

var outcomeStatistics = database.getOutcomeStatisticsForChannel(bettingPrediction.getEvent().getChannelId(), minTotalBetsPlacedByUser);

var mostTrusted = outcomeStatistics.stream()
.max(Comparator.comparingDouble(OutcomeStatistic::getAverageReturnOnInvestment))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be interesting to have the comparator configurable. Like there could be a field StatisticComparator comparator that the user can set through the config and would be passed to the comparator.

@RequiredArgsConstructor
@Getter
enum StatisticComparator {
    ROI(Comparator.comparingDouble(OutcomeStatistic::getAverageReturnOnInvestment)),
    WIN(Comparator.comparingDouble(OutcomeStatistic::getWinRate));

    private final Comparator<OutcomeStatistic> comparator;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out with the idea to compare the win rate. The reason why I dropped it is that what matters isn't the win rate, but how many points were earned. For example, if a user has the habit to bet on an outcome that has a 90% win chance but only a return ratio of 1.05. It's mathematically better to go with the user who bets on the outcome with 10% win chance and a return ratio of 20.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still having the choice is a good option I think. Can be set to ROI by default.

Maybe I'm wrong but it feels like it could be abused somehow. Especially because a bet itself is not considered worthy or not. By that I mean that if someone have a ROI of 10 because they bet 10 points or because they bet 50k is the same.
You can imagine that I'm an annoying user and will only bet on predictions that'll have a massive ROI (let's say 15), and bet only 10 points each time. I'm not risking much by doing that and overtime it could give me a somewhat nice ROI (if I win at least once every 15 bets, my ROI will slowly increase). However the win rate will be going towards 0. If the bot follows that user with predictions with higher amounts, like maybe 2% of your total amount, you may quickly drain your wallet as you'll lose a lot, then win big but on a lower base amount and repeat.

This scenario (having a user that is doing weird stuff) is quite unlikely I admit it, but I still prefer to give the choice to the user to find what fits their needs the best. Every channel is different and the people doing predictions too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, giving a choice is fine. Especially if the predictions are mostly 50/50.

I think I understand your concern about predictions with a huge difference in return ratio between outcomes. Although I don't think these are common enough to bother. And wouldn't it be best, in that case, to not bet at all? If you bet on the prediction with the higher return-ratio but low win-chance you run the risk of bankrupting yourself fast (but with the chance to win big), if you bet on the one with the lower return-ratio and higher win-chance you slowly but surely drain your points.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make a configurable limit to the return ratio. E.g. Don't bet on outcomes that have a return ratio of > 10, even if trustworthy users bet on it.

var outcomes = bettingPrediction.getEvent().getOutcomes();
var title = bettingPrediction.getEvent().getTitle();

var outcomeStatistics = database.getOutcomeStatisticsForChannel(bettingPrediction.getEvent().getChannelId(), minTotalBetsPlacedByUser);
Copy link
Member

@Rakambda Rakambda Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to have a configuration to keep channels with only a winRate greater than what is defined.

For example if a channel has only 10 betters all the time but they always lose, I don't want to follow their choice.
(Could be set to something like 0.1 by default, like that only very bad betters are excluded even if they have a high ROI [they could have won once a super high ROI bet, like 100 or something, but then lost quite a bunch])


var outcomeStatistics = database.getOutcomeStatisticsForChannel(bettingPrediction.getEvent().getChannelId(), minTotalBetsPlacedByUser);

var mostTrusted = outcomeStatistics.stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping the n most trusted users and keeping the outcome chosen by the most of them ?
(Could be set to 1 by default to keep the same behavior as now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it calculates the average over all users, so a high n would result in the same behaviour as now if I understand correctly. Still, this is worth considering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I forgot it was done in the SQL request itself.
But seems interesting to have it configurable yeah. Probably needs more thinking. Maybe the other confs (like min win Rate etc will be enough).

@Rakambda
Copy link
Member

I guess it'll be good for that PR. If we need to change the outcome picker we can do it in another PR.

That way data can start to be collected, and refractor changes will be available for other devs.

@Rakambda Rakambda merged commit 4e4c638 into RakambdaOrg:develop Aug 29, 2022
@Rakambda
Copy link
Member

Changes not resolved are listed in #260. I'm merging this now so that data can be gathered.

@Rakambda Rakambda changed the title Record chat members predictions + 'mostTrusted' outcome picker. New OutcomePicker based on other users and their prediction history Aug 29, 2022
@Rakambda
Copy link
Member

Rakambda commented Aug 30, 2022

@15sawyer You'll probably have issues with recent commit, I changed the DB schema to add a channelId in the PredictionUser table.

To me it makes more sense that each user have a ROI/WinRate that is defined per channel instead of globally. They could be great betters on a channel and also super bad betters on another one.


To update:

alter table PredictionUser
    add ChannelID VARCHAR(32) not null after Username;

drop index UsernameIdx on PredictionUser;

create index UsernameIdx
    on PredictionUser (Username, ChannelID);

alter table PredictionUser
    drop key Username;

alter table PredictionUser
    add constraint Username
        unique (Username, ChannelID);

alter table PredictionUser
    add constraint PredictionUser_Channel_ID_fk
        foreign key (ChannelID) references Channel (ID);

However the foreign key will fail and stats are like reset.

@15sawyer
Copy link
Contributor Author

@15sawyer You'll probably have issues with recent commit, I changed the DB schema to add a channelId in the PredictionUser table.

To me it makes more sense that each user have a ROI/WinRate that is defined per channel instead of globally. They could be great betters on a channel and also super bad betters on another one.

To update:

alter table PredictionUser
    add ChannelID VARCHAR(32) not null after Username;

drop index UsernameIdx on PredictionUser;

create index UsernameIdx
    on PredictionUser (Username, ChannelID);

alter table PredictionUser
    drop key Username;

alter table PredictionUser
    add constraint Username
        unique (Username, ChannelID);

alter table PredictionUser
    add constraint PredictionUser_Channel_ID_fk
        foreign key (ChannelID) references Channel (ID);

However the foreign key will fail and stats are like reset.

Makes sense, although I wonder if this is very common

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants