Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in batsmanDismissals() #5

Open
npranav10 opened this issue May 6, 2019 · 7 comments
Open

Issue in batsmanDismissals() #5

npranav10 opened this issue May 6, 2019 · 7 comments

Comments

@npranav10
Copy link

First of all , I would like to appreciate your dedication in framing this package!. Hats Off!

Coming to the issues,
I have been trying to analyze Vijay Shankar's ODI dismissals. (http://stats.espncricinfo.com/ci/engine/player/477021.html?class=2;filter=advanced;orderby=start;template=results;type=batting;view=innings). and I came across 2 following issues:

(I have traced the function)

  1. batsman <- clean(file) .
    After execution of this line only one record stays in his data
    image
    image

Similarly with Rishab Pant
image
image

  1. If I manually remove the above line (batsman <- clean(file) and continue execution line by line, It looks like there is an issue that might occur with any player i.e The individual dismissal type % is calculated with denominator containing total no of innings played rather than total number of times dismissed

What I mean is that
His stats should read
Not out : 0 %
Run out : 40 %
Caught : 60 %
Refer
image

as opposed to current metrics displaying
Not out : 44%
Run out : 22%
Caught : 33%
image

@tvganesh
Copy link
Owner

tvganesh commented May 6, 2019

Will check. Not out shows up as a '*" which I remove. Yes transformations have to be done. Will look into it. Currently caught up in a couple of things. We cannot remove clean(file). We have to make it work with that.

Ganesh

@npranav10
Copy link
Author

Sure Mr Ganesh. I Will keep an eye on this page.

Pranav

@tvganesh
Copy link
Owner

tvganesh commented May 6, 2019

Looking at the data I see rows which have been removed have Mins as '-'. This is NA which R removes in clean(file). Did you check why rows 6,7,8,9 for Vijay has Mins as '-'?

@npranav10
Copy link
Author

npranav10 commented May 6, 2019

ESPNCricinfo (Match Scorecard) doesn't have the minutes played statistics for India's home series vs Australia

@npranav10
Copy link
Author

npranav10 commented May 6, 2019

Looking at the data I see rows which have been removed have Mins as '-'. This is NA which R removes in clean(file). Did you check why rows 6,7,8,9 for Vijay has Mins as '-'?

I can confirm that the issue exists only for players where there is "-" in "Mins" column for the innings they have batted.

image

image

Can't the clean function be executed without considering the Mins column?
Like replacing first line in clean function with
df <- read.csv(file, stringsAsFactor = FALSE)
df = df[c(-3)]
This works fine for me.
image
But have to check whether this holds good for other batsman functions too

@tvganesh
Copy link
Owner

tvganesh commented May 6, 2019

You can make your own function to only look at the dismissals column without the clean function. I may not add this to the package as this is an issue with the data. I cannot keep the package generic if I do changes which are unique.

@npranav10
Copy link
Author

What I actually thought was , given the fact that "Mins" data is inconsistent in ESPNCricinfo Statsguru, why do we need to consider "Mins" data at all. Why dont we drop it for all functions? Then it becomes easier to fill NAs for all other "-"s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants