Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stylistic and code consistency rules #309

Open
25 tasks
malcolmbarrett opened this issue Jan 4, 2025 · 10 comments
Open
25 tasks

Stylistic and code consistency rules #309

malcolmbarrett opened this issue Jan 4, 2025 · 10 comments

Comments

@malcolmbarrett
Copy link
Collaborator

malcolmbarrett commented Jan 4, 2025

This is a sort of internal style guide for various decisions we need to make. All of these are open to discussion, but I want to be consistent for all of them. This issue also serves as a final check when the first edition is done.

Code

  • Use grkstyle::grk_style_dir() to style code
  • Use tibble() instead of data.frame()
  • Use summarize() instead of summarise()
  • Use if_else() instead of ifelse()
  • When grouping by a single variable, use group_by() over .by
  • When grouping by more than one variable when you don't need persistent grouping, use .by or group_by() with .groups = "drop" instead of ungroup(). Prefer .by unless group_by() is more readable for the code, e.g., a very long or complex summarize() statement
  • augment(data = ...) vs. augment(newdata = ...): For cases when we are supplying the original data frame (which we need to do to keep the variables that weren't in the model), use the data argument of augment() instead of newdata. newdata should only be used when it's actually a new data frame, as we do in g-computation cloning.
  • Use |> instead of %>%
  • Use slice_sample() instead of sample_n(), sample_frac() (because it supercedes these per the documentation)
  • For comments, use all lowercase (except proper nouns) and one # at beginning
  • Use explicit testing of 0 and 1, e.g. if x = 1, write if (x == 1) rather than if (x)
  • For case_when(), use .default = default_value rather than TRUE ~ default_value

Quarto

  • Prefer cross-references over writing that we discussed/will discuss something somewhere else in the book
  • For code we are not showing, prefer code-fold: true over echo: false. This way, we can de-emphasize the code while providing easy access for those interested
  • Use sentence breaks (a line break after every sentence) for easier git usage. You can do this automatically by changing your settings in RStudio to use sentence breaks and turning visual mode on and off.
  • For package names, write it as `{pkg}` (which will automatically link to the package website) the first time you mention it in a chapter; thereafter, don't use formatting. Just write: pkg.

Writing

  • Check that anytime we type "casual" we mean it. Common typo!
  • Defined terms are in bold; italics used for emphasis, not definitions.
  • Prefer "exposure" and "outcome" as generic terms for the cause and effect over e.g., treated vs untreated, but use your judgment
  • Write it as "Extra Magic Hours" and "Extra Magic Morning/Evening" (all caps)
  • Write times as 9 AM instead of 9am
  • "data frame" and "data set" (with a space between)

Figures and tables

  • All figures and tables have a thorough caption (what is actually in the fig/tbl, not just the idea behind it)
  • Use gt() for code-based tables instead of kable(). Otherwise, use markdown tables.
  • No time labels for EMM DAG since we don't actually know. Putting this here because we reuse this code in several places so we should check for consistency

I think we may have some other things to add for this one to make sure these are consistent

@malcolmbarrett
Copy link
Collaborator Author

One thing from #310 that might be good: prefer ifelse()/if_else() over case_when() when just a single condition

The big one in this PR, though, is ifelse() vs. if_else(). I hold this opinion lightly, but I have a preference for ifelse() because I don't believe if_else() is that helpful in practice. As opposed to say, map_dbl(), it doesn't really feel like I'm declaring the type. I have to know the type of the inputs. And I also still need to know R's conversion rules to understand the violation. But if_else() is sort of a lonely function in this respect because lots of functions can do implicit conversion besides ifelse(), so I'm usually already on the defensive. So if_else() is a function that has never won me over. But again, I can be convinced and am more concerned about consistency

@tgerke
Copy link
Collaborator

tgerke commented Jan 8, 2025

Suggestion to use if_else instead of ifelse. Reasons:

  • if_else ensures type consistency
  • if_else handles NA better
  • if_else is faster
    To my knowledge, reasons to use ifelse would include shortcutting TRUE/FALSE arguments with integers (which I don't think is the best pedagogy) and a desire to avoid evaluating the false clause when the statement is true (also not the best pedagogy / has a code smell). ifelse could also be good if aiming for minimal dependency applications, but we've already bought into library(tidyverse) as a first move in this book. Happy to learn if there are better reasons though.

Some good discussion here: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse

@malcolmbarrett
Copy link
Collaborator Author

malcolmbarrett commented Jan 8, 2025

shortcutting TRUE/FALSE arguments with integers (which I don't think is the best pedagogy)

I agree with this and we should add explicit testing to our list (as you changed in #310)

Edit: added to the checklist

@malcolmbarrett
Copy link
Collaborator Author

@LucyMcGowan can be the tie-breaker for ifelse() vs. if_else(). I'll be ok with either

@tgerke
Copy link
Collaborator

tgerke commented Jan 8, 2025

One thing from #310 that might be good: prefer ifelse()/if_else() over case_when() when just a single condition

The big one in this PR, though, is ifelse() vs. if_else(). I hold this opinion lightly, but I have a preference for ifelse() because I don't believe if_else() is that helpful in practice. As opposed to say, map_dbl(), it doesn't really feel like I'm declaring the type. I have to know the type of the inputs. And I also still need to know R's conversion rules to understand the violation. But if_else() is sort of a lonely function in this respect because lots of functions can do implicit conversion besides ifelse(), so I'm usually already on the defensive. So if_else() is a function that has never won me over. But again, I can be convinced and am more concerned about consistency

Afaik it's not relevant to applications we're using in this book, but ifelse can have some pretty severe gotchas that I've experienced due to hidden type casting. That SO thread I linked has a good example using dates, and there are others.

Reading from the beginning, this is the first time I've encountered ifelse in the book and I would surely notice it in the 6 other files where it occurs (did a quick search to make sure I'm not proposing a drastic PITA)

@tgerke
Copy link
Collaborator

tgerke commented Jan 8, 2025

@LucyMcGowan can be the tie-breaker for ifelse() vs. if_else(). I'll be ok with either

Great plan!

@tgerke
Copy link
Collaborator

tgerke commented Jan 8, 2025

New topic: suggest replacing TRUE ~ lines in case_when with the much clearer .default =

@malcolmbarrett
Copy link
Collaborator Author

malcolmbarrett commented Jan 8, 2025

ifelse() might be the easiest find-replace ever luckily. Identical syntax and no other use of the letters "ifelse"

@malcolmbarrett
Copy link
Collaborator Author

@LucyMcGowan voted for if_else(), adding it to the checklist now and will submit a PR

@malcolmbarrett
Copy link
Collaborator Author

Note: https://github.com/ropenscilabs/aeolus will auto wrap markdown lines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants