Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove IP-based location determination? #115

Open
slager opened this issue Mar 8, 2024 · 17 comments · May be fixed by #139
Open

Remove IP-based location determination? #115

slager opened this issue Mar 8, 2024 · 17 comments · May be fixed by #139

Comments

@slager
Copy link
Collaborator

slager commented Mar 8, 2024

I'm wondering if we should just remove the IP-based location determination. It's neat, but it creates potential privacy issues by exposing IP address and coordinates in the VCR fixtures. Also my experimenting with it shows that the location can be quite far off from my actual location.

@sebpardo
Copy link
Contributor

Is there a way to hide IP addresses and coordinates in VCR the same way we hide the API key?

@jrdnbradford
Copy link
Contributor

It looks like you can filter data via regex in {vcr}, so something like this would remove coordinates from URLs and JSON strings in the fixtures:

library("vcr") # *Required* as vcr is set up on loading
invisible(vcr::vcr_configure(
  filter_sensitive_data = list("<<<redacted>>>" = Sys.getenv('EBIRD_KEY')),
+ filter_sensitive_data_regex = list(
+   '"lat":<<<redacted>>>' = '("lat"|"latitude"):(-?\\d+(\\.\\d+)?)',
+   "lat=<<<redacted>>>" = '(lat|latitude)=(-?\\d+(\\.\\d+)?)',
+   '"lng":<<<redacted>>>' = '("lng"|:"longitude"):(-?\\d+(\\.\\d+)?)',
+   "lng=<<<redacted>>>" = '(lng|longitude)=(-?\\d+(\\.\\d+)?)'
+ ),
  dir = vcr::vcr_test_path("fixtures")
))

If redaction is the route you want to go I'm happy to take a look.

@slager
Copy link
Collaborator Author

slager commented Jul 15, 2024

I tried at one point using vcr/regex to redact latitude, longitude, and IP address, but iirc the tricky part is that they occur inside a long string value. If you can get it to work, go for it.

@sebpardo sebpardo mentioned this issue Aug 27, 2024
19 tasks
@jrdnbradford
Copy link
Contributor

I tried a few things but every solution I tried that redacted the relevant data ended up causing issues with the vcr testing setup. 🫠

@sckott
Copy link
Contributor

sckott commented Oct 3, 2024

👋🏽 hi, vcr maintainer here. Is there a small reproducible example I can play with to understand the problem and see what the best solution is?

@RichardLitt
Copy link
Contributor

I am not an expert here.

Running testthat::test_file("tests/testthat/test-ebirdgeo.R", reporter="summary"), I expected vcr to make a fixture in the line where I wrote expect_no_error. It didn't and I'm not sure why. But this should call getlatlng, which should look for my IP address and use that to get a location - which is what we're trying to avoid putting into the fixtures.

vcr::use_cassette("ebirdgeo", {
  test_that("ebirdgeo works correctly", {
    expect_warning(egeo <- ebirdgeo('amegfi', 42.45, -76.50, dist = 51, max = 2),
                   "Distance supplied was >50km")
    # Added this line
    expect_no_error(ebirdgeo('amegfi'))
    expect_true(inherits(egeo, "data.frame"))
    expect_gt(NCOL(egeo), 10)
    expect_equal(nrow(egeo), 2)
    expect_true(inherits(egeo$locName, "character"))
    expect_true(inherits(egeo$howMany, "integer"))
    expect_warning(
      prov <- ebirdgeo(
        lat = 42.45, lng = -76.50, max=2,
        provisional=TRUE, hotspot=TRUE, back = 31.5),
      "using 30 days"
    )
    expect_true(inherits(prov, "data.frame"))
  })
})

@slager
Copy link
Collaborator Author

slager commented Oct 3, 2024

For a reprex of the main problem here as I understand it, clone the vcr_demo branch of this repo and run devtools::test(filter = 'DEMO') to create the DEMO.yml fixture. The goal is for string in this fixture to have the IP address redacted.

@RichardLitt
Copy link
Contributor

Running this:

  filter_sensitive_data_regex = list(": \"<<<redacted>>>\"," = ': ".*",'),

I managed to get this:

http_interactions:
- request:
    method: get
    uri: http://ipinfo.io/
    body:
      encoding: ''
      string: ''
    headers:
      Accept: application/json, text/xml, application/xml, */*
  response:
    status:
      status_code: 200
      category: Success
      reason: OK
      message: 'Success: (200) OK'
    headers:
      access-control-allow-origin: '*'
      content-encoding: gzip
      content-type: application/json; charset=utf-8
      date: Thu, 03 Oct 2024 08:50:13 GMT
      referrer-policy: strict-origin-when-cross-origin
      vary: Accept-Encoding
      x-content-type-options: nosniff
      x-frame-options: SAMEORIGIN
      x-xss-protection: 1; mode=block
      content-length: '236'
      via: 1.1 google
      strict-transport-security: max-age=2592000; includeSubDomains
    body:
      encoding: ''
      file: no
      string: |-
        {
          "ip": "<<<redacted>>>",
          "readme": "https://ipinfo.io/missingauth"
        }
  recorded_at: 2024-10-03 08:50:13 GMT
  recorded_with: vcr/1.6.0, webmockr/0.9.0

I think this is what we want.

@RichardLitt
Copy link
Contributor

Having done that:

I'm wondering if we should just remove the IP-based location determination. It's neat, but it creates potential privacy issues by exposing IP address and coordinates in the VCR fixtures. Also my experimenting with it shows that the location can be quite far off from my actual location.

I agree with removing it. It was quite far off for me - and this is a wrapper for the eBird API, which doesn't automatically look for a location based on IP. I think removing it and throwing a failure if latlng isn't included would be fine.

@slager
Copy link
Collaborator Author

slager commented Oct 3, 2024

That regex does remove the IP, but unfortunately it also removes the lat-long and all the other fields except the last one.

@RichardLitt
Copy link
Contributor

Right.

I don't understand why the .yml file outputs it as a pretty-printed JSON format, but the vcr regex treats it as a single string. I feel like ".*" is grabbing everything from first to last quotes, instead of stopping at the newlines that I see in the .yml file.

Could there there a smarter way for us to format the string as JSON, and then selectively keep only the fields we want, instead of relying on regex?

@RichardLitt
Copy link
Contributor

Taking a crack at this again. I can't figure out how to use escape characters in the regex, or backreferences as might be possible with gsub().

For instance, for [0-9\.\-,], I get this error:

Error: '\.' is an unrecognized escape in character string (/Users/richard/src/rebird/tests/testthat/helper-vcr.R:4:77)

Basically, filter_sensitive_data_regex only provides a subset of possible regexes, which makes doing this work quite difficult.

@slager
Copy link
Collaborator Author

slager commented Oct 4, 2024

Periods need to be doubly escaped like \\. See ?regex

@RichardLitt
Copy link
Contributor

Thank you. I think I got it - take a look at #138.

@sckott
Copy link
Contributor

sckott commented Oct 4, 2024

@RichardLitt

I don't understand why the .yml file outputs it as a pretty-printed JSON format, but the vcr regex treats it as a single string. I feel like ".*" is grabbing everything from first to last quotes, instead of stopping at the newlines that I see in the .yml file.

in vcr we just read the whole file in with the yaml package, and then I can index to different parts of the request or response. the newlines are not meagningful, i.e., I would not write code accounting for them

@sckott
Copy link
Contributor

sckott commented Oct 4, 2024

Looks like you got it working in #138 @RichardLitt - let me know if you have any further questions.

@RichardLitt RichardLitt linked a pull request Oct 10, 2024 that will close this issue
@sebpardo
Copy link
Contributor

@slager are you OK with the solution being put forward in #139?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants