Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support links behind auth #86

Closed
gjtorikian opened this issue Aug 12, 2014 · 20 comments · Fixed by #577
Closed

Support links behind auth #86

gjtorikian opened this issue Aug 12, 2014 · 20 comments · Fixed by #577

Comments

@gjtorikian
Copy link
Owner

Occasionally, at GitHub, we'll link to sites within github.com that are behind auth. For example: [check out this discussion](https://www.github.com/github/secret-internal-repo/issues/23).

We've had to exclude these links by writing them out as HTML and adding data-proofer-ignore. Blah.

I think instead there should be a new config option hash that takes a domain as a key, and an OAuth token as the value, so that these sorts of links can be checked. For example, you'd pass in :domain_auth => { "github.com" => ENV['MACHINE_USER_TOKEN'] }. When HTML::Proofer hits a 404, it'd look the domain up, and try to use the provided token to recheck the link.

/cc @penibelst @parkr Y'all think this makes sense?

@parkr
Copy link
Contributor

parkr commented Aug 12, 2014

To do this right, we might want to support more than just GitHub's token-based auth. But for an initial PoC, it's a great idea. Would you just add a header to Typhoeus?

@gjtorikian
Copy link
Owner Author

we might want to support more than just GitHub's token-based auth

For sure! I guess GitHub's system is the one I'm most familiar with. I have no idea what other links on other sites might be, but I'm happy to work out something more flexible.

Would you just add a header to Typhoeus?

Ideally, yeah. A queue request abstraction was added recently, so in the event of a fail, we'd just hop back in line with the auth.

@parkr
Copy link
Contributor

parkr commented Aug 12, 2014

Maybe

:domain_auth => { "github.com" => {
  :type => :header,
  :template => "Bearer %token%",
  :values => {
    :token => ENV['MACHINE_USER_TOKEN']
  }
} }

So then for basic auth:

:domain_auth => { "oldsite.com" => {
  :type => :basic,
  :values => {
    :username => ENV['MACHINE_USERNAME'],
    :password => ENV['MACHINE_PASSWORD']
  }
} }

@gjtorikian
Copy link
Owner Author

Dang, that's swell.

@parkr
Copy link
Contributor

parkr commented Apr 3, 2015

Could also use ERB... :)

@gjtorikian
Copy link
Owner Author

What's that? I know you're not talking about Rails templates....

@parkr
Copy link
Contributor

parkr commented Apr 4, 2015

ERB is in Ruby's standard library. I mean for the :template above. :)

:template => "Bearer <%= @token %>"

@fulldecent
Copy link
Collaborator

@parkr Is it sufficient to document this use case (maybe in a wiki page) and invite users to use the Ruby / manual Typhoeus configuration approach. This will allow us to close the issue without any code changes and also help people that want to do this.

@parkr
Copy link
Contributor

parkr commented May 1, 2017

document this use case (maybe in a wiki page) and invite users to use the Ruby / manual Typhoeus configuration approach

That seems like a good idea! I don't have commit rights to this repo, so I won't be able to merge your contribution. But I support the idea 😄

@fulldecent
Copy link
Collaborator

Related issues: #427

@fulldecent
Copy link
Collaborator

@parkr Thanks for the reply. We have gotten a wiki up and I created this page

https://github.com/gjtorikian/html-proofer/wiki/How-to-configure-Typhoeus

Would you be able to assist with adding a section for the Typhoeus customization discussed here?

@fulldecent
Copy link
Collaborator

I added the existing examples here to https://github.com/gjtorikian/html-proofer/wiki/How-to-configure-Typhoeus

It is incomplete but should be useful if anyone wants to start there and learn more.

@asbjornu
Copy link
Contributor

asbjornu commented Aug 25, 2020

There is one thing I don't feel is quite clear with the documentation. Is :domain_auth supposed to be nested beneath :typheous or is it a top-level configuration key? Which of these are correct?

Nested

{
  :typhoeus => {
    :domain_auth => {
      "github.com" => {
        :type => :header, :template => "Bearer %token%", :values => {
          :token => ENV['MACHINE_USER_TOKEN']
        }
      }
    }
  }
}

Top-level

{
  :domain_auth => {
    "github.com" => {
      :type => :header, :template => "Bearer %token%", :values => {
        :token => ENV['MACHINE_USER_TOKEN']
      }
    }
  }
}

@asbjornu
Copy link
Contributor

Seems like I figured out that nesting :domain_options underneath :typheous is not right, at least:

Ethon::Errors::InvalidOption:
  The option: domain_auth is invalid.
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy.rb:238:in `block in set_attributes'
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy.rb:235:in `each_pair'
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy.rb:235:in `set_attributes'
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy/http/actionable.rb:99:in `setup'
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy/http/head.rb:17:in `setup'
# /Users/bitbear/gems/gems/ethon-0.12.0/lib/ethon/easy/http.rb:39:in `http_request'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/easy_factory.rb:81:in `get'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/addable.rb:19:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/memoizable.rb:41:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/cacheable.rb:10:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/block_connection.rb:30:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/stubbable.rb:23:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/before.rb:27:in `add'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/queueable.rb:77:in `dequeue_many'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/runnable.rb:14:in `run'
# /Users/bitbear/gems/gems/typhoeus-1.4.0/lib/typhoeus/hydra/memoizable.rb:51:in `run'
# /Users/bitbear/gems/gems/html-proofer-3.15.3/lib/html-proofer/url_validator.rb:106:in `external_link_checker'
# /Users/bitbear/gems/gems/html-proofer-3.15.3/lib/html-proofer/url_validator.rb:31:in `run'
# /Users/bitbear/gems/gems/html-proofer-3.15.3/lib/html-proofer/runner.rb:120:in `validate_urls'
# /Users/bitbear/gems/gems/html-proofer-3.15.3/lib/html-proofer/runner.rb:82:in `check_files'
# /Users/bitbear/gems/gems/html-proofer-3.15.3/lib/html-proofer/runner.rb:42:in `run'

However, moving :domain_auth to a top-level option key doesn't seem to work, as I still receive HTTP status code 429 on most requests towards github.com. Is there any logging I can turn on to see that the :domain_auth option is understood and that the requests are, indeed, authenticated?

@asbjornu
Copy link
Contributor

This is weird. I can't find any mention of domain_auth in the HTMLProofer codebase. Is this actually implemented?

@gjtorikian
Copy link
Owner Author

@asbjornu To be honest this issue is so old I had to investigate myself. domain_auth was never added; I believe it was just a proposal. Why it made it to the Wiki I also don't know. Sorry for the confusion.

For your specific case, it would seem that just providing the headers would work:

{
  :typhoeus => {
    :headers => {
      "Authentication" => "Bearer #{ENV['MACHINE_USER_TOKEN']}"}
      }
    }
}

@asbjornu
Copy link
Contributor

Thanks for clarifying and for the provided code example, @gjtorikian! The problem with adding a static Authorization header like that is that it will be transmitted with every HTTP request performed by Typhoeus, no matter to which host, which is a major security risk. That's why something like domain_auth is required in one way or another, so secrets aren't transmitted to the entire internet.

If it's possible to build something like that by extending HTMLProofer somehow, I wouldn't mind solving this myself, but I'm not familiar enough with HTMLProofer's codebase to know where to plug in such functionality. Could you please provide some pointers if this is possible to do (outside of HTMLProofer's codebase) at all?

@asbjornu
Copy link
Contributor

asbjornu commented Aug 27, 2020

A simpler and more general-purpose design than the one proposed in #86 (comment) could perhaps be something like this:

runner = HTMLProofer.check_directory('./out')
runner.before_request do |request|
  # request is a Typhoeus::Request object
  if request.base_url == 'https://github.com'
    request.options[:headers]['Authorization'] = "Bearer #{ENV['MACHINE_USER_TOKEN']}"
  end
end
runner.run

Then the host validation is up to the user of HTMLProofer, providing more flexibility as well as providing less complexity to the HTMLProofer codebase. Thoughts?

@gjtorikian
Copy link
Owner Author

I think that’s pretty great, actually! If you have the time I’d definitely accept a PR. At the moment I’m wrapping up a master’s dissertation and couldn’t devote time to this until mid to late September.

@asbjornu
Copy link
Contributor

asbjornu commented Aug 31, 2020

I submitted #577 now with an implementation of the HTMLProofer::Runner#before_request method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants