-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access Control (Exclusion) System #7
Comments
have a few questions regarding the exclusion system requirement (for @anjackson):
|
The last question deserves a bit more detail. We have three deployments:
I imagine we will need three different pywb collections, in three different deployments. Important: We generally expect to run all such services behind NGINX or Apache proxies. In the case of QA Wayback, it's actually proxied through a 'parent' app that performs authentication/authorisation. I've assumed this won't be a problem, but I shouldn't take that for granted. |
Thanks for the clarifications, seems like best option is to configure these as 3 collections for integration testing using the above options:
To clarify, if the allow/disallow lists access checks work in this order?
Does disallow always take precedence? Or, is most specific match take precedence? |
For maximum flexibility, considering the following approach: Using CDXJ like format for ACL rules, with SURT key, but reverse sorted to facilitate longest prefix matching. Each rule would have one of the following settings
An example rule set:
ACL rules can be stored in There will also be a default rule, which can be configured to |
The command-line Add a rule: Remove rule: Return matching rule: Import OpenWayback-style non-surt prefix based rules (eg. excludes.txt): |
…ywb#7) - .aclj files contain access controls in reverse sorted, CDXJ-like format - ./sample_archive/acl contains sample acl files - directory and single-file acl sources (extend directory aggregator and file index source) - tests for longest-prefix acl match - tests for acl applied to collection - pywb.utils.merge -- merge(..., reverse=True) support for py2.7 (backported from py3.5) - acl types: * allow - all allowed * block - allowed in index (as blocked) but content not allowed, served as 451 * exclude - removed from index and content, served as 404 - warcserver: AccessChecker inited if 'acl_paths' specified in custom collections - exceptions: * clean up wbexception, subclasses provide the status code, message loaded automatically * warcserver handles AccessException with json response (now with 451 status) * pass status to template to allow custom handling
…ywb#7) - .aclj files contain access controls in reverse sorted, CDXJ-like format - ./sample_archive/acl contains sample acl files - directory and single-file acl sources (extend directory aggregator and file index source) - tests for longest-prefix acl match - tests for acl applied to collection - pywb.utils.merge -- merge(..., reverse=True) support for py2.7 (backported from py3.5) - acl types: * allow - all allowed * block - allowed in index (as blocked) but content not allowed, served as 451 * exclude - removed from index and content, served as 404 - warcserver: AccessChecker inited if 'acl_paths' specified in custom collections - exceptions: * clean up wbexception, subclasses provide the status code, message loaded automatically * warcserver handles AccessException with json response (now with 451 status) * pass status to template to allow custom handling
- 'acl_paths' config can accept a list of files or directories, a file or a directory string - tests_acl: test collection with acl list, single file, dir
…lections as specified in #7 add access.robot for testing exclude, block rules (blacklist) and allow rules and default block (whitelist) - reading-rooms has single-use-lock, blacklist - open-access has whitelist and blacklist - qa-access has no access controls test-data: add httpbin.org warc/cdx for access system tests robot script improvements: use shared init & teardown scripts, parametrize collection name, add reusable check exclude, check blocked, check allowed functions
Thanks @ikreymer this looks good. |
… files via command-line (ukwa/ukwa-pywb#7) - support as target an auto-collection, where acl file added automatically in ./collections/<coll>/acl/access-rules.aclj or specifying an .aclj explicitly for more custom configs - support adding urls and surts, determine if url is already a surt, otherwise canonicalize acl commands include: - acl add <target_file_or_coll> <url_or_surt> <access> -- add (or replace) rule for url/surt with access level <access> - acl remove <target_filr_or_coll> <url_or_surt> -- remove url/surt from target - acl list <target_file_or_coll> -- list all rules for target - acl validate <target_file_or_coll> -- ensure sort order is correct, otherwise fix and save - acl match <target_file_or_coll> <url> -- find matching rule, if any, in target for specified url, or print no match/default rule - acl importtxt <target_file_or_coll> <filename> -- bulk import of 'excludes.txt' style rules, one url-per-line and add to target
acl: update acls with cli tool, move block to correct file, include original url in json portion (#7)
Added initial support for CLI command for operating on individual .aclj files. For example, to list all rules in a file: To add a new rule: Also supports adding SURTS as well as url: and removing surts/urls: |
…ywb#7) - add, importtxt will create an access file if it doesn't exist - return status code 1 on errors, including if file doesn't exist (for other commands)
And also, example . command for importing OpenWayback-style exclusions (one url per line):
|
We are working towards a more flexible and efficient approach for archive profiling that contains aspects of ACLs as well. We were thinking along a similar CDXJ-style format, but more flexible than what is illustrated above. We have included the idea of wildcards in partial SURT keys to identify prefixed matches from exact matches. This enables us to more easily describe scenarios like having one rule for a domain (or path at certain depth), but other rules for other resources under that path. |
…ywb#7) - .aclj files contain access controls in reverse sorted, CDXJ-like format - ./sample_archive/acl contains sample acl files - directory and single-file acl sources (extend directory aggregator and file index source) - tests for longest-prefix acl match - tests for acl applied to collection - pywb.utils.merge -- merge(..., reverse=True) support for py2.7 (backported from py3.5) - acl types: * allow - all allowed * block - allowed in index (as blocked) but content not allowed, served as 451 * exclude - removed from index and content, served as 404 - warcserver: AccessChecker inited if 'acl_paths' specified in custom collections - exceptions: * clean up wbexception, subclasses provide the status code, message loaded automatically * warcserver handles AccessException with json response (now with 451 status) * pass status to template to allow custom handling
- 'acl_paths' config can accept a list of files or directories, a file or a directory string - tests_acl: test collection with acl list, single file, dir
… files via command-line (ukwa/ukwa-pywb#7) - support as target an auto-collection, where acl file added automatically in ./collections/<coll>/acl/access-rules.aclj or specifying an .aclj explicitly for more custom configs - support adding urls and surts, determine if url is already a surt, otherwise canonicalize acl commands include: - acl add <target_file_or_coll> <url_or_surt> <access> -- add (or replace) rule for url/surt with access level <access> - acl remove <target_filr_or_coll> <url_or_surt> -- remove url/surt from target - acl list <target_file_or_coll> -- list all rules for target - acl validate <target_file_or_coll> -- ensure sort order is correct, otherwise fix and save - acl match <target_file_or_coll> <url> -- find matching rule, if any, in target for specified url, or print no match/default rule - acl importtxt <target_file_or_coll> <filename> -- bulk import of 'excludes.txt' style rules, one url-per-line and add to target
…ywb#7) - add, importtxt will create an access file if it doesn't exist - return status code 1 on errors, including if file doesn't exist (for other commands)
Support a per-url exclusion system for pywb, including the following modes:
message if content is outside the white list.
The text was updated successfully, but these errors were encountered: