-
Notifications
You must be signed in to change notification settings - Fork 111
Configuration
Raja Tomar edited this page Jun 7, 2019
·
2 revisions
pywebcopy
is highly configurable. You can setup the global object
using the methods exposed by the pywebcopy.config
object.
Ways to change the global configurations are below -
-
Using the method
.setup_config
on globalpywebcopy.config
objectYou can manually configure every configuration by using a
.setup_config
call.>>> import pywebcopy >>> url = 'http://example-site.com/index.html' >>> download_loc = 'path/to/downloads/' >>> project = 'my_project' >>> pywebcopy.config.setup_config(url, download_loc, project, **kwargs) # done! # Now check >>> pywebcopy.config.get('project_url') 'http://example-site.com/index.html' >>> pywebcopy.config.get('project_folder') 'path/to/downloads' >>> pywebcopy.config.get('project_name') 'example-site.com' ## You can also change any config even after ## the `setup_config` call pywebcopy.config['url'] = 'http://url-changed.com' # rest of config remains unchanged Done!
-
Passing in the config vars directly to the
global apis e.g.
.save_webpage
To change any configuration, just pass it to the
api
call.Example:
from pywebcopy import save_webpage kwargs = { 'project_url': 'http://google.com', 'project_folder': '/home/pages/', 'project_name': ... } save_webpage(**kwargs)
below is the list of config
keys with their default
values :
# writes the trace output and log file content to console directly
'DEBUG': False
# make zip archive of the downloaded content
'zip_project_folder': True
# delete the project folder after making zip archive of it
'delete_project_folder': False
# to download css file or not
'LOAD_CSS': True
# to download images or not
'LOAD_IMAGES': True
# to download js file or not
'LOAD_JAVASCRIPT': True
# to overwrite the existing files if found
'OVER_WRITE': False
# list of allowed file extensions
# shortend for readability
'ALLOWED_FILE_EXT': ['.html', '.css', ...]
# log file path
'LOG_FILE': None
# name of the mirror project
'PROJECT_NAME': website-name.com
# define the base directory to store all copied sites data
'PROJECT_FOLDER': None
# DANGER ZONE
# CHANGE THESE ON YOUR RESPONSIBILITY
# NOTE: Do not change unless you know what you're doing
# requests headers to be shown on requests made to server
'http_headers': {...}
# bypass the robots.txt restrictions
'BYPASS_ROBOTS' : False