- changed
before
andafter
tountil
andsince
- removed
metadata=true
as this is now always enabled - set
order='desc'
as this replacessort
- set
sort='created_utc'
so that slicing still works as expected - Read more on COLO switchover
- refactored metadata usage
- 🎅🎅🎅🎅🎅🎅🎅🎅🎅🎅🎅🎅🎅🎅
- Don't inherit from object in classes
- Removed logging configuration to prevent unexpected results for users
- fix scenario where a result is reported but cannot be returned by Pushshift
- fix index error bug
- Updated logging and set default log level to INFO
- Added
load_cache
static method toResponse
to load cached responses using cache key
- Added support for enriching result metadata using PRAW
- Implemented functional tests
- Reduced
max_ids_per_request
to 500 - Added automated testing
- Increased exception handling specificity
- Added
filter_fn
for custom filtering
- Added gzip for cached pickle files
- Exception handling is now slightly more specific
- Updated many print statements to output via logging
- Fixed issue with safe_exit not saving info
- Moved remaining limit logging to DEBUG from INFO
- Fixed generator incorrect length after being partially iterated through
- Reduced the number of debug logs
- Fixed duplicate responses being returned if the number of responses for a provided window is less than expected
- None type comparison bug fixed
- updated how limit was being updated for submission comment ids
- fixed early cache bug
- fixed limit being retrieved from next search window when resuming from safe exit
- fixed comments returning 25 by default
- limit error in
trim
hot fix
search
methods now return aResponse
generator object- memory safety can now be enabled with
mem_safe
to cache responses during data retrieval and reduce the amount of memory used - safe exiting can now be enabled with
safe_exit
to safely exit when an interrupt signal is received during data retrieval - load unfinished requests and saved responses from
cache
when safe exiting is enabled - request details are now handled inside a
Request
object
- Fixed infinite while loop error
- Checkpoint by batch
- Removed erroneous pandas import
- Fixed timeslicing creating extra requests
- Fixed a bug with timeslicing causing duplicate results
- Fixed a miscalculation error for remaining results for a timeslice
- General code improvements
- Added exponential backoff and jitter rate-limiting
- Added
non-id
search for submissions and comments
- Initial implementation of multithreading requests for
ids
queries, with support for:- comment ids by submission id
- submissions by id
- comments by id
- Rate-limit based on rate averaging across previous requests