-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Prowlarr feed scraping & Improve Advanced scraping capability for prowlarr, zilean, torrentio & more bugfixes & improvements #286
Conversation
…or prowlarr, zilean, torrentio & more bugfixes & improvements
… individual task as soon as completed
WalkthroughThe pull request introduces various modifications across multiple files, primarily enhancing the functionality of the streaming data retrieval process. Key changes include updates to dependency management, logging improvements, the addition of background task handling, and the introduction of new scraping functionalities. The Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Outside diff range and nitpick comments (4)
scrapers/utils.py (1)
11-48
: LGTM! The newrun_scrapers
function is well-structured and modular.The function effectively orchestrates the scraping of media streams from multiple sources based on user-defined settings and metadata. The modular approach enhances the flexibility and scalability of the scraping process, enabling the integration of additional scrapers in the future if needed.
Some additional suggestions for improvement:
- Consider extracting the logic for adding scraping tasks into separate functions for each scraper. This will further improve the readability and maintainability of the code.
- Consider adding error handling and logging for each scraping task to better track and manage any potential issues that may arise during the scraping process.
- Consider adding type hints for the
scraped_streams
variable to improve code clarity and catch potential type-related issues early.Overall, the changes look good and the function is well-designed.
db/config.py (1)
87-87
: Clarify the unit of measurement forprowlarr_feed_scrape_interval
.The addition of the
prowlarr_feed_scrape_interval
attribute enhances the configurability of theSettings
class. However, please consider clarifying the unit of measurement (e.g., seconds, minutes, hours) for the interval value to improve code clarity.scrapers/base_scraper.py (1)
82-105
: LGTM with a minor suggestion!The
make_request
method is a well-implemented method that makes an HTTP request with retry logic using thetenacity
library. The method handleshttpx.HTTPStatusError
andhttpx.RequestError
exceptions and raises aScraperError
with appropriate error messages. The method implementation is correct and doesn't require any major changes.However, as suggested by the static analysis tool, it's recommended to use
raise ... from err
orraise ... from None
when raising exceptions within anexcept
clause to distinguish them from errors in exception handling.Apply this diff to update the exception raising:
- raise ScraperError(f"HTTP error occurred: {e}") + raise ScraperError(f"HTTP error occurred: {e}") from e - raise ScraperError(f"An error occurred while requesting {e.request.url!r}.") + raise ScraperError(f"An error occurred while requesting {e.request.url!r}.") from eTools
Ruff
102-102: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
105-105: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
scrapers/prowlarr.py (1)
669-671
: Consider simplifying the expression.The static analysis tool suggests replacing the current ternary operator with a more concise expression:
category_ids = category_ids if category_ids else [category["id"] for category in prowlarr_data.get("categories", [])]This change improves readability and simplifies the code.
Tools
Ruff
669-671: Use
category_ids if category_ids else [category["id"] for category in prowlarr_data.get("categories", [])]
instead of[category["id"] for category in prowlarr_data.get("categories", [])] if not category_ids else category_ids
Replace with
category_ids if category_ids else [category["id"] for category in prowlarr_data.get("categories", [])]
(SIM212)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (3)
Pipfile.lock
is excluded by!**/*.lock
resources/images/logo_text.png
is excluded by!**/*.png
resources/images/poster_template.jpg
is excluded by!**/*.jpg
Files selected for processing (20)
- Pipfile (1 hunks)
- api/main.py (5 hunks)
- api/scheduler.py (2 hunks)
- api/task.py (1 hunks)
- db/config.py (2 hunks)
- db/crud.py (9 hunks)
- db/models.py (3 hunks)
- resources/json/scraper_config.json (2 hunks)
- scrapers/base_scraper.py (1 hunks)
- scrapers/imdb_data.py (3 hunks)
- scrapers/prowlarr.py (2 hunks)
- scrapers/prowlarr_feed.py (1 hunks)
- scrapers/routes.py (5 hunks)
- scrapers/torrentio.py (1 hunks)
- scrapers/utils.py (1 hunks)
- scrapers/zilean.py (1 hunks)
- streaming_providers/realdebrid/utils.py (1 hunks)
- streaming_providers/routes.py (2 hunks)
- utils/network.py (4 hunks)
- utils/parser.py (1 hunks)
Files skipped from review due to trivial changes (1)
- resources/json/scraper_config.json
Additional context used
Ruff
api/task.py
10-10:
scrapers.tv
imported but unusedRemove unused import
(F401)
10-10:
scrapers.imdb_data
imported but unusedRemove unused import
(F401)
10-10:
scrapers.trackers
imported but unusedRemove unused import
(F401)
10-10:
scrapers.helpers
imported but unusedRemove unused import
(F401)
10-10:
scrapers.prowlarr
imported but unusedRemove unused import
(F401)
10-10:
scrapers.prowlarr_feed
imported but unusedRemove unused import
(F401)
scrapers/base_scraper.py
102-102: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
105-105: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
107-113:
BaseScraper.validate_response
is an empty method in an abstract base class, but has no abstract decorator(B027)
115-132:
BaseScraper.parse_response
is an empty method in an abstract base class, but has no abstract decorator(B027)
193-198: Use a single
if
statement instead of nestedif
statements(SIM102)
scrapers/torrentio.py
93-100: Use a single
if
statement instead of nestedif
statements(SIM102)
119-119: Using
.strip()
with multi-character strings is misleading(B005)
api/main.py
512-512: Do not perform function call
BackgroundTasks
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
scrapers/prowlarr.py
669-671: Use
category_ids if category_ids else [category["id"] for category in prowlarr_data.get("categories", [])]
instead of[category["id"] for category in prowlarr_data.get("categories", [])] if not category_ids else category_ids
Replace with
category_ids if category_ids else [category["id"] for category in prowlarr_data.get("categories", [])]
(SIM212)
Additional comments not posted (72)
Pipfile (1)
46-46
: Clarify the rationale behind the change inparsett
dependency declaration.The
parsett
dependency declaration has been simplified from a specific Git repository reference to a wildcard version specification (*
). While this allows for flexibility, it's important to consider:
- What is the rationale behind this change? Is there a specific reason to decouple from the Git repository?
- Does the
parsett
package follow strict semantic versioning? If not, using a wildcard version could lead to unexpected behavior if breaking changes are introduced in future releases.Please provide more context on these aspects to ensure the stability and predictability of the dependency.
streaming_providers/realdebrid/utils.py (1)
97-98
: LGTM!The added conditional check improves the robustness of the function by ensuring that the subsequent logic, which updates the
cached
status of each stream, only executes when there is valid availability data. This change alters the control flow and avoids unnecessary operations when no data is available.db/config.py (2)
30-32
: LGTM!The reformatting of the
logo_url
string enhances code clarity without affecting functionality.
88-89
: LGTM!The addition of the
prowlarr_feed_scraper_crontab
anddisable_prowlarr_feed_scraper
attributes enhances the configurability of the Prowlarr feed scraping functionality. The default values seem reasonable.db/models.py (6)
5-5
: LGTM!The
pytz
library is a good choice for handling timezones in Python. It will be useful for converting timestamps to UTC in thevalidate_created_at
method.
47-47
: LGTM!The
indexer_flags
field is a useful addition to store metadata related to the indexer. Making it optional with a default empty list is a good practice.
49-55
: LGTM!Implementing
__eq__
and__hash__
methods based on theid
field is a good practice. It allows instances ofTorrentStreams
to be compared and used in hash-based collections while ensuring unique identification.
57-59
: LGTM!Converting the
id
to lowercase in the field validator is a good practice. It ensures consistency and avoids case-sensitivity issues when storing and comparing IDs.
61-64
: LGTM!Converting the
created_at
timestamp to UTC in the field validator is a good practice. It ensures that all timestamps are stored in a consistent timezone, which is important for accurate comparisons and sorting.
133-137
: LGTM!Formatting the
runtime
as a string with "min" appended in the field validator is a good practice. It improves readability and ensures that theruntime
is consistently stored as a string indicating the duration in minutes.scrapers/zilean.py (5)
18-21
: LGTM!The
__init__
method is correctly initializing theZileanScraper
instance by calling the parent class constructor and setting the necessary attributes.
23-47
: LGTM!The
scrape_and_parse
method is well-structured and correctly handles the scraping and parsing of streams from the Zilean API. The use of decorators for caching and rate limiting, error handling, response validation, and delegation of response parsing to a separate method are all good practices.
50-62
: LGTM!The
fetch_stream_data
method is correctly implemented to fetch stream data asynchronously from the Zilean API. The use of thehttpx
library, inclusion of necessary settings, error handling, logging, and returning of the response JSON are all done properly.
64-77
: LGTM!The
parse_response
method is properly implemented to parse the response from the Zilean API and return a list ofTorrentStreams
objects. The creation of tasks for each stream, concurrent execution usingasyncio.gather
, and filtering ofNone
values from the results are all done correctly.
79-145
: LGTM!The
process_stream
method is well-implemented to process a single stream from the Zilean API response and return aTorrentStreams
object. The use of thesemaphore
for concurrency control, filtering of inappropriate content, parsing and validation of the torrent title, creation of theTorrentStreams
object with the appropriate catalog, and handling of season and episode data for series are all done correctly.scrapers/prowlarr_feed.py (7)
40-72
: LGTM!The
scrape_prowlarr_feed
function provides a robust and efficient mechanism to scrape and process items from the Prowlarr feed. It employs a circuit breaker pattern to handle failures gracefully and processes items in batches to improve performance. The function is well-structured and follows best practices.
75-125
: LGTM!The
process_feed_item
function provides a comprehensive logic to process individual items from the Prowlarr feed. It handles various scenarios such as already processed items, blacklisted keywords, and unsupported categories. It fetches or creates metadata based on the available information and processes and stores the stream if metadata is available. The function is well-structured and follows best practices.
128-132
: LGTM!The
get_metadata_by_id
function provides a simple and clear way to fetch metadata based on the IMDb ID and media type. It abstracts the logic of fetching metadata for different media types by calling the appropriate function based on the media type. The function is concise and easy to understand.
135-142
: LGTM!The
search_and_create_metadata
function provides a way to search and create metadata based on the provided information. It ensures that metadata is only created for IMDb movies and series by checking the metadata ID. It fetches the newly created metadata to return the complete metadata information. The function is well-structured and follows best practices.
145-155
: LGTM!The
run_prowlarr_feed_scraper
function serves as the entry point for running the Prowlarr feed scraper. It is decorated with the@minimum_run_interval
decorator to ensure that the scraper runs at a specified minimum interval, avoiding excessive or unnecessary scraping. The function is also decorated with the@dramatiq.actor
decorator, allowing it to be executed as a background task using Dramatiq, with configurable settings for time limit, retries, backoff, and priority. The function is concise and effectively initiates the scraping process.
31-32
: LGTM!The
is_item_processed
function provides a simple and efficient way to check if an item has already been processed. It utilizes Redis to store and check the processed items, allowing for fast lookups. The function is concise and effectively determines the processing status of an item.
35-37
: LGTM!The
mark_item_as_processed
function provides a way to mark an item as processed by adding its ID to the set of processed items in Redis. It ensures that the processed items set has an expiry time based on thePROCESSED_ITEMS_EXPIRY
constant, preventing indefinite growth and allowing for the reprocessing of items after a certain period. The function is concise and effectively marks an item as processed.scrapers/base_scraper.py (7)
16-17
: LGTM!The
ScraperError
class is a simple custom exception class that inherits fromException
. The class definition is correct and doesn't require any changes.
21-24
: LGTM!The
__init__
method is a standard constructor that initializes the class attributes correctly. The method implementation is correct and doesn't require any changes.
26-27
: LGTM!The
__aenter__
method is a standard method that is part of the asynchronous context manager protocol. The method implementation is correct and doesn't require any changes.
29-30
: LGTM!The
__aexit__
method is a standard method that is part of the asynchronous context manager protocol. The method implementation is correct and closes thehttpx.AsyncClient
instance properly.
32-38
: LGTM!The
scrape_and_parse
method is an abstract method that is intended to be implemented by subclasses. The method is correctly decorated with@abc.abstractmethod
and includes a clear docstring describing its purpose. The method definition is correct and doesn't require any changes.
40-61
: LGTM!The
cache
method is a well-implemented decorator that caches the results of a method using Redis. The method generates a cache key, checks if the result is already cached, and either returns the cached result or calls the decorated function and stores the result in Redis with the specified TTL. The method implementation is correct and doesn't require any changes.
63-80
: LGTM!The
rate_limit
method is a well-implemented decorator that rate limits method calls using theratelimit
library. The method uses the@sleep_and_retry
and@limits
decorators to enforce the rate limit based on the providedcalls
andperiod
parameters. The method implementation is correct and doesn't require any changes.api/scheduler.py (1)
234-244
: LGTM!The changes introduce a new scheduled job for running the Prowlarr feed scraper. The job is conditionally added based on the
settings.disable_prowlarr_feed_scraper
flag, allowing for easy enabling/disabling of the feature. If enabled, the job is scheduled using a cron trigger based on thesettings.prowlarr_feed_scrape_interval
, ensuring periodic execution of the scraper.The code segment follows the existing pattern for adding scheduled jobs and properly invokes the
run_prowlarr_feed_scraper.send
function when triggered. The cron expression is correctly passed as a keyword argument to the job.Overall, the changes are well-structured, adhere to the existing codebase conventions, and enhance the functionality by automating the Prowlarr feed scraping process.
scrapers/imdb_data.py (6)
149-156
: LGTM!The use of
batch_process_with_circuit_breaker
to process movie IDs asynchronously is an efficient approach. The circuit breaker pattern is being used correctly to handle failures and prevent cascading failures. Retrying onIMDbDataAccessError
is a good practice to handle temporary failures.
157-180
: LGTM!Processing the results from
batch_process_with_circuit_breaker
and updating the database entries immediately for each processed movie is an efficient approach. It improves responsiveness and potentially reduces memory usage compared to collecting all the results before updating the database.The logging statement is helpful for tracking the updates.
218-242
: LGTM!The new function
get_episode_by_date
is a useful addition to retrieve a specific episode from a TV series based on the release date. It utilizes theTVSeries
model and theweb.update_title
method effectively to fetch and filter episodes.The logic is clear and concise:
- It creates a
TVSeries
instance and updates the title with the episodes filtered by the expected year.- It then filters the episodes to find the one with the matching release date.
- If no matching episode is found, it returns
None
.
244-256
: LGTM!The new function
get_season_episodes
is a useful addition to retrieve all episodes of a specific season from a TV series. It utilizes theTVSeries
model and theweb.update_title
method effectively to fetch and filter episodes.The logic is clear and concise:
- It creates a
TVSeries
instance and updates the title with the episodes filtered by the specified season.- It then retrieves all the episodes of the specified season using the
get_episodes_by_season
method.
2-2
: LGTM!The import of the
date
class from thedatetime
module is necessary for the new functionsget_episode_by_date
andget_season_episodes
to handle date-related operations.
6-11
: Verify the usage of thefuzz
module.The import of the
math
module and theTVSeries
class from thecinemagoerng.model
module are necessary for the changed code segments.However, the
fuzz
module imported from thethefuzz
package is not being used in the changed code segments. Please verify if it is being used in other parts of the code. If not, consider removing the unused import to keep the codebase clean.scrapers/torrentio.py (8)
19-23
: LGTM!The
TorrentioScraper
class structure and constructor look good. The inheritance fromBaseScraper
is appropriate, and the necessary attributes are correctly initialized.
25-50
: LGTM!The
scrape_and_parse
method is well-structured and follows a clear flow. The use of decorators for caching and rate limiting is appropriate. Error handling is implemented correctly, and the response validation and parsing are delegated to separate methods, promoting separation of concerns.
52-53
: LGTM!The
validate_response
method performs a simple and clear validation check on the response structure. The logic is concise and easy to understand.
55-68
: LGTM!The
parse_response
method efficiently processes the stream data concurrently usingasyncio.gather
. It delegates the processing of individual streams to theprocess_stream
method, promoting separation of concerns. The filtering ofNone
values ensures that only valid streams are returned.
70-163
: LGTM!The
process_stream
method performs comprehensive processing of an individual stream. The adult content check and title validation are important safeguards. The parsing of the stream title is delegated to a separate method, promoting separation of concerns. The creation ofTorrentStreams
,Season
, andEpisode
objects is handled appropriately based on the catalog type. Error handling ensures that exceptions during processing are caught and logged.Tools
Ruff
93-100: Use a single
if
statement instead of nestedif
statements(SIM102)
119-119: Using
.strip()
with multi-character strings is misleading(B005)
165-182
: LGTM!The
parse_stream_title
method effectively parses the stream title and extracts relevant information. The use ofPTT.parse_title
is appropriate for parsing the torrent name. The extracted data is organized into a dictionary for easy access and further processing.
184-212
: LGTM!The static methods
extract_seeders
,extract_languages_from_title
,extract_languages
, andextract_size_string
provide utility functions for extracting specific information from the stream data. The use of regular expressions is appropriate for pattern matching and extraction. The methods are focused and perform their specific tasks effectively.
93-100
: Skipping static analysis hints.The nested
if
statements at lines 93-100 are used for conditional logic and are not overly complex. Combining them into a singleif
statement may reduce readability. The use of.lstrip()
at line 119 is appropriate for removing the "tracker:" prefix from the tracker URLs and is not misleading in this context.Also applies to: 119-119
Tools
Ruff
93-100: Use a single
if
statement instead of nestedif
statements(SIM102)
utils/network.py (3)
36-62
: Improved exception handling and state management in thecall
method.The changes to the
call
method enhance its functionality and robustness:
- The additional
item
parameter allows the method to return the item alongside the result or exception, enabling more graceful exception handling.- The refined state management logic, particularly in the half-open state, improves the behavior of the circuit breaker by checking the failure count against the half-open attempts threshold before transitioning to the closed state.
These modifications contribute to better error handling and more accurate state transitions in the circuit breaker implementation.
75-138
: Improved performance, reliability, and observability inbatch_process_with_circuit_breaker
.The modifications to the
batch_process_with_circuit_breaker
function bring several enhancements:
- Yielding results as they become available optimizes memory usage and processing time, improving overall performance.
- Utilizing
asyncio.TaskGroup
enables concurrent execution of tasks, further boosting performance.- The enhanced retry logic, with a clearer separation of successful results and retryable exceptions, improves the reliability of the batch processing by handling failures more effectively.
- The added logging statements provide better visibility into the processing flow and retry attempts, aiding in debugging and monitoring.
These changes contribute to a more efficient, reliable, and observable batch processing function.
205-205
: Updated function signature to accurately reflect the return type.The change in the function signature of
get_user_public_ip
fromstr
tostr | None
accurately reflects the possibility of returningNone
when the user's IP address is a private IP address. This improves the clarity and correctness of the function's return type.streaming_providers/routes.py (2)
69-77
: LGTM! Remember to address the TODO comment in the future.The changes improve the robustness of the stream retrieval process by adding a fallback mechanism for case sensitivity. This accommodates legacy data formats while maintaining compatibility.
Please ensure to remove the uppercase fallback in the future as indicated by the TODO comment, once the legacy data has been migrated.
188-188
: LGTM!Converting the
info_hash
to lowercase ensures consistency in how the hash is processed throughout the streaming provider endpoint. This aligns with the changes made in thefetch_stream_or_404
function and ensures that the caching and locking mechanisms operate consistently, regardless of the case of the providedinfo_hash
.scrapers/routes.py (2)
296-323
: LGTM!The
handle_movie_stream_store
function encapsulates the logic for storing a movie torrent stream in a structured manner. It creates a newTorrentStreams
instance using attributes from the parsed data, sets theupdated_at
timestamp to the current datetime, and logs the creation of the movie stream. The function ensures that the necessary attributes are populated in theTorrentStreams
instance.
326-379
: LGTM!The
handle_series_stream_store
function encapsulates the logic for storing a series torrent stream in a structured manner. It ensures that the torrent pertains to a single season and prepares episode data based on the availability of detailed file data or basic episode numbers. If no valid episode data is found, it skips the torrent. The inclusion of theSeason
object in theTorrentStreams
instance provides a structured representation of the episode data.utils/parser.py (1)
90-90
: Approved: Improved error logging.The change from
logging.error
tologging.exception
enhances the error handling by capturing and logging the full traceback of exceptions. This provides more context and detail about the error, making it easier for developers to trace the source of the issue and debug effectively.api/main.py (3)
16-16
: LGTM!The import statement is correct.
48-48
: LGTM!Including the filename and line number in the log messages improves traceability during debugging. The updated logging format is correct.
554-554
: Verify the function signature change in the codebase.The function signature is updated to include the
background_tasks
parameter. Ensure that all function calls toget_streams
have been updated to pass thebackground_tasks
argument.Run the following script to verify the function usage:
Also applies to: 565-565
scrapers/prowlarr.py (10)
71-96
: LGTM!The
scrape_and_parse
method looks good. It handles the scraping and parsing of streams, catches exceptions, and logs relevant information.
98-118
: LGTM!The
_scrape_and_parse
method correctly routes the scraping based on the catalog type. It raises an error for unsupported catalog types, which is a good practice.
120-147
: LGTM!The
scrape_movie
method looks good. It scrapes movie streams using both IMDb ID and title search (if enabled), processes the scraped streams, and sends a background search task (if enabled).
153-191
: LGTM!The
scrape_series
method looks good. It scrapes series streams using both IMDb ID and title search (if enabled), processes the scraped streams, and sends a background search task (if enabled).
199-278
: LGTM!The
process_streams
method looks good. It efficiently processes streams from multiple generators concurrently using a queue. It respects the processing limits, handles exceptions, and logs relevant information.
280-290
: LGTM!The
fetch_stream_data
method looks good. It makes a request to the Prowlarr API using the provided parameters and returns the JSON response. It also raises any HTTP errors, which is a good practice.
292-306
: LGTM!The
build_search_params
method looks good. It correctly formats the search query based on the search type and returns the required parameters for the Prowlarr API.
562-603
: LGTM!The
parse_prowlarr_data
method looks good. It extracts the download URL, fetches the torrent data, handles exceptions, and updates the torrent data with additional metadata. The error handling and returningNone
in case of errors is a good practice.
696-725
: LGTM!The
background_movie_title_search
function looks good. It performs a background movie title search using theProwlarrScraper
. It fetches the movie metadata, generates title search queries, processes the scraped streams, and stores them. The use ofminimum_run_interval
anddramatiq.actor
decorators is appropriate for a background task.
729-767
: LGTM!The
background_series_title_search
function looks good. It performs a background series title search using theProwlarrScraper
. It fetches the series metadata, generates title search queries, processes the scraped streams, and stores them. The use ofminimum_run_interval
anddramatiq.actor
decorators is appropriate for a background task.db/crud.py (7)
Line range hint
297-330
: LGTM!The changes to
get_movie_streams
function improve code modularity, performance, and functionality:
- The addition of
BackgroundTasks
allows for asynchronous processing of new streams.- The
run_scrapers
function consolidates the scraping logic for movies.- Storing new streams asynchronously using a background task enhances performance.
- The function correctly combines new and cached streams to provide a comprehensive set of streams.
Line range hint
333-370
: LGTM!The changes to
get_series_streams
function are similar to those inget_movie_streams
and provide the same benefits:
- The addition of
BackgroundTasks
allows for asynchronous processing of new streams.- The
run_scrapers
function consolidates the scraping logic for series.- Storing new streams asynchronously using a background task enhances performance.
- The function correctly combines new and cached streams to provide a comprehensive set of streams.
374-414
: LGTM!The new
store_new_torrent_streams
function efficiently stores new torrent streams in the database:
- The use of
BulkWriter
minimizes database round trips and improves performance.- The function correctly handles existing streams by updating them with new data.
- It also handles adding new episodes to existing series streams.
- The logging statements provide useful information for monitoring and debugging purposes.
Line range hint
692-717
: LGTM!The new
get_or_create_metadata
function efficiently retrieves or creates metadata for a given media type:
- The function first checks for existing metadata, avoiding unnecessary database operations.
- If metadata doesn't exist, the function creates a new metadata object using the provided data and IMDB data (if available).
- The function correctly handles duplicate key errors by waiting and re-fetching the metadata, ensuring data consistency.
720-725
: LGTM!The new
save_metadata
function efficiently saves metadata and creates a new stream object:
- The function avoids unnecessary operations by first checking if the stream already exists.
- The function correctly retrieves or creates metadata using the
get_or_create_metadata
function.- The function creates a new stream object using the provided metadata and the
create_stream_object
function.
Line range hint
648-667
: LGTM!The
create_metadata_object
function correctly creates a new metadata object based on the provided metadata and IMDB data:
- The function extracts relevant data from the provided metadata and IMDB data.
- The function handles the case where the year is provided as a range, ensuring correct data format.
- The function creates a new metadata object using the extracted data, ensuring data consistency.
Line range hint
669-689
: LGTM!The
create_stream_object
function correctly creates a new stream object based on the provided metadata:
- The function extracts relevant data from the provided metadata.
- The function handles the case where the stream is for a movie by including the filename and file index, ensuring correct data format.
- The function creates a new
TorrentStreams
object using the extracted data, ensuring data consistency.
@@ -7,7 +7,7 @@ | |||
# import background actors | |||
# noqa: F401 | |||
from mediafusion_scrapy import task | |||
from scrapers import tv, imdb_data, trackers, helpers, prowlarr | |||
from scrapers import tv, imdb_data, trackers, helpers, prowlarr, prowlarr_feed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The static analysis tool correctly points out that several imports, including the newly added prowlarr_feed
import, are unused in this file.
Unless there are plans to use these imports in the near future, it's best to remove them to keep the codebase clean and maintainable.
Apply this diff to remove the unused imports:
-from scrapers import tv, imdb_data, trackers, helpers, prowlarr, prowlarr_feed
+from scrapers import prowlarr_feed
If you intend to use these imports in upcoming commits, feel free to ignore this comment.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
from scrapers import tv, imdb_data, trackers, helpers, prowlarr, prowlarr_feed | |
from scrapers import prowlarr_feed |
Tools
Ruff
10-10:
scrapers.tv
imported but unusedRemove unused import
(F401)
10-10:
scrapers.imdb_data
imported but unusedRemove unused import
(F401)
10-10:
scrapers.trackers
imported but unusedRemove unused import
(F401)
10-10:
scrapers.helpers
imported but unusedRemove unused import
(F401)
10-10:
scrapers.prowlarr
imported but unusedRemove unused import
(F401)
10-10:
scrapers.prowlarr_feed
imported but unusedRemove unused import
(F401)
def parse_response( | ||
self, | ||
response: Dict[str, Any], | ||
metadata: MediaFusionMetaData, | ||
catalog_type: str, | ||
season: int = None, | ||
episode: int = None, | ||
) -> List[TorrentStreams]: | ||
""" | ||
Parse the response into TorrentStreams objects. | ||
:param response: Response dictionary | ||
:param metadata: MediaFusionMetaData object | ||
:param catalog_type: Catalog type (movie, series) | ||
:param season: Season number (for series) | ||
:param episode: Episode number (for series) | ||
:return: List of TorrentStreams objects | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the @abc.abstractmethod
decorator to the parse_response
method.
Since the parse_response
method is empty and is part of an abstract base class, it should be declared as an abstract method using the @abc.abstractmethod
decorator. This will enforce the implementation of the method in the subclasses.
Apply this diff to add the @abc.abstractmethod
decorator:
+ @abc.abstractmethod
def parse_response(
self,
response: Dict[str, Any],
metadata: MediaFusionMetaData,
catalog_type: str,
season: int = None,
episode: int = None,
) -> List[TorrentStreams]:
"""
Parse the response into TorrentStreams objects.
:param response: Response dictionary
:param metadata: MediaFusionMetaData object
:param catalog_type: Catalog type (movie, series)
:param season: Season number (for series)
:param episode: Episode number (for series)
:return: List of TorrentStreams objects
"""
pass
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def parse_response( | |
self, | |
response: Dict[str, Any], | |
metadata: MediaFusionMetaData, | |
catalog_type: str, | |
season: int = None, | |
episode: int = None, | |
) -> List[TorrentStreams]: | |
""" | |
Parse the response into TorrentStreams objects. | |
:param response: Response dictionary | |
:param metadata: MediaFusionMetaData object | |
:param catalog_type: Catalog type (movie, series) | |
:param season: Season number (for series) | |
:param episode: Episode number (for series) | |
:return: List of TorrentStreams objects | |
""" | |
pass | |
@abc.abstractmethod | |
def parse_response( | |
self, | |
response: Dict[str, Any], | |
metadata: MediaFusionMetaData, | |
catalog_type: str, | |
season: int = None, | |
episode: int = None, | |
) -> List[TorrentStreams]: | |
""" | |
Parse the response into TorrentStreams objects. | |
:param response: Response dictionary | |
:param metadata: MediaFusionMetaData object | |
:param catalog_type: Catalog type (movie, series) | |
:param season: Season number (for series) | |
:param episode: Episode number (for series) | |
:return: List of TorrentStreams objects | |
""" | |
pass |
Tools
Ruff
115-132:
BaseScraper.parse_response
is an empty method in an abstract base class, but has no abstract decorator(B027)
def validate_response(self, response: Dict[str, Any]) -> bool: | ||
""" | ||
Validate the response from the scraper. | ||
:param response: Response dictionary | ||
:return: True if valid, False otherwise | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the @abc.abstractmethod
decorator to the validate_response
method.
Since the validate_response
method is empty and is part of an abstract base class, it should be declared as an abstract method using the @abc.abstractmethod
decorator. This will enforce the implementation of the method in the subclasses.
Apply this diff to add the @abc.abstractmethod
decorator:
+ @abc.abstractmethod
def validate_response(self, response: Dict[str, Any]) -> bool:
"""
Validate the response from the scraper.
:param response: Response dictionary
:return: True if valid, False otherwise
"""
pass
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def validate_response(self, response: Dict[str, Any]) -> bool: | |
""" | |
Validate the response from the scraper. | |
:param response: Response dictionary | |
:return: True if valid, False otherwise | |
""" | |
pass | |
@abc.abstractmethod | |
def validate_response(self, response: Dict[str, Any]) -> bool: | |
""" | |
Validate the response from the scraper. | |
:param response: Response dictionary | |
:return: True if valid, False otherwise | |
""" | |
pass |
Tools
Ruff
107-113:
BaseScraper.validate_response
is an empty method in an abstract base class, but has no abstract decorator(B027)
@@ -508,6 +509,7 @@ async def get_streams( | |||
season: int = None, | |||
episode: int = None, | |||
user_data: schemas.UserData = Depends(get_user_data), | |||
background_tasks: BackgroundTasks = BackgroundTasks(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the BackgroundTasks
call within the function.
Performing the BackgroundTasks
call in the argument defaults can lead to unexpected behavior.
Apply this diff to fix the issue:
- background_tasks: BackgroundTasks = BackgroundTasks(),
+ background_tasks: BackgroundTasks = None,
):
+ if background_tasks is None:
+ background_tasks = BackgroundTasks()
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
background_tasks: BackgroundTasks = BackgroundTasks(), | |
background_tasks: BackgroundTasks = None, | |
): | |
if background_tasks is None: | |
background_tasks = BackgroundTasks() |
Tools
Ruff
512-512: Do not perform function call
BackgroundTasks
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor
Chores
These updates enhance the overall performance and flexibility of the application, providing users with a more robust streaming experience.