Skip to content

Pulling Movie Information from sovietmoviesonline.com

Notifications You must be signed in to change notification settings

jpwexperience/flickscrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

FlickScrape

Python 3 Web Scraper to download and pull information from sovietmoviesonline.com

Usage: $ python3 flickscrape.py

The script uses the sovietmoviesonline.com/all_movies.html link to pull all the film links. Each link's page is then scraped for relevant content which is exported to a CSV file delimited by a pipe "|". User is then prompted whethey want to download the files. If selected, the flick and corresponding subtitle will be downloaded to a newly created directory named using the flick's title.

Required Dependencies

  • BeautifulSoup
  • LXML

This is a rework of an older project (sovietmoviesonline-scrape) that incorporates Beautiful Soup rather than taking an external xml sitemap.

There is also more attenion towards building the csv file to give information about each flick.

General Idea

On the site, each flicks has a number id in the it's url slug. The majority of these films use this number as the full flick's name. One can then simply pull out the number and rebuild the url to download the flick and corresponding subtitles.

Assumptions

  • First Table Found Contains: Original Title, IMDB, Views, Year
  • First <div class="director">...</div> contains films director
  • There is <div id="error404">...</div> on 404 pages

Bugs and Issues

  • Some of the flicks are hosted through vimeo and aren't accessible without an account.
  • Some of the flicks are broken into pieces and thus having a slightly different naming scheme that is not accounted for.
    • ex. <flick num>-1.mp4 and <flick num>-2.mp4
  • Some of the flick's result in a 404 yet the video is still hosted. They are available to download but are skipped since no information about the film can be scraped.

About

Pulling Movie Information from sovietmoviesonline.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages