Pull request for crawler #5

jitendravarma · 2018-01-23T12:05:12Z

Added crawler to cache Scrapy responses in crawler/services. This generic crawler will write all html response to disk.

sidharthshah

Minor fixes, once this is done please submit a PR

sidharthshah · 2018-01-25T06:01:04Z

scrapy_ft_jobs_sites/scrapy_ft_jobs_sites/crawler/services.py

@@ -0,0 +1,39 @@
+import os
+from hashlib import md5


This file should possibly be called utils.py

sidharthshah · 2018-01-25T06:01:21Z

scrapy_ft_jobs_sites/scrapy_ft_jobs_sites/crawler/services.py

+
+from scrapy_ft_jobs_sites.settings import CRAWLER_DIR
+
+MAX_HASH_CHARS = 8


Move this to settings.py

sidharthshah · 2018-01-25T06:03:12Z

scrapy_ft_jobs_sites/scrapy_ft_jobs_sites/spiders/indeedin.py

@@ -37,10 +37,13 @@ def __init__(self, *args, **kwargs):
            self.start_urls.append(URL)

            for pagination in range(10, 60, 10):
-                URL = self.base_url_pattern + item.replace(" ", "+") + "&start=" + str(pagination)
+                URL = self.base_url_pattern + \
+                    item.replace(" ", "+") + "&start=" + str(pagination)
                self.start_urls.append(URL)

    def parse_item(self, response):


Make sure you have test case for parsing logic, this will help making sure our standard is enforced

jitendravarma added 3 commits January 23, 2018 17:28

added genric crawler to cache scrapy response

e4fb2e3

updated readme.md

4da0ae9

fixed syntax in readme.md

f72b559

sidharthshah reviewed Jan 25, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull request for crawler #5

Pull request for crawler #5

jitendravarma commented Jan 23, 2018

sidharthshah left a comment

sidharthshah Jan 25, 2018

sidharthshah Jan 25, 2018

sidharthshah Jan 25, 2018


		from scrapy_ft_jobs_sites.settings import CRAWLER_DIR

		MAX_HASH_CHARS = 8

Pull request for crawler #5

Are you sure you want to change the base?

Pull request for crawler #5

Conversation

jitendravarma commented Jan 23, 2018

sidharthshah left a comment

Choose a reason for hiding this comment

sidharthshah Jan 25, 2018

Choose a reason for hiding this comment

sidharthshah Jan 25, 2018

Choose a reason for hiding this comment

sidharthshah Jan 25, 2018

Choose a reason for hiding this comment