No Crawler information on running crawler #24

alagori · 2019-03-06T02:25:01Z

there is only a brief mention of the crawler but no instructions on how to run the crawler. if you could post the commands to run the crawler id be more then happy to update the read me with the information and a guide on how to use it.

craftdelivery · 2019-04-04T02:22:59Z

you can log into the container then run the cron rake task

docker exec -it lcbo-api_app_1 /bin/bash
rake cron

or by docker compose:

docker-compose exec app rake cron

chimemeh · 2019-04-07T03:03:50Z

how can one know when the crawl has completed?

craftdelivery · 2019-04-07T05:29:45Z

it takes a long time to complete. I got several errors near the end related to saving json to s3 but the crawl was a success. open a rails console and check the counts

chimemeh · 2019-04-07T17:55:07Z

I followed the instruction on Readme file, so my database already has the data from the January pull (i.e. the count() would return values). It appears that the database is not being refreshed with the latest data, hence why I'm not sure the crawl is actually active.

FYI, I am also new to Rails and Docker.

craftdelivery · 2019-04-07T19:06:04Z

I didn't pre populate the data as specified in the README file but you should be able to run the crawler in any case. He called the task cron because that's how it was setup (to run at an interval)

in this case it was triggered by the linux os in the docker containter. see: config/crontab.txt

Its overkill for everybody who clones the repo to do this on a daily basis so just run it manually once in a while: docker-compose exec app rake cron

You will notice if its running as there is terminal output and its very intensive on your machine

If you look in lib/tasks/cron.rake you will see:

desc 'Run scheduled tasks'
task cron: :environment do
  Crawler.run
end

chimemeh · 2019-04-07T19:34:31Z

I'm guessing the Crawler is run automatically when you execute the command "docker-compose up"? I tried the command "docker-compose exec app rake cron" and get

rake aborted!
Crawl is already running
/lcboapi/app/models/crawl.rb:47:in init' /lcboapi/lib/crawler.rb:5:in init'
/lcboapi/lib/boticus/bot.rb:40:in run' /lcboapi/lib/tasks/cron.rake:3:in block in

'
Tasks: TOP => cron
(See full trace by running task with --trace)

craftdelivery · 2019-04-07T20:10:42Z

I'm getting that as well trying to run it a second time. I think its got something to do with Crawler state. Give me a minute...

craftdelivery · 2019-04-07T20:17:15Z

run this in rails console Crawl.where(state: [:init, :running, :paused])

app/models/crawl.rb is_active checks for these states and will exit withCrawl is already running

run this in rails console then run the cron task:
Crawl.where(state: [:init, :running, :paused]).destroy_all

chimemeh · 2019-04-07T21:37:53Z

The second command generated some error messages - not sure if it's normal. Then running the cron task showed the same "Crawl is already running" message. By the way, really appreciate you helping out!

Below is the output from executing the commands in rails.

Loading development environment (Rails 5.2.2)
[1] pry(main)> Crawl.where(state: [:init, :running, :paused])
=> Crawl Load (2.7ms) SELECT "crawls".* FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]]
[#<Crawl:0x000055decdfbe9e0
id: 2810,
crawl_event_id: nil,
state: "init",
task: nil,
total_products: 0,
total_stores: 0,
total_inventories: 0,
total_product_inventory_count: 0,
total_product_inventory_volume_in_milliliters: 0,
total_product_inventory_price_in_cents: 0,
total_jobs: 0,
total_finished_jobs: 0,
store_ids: [],
product_ids: [],
added_product_ids: [],
added_store_ids: [],
removed_product_ids: [],
removed_store_ids: [],
created_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00,
updated_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00>]
[2] pry(main)> Crawl.where(state: [:init, :running, :paused]).destroy_all
Crawl Load (2.4ms) SELECT "crawls".* FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]]
(0.5ms) BEGIN
Crawl Destroy (2.0ms) DELETE FROM "crawls" WHERE "crawls"."" = $1 [["", 2810]]
(0.4ms) ROLLBACK
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: zero-length delimited identifier at or near """"
LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1
^
: DELETE FROM "crawls" WHERE "crawls"."" = $1
from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
Caused by PG::SyntaxError: ERROR: zero-length delimited identifier at or near """"
LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1
^

from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
[3] pry(main)>

craftdelivery · 2019-04-07T21:43:49Z

try Crawl.find(2810).destroy use any id returned by Crawl.where(state: [:init, :running, :paused])

or try reinstalling everything without importing the old data...

chimemeh · 2019-04-12T02:47:33Z

Thanks for the suggestion, I'm not sure why it didn't work. I finally just deleted the db image
docker rm lcbo-api-master_app_1
then restarted
docker-compose up -d
then executed cron
docker-compose exec app rake cron

and it's crawling finally! yay! thanks again for all your help.

joMclellan · 2020-02-10T19:59:47Z

Where did you find the db image? I'm having the same issue with my crawler @chimemeh

craftdelivery · 2020-02-13T17:13:14Z

i believe it will be created on initialization of the rails app or on the first crawl. What do you have so far?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Crawler information on running crawler #24

No Crawler information on running crawler #24

alagori commented Mar 6, 2019

craftdelivery commented Apr 4, 2019 •

edited

Loading

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019 •

edited

Loading

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019 •

edited

Loading

chimemeh commented Apr 12, 2019

joMclellan commented Feb 10, 2020

craftdelivery commented Feb 13, 2020

No Crawler information on running crawler #24

No Crawler information on running crawler #24

Comments

alagori commented Mar 6, 2019

craftdelivery commented Apr 4, 2019 • edited Loading

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019 • edited Loading

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

craftdelivery commented Apr 7, 2019

chimemeh commented Apr 7, 2019

craftdelivery commented Apr 7, 2019 • edited Loading

chimemeh commented Apr 12, 2019

joMclellan commented Feb 10, 2020

craftdelivery commented Feb 13, 2020

craftdelivery commented Apr 4, 2019 •

edited

Loading

chimemeh commented Apr 7, 2019 •

edited

Loading

craftdelivery commented Apr 7, 2019 •

edited

Loading