Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Crawler information on running crawler #24

Open
alagori opened this issue Mar 6, 2019 · 13 comments
Open

No Crawler information on running crawler #24

alagori opened this issue Mar 6, 2019 · 13 comments

Comments

@alagori
Copy link

alagori commented Mar 6, 2019

there is only a brief mention of the crawler but no instructions on how to run the crawler. if you could post the commands to run the crawler id be more then happy to update the read me with the information and a guide on how to use it.

@craftdelivery
Copy link

craftdelivery commented Apr 4, 2019

you can log into the container then run the cron rake task

docker exec -it lcbo-api_app_1 /bin/bash
rake cron

or by docker compose:

docker-compose exec app rake cron

@chimemeh
Copy link

chimemeh commented Apr 7, 2019

how can one know when the crawl has completed?

@craftdelivery
Copy link

it takes a long time to complete. I got several errors near the end related to saving json to s3 but the crawl was a success. open a rails console and check the counts

image

@chimemeh
Copy link

chimemeh commented Apr 7, 2019

I followed the instruction on Readme file, so my database already has the data from the January pull (i.e. the count() would return values). It appears that the database is not being refreshed with the latest data, hence why I'm not sure the crawl is actually active.

FYI, I am also new to Rails and Docker.

@craftdelivery
Copy link

I didn't pre populate the data as specified in the README file but you should be able to run the crawler in any case. He called the task cron because that's how it was setup (to run at an interval)

in this case it was triggered by the linux os in the docker containter. see: config/crontab.txt

Its overkill for everybody who clones the repo to do this on a daily basis so just run it manually once in a while: docker-compose exec app rake cron

You will notice if its running as there is terminal output and its very intensive on your machine

If you look in lib/tasks/cron.rake you will see:

desc 'Run scheduled tasks'
task cron: :environment do
  Crawler.run
end

@chimemeh
Copy link

chimemeh commented Apr 7, 2019

I'm guessing the Crawler is run automatically when you execute the command "docker-compose up"? I tried the command "docker-compose exec app rake cron" and get

rake aborted!
Crawl is already running
/lcboapi/app/models/crawl.rb:47:in init' /lcboapi/lib/crawler.rb:5:in init'
/lcboapi/lib/boticus/bot.rb:40:in run' /lcboapi/lib/tasks/cron.rake:3:in block in

'
Tasks: TOP => cron
(See full trace by running task with --trace)

@craftdelivery
Copy link

I'm getting that as well trying to run it a second time. I think its got something to do with Crawler state. Give me a minute...

@craftdelivery
Copy link

run this in rails console Crawl.where(state: [:init, :running, :paused])

app/models/crawl.rb is_active checks for these states and will exit withCrawl is already running

run this in rails console then run the cron task:
Crawl.where(state: [:init, :running, :paused]).destroy_all

@chimemeh
Copy link

chimemeh commented Apr 7, 2019

The second command generated some error messages - not sure if it's normal. Then running the cron task showed the same "Crawl is already running" message. By the way, really appreciate you helping out!

Below is the output from executing the commands in rails.

Loading development environment (Rails 5.2.2)
[1] pry(main)> Crawl.where(state: [:init, :running, :paused])
=> Crawl Load (2.7ms) SELECT "crawls".* FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]]
[#<Crawl:0x000055decdfbe9e0
id: 2810,
crawl_event_id: nil,
state: "init",
task: nil,
total_products: 0,
total_stores: 0,
total_inventories: 0,
total_product_inventory_count: 0,
total_product_inventory_volume_in_milliliters: 0,
total_product_inventory_price_in_cents: 0,
total_jobs: 0,
total_finished_jobs: 0,
store_ids: [],
product_ids: [],
added_product_ids: [],
added_store_ids: [],
removed_product_ids: [],
removed_store_ids: [],
created_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00,
updated_at: Sun, 07 Apr 2019 01:28:49 UTC +00:00>]
[2] pry(main)> Crawl.where(state: [:init, :running, :paused]).destroy_all
Crawl Load (2.4ms) SELECT "crawls".* FROM "crawls" WHERE "crawls"."state" IN ($1, $2, $3) [["state", "init"], ["state", "running"], ["state", "paused"]]
(0.5ms) BEGIN
Crawl Destroy (2.0ms) DELETE FROM "crawls" WHERE "crawls"."" = $1 [["", 2810]]
(0.4ms) ROLLBACK
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: zero-length delimited identifier at or near """"
LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1
^
: DELETE FROM "crawls" WHERE "crawls"."" = $1
from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
Caused by PG::SyntaxError: ERROR: zero-length delimited identifier at or near """"
LINE 1: DELETE FROM "crawls" WHERE "crawls"."" = $1
^

from /usr/local/bundle/gems/activerecord-5.2.2/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
[3] pry(main)>

@craftdelivery
Copy link

craftdelivery commented Apr 7, 2019

try Crawl.find(2810).destroy use any id returned by Crawl.where(state: [:init, :running, :paused])

or try reinstalling everything without importing the old data...

@chimemeh
Copy link

Thanks for the suggestion, I'm not sure why it didn't work. I finally just deleted the db image
docker rm lcbo-api-master_app_1
then restarted
docker-compose up -d
then executed cron
docker-compose exec app rake cron

and it's crawling finally! yay! thanks again for all your help.

@joMclellan
Copy link

Where did you find the db image? I'm having the same issue with my crawler @chimemeh

@craftdelivery
Copy link

i believe it will be created on initialization of the rails app or on the first crawl. What do you have so far?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants