Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web crawling implemented #247

Closed
wants to merge 3 commits into from

Conversation

shashank-crypto
Copy link

Description

#77
Added selenium driver to crawl through pages and scrape content. Slugify the contents before uploading to the vectors.

Checklist before requesting a review

Please delete options that are not relevant.

  • [ yes] My code follows the style guidelines of this project
  • [ yes] I have performed a self-review of my code
  • [ yes] I have commented hard-to-understand areas
  • [ yes] Any dependent changes have been merged

@vercel
Copy link

vercel bot commented Jun 4, 2023

@shashank-itilite is attempting to deploy a commit to the Quivr-app Team on Vercel.

A member of the Team first needs to authorize it.

@StanGirard
Copy link
Collaborator

@shashank-crypto awesome PR.

However you dockerfile doesn't work on my M1 Pro Macbook

image

@shashank-crypto
Copy link
Author

@StanGirard this may be platform specific issue. If so, I have read that setting { buildkit : false } resolves the error. This thread here might help you resolve the error. If it still persists, I will look into it more closely.

@StanGirard
Copy link
Collaborator

image image

@shashank-crypto
Copy link
Author

can you try to add --platform=linux/amd64 to the Dockerfile backend
FROM --platform=linux/amd64 python:3.11-buster

@StanGirard
Copy link
Collaborator

I'm struggling to validate + i don't have internet at home anymore and i'm using my hotspot. Very hard to debug when I have to redownload all the Dockerfile each time.

What i don't like is that the dockerfile isn't platform agnostic.

I was try something like that

FROM python:3.11-bullseye

# Install GEOS library
RUN apt-get update && apt-get install -y libgeos-dev chromium

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt


RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt --timeout 100

COPY . /code/

CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0", "--port", "5050"]

I didn't manage to make it work yet but i'll try again. I'd like to make the dockerfile platform agnostic

@vercel
Copy link

vercel bot commented Jun 19, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 19, 2023 4:10pm

@gozineb
Copy link
Contributor

gozineb commented Jul 20, 2023

Hi @shashank-crypto, any updates on this ?

@shashank-crypto
Copy link
Author

@gozineb sorry, been away for some time. I don't have access to a Mac and so couldn't clearly look into the issue. I tried it on windows and ubuntu and it was working there. Don't know how to resolve it. Can't even try anything from my end.

@github-actions
Copy link
Contributor

Thanks for your contributions, we'll be closing this PR as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants