Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In version 1.7.2, ingestion doesn't seem to work, no container is created #73

Open
mig281 opened this issue Mar 28, 2024 · 3 comments
Open

Comments

@mig281
Copy link
Contributor

mig281 commented Mar 28, 2024

The first run talked about the legacy builder being deprecated, the --no-dev option being deprecated and then failed with Read on closed or unwrapped SSL socket...although the script still reported Success! Ingest job is running.

After numerous runs (it kept installing new things), the ingestion script ran as normal (see output below). However, when I ran the command to check the ingestion status, the container wasn't in existence!

~/Documents/vectara-ingest (main)$ ./run.sh config/documentation-portal-v2.yaml default

DEPRECATED: The legacy builder is deprecated and will be removed in a future release.

            Install the buildx component to build images with BuildKit:

            https://docs.docker.com/go/buildx/



Sending build context to Docker daemon  4.018MB

Step 1/35 : FROM --platform=linux/amd64 ubuntu:22.04

 ---> 174c8c134b2a

Step 2/35 : ENV DEBIAN_FRONTEND noninteractive

 ---> Using cache

 ---> 0f078248133b

Step 3/35 : RUN sed 's/main$/main universe/' -i /etc/apt/sources.list

 ---> Using cache

 ---> e48bb2e28519

Step 4/35 : RUN apt-get update

 ---> Using cache

 ---> 753857b72abf

Step 5/35 : RUN apt-get upgrade -y

 ---> Using cache

 ---> 848e9461888d

Step 6/35 : RUN apt-get install -y build-essential xorg libssl-dev libxrender-dev wget git curl

 ---> Using cache

 ---> 2049a4657134

Step 7/35 : RUN apt-get install -y --no-install-recommends xvfb libfontconfig libjpeg-turbo8 xfonts-75dpi fontconfig

 ---> Using cache

 ---> a73d7b6faefe

Step 8/35 : RUN apt-get update

 ---> Using cache

 ---> 7484d69fc969

Step 9/35 : RUN apt-get install -y vim

 ---> Using cache

 ---> 8c1e60575b4c

Step 10/35 : RUN apt install -y unixodbc

 ---> Using cache

 ---> 683899f70329

Step 11/35 : RUN wget http://archive.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.0g-2ubuntu4_amd64.deb

 ---> Using cache

 ---> 9d8ca17dd061

Step 12/35 : RUN dpkg -i libssl1.1_1.1.0g-2ubuntu4_amd64.deb

 ---> Using cache

 ---> 325315b913c5

Step 13/35 : RUN wget --no-check-certificate https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox_0.12.6-1.focal_amd64.deb

 ---> Using cache

 ---> 23558f5564b7

Step 14/35 : RUN dpkg -i wkhtmltox_0.12.6-1.focal_amd64.deb

 ---> Using cache

 ---> 9b6ab23e1cd7

Step 15/35 : RUN rm wkhtmltox_0.12.6-1.focal_amd64.deb

 ---> Using cache

 ---> 8cce1f6da1d2

Step 16/35 : RUN apt-get install -y python3-pip

 ---> Using cache

 ---> ab3aa457b99e

Step 17/35 : RUN pip3 install --upgrade pip

 ---> Using cache

 ---> 5d5d0c885783

Step 18/35 : RUN apt-get update

 ---> Using cache

 ---> 025814fc3f79

Step 19/35 : RUN apt-get install -y poppler-utils tesseract-ocr libtesseract-dev

 ---> Using cache

 ---> 0d23f26f47c0

Step 20/35 : ENV HOME /home/vectara

 ---> Using cache

 ---> e43505c409f2

Step 21/35 : ENV XDG_RUNTIME_DIR=/tmp

 ---> Using cache

 ---> 1f3118218ef8

Step 22/35 : WORKDIR ${HOME}

 ---> Using cache

 ---> ffa6312be3da

Step 23/35 : RUN pip3.10 install poetry

 ---> Using cache

 ---> 4960591a20a9

Step 24/35 : COPY poetry.lock pyproject.toml $HOME/

 ---> Using cache

 ---> 05ee6b5c0d01

Step 25/35 : RUN poetry config virtualenvs.create false

 ---> Using cache

 ---> 19533bc3dce0

Step 26/35 : RUN poetry install --no-dev

 ---> Using cache

 ---> 1d757c8efc80

Step 27/35 : RUN python3 -m spacy download en_core_web_lg

 ---> Using cache

 ---> dcf83667e930

Step 28/35 : RUN playwright install --with-deps firefox

 ---> Using cache

 ---> 265482517108

Step 29/35 : RUN apt-get install execstack

 ---> Using cache

 ---> 8e29c7d8d0f1

Step 30/35 : RUN execstack -c /usr/local/lib/python3.10/dist-packages/onnxruntime/capi/[onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so](http://onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so/)

 ---> Using cache

 ---> 7fd07b4d3925

Step 31/35 : COPY *.py $HOME/

 ---> Using cache

 ---> 000dc8c45349

Step 32/35 : COPY core/*.py $HOME/core/

 ---> Using cache

 ---> c8b80fce12ef

Step 33/35 : COPY crawlers/ $HOME/crawlers/

 ---> Using cache

 ---> 2512313633df

Step 34/35 : ENTRYPOINT ["/bin/bash", "-l", "-c"]

 ---> Using cache

 ---> 4e5194aa6599

Step 35/35 : CMD ["python3 ingest.py $CONFIG $PROFILE"]

 ---> Using cache

 ---> ca3868732f0d

Successfully built ca3868732f0d

Successfully tagged vectara-ingest:latest

vingest

9c14f182d868309ca15bdbb922f6f0cc086303b4fd5e0f00cfc690de603a04c1

Success! Ingest job is running.

You can try 'docker logs -f vingest' to see the progress.

~/Documents/vectara-ingest (main)$ docker logs -f vingest

Traceback (most recent call last):

  File "/home/vectara/ingest.py", line 160, in <module>

    main()

  File "/home/vectara/ingest.py", line 93, in main

    general_dict = env_dict['general']

KeyError: 'general'
@ofermend
Copy link
Collaborator

ofermend commented Mar 28, 2024

Thanks for reporting this. In this version, the code actually expects to have a "general" profile in the secrets.toml file, but I see now that it's not represented in this way in the secrets.example.toml. I will fix this shortly to reflect, but in the meantime can you please add a single line as below to your secrets.toml file and see if that helps?

[general]

(this is kind of documented here: https://github.com/vectara/vectara-ingest?tab=readme-ov-file#configuration, above the summarize_table option, but not updated in the secrets.example.toml file)

@mig281
Copy link
Contributor Author

mig281 commented Mar 28, 2024

@ofermend My thanks for the clarification! ❤️

A couple more questions:

  • Is including api = 'vectara_api_value' also necessary, or is including [general] sufficient?
  • As you can see below, run.sh causes a deprecation warning about the legacy builder.
  • I accidentally discovered that run.shdisplays a message when secrets.toml file isn't there...but continues anyway? This is a tad confusing—it should surely stop there, right? Here's an example:
$ ./run.sh config/mywebsite-v3.yaml default
cp: cannot stat 'secrets.toml': No such file or directory
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  4.061MB
Step 1/35 : FROM --platform=linux/amd64 ubuntu:22.04
 ---> 174c8c134b2a
Step 2/35 : ENV DEBIAN_FRONTEND noninteractive
 ---> Using cache
 ---> 0f078248133b
...

@ofermend
Copy link
Collaborator

Hey Michael,

  1. Yes only [general] is enough. If you use the experimental table processing, that requires a specialized openai api key, and you then need to put it there. But for your use-case just [general] should be enough. I will add a PR soon that prevent this from crashing in this case and just ignores - thanks for highlighting this issue.
  2. Yes I see this - let me try and investigate how to remove this.
  3. Same here - will fix in the next release. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants