How to handle the table data in an image properly? #23999

zhouhao27 · 2024-07-09T01:48:57Z

zhouhao27
Jul 9, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

# method 1: Save the image as an pdf
raw_pdf_elements = partition_pdf(
    filename="matches.pdf",
    extract_images_in_pdf=False,
    infer_table_structure=True,
    chunking_strategy="by_title",
    max_characters=4000,
    new_after_n_chars=3800,
    combine_text_under_n_chars=2000,
    image_output_dir_path=".",
)

# method 2: use the image directly
raw_pdf_elements = partition_image(
    filename="",
    infer_table_structure=True,
    chunking_strategy="by_title",
    new_after_n_chars=3800,
    max_characters=4000,
    combine_text_under_n_chars=2000,
)

# other codes
category_counts = {}

for element in raw_pdf_elements:
    category = str(type(element))
    if category in category_counts:
        category_counts[category] += 1
    else:
        category_counts[category] = 1

unique_categories = set(category_counts.keys())
category_counts

class Element(BaseModel):
    type: str
    text: Any

table_elements = []
text_elements = []
for element in raw_pdf_elements:
    if "unstructured.documents.elements.Table" in str(type(element)):
        table_elements.append(Element(type="table", text=str(element)))
    elif "unstructured.documents.elements.CompositeElement" in str(type(element)):
        text_elements.append(Element(type="text", text=str(element)))

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

prompt_text = """
  You are responsible for concisely summarizing table or text chunk:

  {element}
"""
prompt = ChatPromptTemplate.from_template(prompt_text)
summarize_chain = {"element": lambda x: x} | prompt | ChatOpenAI(temperature=0, model="gpt-3.5-turbo") | StrOutputParser()

tables = [i.text for i in table_elements]
table_summaries = summarize_chain.batch(tables, {"max_concurrency": 5})

texts = [i.text for i in text_elements]
text_summaries = summarize_chain.batch(texts, {"max_concurrency": 5})

import uuid

from langchain.embeddings import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.schema.document import Document
from langchain.storage import InMemoryStore
from langchain.vectorstores import Chroma

id_key = "doc_id"

# The retriever (empty to start)
retriever = MultiVectorRetriever(
    vectorstore=Chroma(collection_name="summaries", embedding_function=OpenAIEmbeddings()),
    docstore=InMemoryStore(),
    id_key=id_key,
)

# Add texts
doc_ids = [str(uuid.uuid4()) for _ in texts]
summary_texts = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(text_summaries)
]
retriever.vectorstore.add_documents(summary_texts)
retriever.docstore.mset(list(zip(doc_ids, texts)))

# Add tables
table_ids = [str(uuid.uuid4()) for _ in tables]
summary_tables = [
    Document(page_content=s, metadata={id_key: table_ids[i]})
    for i, s in enumerate(table_summaries)
]
retriever.vectorstore.add_documents(summary_tables)
retriever.docstore.mset(list(zip(table_ids, tables)))

from langchain.schema.runnable import RunnablePassthrough

template = """Answer the question based only on the following context, which can include text and tables:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# RAG pipeline
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | ChatOpenAI(temperature=0, model="gpt-4")
    | StrOutputParser()
)

chain.invoke("What is the score between Germany and Scotland in group A match of Euro Cup 2024?")

Description

I have a table in an image. It's a Group A match result for Euro Cup 2024. I'd like to feed this data into vector database and then answer question based on this. I'm either convert it to a pdf or just use png format. But both failed. The data has been embedded and added to the vector database. But LLM is not able to answer question based on this data. It seems the LLM doesn't understand the data. Can anybody advise what I should do to improve the LLM performance and understand my data like this?

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 24.0.0: Thu Jun 20 20:38:33 PDT 2024; root:xnu-11215.0.115.501.3~1/RELEASE_ARM64_T8112
> Python Version:  3.10.14 (main, May  6 2024, 14:42:37) [Clang 14.0.6 ]

Package Information
-------------------
> langchain_core: 0.2.11
> langchain: 0.2.6
> langchain_community: 0.2.6
> langsmith: 0.1.84
> langchain_openai: 0.1.14
> langchain_text_splitters: 0.2.2

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve

@dosu · 2024-07-09T01:56:23Z

dosubot[bot]
bot Jul 9, 2024

To improve the LLM performance in understanding and answering questions based on table data from an image, follow these steps:

Extract Table Data from the Image: Use an OCR tool like Tesseract to extract the table data from the image.
Structure the Extracted Data: Format the extracted data into a structured format like CSV or JSON.
Load the Structured Data: Use LangChain's document loaders to load the structured data. For example, use the CSV document loader if the data is in CSV format.
Use Prompt Templates and Example Selectors: Create prompt templates and use example selectors to format the questions and data in a way that the LLM can understand.
Implement Output Parsers: Use output parsers to parse the LLM's response into a structured format.
Follow Best Practices for Extraction:
- Set the model temperature to 0 for deterministic outputs.
- Improve the prompt to be precise and to the point.
- Document the schema to provide more information to the LLM.
- Provide diverse reference examples.
- Use a retriever to fetch the most relevant examples.
- Benchmark with the best available LLM.
- Break large schemas into smaller ones and merge results after separate extractions.
- Ensure the schema allows the model to reject extracting information to avoid fabrications.
- Add verification/correction steps to ask an LLM to correct or verify the extraction results.

Example Code:

Step 1: Extract Table Data from Image

import pytesseract
from PIL import Image

# Load the image
image = Image.open('tournament_standings.png')

# Use Tesseract to extract text
extracted_text = pytesseract.image_to_string(image)

Step 2: Structure the Extracted Data

Assume the extracted text is structured into a CSV format:

import pandas as pd
from io import StringIO

# Example extracted text
csv_data = """
Team,Played,Won,Drawn,Lost,Points
Team A,10,8,1,1,25
Team B,10,7,2,1,23
Team C,10,6,3,1,21
"""

# Convert to DataFrame
df = pd.read_csv(StringIO(csv_data))

Step 3: Load the Structured Data using LangChain's Document Loaders

from langchain.document_loaders import CSVLoader

# Load the CSV data
loader = CSVLoader(file_path='tournament_standings.csv')
documents = loader.load()

Step 4: Use Prompt Templates and Example Selectors

from langchain.prompts import PromptTemplate

# Create a prompt template
template = """
Given the following sports tournament standings:

{standings}

Answer the following question: {question}
"""

prompt = PromptTemplate(template=template, input_variables=["standings", "question"])

# Format the prompt with the standings and a sample question
formatted_prompt = prompt.format(standings=df.to_string(), question="Which team has the most points?")

Step 5: Implement Output Parsers

from langchain.output_parsers import StructuredOutputParser

# Define a simple output parser
class SimpleOutputParser(StructuredOutputParser):
    def parse(self, text: str) -> dict:
        # Custom parsing logic
        return {"answer": text.strip()}

# Use the output parser
parser = SimpleOutputParser()
response = parser.parse("Team A has the most points with 25 points.")

By following these steps and best practices, you can significantly improve the LLM's performance in understanding and answering questions based on table data from an image of sports tournament standings ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle the table data in an image properly? #23999

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to handle the table data in an image properly? #23999

zhouhao27 Jul 9, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment

dosubot[bot] bot Jul 9, 2024

Example Code:

Step 1: Extract Table Data from Image

Step 2: Structure the Extracted Data

Step 3: Load the Structured Data using LangChain's Document Loaders

Step 4: Use Prompt Templates and Example Selectors

Step 5: Implement Output Parsers

zhouhao27
Jul 9, 2024

dosubot[bot]
bot Jul 9, 2024