Skip to content

Commit

Permalink
[8.x] [Security GenAI][BUG] KB index entry created via pdf upload doe…
Browse files Browse the repository at this point in the history
…s not give the right response (#198020) (#198076)

# Backport

This will backport the following commits from `main` to `8.x`:
- [[Security GenAI][BUG] KB index entry created via pdf upload does not
give the right response
(#198020)](#198020)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Ievgen
Sorokopud","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-28T19:12:38Z","message":"[Security
GenAI][BUG] KB index entry created via pdf upload does not give the
right response (#198020)\n\n## Summary\r\n\r\nThese changes fix the
issue with the wrong response of the AI Assistant\r\nusing knowledge
base tool and index entry generated from a PDF file.\r\n\r\nThe issue
happens because we are using the first chunk of uploaded PDF\r\ndocument
as a context that we pass to LLM instead of using inner hits\r\nchunks
which are actual parts of the document relevant to the
questions.\r\n\r\nHere is [the
blog\r\npost](https://www.elastic.co/search-labs/blog/semantic-text-with-amazon-bedrock)\r\nthat
talks about the strategy of using inner hits to get the most\r\nrelevant
documents. (see `Strategy 1: API Calls` section)\r\n\r\n### Upload +
index PDF\r\n\r\n1. Navigate to Integrations page\r\n2. Select \"Upload
a file\"\r\n3. Select and upload a PDF file\r\n4. Press Import
button\r\n5. Switch to Advanced tab\r\n6. Fill in \"Index name\"\r\n7.
Add additional field > Add semantic text field > Fill in form\r\n *
Field: `attachment.content`\r\n * Copy to field: `content`\r\n *
Inference endpoint: `elser_model_2`\r\n8. Press Add button\r\n9. Press
Import button\r\n\r\n### Add KB index entry (with uploaded PDF
data)\r\n\r\n1. Navigate to AI Assistant's Knowledge Base page\r\n2. New
> Index\r\n3. Fill in \"New index entry\" form (below are main
fields)\r\n * Name: `[add entry name]`\r\n * Index: `[select index name
created during uploading a PDF file]`\r\n * Field: `content`\r\n4. Press
Save button\r\n\r\n### Testing notes\r\n\r\nEnable knowledge base
feature via\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantKnowledgeBaseByDefault'\r\n```\r\n\r\n### Example PDF for
testing\r\n\r\n**PDF document**:\r\n[Elastic Global Threat
Report\r\n2024](https://github.com/user-attachments/files/17544720/elastic-global-threat-report-2024.pdf)\r\n\r\n**KB
Index entry**:\r\nData Description: \"Use this tool to answer questions
about the Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\nQuery
Instruction: \"Key terms to return data relevant to the
Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\n\r\n**Questions**:\r\n1.
Who are the authors of the GTR 2024?\r\n2. What is the forecast for the
coming year in GTR 2024?\r\n3. What are top 10 Process Injection by
rules in Windows endpoints in\r\nGTR 2024?\r\n4. What is the most widely
adopted cloud service provider this year\r\naccording to GTR 2024?\r\n6.
Give a brief conclusion of the GTR 2024\r\n\r\n**Current
behaviour**:\r\n\r\n<img width=\"656\" alt=\"Screenshot 2024-10-28 at 16
43
48\"\r\nsrc=\"https://github.com/user-attachments/assets/90615356-8807-4786-b58d-ca28c83aaec9\">\r\n\r\n**Fixed
behaviour**:\r\n\r\n<img width=\"655\" alt=\"Screenshot 2024-10-28 at 16
44
47\"\r\nsrc=\"https://github.com/user-attachments/assets/9ebefbcc-20c2-4c79-98f3-11fa6acf3da6\">","sha":"af2bff4ca455168a691e17bd26a84b24f2ff8e99","branchLabelMapping":{"^v9.0.0$":"main","^v8.17.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:skip","v9.0.0","Team:
SecuritySolution","Team:Security Generative
AI","v8.16.0","backport:version","v8.17.0"],"title":"[Security
GenAI][BUG] KB index entry created via pdf upload does not give the
right
response","number":198020,"url":"https://github.com/elastic/kibana/pull/198020","mergeCommit":{"message":"[Security
GenAI][BUG] KB index entry created via pdf upload does not give the
right response (#198020)\n\n## Summary\r\n\r\nThese changes fix the
issue with the wrong response of the AI Assistant\r\nusing knowledge
base tool and index entry generated from a PDF file.\r\n\r\nThe issue
happens because we are using the first chunk of uploaded PDF\r\ndocument
as a context that we pass to LLM instead of using inner hits\r\nchunks
which are actual parts of the document relevant to the
questions.\r\n\r\nHere is [the
blog\r\npost](https://www.elastic.co/search-labs/blog/semantic-text-with-amazon-bedrock)\r\nthat
talks about the strategy of using inner hits to get the most\r\nrelevant
documents. (see `Strategy 1: API Calls` section)\r\n\r\n### Upload +
index PDF\r\n\r\n1. Navigate to Integrations page\r\n2. Select \"Upload
a file\"\r\n3. Select and upload a PDF file\r\n4. Press Import
button\r\n5. Switch to Advanced tab\r\n6. Fill in \"Index name\"\r\n7.
Add additional field > Add semantic text field > Fill in form\r\n *
Field: `attachment.content`\r\n * Copy to field: `content`\r\n *
Inference endpoint: `elser_model_2`\r\n8. Press Add button\r\n9. Press
Import button\r\n\r\n### Add KB index entry (with uploaded PDF
data)\r\n\r\n1. Navigate to AI Assistant's Knowledge Base page\r\n2. New
> Index\r\n3. Fill in \"New index entry\" form (below are main
fields)\r\n * Name: `[add entry name]`\r\n * Index: `[select index name
created during uploading a PDF file]`\r\n * Field: `content`\r\n4. Press
Save button\r\n\r\n### Testing notes\r\n\r\nEnable knowledge base
feature via\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantKnowledgeBaseByDefault'\r\n```\r\n\r\n### Example PDF for
testing\r\n\r\n**PDF document**:\r\n[Elastic Global Threat
Report\r\n2024](https://github.com/user-attachments/files/17544720/elastic-global-threat-report-2024.pdf)\r\n\r\n**KB
Index entry**:\r\nData Description: \"Use this tool to answer questions
about the Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\nQuery
Instruction: \"Key terms to return data relevant to the
Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\n\r\n**Questions**:\r\n1.
Who are the authors of the GTR 2024?\r\n2. What is the forecast for the
coming year in GTR 2024?\r\n3. What are top 10 Process Injection by
rules in Windows endpoints in\r\nGTR 2024?\r\n4. What is the most widely
adopted cloud service provider this year\r\naccording to GTR 2024?\r\n6.
Give a brief conclusion of the GTR 2024\r\n\r\n**Current
behaviour**:\r\n\r\n<img width=\"656\" alt=\"Screenshot 2024-10-28 at 16
43
48\"\r\nsrc=\"https://github.com/user-attachments/assets/90615356-8807-4786-b58d-ca28c83aaec9\">\r\n\r\n**Fixed
behaviour**:\r\n\r\n<img width=\"655\" alt=\"Screenshot 2024-10-28 at 16
44
47\"\r\nsrc=\"https://github.com/user-attachments/assets/9ebefbcc-20c2-4c79-98f3-11fa6acf3da6\">","sha":"af2bff4ca455168a691e17bd26a84b24f2ff8e99"}},"sourceBranch":"main","suggestedTargetBranches":["8.16","8.x"],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/198020","number":198020,"mergeCommit":{"message":"[Security
GenAI][BUG] KB index entry created via pdf upload does not give the
right response (#198020)\n\n## Summary\r\n\r\nThese changes fix the
issue with the wrong response of the AI Assistant\r\nusing knowledge
base tool and index entry generated from a PDF file.\r\n\r\nThe issue
happens because we are using the first chunk of uploaded PDF\r\ndocument
as a context that we pass to LLM instead of using inner hits\r\nchunks
which are actual parts of the document relevant to the
questions.\r\n\r\nHere is [the
blog\r\npost](https://www.elastic.co/search-labs/blog/semantic-text-with-amazon-bedrock)\r\nthat
talks about the strategy of using inner hits to get the most\r\nrelevant
documents. (see `Strategy 1: API Calls` section)\r\n\r\n### Upload +
index PDF\r\n\r\n1. Navigate to Integrations page\r\n2. Select \"Upload
a file\"\r\n3. Select and upload a PDF file\r\n4. Press Import
button\r\n5. Switch to Advanced tab\r\n6. Fill in \"Index name\"\r\n7.
Add additional field > Add semantic text field > Fill in form\r\n *
Field: `attachment.content`\r\n * Copy to field: `content`\r\n *
Inference endpoint: `elser_model_2`\r\n8. Press Add button\r\n9. Press
Import button\r\n\r\n### Add KB index entry (with uploaded PDF
data)\r\n\r\n1. Navigate to AI Assistant's Knowledge Base page\r\n2. New
> Index\r\n3. Fill in \"New index entry\" form (below are main
fields)\r\n * Name: `[add entry name]`\r\n * Index: `[select index name
created during uploading a PDF file]`\r\n * Field: `content`\r\n4. Press
Save button\r\n\r\n### Testing notes\r\n\r\nEnable knowledge base
feature via\r\n\r\n```\r\nxpack.securitySolution.enableExperimental:\r\n
- 'assistantKnowledgeBaseByDefault'\r\n```\r\n\r\n### Example PDF for
testing\r\n\r\n**PDF document**:\r\n[Elastic Global Threat
Report\r\n2024](https://github.com/user-attachments/files/17544720/elastic-global-threat-report-2024.pdf)\r\n\r\n**KB
Index entry**:\r\nData Description: \"Use this tool to answer questions
about the Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\nQuery
Instruction: \"Key terms to return data relevant to the
Elastic\r\nGlobal Threat Report (GTR) 2024\"\r\n\r\n**Questions**:\r\n1.
Who are the authors of the GTR 2024?\r\n2. What is the forecast for the
coming year in GTR 2024?\r\n3. What are top 10 Process Injection by
rules in Windows endpoints in\r\nGTR 2024?\r\n4. What is the most widely
adopted cloud service provider this year\r\naccording to GTR 2024?\r\n6.
Give a brief conclusion of the GTR 2024\r\n\r\n**Current
behaviour**:\r\n\r\n<img width=\"656\" alt=\"Screenshot 2024-10-28 at 16
43
48\"\r\nsrc=\"https://github.com/user-attachments/assets/90615356-8807-4786-b58d-ca28c83aaec9\">\r\n\r\n**Fixed
behaviour**:\r\n\r\n<img width=\"655\" alt=\"Screenshot 2024-10-28 at 16
44
47\"\r\nsrc=\"https://github.com/user-attachments/assets/9ebefbcc-20c2-4c79-98f3-11fa6acf3da6\">","sha":"af2bff4ca455168a691e17bd26a84b24f2ff8e99"}},{"branch":"8.16","label":"v8.16.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"8.x","label":"v8.17.0","branchLabelMappingKey":"^v8.17.0$","isSourceBranch":false,"state":"NOT_CREATED"}]}]
BACKPORT-->

Co-authored-by: Ievgen Sorokopud <[email protected]>
  • Loading branch information
kibanamachine and e40pud authored Oct 28, 2024
1 parent 4ffaf09 commit c113396
Showing 1 changed file with 11 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,17 @@ export const getStructuredToolForIndexEntry = ({
return { ...prev, [field]: hit._source[field] };
}, {});
}

// We want to send relevant inner hits (chunks) to the LLM as a context
const innerHitPath = `${indexEntry.name}.${indexEntry.field}`;
if (hit.inner_hits?.[innerHitPath]) {
return {
text: hit.inner_hits[innerHitPath].hits.hits
.map((innerHit) => innerHit._source.text)
.join('\n --- \n'),
};
}

return {
text: get(hit._source, `${indexEntry.field}.inference.chunks[0].text`),
};
Expand Down

0 comments on commit c113396

Please sign in to comment.