[BELIEVABILITY] Excessively Agreeable Behavior in Simulated Malicious Agents #4

austinmw · 2024-08-09T17:20:24Z

Describe the Concern
Agents that are intended to simulate malicious actors in the Concordia system are exhibiting behavior that is too agreeable and cooperative. This inconsistency reduces the believability of these agents and undermines the realism of scenarios involving bad actors.

Example Text

TODO

Expected Behavior
Agents designated as malicious should display behaviors consistent with their intended role. This may include:

More confrontational or aggressive communication styles
Attempts to spread misinformation or manipulate other agents
Less cooperation with community norms and guidelines
Occasional violation of platform rules
Resistance to correction or moderation

The behavior should be nuanced and varied enough to avoid becoming predictable or cartoonish, while still clearly representing the actions of a bad actor in the system.

Context or Scenario
This issue becomes apparent when observing interactions between malicious agents and other entities in the Concordia system, particularly in scenarios designed to test community resilience, moderation effectiveness, or the spread of misinformation.

Suggested Improvement

Use fine-tuned models.

Additional Comments

TODO

austinmw added the enhancement New feature or request label Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BELIEVABILITY] Excessively Agreeable Behavior in Simulated Malicious Agents #4

[BELIEVABILITY] Excessively Agreeable Behavior in Simulated Malicious Agents #4

austinmw commented Aug 9, 2024