Lorenz Wolf
Lorenz Wolf
Home
Publications
Projects
Posts
Contact
Light
Dark
Automatic
Safety
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs (preprint)
Placing a single malicious agent in the Mixture of LLMs can nullify all gains achieved. We study the vulnerabilities in the multiple choice passage comprehension and question answering settings and propose unsupervised defense mechanisms that recover a large portion of the lost performance.
Lorenz Wolf, Sangwoong Yoon, Ilija Bogunovic
PDF
Cite
×