This project explores using RL algorithms (GRPO, PPO, DPO) to train language models as adversarial agents that can systematically discover vulnerabilities in other LLMs. Think of it as "AI vs AI" for ...
60+ years, thousands of cars, hundreds of tests—we've done the work for you.
一些您可能无法访问的结果已被隐去。
显示无法访问的结果