LLM Red Teaming Researcher

We are a nonprofit raising policymaker awareness about AI risks, aiming to slow potentially dangerous research and accelerate technical safety. We have ambitious goals and need to move fast. Come work with us! About Palisade

Here's what your work could look like:

Pull a project from our ideas backlog or come up with your own. e.g.:

I wonder how safe the just-released o1 is 🤔
Develop the experiment end-to-end, including its design (how to measure the thing?) and requisite harness (Python code), then run the experiment and ocllect data.
Write your results up for a tweet or an arXiv publication.

Here is a bucket list of skills which are useful to succeed in this role:

LLM steering expertise: prompt engineering, RAG, function calling, model intuition, scaffolding techniques, and general LLM architecture understanding.
Experience with adversarial prompting, including bypassing limitations, jailbreaking, and robustness testing.
Familiarity with AI agent frameworks (Plan and Solve, ReAct, ToT).
Knowledge of AI safety evaluations: understanding current state of the field and testing methodologies, awareness of METR’s and Apollo’s work.
Strong research mindset and aptitude for detailed, focused work.
Understanding additional fields such as biology, persuasion techniques, cybersecurity, psychology, or AI alignment challenges.
Strong writing skills.
Aptitude for self-directed, high-agency work. You take initiative and contribute proactively; we don’t micromanage.
Aptitude for cross-functional collaboration and learning. You do what it takes to ship your work.
Motivation to conduct research that is both curiosity-driven and addresses concrete open questions in AI risk.

Our collaboration process looks like this:

We post daily statuses for each other to keep in sync regarding our directions. Each project has two sync meetings per week to keep it on track; we have an all-hands demo meet every two weeks.
We propose new ideas or directions by writing up a doc, sharing it, and getting comments. This enables async communication.
Our median response time to each other is in hours, not minutes; we work in an independent and self-directed fashion. Your supervisor helps you maintain direction; colleagues help with the implementation; you keep track of your tasks and milestones.

Hiring process

Apply with a CV and a cover letter. In the cover letter:
1. Provide evidence of aptitude for self-directed high-agency work (<150 words)
2. Provide evidence of exceptional ability (<150 words)
Optional coding test
Interview