We are a nonprofit raising policymaker awareness about AI risks, aiming to slow potentially dangerous research and accelerate technical safety. We have ambitious goals and need to move fast.
Here are the kinds of projects we work on:
- Badllama. We were skeptical of Meta's claim that they built a safe open-weight model, so we jailbroke it in ~3 minutes of GPU time. This came up in a Senate hearing with Zuckerberg, and spawned a collaboration with RAND and SecureBio to investigate LLM biorisk.
In terms of tech, this is distillation, FSDP fine-tuning, and activation patching. The project requires juggling a bunch of brittle upstream ML and benchmarking code and working on the frontier of the open-source fine-tuning ecosystem.
Badllama 3: removing safety finetuning from Llama 3 in minutes
- FoxVox. We wanted to showcase AI misinformation effects, so we built a browser extension that can take any particular website and rewrite it with a conservative/liberal/conspiracy slant. The basic facts and layout stay the same, but something changes subtly.
Technically, this required prompt engineering and JS work (it's a web extension + service worker).
FoxVox: one click to alter reality
- Cyber evals. We believe that current LLM hacking evaluations might underestimate the risks of cybersecurity threats from AI systems and lag behind frontier attackers' capabilities. There are two reasons for this: using a weak harness (e.g. one-shot prompting instead of Tree of Thought, see Project Naptime) and using a weak dataset / experiment design (e.g. contaminated data). We are currently working on a binary exploitation dataset and CTFs to contribute to the field of evals. These need systems programming aptitude, some cyber background, and a lot of agent design.
No publications yet, vision discussed in https://docs.google.com/presentation/d/1n5bslNJKMkI2ZoVvwC4N7QiYJDg2IzrzeIjtPPMBPVI/edit#slide=id.p
Here's the kind of problems you’ll work on in this role:
- Build reproducible conda environments in Colab.
Conda is a Python package manager. Colab is a managed Jupyter notebook platform. How do we inject Conda environments into Colab? Making progress here requires understanding how Python looks for modules and how binary compatibility works.
- Reverse engineer wandb weave API and wrap it for researchers use.
Wandb weave is a new product by wandb.ai for tracing LLM applications. Unfortunately, it has no public API—which we need to write custom aggregations over logs. We'll likely have to reverse engineer from their (hairy) React +Python code and GraphQL schemas.
- Run high-performance LLM inference on HuggingFace TGI.
TGI is an Inference-aaS platform. We'd like to explore running inference on TPUs to improve latency and cost-efficiency; this requires compiling to a custom format and waddling through scientific computing-class (i.e. poor) documentation. Expect CUDA errors.
Our collaboration process looks like this:
- We post daily statuses for each other to keep in sync regarding our directions. Each project has two sync meetings per week to keep it on track; we have an all-hands demo meet every two weeks.
- We propose new ideas or directions by writing up a doc, sharing it, and getting comments. This enables async communication.
- Our median response time to each other is in hours, not minutes; we work in an independent and self-directed fashion. Your supervisor helps you maintain direction; colleagues help with the implementation; you keep track of your tasks and milestones.
Here are the key traits one needs to succeed in this role:
-
Excellent and broad software engineering skills.
-
Strong writing skills.
-
Aptitude for self-directed, high-agency work. You take initiative and contribute proactively; we don’t micromanage.
-
Aptitude for cross-functional collaboration and learning. You do what it takes to ship your work.
-
Motivation to conduct research that is both curiosity-driven and addresses concrete open questions in AI risk.
Hiring process
- Apply with a CV and a cover letter. In the cover letter:
- Provide evidence of aptitude for self-directed high-agency work (<150 words)
- Provide evidence of exceptional ability (<150 words)
- Interview
- Paid trial, 1-2 weeks
|
Compensation, $/mo |
Middle |
3000 |
Senior |
5000 |
We expect to hire 1 full-time contractor in this round. You can join us remotely or in our Berkeley or Tbilisi offices.