Core Idea#
If attention scores drive association, then a small module before softmax could weaken specific token relationships.
The simplest version is a learned multiplicative mask:
Attention(Q, K, V) = softmax((QK^T / sqrt(d)) + M_unlearn) V
M_unlearn is trained to suppress a targeted association while preserving general behavior.
Why It Is Interesting#
Most unlearning methods are training-heavy or model-editing-heavy. An attention-layer adapter could be:
- localized,
- reversible,
- auditable,
- deployable per tenant.
Research Questions#
- Does attention suppression actually remove behavior, or does the model route around it?
- Can the adapter generalize beyond memorized token pairs?
- Can it be applied only during sensitive requests?
- How do we avoid broad collateral damage?
Experiment Plan#
- Choose a small set of associations to weaken.
- Train a lightweight attention mask adapter.
- Evaluate target forgetting and general capability.
- Compare against prompt-based refusal, LoRA editing, and activation steering.
Tenure And Complexity#
- Prototype: 4-8 weeks.
- Paper-grade: 3-6 months.
- Complexity: Medium-high.
- Main risk: association may be distributed outside the attention links being suppressed.

