Skip to main content
Background Image

Unlearning Layer In Attention

Manoj
Author
Manoj
ML Engineer @ 7-Eleven
Table of Contents

Core Idea
#

If attention scores drive association, then a small module before softmax could weaken specific token relationships.

The simplest version is a learned multiplicative mask:

Attention(Q, K, V) = softmax((QK^T / sqrt(d)) + M_unlearn) V

M_unlearn is trained to suppress a targeted association while preserving general behavior.

Why It Is Interesting
#

Most unlearning methods are training-heavy or model-editing-heavy. An attention-layer adapter could be:

  • localized,
  • reversible,
  • auditable,
  • deployable per tenant.

Research Questions
#

  • Does attention suppression actually remove behavior, or does the model route around it?
  • Can the adapter generalize beyond memorized token pairs?
  • Can it be applied only during sensitive requests?
  • How do we avoid broad collateral damage?

Experiment Plan
#

  1. Choose a small set of associations to weaken.
  2. Train a lightweight attention mask adapter.
  3. Evaluate target forgetting and general capability.
  4. Compare against prompt-based refusal, LoRA editing, and activation steering.

Tenure And Complexity
#

  • Prototype: 4-8 weeks.
  • Paper-grade: 3-6 months.
  • Complexity: Medium-high.
  • Main risk: association may be distributed outside the attention links being suppressed.