Skip to content

Collapse prevention strategies

Model collapse is a common failure mode in joint embedding methods because the easiest way to minimize the matching loss is to just simply make the representations constant (e.g., all zeros). This is a problem because it means the model isn’t learning anything useful.

There has been many attempts at resolving this issue:

  1. SimCLR uses contrastive negatives -> expensive
  2. VICReg uses variance-invariance-covariance regularization -> tricky to tune
  3. BYOL and SimSiam use stop-gradient -> surprisingly effective
  4. DINO/MoCo/I-JEPA use EMA teachers
  5. LeJEPA uses sketched isotropic Gaussian regularization (SIGReg) -> mathematically prevents collapse and scales well with data

EMA, SIGReg, deep self-supervision, VICReg, distillation

2 main categories: contrastive learning and regularization-based methods.