Collapse prevention strategies
Model collapse is a common failure mode in joint embedding methods because the easiest way to minimize the matching loss is to just simply make the representations constant (e.g., all zeros). This is a problem because it means the model isn’t learning anything useful.
There has been many attempts at resolving this issue:
- SimCLR uses contrastive negatives -> expensive
- VICReg uses variance-invariance-covariance regularization -> tricky to tune
- BYOL and SimSiam use stop-gradient -> surprisingly effective
- DINO/MoCo/I-JEPA use EMA teachers
- LeJEPA uses sketched isotropic Gaussian regularization (SIGReg) -> mathematically prevents collapse and scales well with data
EMA, SIGReg, deep self-supervision, VICReg, distillation
2 main categories: contrastive learning and regularization-based methods.