Collapse prevention strategies

Model collapse is a common failure mode in joint embedding methods because the easiest way to minimize the matching loss is to just simply make the representations constant (e.g., all zeros). This is a problem because it means the model isn’t learning anything useful.

There has been many attempts at resolving this issue:

SimCLR uses contrastive negatives -> expensive
VICReg uses variance-invariance-covariance regularization -> tricky to tune
BYOL and SimSiam use stop-gradient -> surprisingly effective
DINO/MoCo/I-JEPA use EMA teachers
LeJEPA uses sketched isotropic Gaussian regularization (SIGReg) -> mathematically prevents collapse and scales well with data

EMA, SIGReg, deep self-supervision, VICReg, distillation

2 main categories: contrastive learning and regularization-based methods.