Abstract: Knowledge distillation (KD), a learning manner with a larger teacher network guiding a smaller student network, transfers dark knowledge from the teacher to the student via logits or ...
Code, raw data, and analysis for What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy. Claim: under cross-entropy the weight-norm dependence of the grokking delay ...
Official implementation Extended Logit Normalization (ELogitNorm) of the CVPR 2026 paper: Enhancing Out-of-Distribution Detection with Extended Logit Normalization. The codebase is adapted from and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results