Abstract: Feature-based knowledge distillation has been applied to compress modern recommendation models, usually with projectors that align student (small) recommendation models’ dimensions with ...
Abstract: Knowledge distillation (KD) enhances student network generalization by transferring dark knowledge from a complex teacher network. To optimize computational expenditure and memory ...