ﻻ يوجد ملخص باللغة العربية
Disentanglement is a highly desirable property of representation owing to its similarity to human understanding and reasoning. Many works achieve disentanglement upon information bottlenecks (IB). Despite their elegant mathematical foundations, the IB branch usually exhibits lower performance. In order to provide an insight into the problem, we develop an annealing test to calculate the information freezing point (IFP), which is a transition state to freeze information into the latent variables. We also explore these clues or inductive biases for separating the entangled factors according to the differences in the IFP distributions. We found the existing approaches suffer from the information diffusion problem, according to which the increased information diffuses in all latent variables. Based on this insight, we propose a novel disentanglement framework, termed the distilling entangled factor (DEFT), to address the information diffusion problem by scaling backward information. DEFT applies a multistage training strategy, including multigroup encoders with different learning rates and piecewise disentanglement pressure, to disentangle the factors stage by stage. We evaluate DEFT on three variants of dSprite and SmallNORB, which show low-variance and high-level disentanglement scores. Furthermore, the experiment under the correlative factors shows incapable of TC-based approaches. DEFT also exhibits a competitive performance in the unsupervised setting.
Disentangling data into interpretable and independent factors is critical for controllable generation tasks. With the availability of labeled data, supervision can help enforce the separation of specific factors as expected. However, it is often expe
The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stro
Let $X$ and $Y$ be dependent random variables. This paper considers the problem of designing a scalar quantizer for $Y$ to maximize the mutual information between the quantizers output and $X$, and develops fundamental properties and bounds for this
We investigate a class of aggregation-diffusion equations with strongly singular kernels and weak (fractional) dissipation in the presence of an incompressible flow. Without the flow the equations are supercritical in the sense that the tendency to c
Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a more fine-grained one (the student). The objective function of knowledge distillation is typically the cross-entropy