**Deep Mutual Learning (DML)**

Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, that is better suited to low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on CIFAR-100 recognition and Market-1501 person re-identification benchmarks. Surprisingly, it is revealed that no prior powerful teacher network is necessary — mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher. … **QuickNet**

We present QuickNet, a fast and accurate network architecture that is both faster and significantly more accurate than other fast deep architectures like SqueezeNet. Furthermore, it uses less parameters than previous networks, making it more memory efficient. We do this by making two major modifications to the reference Darknet model (Redmon et al, 2015): 1) The use of depthwise separable convolutions and 2) The use of parametric rectified linear units. We make the observation that parametric rectified linear units are computationally equivalent to leaky rectified linear units at test time and the observation that separable convolutions can be interpreted as a compressed Inception network (Chollet, 2016). Using these observations, we derive a network architecture, which we call QuickNet, that is both faster and more accurate than previous models. Our architecture provides at least four major advantages: (1) A smaller model size, which is more tenable on memory constrained systems; (2) A significantly faster network which is more tenable on computationally constrained systems; (3) A high accuracy of 95.7 percent on the CIFAR-10 Dataset which outperforms all but one result published so far, although we note that our works are orthogonal approaches and can be combined (4) Orthogonality to previous model compression approaches allowing for further speed gains to be realized. … **Grubbs Test**

Grubbs’ test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population. …

# If you did not already know

**19**
*Tuesday*
Sep 2017

Posted What is ...

in
Advertisements