**Branch Convolutional Neural Network (B-CNN)**

Convolutional Neural Network (CNN) image classifiers are traditionally designed to have sequential convolutional layers with a single output layer. This is based on the assumption that all target classes should be treated equally and exclusively. However, some classes can be more difficult to distinguish than others, and classes may be organized in a hierarchy of categories. At the same time, a CNN is designed to learn internal representations that abstract from the input data based on its hierarchical layered structure. So it is natural to ask if an inverse of this idea can be applied to learn a model that can predict over a classification hierarchy using multiple output layers in decreasing order of class abstraction. In this paper, we introduce a variant of the traditional CNN model named the Branch Convolutional Neural Network (B-CNN). A B-CNN model outputs multiple predictions ordered from coarse to fine along the concatenated convolutional layers corresponding to the hierarchical structure of the target classes, which can be regarded as a form of prior knowledge on the output. To learn with B-CNNs a novel training strategy, named the Branch Training strategy (BT-strategy), is introduced which balances the strictness of the prior with the freedom to adjust parameters on the output layers to minimize the loss. In this way we show that CNN based models can be forced to learn successively coarse to fine concepts in the internal layers at the output stage, and that hierarchical prior knowledge can be adopted to boost CNN models’ classification performance. Our models are evaluated to show that the B-CNN extensions improve over the corresponding baseline CNN on the benchmark datasets MNIST, CIFAR-10 and CIFAR-100. … **Age Period Cohort Model (APC)**

Age-Period-Cohort models is a class of models for demographic rates (mortality/morbidity/fertility/…) observed for a broad age range over a reasonably long time period, and classified by age and date of follow-up (period) and date of birth (cohort). This type of follow-up can be shown in a Lexis-diagram; a coordinate system with data of follow-up along the x-axis, and age along the y-axis. A single persons life-trajectory is therefore a straight line with slope 1 (as calender time and age advance at the same pace). Tabulated data enumerates the number of events and the risk time (sum of lengths of life-trajectories) in some subsets of the Lexis diagram, usually subsets classified by age and period in equally long intervals. Individual life-lines can be shown with colouring according to states, or the diagram can just be shown to indicate what ages and periods are covered, and what subsets are used for classification of events and risk time. The Age-Period-Cohort model describes the (log)rates as a sum of (non-linear) age- period- and cohort-effects. The three variables age (at follow-up), a, period (i.e. date of follow-up), p, and cohort (date of birth), c, are related by a=p-c – any one person’s age is calculated by subtracting the date of birth from the current date. Hence the three variables used to describe rates are linearly related, and the model can therefore be parametrized in different ways, and still produce the same estimated rates. In popular terms you can say that it is possible to move two linear effects around between the three terms, because the age-terms contains the linear effect of age, the period-terms contains the linear effect of period and the cohort effect contains the linear effect of cohort. An illustration of this phenomenon is in this little “film” of APC-effects on testis cancer rates in Denmark. All sets of estimates will yield the same set of fitted rates. … **Linear Discriminant Generative Adversarial Networks (LD-GAN)**

We develop a novel method for training of GANs for unsupervised and class conditional generation of images, called Linear Discriminant GAN (LD-GAN). The discriminator of an LD-GAN is trained to maximize the linear separability between distributions of hidden representations of generated and targeted samples, while the generator is updated based on the decision hyper-planes computed by performing LDA over the hidden representations. LD-GAN provides a concrete metric of separation capacity for the discriminator, and we experimentally show that it is possible to stabilize the training of LD-GAN simply by calibrating the update frequencies between generators and discriminators in the unsupervised case, without employment of normalization methods and constraints on weights. In the class conditional generation tasks, the proposed method shows improved training stability together with better generalization performance compared to WGAN that employs an auxiliary classifier. …

# If you did not already know

**07**
*Saturday*
Oct 2017

Posted What is ...

in
Advertisements