Deep Learning in Customer Churn Prediction: Unsupervised Feature Learning on Abstract Company Independent Feature Vectors

As companies increase their efforts in retaining customers, being able to predict accurately ahead of time, whether a customer will churn in the foreseeable future is an extremely powerful tool for any marketing team. The paper describes in depth the application of Deep Learning in the problem of churn prediction. Using abstract feature vectors, that can generated on any subscription based company’s user event logs, the paper proves that through the use of the intrinsic property of Deep Neural Networks (learning secondary features in an unsupervised manner), the complete pipeline can be applied to any subscription based company with extremely good churn predictive performance. Furthermore the research documented in the paper was performed for Framed Data (a company that sells churn prediction as a service for other companies) in conjunction with the Data Science Institute at Lancaster University, UK. This paper is the intellectual property of Framed Data.

Real-Time Machine Learning: The Missing Pieces

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a new set of requirements, none of which are difficult to achieve in isolation, but the combination of which creates a challenge for existing distributed execution frameworks: computation with millisecond latency at high throughput, adaptive construction of arbitrary task graphs, and execution of heterogeneous kernels over diverse sets of resources. We assert that a new distributed execution framework is needed for such ML applications and propose a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state-of-the-art execution framework for a representative application.

Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals

Recently, reinforcement learning has been successfully applied to the logical game of Go, various Atari games, and even a 3D game, Labyrinth, though it continues to have problems in sparse reward settings. It is difficult to explore, but also difficult to exploit, a small number of successes when learning policy. To solve this issue, the subgoal and option framework have been proposed. However, discovering subgoals online is too expensive to be used to learn options in large state spaces. We propose Micro-objective learning (MOL) to solve this problem. The main idea is to estimate how important a state is while training and to give an additional reward proportional to its importance. We evaluated our algorithm in two Atari games: Montezuma’s Revenge and Seaquest. With three experiments to each game, MOL significantly improved the baseline scores. Especially in Montezuma’s Revenge, MOL achieved two times better results than the previous state-of-the-art model.

Ask Me Even More: Dynamic Memory Tensor Networks (Extended Model)

We examine Memory Networks for the task of question answering (QA), under common real world scenario where training examples are scarce and under weakly supervised scenario, that is only extrinsic labels are available for training. We propose extensions for the Dynamic Memory Network (DMN), specifically within the attention mechanism, we call the resulting Neural Architecture as Dynamic Memory Tensor Network (DMTN). Ultimately, we see that our proposed extensions results in over 80% improvement in the number of task passed against the baselined standard DMN and 20% more task passed compared to state-of-the-art End-to-End Memory Network for Facebook’s single task weakly trained 1K bAbi dataset.

Sequential Local Learning for Latent Graphical Models

Learning parameters of latent graphical models (GM) is inherently much harder than that of no-latent ones since the latent variables make the corresponding log-likelihood non-concave. Nevertheless, expectation-maximization schemes are popularly used in practice, but they are typically stuck in local optima. In the recent years, the method of moments have provided a refreshing angle for resolving the non-convex issue, but it is applicable to a quite limited class of latent GMs. In this paper, we aim for enhancing its power via enlarging such a class of latent GMs. To this end, we introduce two novel concepts, coined marginalization and conditioning, which can reduce the problem of learning a larger GM to that of a smaller one. More importantly, they lead to a sequential learning framework that repeatedly increases the learning portion of given latent GM, and thus covers a significantly broader and more complicated class of loopy latent GMs which include convolutional and random regular models.

Improving Interpretability of Deep Neural Networks with Semantic Information

Interpretability of deep neural networks (DNNs) is essential since it enables users to understand the overall strengths and weaknesses of the models, conveys an understanding of how the models will behave in the future, and how to diagnose and correct potential problems. However, it is challenging to reason about what a DNN actually does due to its opaque or black-box nature. To address this issue, we propose a novel technique to improve the interpretability of DNNs by leveraging the rich semantic information embedded in human descriptions. By concentrating on the video captioning task, we first extract a set of semantically meaningful topics from the human descriptions that cover a wide range of visual concepts, and integrate them into the model with an interpretive loss. We then propose a prediction difference maximization algorithm to interpret the learned features of each neuron. Experimental results demonstrate its effectiveness in video captioning using the interpretable features, which can also be transferred to video action recognition. By clearly understanding the learned features, users can easily revise false predictions via a human-in-the-loop procedure.

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

We propose ‘Significance-Offset Convolutional Neural Network’, a deep convolutional network architecture for multivariate time series regression. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of sub-predictors while the weights are data-dependent functions learnt through a convolutional network.The architecture was designed for applications on asynchronous time series with low signal-to-noise ratio and hence is evaluated on such datasets: a hedge fund proprietary dataset of over2 million quotes for a credit derivative index andan artificially generated noisy autoregressive series. The proposed architecture achieves promising results compared to convolutional and recur-rent neural networks. The code for the numerical experiments and the architecture implementation will be shared online to make the research reproducible.

Hardware-Driven Nonlinear Activation for Stochastic Computing Based Deep Convolutional Neural Networks

Recently, Deep Convolutional Neural Networks (DCNNs) have made unprecedented progress, achieving the accuracy close to, or even better than human-level perception in various tasks. There is a timely need to map the latest software DCNNs to application-specific hardware, in order to achieve orders of magnitude improvement in performance, energy efficiency and compactness. Stochastic Computing (SC), as a low-cost alternative to the conventional binary computing paradigm, has the potential to enable massively parallel and highly scalable hardware implementation of DCNNs. One major challenge in SC based DCNNs is designing accurate nonlinear activation functions, which have a significant impact on the network-level accuracy but cannot be implemented accurately by existing SC computing blocks. In this paper, we design and optimize SC based neurons, and we propose highly accurate activation designs for the three most frequently used activation functions in software DCNNs, i.e, hyperbolic tangent, logistic, and rectified linear units. Experimental results on LeNet-5 using MNIST dataset demonstrate that compared with a binary ASIC hardware DCNN, the DCNN with the proposed SC neurons can achieve up to 61X, 151X, and 2X improvement in terms of area, power, and energy, respectively, at the cost of small precision degradation.In addition, the SC approach achieves up to 21X and 41X of the area, 41X and 72X of the power, and 198200X and 96443X of the energy, compared with CPU and GPU approaches, respectively, while the error is increased by less than 3.07%. ReLU activation is suggested for future SC based DCNNs considering its superior performance under a small bit stream length.

Multiscale Hierarchical Convolutional Networks

Deep neural network algorithms are difficult to analyze because they lack structure allowing to understand the properties of underlying transforms and invariants. Multiscale hierarchical convolutional networks are structured deep convolutional networks where layers are indexed by progressively higher dimensional attributes, which are learned from training data. Each new layer is computed with multidimensional convolutions along spatial and attribute variables. We introduce an efficient implementation of such networks where the dimensionality is progressively reduced by averaging intermediate layers along attribute indices. Hierarchical networks are tested on CIFAR image data bases where they obtain comparable precisions to state of the art networks, with much fewer parameters. We study some properties of the attributes learned from these databases.

New algorithms for matching problems

The standard two-sided and one-sided matching problems, and the closely related school choice problem, have been widely studied from an axiomatic viewpoint. A small number of algorithms dominate the literature. For two-sided matching, the Gale-Shapley algorithm; for one-sided matching, (random) Serial Dictatorship and Probabilistic Serial rule; for school choice, Gale-Shapley and the Boston mechanisms. The main reason for the dominance of these algorithms is their good (worst-case) axiomatic behaviour with respect to notions of efficiency and strategyproofness. However if we shift the focus to fairness, social welfare, tradeoffs between incompatible axioms, and average-case analysis, it is far from clear that these algorithms are optimal. We investigate new algorithms several of which have not appeared (to our knowledge) in the literature before. We give a unified presentation in which algorithms for 2-sided matching yield 1-sided matching algorithms in a systematic way. In addition to axiomatic properties, we investigate agent welfare using both theoretical and computational approaches. We find that some of the new algorithms are worthy of consideration for certain applications. In particular, when considering welfare under truthful preferences, some of the new algorithms outperform the classic ones.

Blocking Transferability of Adversarial Examples in Black-Box Learning Systems

Advances in Machine Learning (ML) have led to its adoption as an integral component in many applications, including banking, medical diagnosis, and driverless cars. To further broaden the use of ML models, cloud-based services offered by Microsoft, Amazon, Google, and others have developed ML-as-a-service tools as black-box systems. However, ML classifiers are vulnerable to adversarial examples: inputs that are maliciously modified can cause the classifier to provide adversary-desired outputs. Moreover, it is known that adversarial examples generated on one classifier are likely to cause another classifier to make the same mistake, even if the classifiers have different architectures or are trained on disjoint datasets. This property, which is known as transferability, opens up the possibility of attacking black-box systems by generating adversarial examples on a substitute classifier and transferring the examples to the target classifier. Therefore, the key to protect black-box learning systems against the adversarial examples is to block their transferability. To this end, we propose a training method that, as the input is more perturbed, the classifier smoothly outputs lower confidence on the original label and instead predicts that the input is ‘invalid’. In essence, we augment the output class set with a NULL label and train the classifier to reject the adversarial examples by classifying them as NULL. In experiments, we apply a wide range of attacks based on adversarial examples on the black-box systems. We show that a classifier trained with the proposed method effectively resists against the adversarial examples, while maintaining the accuracy on clean data.

Zero-Shot Learning – The Good, the Bad and the Ugly

Due to the importance of zero-shot learning, the number of proposed approaches has increased steadily recently. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g. pre-training on zero-shot test classes. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss limitations of the current status of the area which can be taken as a basis for advancing it.

Task-based End-to-end Model Learning

As machine learning techniques have become more ubiquitous, it has become common to see machine learning prediction algorithms operating within some larger process. However, the criteria by which we train machine learning algorithms often differ from the ultimate criteria on which we evaluate them. This paper proposes an end-to-end approach for learning probabilistic machine learning models within the context of stochastic programming, in a manner that directly captures the ultimate task-based objective for which they will be used. We then present two experimental evaluations of the proposed approach, one as applied to a generic inventory stock problem and the second to a real-world electrical grid scheduling task. In both cases, we show that the proposed approach can outperform both a traditional modeling approach and a purely black-box policy optimization approach.

Dynamically induced many-body localization

Effects of Limiting Memory Capacity on the Behaviour of Exemplar Dynamics

Socially Optimal Mining Pools

Development of An Android Application for Object Detection Based on Color, Shape, or Local Features

A note on approximate strengths of edges in a hypergraph

PairCloneTree: Reconstruction of Tumor Subclone Phylogeny Based on Mutation Pairs using Next Generation Sequencing Data

Convolutional Spike Timing Dependent Plasticity based Feature Learning in Spiking Neural Networks

Inhomogeneous exponential jump model

Maximum entropy sampling in complex networks

Markov Chain Lifting and Distributed ADMM

Building automated vandalism detection tools for Wikidata

Joint Embedding of Graphs

Tuning Over-Relaxed ADMM

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture

Front-to-End Bidirectional Heuristic Search with Near-Optimal Node Expansions

Deep Image Matting

Generalized Full Matching

Segmentation of skin lesions based on fuzzy classification of pixels and histogram thresholding

A degree version of the Hilton–Milner theorem

A multi-stage convex relaxation approach to noisy structured low-rank matrix recovery

Core Maintenance in Dynamic Graphs: A Parallel Approach based on Matching

DotGrid: a .NET-based cross-platform software for desktop grids

DotDFS: A Grid-based high-throughput file transfer system

Massive Exploration of Neural Machine Translation Architectures

The Curse of Correlation in Security Games and Principle of Max-Entropy

Elliptic Determinantal Processes and Elliptic Dyson Models

Axioms in Model-based Planners

Gait Pattern Recognition Using Accelerometers

A German Corpus for Text Similarity Detection Tasks

Recruiting from the network: discovering Twitter users who can help combat Zika epidemics

Viraliency: Pooling Local Virality

A 3D Object Detection and Pose Estimation Pipeline Using RGB-D Images

A norm knockout method on indirect reciprocity to reveal indispensable norms

Generalized Rao Test for Decentralized Detection of an Uncooperative Target

Web-based visualisation of head pose and facial expressions changes: monitoring human activity using depth data

Simple interval observers for linear impulsive systems with applications to sampled-data and switched systems

The cross section of a spherical double cone

Neural method for Explicit Mapping of Quasi-curvature Locally Linear Embedding in image retrieval

On Solving Travelling Salesman Problem with Vertex Requisitions

Sparse Poisson Regression with Penalized Weighted Score Function

On Constraint Qualifications of a Nonconvex Inequality

Ramsey Theory for Binary Trees and the Separation of Tree-chromatic Number from Path-chromatic Number

The Steiner (n-3)-diameter of a graph

Simplicial Random Variables

Capacity Enhancement with Meta-Multiplexing

The Weighted Matching Approach to Maximum Cardinality Matching

Language Use Matters: Analysis of the Linguistic Structure of Question Texts Can Characterize Answerability in Quora

Waveform Optimization for Radio-Frequency Wireless Power Transfer

Numerical simulation of polynomial-speed convergence phenomenon

Automated Hate Speech Detection and the Problem of Offensive Language

Data-Driven Estimation of Travel Latency Cost Functions via Inverse Optimization in Multi-Class Transportation Networks

Negentropic Planar Symmetry Detector

Learning Large-Scale Bayesian Networks with the sparsebn Package

The Modified Stochastic Games

Laman Graphs are Generically Bearing Rigid in Arbitrary Dimensions

Locality-sensitive hashing of curves

Colorization as a Proxy Task for Visual Understanding

DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG

A proximal point algorithm revisited and extended

Quantifying the strength of structural connectivity underlying functional brain networks

BLOCKBENCH: A Framework for Analyzing Private Blockchains

Think globally, fit locally under the Manifold Setup: Asymptotic Analysis of Locally Linear Embedding

Multi-user Precoding and Channel Estimation for Hybrid Millimeter Wave Systems

Multi-Pose Face Recognition Using Hybrid Face Features Descriptor

On the $k$-abelian complexity of the Cantor sequence

An Improved Receiving Scheme for Layered ACO-FOFDM in IM/DD Systems

Tight Nordhaus-Gaddum-type upper bound for total-rainbow connection number of graphs

Prediction and Control with Temporal Segment Models

A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation

Resource Allocation for a Full-Duplex Base Station Aided OFDMA System

Prostate Cancer Diagnosis using Deep Learning with 3D Multiparametric MRI

SurfNet: Generating 3D shape surfaces using deep residual networks

Feature overwriting as a finite mixture process: Evidence from comprehension data

Local Patch Classification Based Framework for Single Image Super-Resolution

Singular Stochastic Allen-Cahn equations with dynamic boundary conditions

Evaluating Deep Convolutional Neural Networks for Material Classification

Detection of Human Rights Violations in Images: Can Convolutional Neural Networks help?

Combining Residual Networks with LSTMs for Lipreading

Weight Spectrum of Quasi-Perfect Binary Codes with Distance 4

Representation theoretic realization of non-symmetric Macdonald polynomials at infinity

Co-occurrence Filter

Payoff-Based Approach to Learning Generalized Nash Equilibria in Convex Games

BetaRun 2017 Team Description Paper: Variety, Complexity, and Learning

Intertangled stochastic motifs in networks of excitatory-inhibitory units

Symmetric Complete Sum-free Sets in Cyclic Groups

Abstract matrix-tree theorem and Bernardi polynomial

Performance Analysis of Physical Layer Network Coding for Two-way Relaying over Non-regenerative Communication Satellites

Wireless Bidirectional Relaying using Physical Layer Network Coding with Heterogeneous PSK Modulation

Bernoulli Factories and Black-Box Reductions in Mechanism Design

Robustness from structure: Inference with hierarchical spiking networks on analog neuromorphic hardware

A class of multidimensional quadratic BSDEs

A trust-region method for derivative-free nonlinear constrained stochastic optimization

Using Aggregated Relational Data to feasibly identify network structure without network data

Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm

The difficulty of folding self-folding origami

Leak Event Identification in Water Systems Using High Order CRF

Big Data in HEP: A comprehensive use case study

Maximizing the Mutual Information of Multi-Antenna Links Under an Interfered Receiver Power Constraint

Cubature on Wiener Space for McKean-Vlasov SDEs with Smooth Scalar Interaction

Why we have switched from building full-fledged taxonomies to simply detecting hypernymy relations

MEDL and MEDLA: Methods for Assessment of Scaling by Medians of Log-Squared Nondecimated Wavelet Coefficients

Mixed linear-nonlinear least squares regression

Automatic Skin Lesion Analysis using Large-scale Dermoscopy Images and Deep Residual Networks

Improved multitask learning through synaptic intelligence

Virtual Reality over Wireless Networks: Quality-of-Service Model and Learning-Based Resource Management

MetaPAD: Meta Pattern Discovery from Massive Text Corpora

Multiple User Context Inference by Fusing Data Sources

Cognitive Inference of Demographic Data by User Ratings

SPARTan: Scalable PARAFAC2 for Large & Sparse Data

A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning

Improved approximation algorithms for $k$-connected $m$-dominating set problems

Numerical Integration and Dynamic Discretization in Heuristic Search Planning over Hybrid Domains

Chemical-disorder-caused Medium Range Order in Covalent Glass

GUN: Gradual Upsampling Network for single image super-resolution

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

GRAAD: Group Anonymous and Accountable D2D Communication in Mobile Networks

Poisson multi-Bernoulli mixture filter: direct derivation and implementation

Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models

Spin ice physics in a new rare-earth selenide spinel MgEr$_2$Se$_4$

Orbital Graphs

Lagrangians of hypergraphs: The Frankl-Füredi conjecture holds almost everywhere

Online Learning with Local Permutations and Delayed Feedback

Computing the $p$-Spectral Radii of Uniform Hypergraphs with Applications

Affine counter automata

Mahler takes a regular view of Zaremba

A Note on the Inapproximability of Induced Disjoint Paths

Automatic Skin Lesion Segmentation using Semi-supervised Learning Technique

Statistical properties of one-dimensional directed polymers in a random potential

End-to-End Learning of Geometry and Context for Deep Stereo Regression

Assessing Potential Wind Energy Resources in Saudi Arabia with a Skew-t Distribution

Fourier analysis of serial dependence measures

The Mean Drift: Tailoring the Mean Field Theory of Markov Processes for Real-World Applications

Story Cloze Ending Selection Baselines and Data Examination

What You Expect is NOT What You Get! Questioning Reconstruction/Classification Correlation of Stacked Convolutional Auto-Encoder Features

Probabilistic Matching: Causal Inference under Measurement Errors

Practical Bayesian Optimization for Variable Cost Objectives

A Visual Representation of Wittgenstein’s Tractatus Logico-Philosophicus

A sub-Riemannian Bonnet-Myers theorem for quaternionic contact structures

Response adaptive designs for binary responses: how to offer patient benefit while being robust to time trends?

Linear codes over Fq which are equivalent to LCD codes

A Localisation-Segmentation Approach for Multi-label Annotation of Lumbar Vertebrae using Deep Nets

Interference Networks with Caches at Both Ends

Coupling the Gaussian free fields with free and with zero boundary conditions via common level lines

Nematus: a Toolkit for Neural Machine Translation

Toward a Formal Model of Cognitive Synergy

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

Deep Learning for Skin Lesion Classification

Towards Efficient Verification of Population Protocols

Symbol Grounding via Chaining of Morphisms

Analytical Model of Wireless Cell with Superposition Coding

Langevin Dynamics with Continuous Tempering for High-dimensional Non-convex Optimization

On the Transformation Capability of Feasible Mechanisms for Programmable Matter

Cost-Based Intuitionist Probabilities on Spaces of Graphs, Hypergraphs and Theorems

Enhanced robustness of evolving open systems by the bidirectionality of interactions between elements

Mutual information decay for factors of IID

Bayesian Optimization with Gradients

On enumeration of tree-rooted planar cubic maps. II

Fate of the one-particle-density-matrix occupation spectrum of many-body localized states after a global quench

Frequency Synchronization for Uplink Massive MIMO Systems

Reflexive polytopes arising from perfect graphs

Dynamic-SCFlip Decoding of Polar Codes

Users prefer Guetzli JPEG over same-sized libjpeg

El Lenguaje Natural como Lenguaje Formal

Improving LBP and its variants using anisotropic diffusion

Iterated failure rate monotonicity and ordering relations within Gamma and Weibull distributions

Guetzli: Perceptually Guided JPEG Encoder

Corner Ranking, Realizable Vectors, and Extremal Cop-Win Graphs

Mean Field Games with Singular Controls of Bounded Velocity

A Lagrangian Gauss-Newton-Krylov Solver for Mass- and Intensity-Preserving Diffeomorphic Image Registration

Detailed, accurate, human shape estimation from clothed 3D scan sequences

Multivariate Gaussian and Student$-t$ Process Regression for Multi-output Prediction

P=?NP as minimization of degree 4 polynomial, or Grassmann number problem

Strong solutions to a nonlinear stochastic Maxwell equation with a retarded material law

Information geometry, simulation and complexity in Gaussian random fields

A microscopic derivation of time-dependent correlation functions of the $1D$ cubic nonlinear Schrödinger equation

Bicriteria Rectilinear Shortest Paths among Rectilinear Obstacles in the Plane

spmoran: An R package for Moran’s eigenvector-based spatial regression analysis

DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks

Geometrical morphology

Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

Euler totient of subfactor planar algebras

Reinforcement Learning for Transition-Based Mention Detection

Thermal Conductivity of Glass-Forming Liquids

Comparison of echo state network output layer classification methods on noisy data

High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data

Global survival of branching random walks and tree-like branching random walks

Exact firing time statistics of neurons driven by discrete inhibitory noise

A remark on a construction of D.S. Asche

A Semiglobal, Practical, Strict Pseudogradient Property for Iterative Methods

Pattern Recognition on Oriented Matroids: Decompositions of Topes, and Dehn-Sommerville Type Relations

Variable selection in discriminant analysis for mixed variables and several groups