Neural Decision Trees

In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known soft-max layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected sub-regions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems.

Horseshoe Regularization for Feature Subset Selection

Feature subset selection arises in many high-dimensional applications in machine learning and statistics, such as compressed sensing and genomics. The \ell_0 penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex \ell_\gamma penalty for \gamma\in (0,1), which results in sparser models than the convex \ell_1 or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables an efficient expectation-maximization algorithm for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithm provides better statistical performance, and the computation requires a fraction of time of state of the art non-convex solvers.

Building Usage Profiles Using Deep Neural Nets

To improve software quality, one needs to build test scenarios resembling the usage of a software product in the field. This task is rendered challenging when a product’s customer base is large and diverse. In this scenario, existing profiling approaches, such as operational profiling, are difficult to apply. In this work, we consider publicly available video tutorials of a product to profile usage. Our goal is to construct an automatic approach to extract information about user actions from instructional videos. To achieve this goal, we use a Deep Convolutional Neural Network (DCNN) to recognize user actions. Our pilot study shows that a DCNN trained to recognize user actions in video can classify five different actions in a collection of 236 publicly available Microsoft Word tutorial videos (published on YouTube). In our empirical evaluation we report a mean average precision of 94.42% across all actions. This study demonstrates the efficacy of DCNN-based methods for extracting software usage information from videos. Moreover, this approach may aid in other software engineering activities that require information about customer usage of a product.

Streaming supercomputing needs workflow-enabled programming-in-the-large

This is a position paper, submitted to the Future Online Analysis Platform Workshop (https://…/futureplatform ), which argues that simple data analysis applications are common today, but future online supercomputing workloads will need to couple multiple advanced technologies (streams, caches, analysis, and simulations) to rapidly deliver scientific results. Each of these technologies are active research areas when integrated with high-performance computing. These components will interact in complex ways, therefore coupling them needs to be programmed. Programming in the large, on top of existing applications, enables us to build much more capable applications and to productively manage this complexity.

Hidden Community Detection in Social Networks

We introduce a new paradigm that is important for community detection in the realm of network analysis. Networks contain a set of strong, dominant communities, which interfere with the detection of weak, natural community structure. When most of the members of the weak communities also belong to stronger communities, they are extremely hard to be uncovered. We call the weak communities the hidden community structure. We present a novel approach called HICODE (HIdden COmmunity DEtection) that identifies the hidden community structure as well as the dominant community structure. By weakening the strength of the dominant structure, one can uncover the hidden structure beneath. Likewise, by reducing the strength of the hidden structure, one can more accurately identify the dominant structure. In this way, HICODE tackles both tasks simultaneously. Extensive experiments on real-world networks demonstrate that HICODE outperforms several state-of-the-art community detection methods in uncovering both the dominant and the hidden structure. In the Facebook university social networks, we find multiple non-redundant sets of communities that are strongly associated with residential hall, year of registration or career position of the faculties or students, while the state-of-the-art algorithms mainly locate the dominant ground truth category. In the Due to the difficulty of labeling all ground truth communities in real-world datasets, HICODE provides a promising approach to pinpoint the existing latent communities and uncover communities for which there is no ground truth. Finding this unknown structure is an extremely important community detection problem.

Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning

In recent years, a branch of machine learning called Deep Learning has become incredibly popular thanks to the ability of a new class of algorithms to model and interpret a large quantity of data in a similar way to humans. Properly training deep learning models involves collecting a vast amount of users’ private data, including habits, geographical positions, interests, and much more. Another major issue is that it is possible to extract from trained models useful information about the training set and this hinders collaboration among distrustful participants or parties that deal with sensitive information. To tackle this problem, collaborative deep learning models have recently been proposed where parties share only a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy to make information extraction even more challenging, as shown by Shokri and Shmatikov at CCS’15. Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates valid samples of the targeted training set that was meant to be private. Interestingly, we show that differential privacy applied to shared parameters of the model as suggested at CCS’15 and CCS’16 is utterly futile. In our generative model attack, all techniques adopted to scramble or obfuscate shared parameters in collaborative deep learning are rendered ineffective with no possibility of a remedy under the threat model considered.

Embedding Knowledge Graphs Based on Transitivity and Antisymmetry of Rules

Representation learning of knowledge graphs encodes entities and relation types into a continuous low-dimensional vector space, learns embeddings of entities and relation types. Most existing methods only concentrate on knowledge triples, ignoring logic rules which contain rich background knowledge. Although there has been some work aiming at leveraging both knowledge triples and logic rules, they ignore the transitivity and antisymmetry of logic rules. In this paper, we propose a novel approach to learn knowledge representations with entities and ordered relations in knowledges and logic rules. The key idea is to integrate knowledge triples and logic rules, and approximately order the relation types in logic rules to utilize the transitivity and antisymmetry of logic rules. All entries of the embeddings of relation types are constrained to be non-negative. We translate the general constrained optimization problem into an unconstrained optimization problem to solve the non-negative matrix factorization. Experimental results show that our model significantly outperforms other baselines on knowledge graph completion task. It indicates that our model is capable of capturing the transitivity and antisymmetry information, which is significant when learning embeddings of knowledge graphs.

Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

We analyze the problem of learning a single user’s preferences in an active learning setting, sequentially and adaptively querying the user over a finite time horizon. Learning is conducted via choice-based queries, where the user selects her preferred option among a small subset of offered alternatives. These queries have been shown to be a robust and efficient way to learn an individual’s preferences. We take a parametric approach and model the user’s preferences through a linear classifier, using a Bayesian prior to encode our current knowledge of this classifier. The rate at which we learn depends on the alternatives offered at every time epoch. Under certain noise assumptions, we show that the Bayes-optimal policy for maximally reducing entropy of the posterior distribution of this linear classifier is a greedy policy, and that this policy achieves a linear lower bound when alternatives can be constructed from the continuum. Further, we analyze a different metric called misclassification error, proving that the performance of the optimal policy that minimizes misclassification error is bounded below by a linear function of differential entropy. Lastly, we numerically compare the greedy entropy reduction policy with a knowledge gradient policy under a number of scenarios, examining their performance under both differential entropy and misclassification error.

Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems

Feasibility of Principal Component Analysis in hand gesture recognition system

Surfing on protein waves: proteophoresis as a mechanism for bacterial genome partitioning

Toward Streaming Synapse Detection with Compositional ConvNets

Continuous-Time Visual-Inertial Trajectory Estimation with Event Cameras

WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater Images

Improving high-pass fusion method using wavelets

Hunt’s Hypothesis (H) for the Sum of Two Independent Levy Processes

Deep Nonparametric Estimation of Discrete Conditional Distributions via Smoothed Dyadic Partitioning

Making Asynchronous Distributed Computations Robust to Noise

GapTV: Accurate and Interpretable Low-Dimensional Regression and Classification

Founsure 1.0: An Erasure Code Library with Efficient Repair and Update Features

On Uniqueness and Blowup Properties for a Class of Second Order SDEs

sourceR: Classification and Source Attribution of Infectious Agents among Heterogeneous Populations

Control of the Correlation of Spontaneous Neuron Activity in Biological and Noise-activated CMOS Artificial Neural Microcircuits

Freeness characterizations on free chaos spaces

On the Optimality of Secret Key Agreement via Omniscience

A Debt-Aware Learning Approach for Resource Adaptations in Cloud Elasticity Management

Multi-Context Attention for Human Pose Estimation

Capacitated Center Problems with Two-Sided Bounds and Outliers

Error Bounds for Approximations of Geometrically Ergodic Markov Chains

Bandits with Movement Costs and Adaptive Pricing

Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices

Characterizing Spatiotemporal Transcriptome of Human Brain via Low Rank Tensor Decomposition

Strongly-Typed Agents are Guaranteed to Interact Safely

Viewpoint Adaptation for Rigid Object Detection

Small-space encoding LCE data structure with constant-time queries

Sequence Modeling via Segmentations

PairClone: A Bayesian Subclone Caller Based on Mutation Pairs

Learning Non-local Image Diffusion for Image Denoising

Simultaneous Feature and Body-Part Learning for Real-Time Robot Awareness of Human Behaviors

Sequence-based Multimodal Apprenticeship Learning For Robot Perception and Decision Making

Speckle Reduction with Trained Nonlinear Diffusion Filtering

Deep representation learning for human motion prediction and classification

Online Meta-learning by Parallel Algorithm Competition

Robot gains Social Intelligence through Multimodal Deep Reinforcement Learning

Dirichlet-vMF Mixture Model

Secure Clustered Distributed Storage Against Eavesdroppers

Improved cyclotomic conditions leading to new $2$-designs: the use of strong difference families

Efficient high-resolution RF pulse design applied to simultaneous multi-slice excitation

A convex penalty for switching control of partial differential equations

Use Generalized Representations, But Do Not Forget Surface Features

Toward high-performance online HCCR: a CNN approach with DropDistortion, path signature and spatial stochastic max-pooling

High Throughput Probabilistic Shaping with Product Distribution Matching

Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

A convex analysis approach to multi-material topology optimization

Optimal control of elliptic equations with positive measures

Tight Bounds for Bandit Combinatorial Optimization

A convex analysis approach to optimal controls with switching structure for partial differential equations

Scalable Multiagent Coordination with Distributed Online Open Loop Planning

Optimal Energy Beamforming under Per-Antenna Power Constraint

The Stochastic complexity of spin models: how simple are simple spin models?

Learning Rates for Kernel-Based Expectile Regression

Plane graphs without 4- and 5-cycles and without ext-triangular 7-cycles are 3-colorable

Remarks on planar edge-chromatic critical graphs

RNN Decoding of Linear Block Codes

$k$-clean monomial ideals

Compression with the tudocomp Framework

Fast and Simple Parallel Wavelet Tree and Matrix Construction

Suitable Spaces for Shape Optimization

Robust stability analysis of DC microgrids with constant power loads

Generalization of Schnyder woods to orientable surfaces and applications

How hard is it to cross the room? — Training (Recurrent) Neural Networks to steer a UAV

On the Total Energy Efficiency of Cell-Free Massive MIMO

Fully packed loop configurations: polynomiality and nested arches

Compact Self-Stabilizing Leader Election for Arbitrary Networks

Microwave breast cancer detection using Empirical Mode Decomposition features

Automatic segmentation in dynamic outdoor environments

Exact Localisations of Feedback Sets

Information Management for Decentralized Energy Storages under Market Uncertainties

DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

Regularity results for the minimum time function with Hörmander vector fields

Fast and robust curve skeletonization for real-world elongated objects

Inertia-Constrained Pixel-by-Pixel Nonnegative Matrix Factorisation: a Hyperspectral Unmixing Method Dealing with Intra-class Variability

Thermal Transients in District Heating Systems

Path Planning for Multiple Heterogeneous Unmanned Vehicles with Uncertain Service Times

Control of Gene Regulatory Networks with Noisy Measurements and Uncertain Inputs

Capacity of the Aperture-Constrained AWGN Free-Space Communication Channel

A Network Epidemic Model for Online Community Commissioning Data

How ConvNets model Non-linear Transformations

Truthful Mechanisms for Delivery with Mobile Agents

On problems equivalent to (min,+)-convolution

A recommender system to restore images with impulse noise

Consistent Alignment of Word Embedding Models

ROPE: high-dimensional network modeling with robust control of edge FDR

An Efficient Data Structure for Dynamic Two-Dimensional Reconfiguration

Crosscorrelation of Rudin-Shapiro-Like Polynomials

Mean-square stability analysis of approximations of stochastic differential equations in infinite dimensions

Bounds on the reliability of typewriter channels

Computationally Efficient Robust Estimation of Sparse Functionals