Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant performance overheads when the time needed to copy data back and forth from CPU memory is higher than the latency to perform the computations required for DNN forward and backward propagation. We introduce a high-performance virtualization strategy based on a ‘compressing DMA engine’ (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations. The cDMA engine offers an average 2.6x (maximum 13.8x) compression ratio by exploiting the sparsity inherent in offloaded data, improving the performance of virtualized DNNs by an average 32% (maximum 61%).


Semi-Supervised AUC Optimization based on Positive-Unlabeled Learning

Maximizing the area under the receiver operating characteristic curve (AUC) is a standard approach to imbalanced classification. So far, various supervised AUC optimization methods have been developed and they are also extended to semi-supervised scenarios to cope with small sample problems. However, existing semi-supervised AUC optimization methods rely on strong distributional assumptions, which are rarely satisfied in real-world problems. In this paper, we propose a novel semi-supervised AUC optimization method that does not require such restrictive assumptions. We first develop an AUC optimization method based only on positive and unlabeled data (PU-AUC) and then extend it to semi-supervised learning by combining it with a supervised AUC optimization method. We theoretically prove that, without the restrictive distributional assumptions, unlabeled data contribute to improving the generalization performance in PU and semi-supervised AUC optimization methods. Finally, we demonstrate the practical usefulness of the proposed methods through experiments.


Optimal Approximation with Sparsely Connected Deep Neural Networks

We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in L^2(\mathbb{R}^d). In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy. Additionally, we prove that our lower bounds are achievable for a broad family of function classes. Specifically, all function classes that are optimally approximated by a general class of representation systems—so-called \emph{affine systems}—can be approximated by deep neural networks with minimal connectivity and memory requirements. Affine systems encompass a wealth of representation systems from applied harmonic analysis such as wavelets, ridgelets, curvelets, shearlets, \alpha-shearlets, and more generally \alpha-molecules. This result elucidates a remarkable universality property of neural networks and shows that they achieve the optimum approximation properties of all affine systems combined. As a specific example, we consider the class of 1/\alpha-cartoon-like functions, which is approximated optimally by \alpha-shearlets. We also explain how our results can be extended to the case of functions on low-dimensional immersed manifolds. Finally, we present numerical experiments demonstrating that the standard stochastic gradient descent algorithm generates deep neural networks providing close-to-optimal approximation rates at minimal connectivity. Moreover, these results show that stochastic gradient descent actually learns approximations that are sparse in the representation systems optimally sparsifying the function class the network is trained on.


An Elementary Method for the Explicit Solution of Multidimensional Optimal Stopping Problems

We study explicitly solvable multidimensional optimal stopping problems. Our approach is based on the notion of monotone stopping problems in discrete and continuous time. The method is illustrated with a variety of examples including multidimensional versions of the house-selling and burglar’s problem, the Poisson disorder problem, and an optimal investment problem.


Pixel Normalization from Numeric Data as Input to Neural Networks

Text to image transformation for input to neural networks requires intermediate steps. This paper attempts to present a new approach to pixel normalization so as to convert textual data into image, suitable as input for neural networks. This method can be further improved by its Graphics Processing Unit (GPU) implementation to provide significant speedup in computational time.


Fast k-means based on KNN Graph

In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast k-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means.


Semi-supervised model-based clustering with controlled clusters leakage

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data.


Toward Open Set Face Recognition

Testing Core Membership in Public Goods Economies

Fourth-order Tensors with Multidimensional Discrete Transforms

Estimating animal abundance with N-mixture models using the R-INLA package for R

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

The power graph of a torsion-free group

Stable Secretaries

Demonstrating research subcommunities in mathematical networks

A Bound on the Spectral Radius of Hypergraphs with $e$ Edges

Homomorphisms Are a Good Basis for Counting Small Subgraphs

Capacity of Burst Noise-Erasure Channels With and Without Feedback and Input Cost

Coupling Polynomial Stratonovich Integrals: the two-dimensional Brownian case

Semi-supervised cross-entropy clustering with information bottleneck constraint

The Delta-Generalized Labeled Multi-Bernoulli Tracking Filter with Target Spawning

Strong solutions of SDE’s with generalized drift and multidimensional fractional Brownian initial noise

Dual numbers, weighted quivers, and extended Somos and Gale-Robinson sequences

Distributed generalized Nash equilibria computation of monotone games via a preconditioned proximal point algorithm

Asymptotically optimal bound on the adjacent vertex distinguishing edge choice number

State-Dependent Gaussian Multiple Access Channels: New Outer Bounds and Capacity Results

A combinatorial approach to Rauzy-type dynamics I: permutations and the Kontsevich–Zorich–Boissy classification theorem

Combinatorial Auctions Do Need Modest Interaction

Decomposing graphs into forests

Polluted Bootstrap Percolation with Threshold Two in All Dimensions

Distribution of residuals in the nonparametric IV model with application to separability testing

MapReduce Particle Filtering with Exact Resampling and Deterministic Runtime

Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics

Supercongruences for rigid hypergeometric Calabi–Yau threefolds

Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

On the Necessity of Superparametric Geometry Representation for Discontinuous Galerkin Methods on Domains with Curved Boundaries

Pricing in non-convex markets with quadratic deliverability costs

Optimized Regression Discontinuity Designs

Tramp Ship Scheduling Problem with Berth Allocation Considerations and Time-dependent Constraints

Spectral Radius and Hamiltonicity of graphs

Probabilistic Typology: Deep Generative Models of Vowel Inventories

3GPP-inspired HetNet Model using Poisson Cluster Process: Sum-product Functionals and Downlink Coverage

New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for $r_4(N)$

Minimum Manhattan Distance Approach to Multiple Criteria Decision Making in Multiobjective Optimization Problems

Generative Convolutional Networks for Latent Fingerprint Reconstruction

Statistical inference using differentially private bi-degree sequences and synthetic directed graphs

A regularity theory for quasi-linear Stochastic Partial Differential Equations in weighted Sobolev spaces

Near-optimal linear decision trees for k-SUM and related problems

Evolutionary learning of fire fighting strategies

Unbounded variation and solutions of impulsive control systems

Wireless Channel Modeling Perspectives for Ultra-Reliable Communications

Spectral representation of one-dimensional Liouville Brownian Motion and Liouville Brownian excursion

Comparison of hidden Markov chain models and hidden Markov random field models in estimation of computed tomography images

Replica analysis of overfitting in regression models for time-to-event data

On the Design of Matched Filters for Molecule Counting Receivers

Attributes2Classname: A discriminative model for attribute-based unsupervised zero-shot learning

Of the People: Voting Is More Effective with Representative Candidates

Incidence Choosability of Graphs

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Charging of highly resistive granular metal films

Non-Inferiority and Equivalence Tests in A Sequential Multiple-Assignment Randomized Trial (SMART)

Uncountable realtime probabilistic classes

PER Approximation for Cross-Layer Optimization under Reliability and Energy Constraints

Am I Done? Predicting Action Progress in Videos

From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis

A Network Game of Dynamic Traffic

An optimal transportation approach for assessing almost stochastic order

Visualization and Assessment of Spatio-temporal Covariance Properties

Análisis econométrico de series temporales en Gretl: La Ley de Okun

A two-phase gradient method for quadratic programming problems with a single linear constraint and bounds on the variables

On The Number Of Unlabeled Bipartite Graphs

Linear syzygies, hyperbolic Coxeter groups and regularity

A Reasoning System for a First-Order Logic of Limited Belief

On Optimal Mechanisms in the Two-Item Single-Buyer Unit-Demand Setting

Towards a Physical Oracle for the Partition Problem using Analogue Computing

On distinct cross-ratios and related growth problems

A Finite State and Rule-based Akshara to Prosodeme (A2P) Converter in Hindi

A Deep Learning Perspective on the Origin of Facial Expressions

Quantum SDP-Solvers: Better upper and lower bounds

Action Tubelet Detector for Spatio-Temporal Action Localization

Blind Detection with Polar Codes

On the Approximate Asymptotic Statistical Independence of the Permanents of 0-1 Matrices, I

Ulam Sequences and Ulam Sets

Streaming for Aibohphobes: Longest Palindrome with Mismatches

Edge-based Component-Trees for Multi-Channel Image Segmentation

Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks

Induced Ramsey-type results and binary predicates for point sets

ADMM for monotone operators: convergence analysis and rates

Localization and Eigenvalue Statistics for the Lattice Anderson model with Discrete Disorder

Recurrent Soft Attention Model for Common Object Recognition

Ramsey Classes with Closure Operations (Selected Combinatorial Applications)

Local Linear Convergence Analysis of Primal–Dual Splitting Methods

On pinned fields, interlacements, and random walk on $(\mathbb{Z}/N \mathbb{Z})^2$

Learning with Confident Examples: Rank Pruning for Robust Classification with Noisy Labels

Advertisements