• Dynamics of many-body localization in the presence of particle loss
• Zero forcing number, Grundy domination number, and their variants
• Two-Point Codes for the Generalized GK curve
• A watershed-based algorithm to segment and classify cells in fluorescence microscopy images
• A Complete Year of User Retrieval Sessions in a Social Sciences Academic Search Engine
• Information, Privacy and Stability in Adaptive Data Analysis
• Comparative Performance Analysis of the Cumulative Sum Chart and the Shiryaev-Roberts Procedure for Detecting Changes in Autocorrelated Data
• One-Sided Unsupervised Domain Mapping
• Multi-Class Model Fitting by Energy Minimization and Mode-Seeking
• The star sequence and the general first Zagreb index
• On the fast convergence of random perturbations of the gradient flow
• Higher-order meshing of implicit geometries – part II: Approximations on manifolds
• Construction of q-ary Constant Weight Sequences using a Knuth-like Approach
• Neureal Network-Based Automatic Liver Tumor Segmentation With Random Forest-Based Candidate Filtering
• A Game of Nontransitive Dice
• Minimax Optimal Rates of Estimation In Functional ANOVA Models When Data On Derivatives Are Available
• A combinatorial proof of Bass’s determinant formula for the zeta function of regular graphs
• A Construction for Balancing Non-Binary Sequences Based on Gray Code Prefixes
• Multivariate initial sequence estimators in Markov chain Monte Carlo
• Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data
• Homogeneity Pursuit in Single Index Models based Panel Data Analysis
• Efficient Textual Representation of Structure
• On the reduced Euler characteristic of independence complexes of circulant graphs
• Inference for penalized spline regression: Improving confidence intervals by reducing the penalty
• Active learning machine learns to create new quantum experiments
• Spectral Simplicity of Apparent Complexity, Part II: Exact Complexities and Complexity Spectra
• Controlling Sources of Inaccuracy in Stochastic Kriging
• Wikipedia Vandal Early Detection: from User Behavior to User Embedding
• Spectrum-based deep neural networks for fraud detection
• Learning Person Trajectory Representations for Team Activity Analysis
• Using Negative Curvature in Solving Nonlinear Programs
• Optimal Envelope Approximation in Fourier Basis with Applications in TV White Space
• Generalized Fit for Asymptotic Predictions of Heavy-Tail Experimental Transients
• X-TCP: A Cross Layer Approach for TCP Uplink Flows in mmWave Networks
• Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach
• Iterative Particle Approximation for McKean-Vlasov SDEs with application to Multilevel Monte Carlo estimation
• Analyzing Random Permutations for Cyclic Coordinate Descent
• Bilinear log n – log p relation and critical power-law grain size distribution of crushable aggregates under compression and shear
• Inferring protein-protein interaction and protein-DNA interaction directions based on cause-effect pairs in undirected and mixed networks
• Deep-Learning Convolutional Neural Networks for scattered shrub detection with Google Earth Imagery
• Higher-order meshing of implicit geometries – part III: Conformal Decomposition FEM (CDFEM)
• Lorden’s inequality and coupling method for backward renewal process
• Simultaneous Inference of User Representations and Trust
• Concept Transfer Learning for Adaptive Language Understanding
• Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition
• See, Hear, and Read: Deep Aligned Representations
• Non-flat regular polytopes and restrictions on chiral polytopes
• Context-aware, Adaptive and Scalable Android Malware Detection through Online Learning (extended version)
• On singularity of distribution of random variables with independent symbols of Oppenheim expansions
• Experiment Software and Projects on the Web with VISPA
• Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi
• Local systems on arrangements of smooth, complex algebraic hypersurfaces
• Fermionic approach to weighted Hurwitz numbers and topological recursion
• A Bayesian General Linear Modeling Approach to Cortical Surface fMRI Data Analysis
• Rates of estimation for determinantal point processes
• M/G/c/c state dependent queuing model for a road traffic system of two sections in tandem
• $L^1$ solutions of non-reflected BSDEs and reflected BSDEs with one and two continuous barriers under general assumptions
• Precious Time: Understanding Social Stratification in the Knowledge Society Through Time Allocation
• Flip-distance between α-orientations of graphs embedded on plane and sphere
• More Accurate Entity Ranking Using Knowledge Graph and Web Corpus
• Thompson Sampling for the MNL-Bandit
• Graph-Cut RANSAC
• Isolated partial Hadamard matrices, and related topics
• Visuospatial Skill Learning for Robots
• A Fast x86 Implementation of Select
• Linear response and moderate deviations: hierarchical approach. II
• Block patterns in generalized Euler Permutations
• Center of Gravity PSO for Partitioning Clustering
• Order embeddings and character-level convolutions for multimodal alignment
• Image Compression Based on Compressive Sensing: End-to-End Comparison with JPEG
• A spectral analysis of discrete-time quantum walks with related to birth and death chains
• DeepSF: deep convolutional neural network for mapping protein sequences to folds
• Path Integral Quantization of Volume
• Non-convex Penalties with Analytical Solutions for One-bit Compressive Sensing
• The split-and-drift random graph, a null model for speciation
• Where and Who? Automatic Semantic-Aware Person Composition
• Distributed Contingency Analysis over Wide Area Network among Dispatch Centers
• Adaptive Multiple-Arm Identification
• Pfaffian Formulas and Schur Q-Function Identities
• A note on conditional versus joint unconditional weak convergence in bootstrap consistency results
• Improving Legal Information Retrieval by Distributional Composition with Term Order Probabilities
• Personalized Age Progression with Bi-level Aging Dictionary Learning
• Brain Intelligence: Go Beyond Artificial Intelligence
• Why do launch trajectories end downwards
• Slow Spin Dynamics and Self-Sustained Clusters in Sparsely Connected Systems
• Load Balancing in Large-Scale Systems with Multiple Dispatchers
• Face R-CNN
• Minimal Dominating Set problem studied by simulated annealing and cavity method: Analytics and population dynamics
• Asymptotic Goodness-of-Fit Tests for Point Processes Based on Scaled Empirical K-Functions
• Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics
• Geometry of the Gibbs measure for the discrete 2D Gaussian free field with scale dependent variance
• Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration
• In elections, irrelevant alternatives provide relevant data
• Joint Text Embedding for Personalized Content-based Recommendation
• Latent Estimation of GDP, GDP per capita, and Population from Historic and Contemporary Sources
• SocioSense: Robot Navigation Amongst Pedestrians with Social and Psychological Constraints
• A Random-Fern based Feature Approach for Image Matching
• Evolving imputation strategies for missing data in classification problems with TPOT
• Optimal learning via local entropies and sample compression
• Virtual Constraints and Hybrid Zero Dynamics for Realizing Underactuated Bipedal Locomotion
• Strategic Dynamic Pricing with Network Externalities
• Segmentation of Intracranial Arterial Calcification with Deeply Supervised Residual Dropout Networks
• Linear Capacity of Networks over Ring Alphabets
• High-dimensional GARCH process segmentation with an application to Value-at-Risk
• On the game total domination number
• Graphical Nonconvex Optimization for Optimal Estimation in Gaussian Graphical Models
• On the properties of new families of generalized Fibonacci numbers
• Distributional Compatibility for Change of Measures
• Greedy Approaches to Symmetric Orthogonal Tensor Decomposition
• Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification
• Improved Consistent Weighted Sampling Revisited
• Maximum Likelihood Signal Amplitude Estimation Based on Permuted Blocks of Differently Binary Quantized Observations of a Signal in Noise
• Optimal Rates for Community Estimation in the Weighted Stochastic Block Model
• A weighted global GMRES algorithm with deflation for solving large Sylvester matrix equations
• PReP: Path-Based Relevance from a Probabilistic Perspective in Heterogeneous Information Networks
• The Likelihood Ratio Test in High-Dimensional Logistic Regression Is Asymptotically a Rescaled Chi-Square
• Formation Control of Rigid Graphs with a Flex Node Addition
• One-step and Two-step Classification for Abusive Language Detection on Twitter
• The number of hypergraphs without linear cycles
• A Kind of Affine Weighted Moment Invariants
• Pile-up Reduction, Bayesian Decomposition and Applications of Silicon Drift Detectors at LCLS
• Forbidden subposet problems for traces of set families
• Inconsistent Node Flattening for Improving Top-down Hierarchical Classification
• Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework
• Computing a minimal partition of partial orders into heapable subsets
• Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning
• Learning Structured Semantic Embeddings for Visual Recognition
• One look at the rating of scientific publications and corresponding toy-model
• On the Identifiability of Diagnostic Classification Models
• Bayesian LSTMs in medicine
• Moral hazard in welfare economics: on the advantage of Planner’s advices to manage employees’ actions
• The Classical Complexity of Boson Sampling
• Performance evaluation of coherent Ising machines against classical neural networks
• Modelling the evaporation of nanoparticle suspensions from heterogeneous surfaces
• On the bias caused by interim analyses
• The Smith group and the critical group of the Grassmann graph of lines in finite projective space and of its complement
• Hybrid Beamforming with Reduced Number of Phase Shifters for Massive MIMO Systems
• Spin-charge conversion in disordered two-dimensional electron gases lacking inversion symmetry
• 3D Pathfinding and Collision Avoidance Using Uneven Search-space Quantization and Visual Cone Search
• Power series, the Riordan group and Hopf algebras
• Neuroevolution on the Edge of Chaos
• Solver composition across the PDE/linear algebra barrier
• Balanced Facilities on Random Graphs
• On the Emergence of Invariance and Disentangling in Deep Representations
• The Schrödinger equation with spatial white noise: the average wave function
• Characterization of multivariate Bernoulli distributions with given margins
• The Geometry of Nodal Sets and Outlier Detection
• Stochastic Integration and Stochastic PDEs Driven by Jumps on the Dual of a Nuclear Space
• Sparse Tableau Formulation for Optimal Power Flow Applications
• Optimal Design Method of MIMO Antenna Directivities and Corresponding Current Distributions by Using Spherical Mode Expansion
• Heat transport coefficients from optimally short molecular dynamics simulations
• Neuro-RAM Unit with Applications to Similarity Testing and Compression in Spiking Neural Networks
• Sparse Stochastic Bandits
• Mendelian Randomization when Many Instruments are Invalid: Hierarchical Empirical Bayes Estimation
• Jump Type Stochastic Differential Equations with Non-Lipschitz Coefficients and Feller and Strong Feller Properties
• Multi-Observation Elicitation
• Language Generation with Recurrent Generative Adversarial Networks without Pre-training
• Decay Estimates for 1-D Parabolic PDEs with Boundary Disturbances
• A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming
• Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes
• Asymptotically normal estimators for Zipf’s law
• Ipanema-β: tools and examples for HEP analysis on GPU
• A simple neural network module for relational reasoning
• A correspondence between thermodynamics and inference
• Strategic Equilibria in Queues with Dynamic Service Rate and Full Information
• Hamiltonian Monte Carlo Methods for Subset Simulation in Reliability Analysis
• The Capacity of Private Information Retrieval from Byzantine and Colluding Databases
• Types of Cognition and its Implications for future High-Level Cognitive Machines
• A Joint Model for Question Answering and Question Generation
We consider the problem of repeatedly solving a variant of the same dynamic programming problem in successive trials. An instance of the type of problems we consider is to find the optimal binary search tree. At the beginning of each trial, the learner probabilistically chooses a tree with the n keys at the internal nodes and the n + 1 gaps between keys at the leaves. It is then told the frequencies of the keys and gaps and is charged by the average search cost for the chosen tree. The problem is online because the frequencies can change between trials. The goal is to develop algorithms with the property that their total average search cost (loss) in all trials is close to the total loss of the best tree chosen in hind sight for all trials. The challenge, of course, is that the algorithm has to deal with exponential number of trees. We develop a methodology for tackling such problems for a wide class of dynamic programming algorithms. Our framework allows us to extend online learning algorithms like Hedge and Component Hedge to a significantly wider class of combinatorial objects than was possible before.
In this paper, we explore optimizations to run Recurrent Neural Network (RNN) models locally on mobile devices. RNN models are widely used for Natural Language Processing, Machine Translation, and other tasks. However, existing mobile applications that use RNN models do so on the cloud. To address privacy and efficiency concerns, we show how RNN models can be run locally on mobile devices. Existing work on porting deep learning models to mobile devices focus on Convolution Neural Networks (CNNs) and cannot be applied directly to RNN models. In response, we present MobiRNN, a mobile-specific optimization framework that implements GPU offloading specifically for mobile GPUs. Evaluations using an RNN model for activity recognition shows that MobiRNN does significantly decrease the latency of running RNN models on phones.
Task-specific word identification aims to choose the task-related words that best describe a short text. Existing approaches require well-defined seed words or lexical dictionaries (e.g., WordNet), which are often unavailable for many applications such as social discrimination detection and fake review detection. However, we often have a set of labeled short texts where each short text has a task-related class label, e.g., discriminatory or non-discriminatory, specified by users or learned by classification algorithms. In this paper, we focus on identifying task-specific words and phrases from short texts by exploiting their class labels rather than using seed words or lexical dictionaries. We consider the task-specific word and phrase identification as feature learning. We train a convolutional neural network over a set of labeled texts and use score vectors to localize the task-specific words and phrases. Experimental results on sentiment word identification show that our approach significantly outperforms existing methods. We further conduct two case studies to show the effectiveness of our approach. One case study on a crawled tweets dataset demonstrates that our approach can successfully capture the discrimination-related words/phrases. The other case study on fake review detection shows that our approach can identify the fake-review words/phrases.
Advances in deep learning have led to substantial increases in prediction accuracy as well as the cost of rendering predictions. We conjecture that for a majority of real-world inputs, the recent advances in deep learning have created models that effectively ‘over-think’ on simple inputs. In this paper we revisit the classic idea of prediction cascades to reduce prediction costs. We introduce the ‘I Don’t Know’ (IDK) prediction cascades framework, a general framework for constructing prediction cascades for arbitrary multi-class prediction tasks. We propose two baseline methods for constructing cascades as well as a new objective within this framework and evaluate these techniques on a range of benchmark and real-world datasets to demonstrate the prediction cascades can achieve 1.7-10.5x speedups in image classification tasks while maintaining comparable accuracy to state-of-the-art models. When combined with human experts, prediction cascades can achieve nearly perfect accuracy(within 5%) while requiring human intervention on less than 30% of the queries.
In many real-world scenarios, labeled data for a specific machine learning task is costly to obtain. Semi-supervised training methods make use of abundantly available unlabeled data and a smaller number of labeled examples. We propose a new framework for semi-supervised training of deep neural networks inspired by learning in humans. ‘Associations’ are made from embeddings of labeled samples to those of unlabeled ones and back. The optimization schedule encourages correct association cycles that end up at the same class from which the association was started and penalizes wrong associations ending at a different class. The implementation is easy to use and can be added to any existing end-to-end training setup. We demonstrate the capabilities of learning by association on several data sets and show that it can improve performance on classification tasks tremendously by making use of additionally available unlabeled data. In particular, for cases with few labeled data, our training scheme outperforms the current state of the art on SVHN.
Precise financial series predicting has long been a difficult problem because of unstableness and many noises within the series. Although Traditional time series models like ARIMA and GARCH have been researched and proved to be effective in predicting, their performances are still far from satisfying. Machine Learning, as an emerging research field in recent years, has brought about many incredible improvements in tasks such as regressing and classifying, and it’s also promising to exploit the methodology in financial time series predicting. In this paper, the predicting precision of financial time series between traditional time series models and mainstream machine learning models including some state-of-the-art ones of deep learning are compared through experiment using real stock index data from history. The result shows that machine learning as a modern method far surpasses traditional models in precision.
Vector representations and vector space modeling (VSM) play a central role in modern machine learning. We propose a novel approach to `vector similarity searching’ over dense semantic representations of words and documents that can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity. We show that this approach allows the indexing and querying of dense vectors in text domains. This opens up exciting avenues for major efficiency gains, along with simpler deployment, scaling and monitoring. The end result is a fast and scalable vector database with a tunable trade-off between vector search performance and quality, backed by a standard fulltext engine such as Elasticsearch. We empirically demonstrate its querying performance and quality by applying this solution to the task of semantic searching over a dense vector representation of the entire English Wikipedia.
Classification predicts classes of objects using the knowledge learned during the training phase. This process requires learning from labeled samples. However, the labeled samples usually limited. Annotation process is annoying, tedious, expensive, and requires human experts. Meanwhile, unlabeled data is available and almost free. Semi-supervised learning approaches make use of both labeled and unlabeled data. This paper introduces cluster and label approach using PSO for semi-supervised classification. PSO is competitive to traditional clustering algorithms. A new local best PSO is presented to cluster the unlabeled data. The available labeled data guides the learning process. The experiments are conducted using four state-of-the-art datasets from different domains. The results compared with Label Propagation a popular semi-supervised classifier and two state-of-the-art supervised classification models, namely k-nearest neighbors and decision trees. The experiments show the efficiency of the proposed model.
This Paper represents a literature review of Swarm intelligence algorithm in the area of semi-supervised classification. There are many research papers for applying swarm intelligence algorithms in the area of machine learning. Some algorithms of SI are applied in the area of ML either solely or hybrid with other ML algorithms. SI algorithms are also used for tuning parameters of ML algorithm, or as a backbone for ML algorithms. This paper introduces a brief literature review for applying swarm intelligence algorithms in the field of semi-supervised learning
Recent work has considered theoretical models for the behavior of agents with specific behavioral biases: rather than making decisions that optimize a given payoff function, the agent behaves inefficiently because its decisions suffer from an underlying bias. These approaches have generally considered an agent who experiences a single behavioral bias, studying the effect of this bias on the outcome. In general, however, decision-making can and will be affected by multiple biases operating at the same time. How do multiple biases interact to produce the overall outcome? Here we consider decisions in the presence of a pair of biases exhibiting an intuitively natural interaction: present bias — the tendency to value costs incurred in the present too highly — and sunk-cost bias — the tendency to incorporate costs experienced in the past into one’s plans for the future. We propose a theoretical model for planning with this pair of biases, and we show how certain natural behavioral phenomena can arise in our model only when agents exhibit both biases. As part of our model we differentiate between agents that are aware of their biases (sophisticated) and agents that are unaware of them (naive). Interestingly, we show that the interaction between the two biases is quite complex: in some cases, they mitigate each other’s effects while in other cases they might amplify each other. We obtain a number of further results as well, including the fact that the planning problem in our model for an agent experiencing and aware of both biases is computationally hard in general, though tractable under more relaxed assumptions.
This paper proposes a novel framework for detecting redundancy in supervised sentence categorisation. Unlike traditional singleton neural network, our model incorporates character-aware convolutional neural network (Char-CNN) with character-aware recurrent neural network (Char-RNN) to form a convolutional recurrent neural network (CRNN). Our model benefits from Char-CNN in that only salient features are selected and fed into the integrated Char-RNN. Char-RNN effectively learns long sequence semantics via sophisticated update mechanism. We compare our framework against the state-of-the-art text classification algorithms on four popular benchmarking corpus. For instance, our model achieves competing precision rate, recall ratio, and F1 score on the Google-news data-set. For twenty-news-groups data stream, our algorithm obtains the optimum on precision rate, recall ratio, and F1 score. For Brown Corpus, our framework obtains the best F1 score and almost equivalent precision rate and recall ratio over the top competitor. For the question classification collection, CRNN produces the optimal recall rate and F1 score and comparable precision rate. We also analyse three different RNN hidden recurrent cells’ impact on performance and their runtime efficiency. We observe that MGU achieves the optimal runtime and comparable performance against GRU and LSTM. For TFIDF based algorithms, we experiment with word2vec, GloVe, and sent2vec embeddings and report their performance differences.
We develop a family of reformulations of an arbitrary consistent linear system into a stochastic problem. The reformulations are governed by two user-defined parameters: a positive definite matrix defining a norm, and an arbitrary discrete or continuous distribution over random matrices. Our reformulation has several equivalent interpretations, allowing for researchers from various communities to leverage their domain specific insights. In particular, our reformulation can be equivalently seen as a stochastic optimization problem, stochastic linear system, stochastic fixed point problem and a probabilistic intersection problem. We prove sufficient, and necessary and sufficient conditions for the reformulation to be exact. Further, we propose and analyze three stochastic algorithms for solving the reformulated problem—basic, parallel and accelerated methods—with global linear convergence rates. The rates can be interpreted as condition numbers of a matrix which depends on the system matrix and on the reformulation parameters. This gives rise to a new phenomenon which we call stochastic preconditioning, and which refers to the problem of finding parameters (matrix and distribution) leading to a sufficiently small condition number. Our basic method can be equivalently interpreted as stochastic gradient descent, stochastic Newton method, stochastic proximal point method, stochastic fixed point method, and stochastic projection method, with fixed stepsize (relaxation parameter), applied to the reformulations.
In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. The most known algorithms intensively used in practice are random forests and gradient boosting. In this paper we present InfiniteBoost – a novel algorithm, which combines the best properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets.
In this paper, we consider the use of deep neural networks in the context of Multiple-Input-Multiple-Output (MIMO) detection. We give a brief introduction to deep learning and propose a modern neural network architecture suitable for this detection task. First, we consider the case in which the MIMO channel is constant, and we learn a detector for a specific system. Next, we consider the harder case in which the parameters are known yet changing and a single detector must be learned for all multiple varying channels. We demonstrate the performance of our deep MIMO detector using numerical simulations in comparison to competing methods including approximate message passing and semidefinite relaxation. The results show that deep networks can achieve state of the art accuracy with significantly lower complexity while providing robustness against ill conditioned channels and mis-specified noise variance.
This work presents a supervised learning based approach to the computer vision problem of frame interpolation. The presented technique could also be used in the cartoon animations since drawing each individual frame consumes a noticeable amount of time. The most existing solutions to this problem use unsupervised methods and focus only on real life videos with already high frame rate. However, the experiments show that such methods do not work as well when the frame rate becomes low and object displacements between frames becomes large. This is due to the fact that interpolation of the large displacement motion requires knowledge of the motion structure thus the simple techniques such as frame averaging start to fail. In this work the deep convolutional neural network is used to solve the frame interpolation problem. In addition, it is shown that incorporating the prior information such as optical flow improves the interpolation quality significantly.
We develop an empirical Bayes (EB) algorithm for the matrix completion problems. The EB algorithm is motivated from the singular value shrinkage estimator for matrix means by Efron and Morris (1972). Unlike existing methods, the EB algorithm does not require heuristic parameter tuning other than tolerance. Numerical results demonstrated that the EB algorithm achieves a good trade-off between accuracy and efficiency compared to existing algorithms and that it works particularly well when the difference between the number of rows and columns is large. Application to real data also shows the practical utility of the EB algorithm.
In this work, we study an important problem: learning programs from input-output examples. We propose a novel method to learn a neural program operating a domain-specific non-differentiable machine, and demonstrate that this method can be applied to learn programs that are significantly more complex than the ones synthesized before: programming language parsers from input-output pairs without knowing the underlying grammar. The main challenge is to train the neural program without supervision on execution traces. To tackle it, we propose: (1) LL machines and neural programs operating them to effectively regularize the space of the learned programs; and (2) a two-phase reinforcement learning-based search technique to train the model. Our evaluation demonstrates that our approach can successfully learn to parse programs in both an imperative language and a functional language, and achieve 100% test accuracy, while existing approaches’ accuracies are almost 0%. This is the first successful demonstration of applying reinforcement learning to train a neural program operating a non-differentiable machine that can fully generalize to test sets on a non-trivial task.
Convolutional network are the de-facto standard for analysing spatio-temporal data such as images, videos, 3D shapes, etc. Whilst some of this data is naturally dense (for instance, photos), many other data sources are inherently sparse. Examples include pen-strokes forming on a piece of paper, or (colored) 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard ‘dense’ implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce a sparse convolutional operation tailored to processing sparse data that differs from prior work on sparse convolutional networks in that it operates strictly on submanifolds, rather than ‘dilating’ the observation with every layer in the network. Our empirical analysis of the resulting submanifold sparse convolutional networks shows that they perform on par with state-of-the-art methods whilst requiring substantially less computation.
We discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing ‘deep’ linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail.
Automated story generation is the problem of automatically selecting a sequence of events, actions, or words that can be told as a story. We seek to develop a system that can generate stories by learning everything it needs to know from textual story corpora. To date, recurrent neural networks that learn language models at character, word, or sentence levels have had little success generating coherent stories. We explore the question of event representations that provide a mid-level of abstraction between words and sentences in order to retain the semantic information of the original data while minimizing event sparsity. We present a technique for preprocessing textual story data into event sequences. We then present a technique for automated story generation whereby we decompose the problem into the generation of successive events (event2event) and the generation of natural language sentences from events (event2sentence). We give empirical results comparing different event representations and their effects on event successor generation and the translation of events to natural language.
We present a new approach to ensemble learning. Our approach constructs a tree of subsets of the feature space and associates a predictor (predictive model) – determined by training one of a given family of base learners on an endogenously determined training set – to each node of the tree; we call the resulting object a tree of predictors. The (locally) optimal tree of predictors is derived recursively; each step involves jointly optimizing the split of the terminal nodes of the previous tree and the choice of learner and training set (hence predictor) for each set in the split. The feature vector of a new instance determines a unique path through the optimal tree of predictors; the final prediction aggregates the predictions of the predictors along this path. We derive loss bounds for the final predictor in terms of the Rademacher complexity of the base learners. We report the results of a number of experiments on a variety of datasets, showing that our approach provides statistically significant improvements over state-of-the-art machine learning algorithms, including various ensemble learning methods. Our approach works because it allows us to endogenously create more complex learners – when needed – and endogenously match both the learner and the training set to the characteristics of the dataset while still avoiding over-fitting.
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in low-power and low-latency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2
Learning with Reproducing Kernel Hilbert Spaces (RKHS) has been widely used in many scientific disciplines. Because a RKHS can be very flexible, it is common to impose a regularization term in the optimization to prevent overfitting. Standard RKHS learning employs the squared norm penalty of the learning function. Despite its success, many challenges remain. In particular, one cannot directly use the squared norm penalty for variable selection or data extraction. Therefore, when there exists noise predictors, or the underlying function has a sparse representation in the dual space, the performance of standard RKHS learning can be suboptimal. In the literature,work has been proposed on how to perform variable selection in RKHS learning, and a data sparsity constraint was considered for data extraction. However, how to learn in a RKHS with both variable selection and data extraction simultaneously remains unclear. In this paper, we propose a unified RKHS learning method, namely, DOuble Sparsity Kernel (DOSK) learning, to overcome this challenge. An efficient algorithm is provided to solve the corresponding optimization problem. We prove that under certain conditions, our new method can asymptotically achieve variable selection consistency. Simulated and real data results demonstrate that DOSK is highly competitive among existing approaches for RKHS learning.
From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains and require direct measurements of the underlying states. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions and dynamics, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. Our results demonstrate that the perceptual module and the object-based dynamics predictor module can induce factored latent representations that support accurate dynamical predictions. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments.
Bayesian Optimization (BO) has been shown to be a very effective paradigm for tackling hard black-box and non-convex optimization problems encountered in Machine Learning. Despite these successes, the computational complexity of the underlying function approximation has restricted the use of BO to problems that can be handled with less than a few thousand function evaluations. Harder problems like those involving functions operating in very high dimensional spaces may require hundreds of thousands or millions of evaluations or more and become computationally intractable to handle using standard Bayesian Optimization methods. In this paper, we propose Ensemble Bayesian Optimization (EBO) to overcome this problem. Unlike conventional BO methods that operate on a single posterior GP model, EBO works with an ensemble of posterior GP models. Further, we represent each GP model using tile coding random features and an additive function structure. Our approach generates speedups by parallelizing the time consuming hyper-parameter posterior inference and functional evaluations on hundreds of cores and aggregating the models in every iteration of BO. Our extensive experimental evaluation shows that EBO can speed up the posterior inference between 2-3 orders of magnitude (400 times in one experiment) compared to the state-of-the-art by putting data into Mondrian bins without sacrificing the sample quality. We demonstrate the ability of EBO to handle sample-intensive hard optimization problems by applying it to a rover navigation problem with tens of thousands of observations.
We present SimDex, a new technique for serving exact top-K recommendations on matrix factorization models that measures and optimizes for the similarity between users in the model. Previous serving techniques presume a high degree of similarity (e.g., L2 or cosine distance) among users and/or items in MF models; however, as we demonstrate, the most accurate models are not guaranteed to exhibit high similarity. As a result, brute-force matrix multiply outperforms recent proposals for top-K serving on several collaborative filtering tasks. Based on this observation, we develop SimDex, a new technique for serving matrix factorization models that automatically optimizes serving based on the degree of similarity between users, and outperforms existing methods in both the high-similarity and low-similarity regimes. SimDexfirst measures the degree of similarity among users via clustering and uses a cost-based optimizer to either construct an index on the model or defer to blocked matrix multiply. It leverages highly efficient linear algebra primitives in both cases to deliver predictions either from its index or from brute-force multiply. Overall, SimDex runs an average of 2x and up to 6x faster than highly optimized baselines for the most accurate models on several popular collaborative filtering datasets.