A Statistical Method for Corrupt Agents Detection

The statistical method is used to identify the hidden leaders of the corruption structure. The method is based on principal component analysis (PCA), linear regression, and Shannon information. It is applied to study the time series data of corrupt financial activity. Shannon’s quantity of information is specified as a function of two arguments: a vector of hidden corruption factors and a subset of corrupt agents. Several optimization problems are solved to determine the contribution of corresponding corrupt agents to the total illegal behavior. An illustrative example is given. A convenient algorithm for computing the covariance matrix with missing data is proposed.

DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec—two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple k-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.

On the Complexity of Learning Neural Networks

The stunning empirical successes of neural networks currently lack rigorous theoretical explanation. What form would such an explanation take, in the face of existing complexity-theoretic lower bounds? A first step might be to show that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently. We demonstrate here a comprehensive lower bound ruling out this possibility: for a wide class of activation functions (including all currently used), and inputs drawn from any logconcave distribution, there is a family of one-hidden-layer functions whose output is a sum gate, that are hard to learn in a precise sense: any statistical query algorithm (which includes all known variants of stochastic gradient descent with any loss function) needs an exponential number of queries even using tolerance inversely proportional to the input dimensionality. Moreover, this hard family of functions is realizable with a small (sublinear in dimension) number of activation units in the single hidden layer. The lower bound is also robust to small perturbations of the true weights. Systematic experiments illustrate a phase transition in the training error as predicted by the analysis.

Simplified Long Short-term Memory Recurrent Neural Networks: part I

We present five variants of the standard Long Short-term Memory (LSTM) recurrent neural networks by uniformly reducing blocks of adaptive parameters in the gating mechanisms. For simplicity, we refer to these models as LSTM1, LSTM2, LSTM3, LSTM4, and LSTM5, respectively. Such parameter-reduced variants enable speeding up data training computations and would be more suitable for implementations onto constrained embedded platforms. We comparatively evaluate and verify our five variant models on the classical MNIST dataset and demonstrate that these variant models are comparable to a standard implementation of the LSTM model while using less number of parameters. Moreover, we observe that in some cases the standard LSTM’s accuracy performance will drop after a number of epochs when using the ReLU nonlinearity; in contrast, however, LSTM3, LSTM4 and LSTM5 will retain their performance.

Smooth backfitting of proportional hazards — A new approach projecting survival data

Smooth backfitting has proven to have a number of theoretical and practical advantages in structured regression. Smooth backfitting projects the data down onto the structured space of interest providing a direct link between data and estimator. This paper introduces the ideas of smooth backfitting to survival analysis in a proportional hazard model, where we assume an underlying conditional hazard with multiplicative components. We develop asymptotic theory for the estimator and we use the smooth backfitter in a practical application, where we extend recent advances of in-sample forecasting methodology by allowing more information to be incorporated, while still obeying the structured requirements of in-sample forecasting.

Simplified Long Short-term Memory Recurrent Neural Networks: part II

This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance.

Simplified Long Short-term Memory Recurrent Neural Networks: part III

This is part III of three-part work. In parts I and II, we have presented eight variants for simplified Long Short Term Memory (LSTM) recurrent neural networks (RNNs). It is noted that fast computation, specially in constrained computing resources, are an important factor in processing big time-sequence data. In this part III paper, we present and evaluate two new LSTM model variants which dramatically reduce the computational load while retaining comparable performance to the base (standard) LSTM RNNs. In these new variants, we impose (Hadamard) pointwise state multiplications in the cell-memory network in addition to the gating signal networks.

Variational approach for learning Markov processes from time series data

Inference, prediction and control of complex dynamical systems from time series is important in many areas, including financial markets, power grid management, climate and weather modeling, or molecular dynamics. The analysis of such highly nonlinear dynamical systems is facilitated by the fact that we can often find a (generally nonlinear) transformation of the system coordinates to features in which the dynamics can be excellently approximated by a linear Markovian model. Moreover, the large number of system variables often change collectively on large time- and length-scales, facilitating a low-dimensional analysis in feature space. In this paper, we introduce a variational approach for Markov processes (VAMP) that allows us to find optimal feature mappings and optimal Markovian models of the dynamics from given time series data. The key insight is that the best linear model can be obtained from the top singular components of the Koopman operator. This leads to the definition of a family of score functions called VAMP-r which can be calculated from data, and can be employed to optimize a Markovian model. In addition, based on the relationship between the variational scores and approximation errors of Koopman operators, we propose a new VAMP-E score, which can be applied to cross-validation for hyper-parameter optimization and model selection in VAMP. VAMP is valid for both reversible and nonreversible processes and for stationary and non-stationary processes or realizations.

Ternary Residual Networks

Sub-8-bit representation of DNNs incur some noticeable loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and model size, the 8-2 architecture (8-bit activations, ternary weights), enhanced by residual ternary edges, turns out to be sophisticated enough to achieve similar accuracy as 8-8 representation (\sim 1\% drop from our FP-32 baseline), despite \sim 1.6\times reduction in model size, \sim 26\times reduction in number of multiplications , and potentially \sim 2\times inference speed up comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is \sim 2\% (from our FP-32 baseline), despite \sim 1.9\times reduction in model size, \sim 32\times reduction in number of multiplications, and potentially \sim 2.3\times inference speed up comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the residual ternary conversion can be done in a resource-constraint setting without any low-precision (re)training and without accessing the data.

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.

Do Neural Nets Learn Statistical Laws behind Natural Language?

The performance of deep learning in natural language processing has been spectacular, but the reason for this success remains unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a Long Short-Term Memory (LSTM)-based neural language model effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of the reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical law of natural language. This understanding could provide a direction of improvement of architectures of neural networks.

Listening while Speaking: Speech Chain by Deep Learning

Despite the close relationship between speech perception and production, research in automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has progressed more or less independently without exerting much mutual influence on each other. In human communication, on the other hand, a closed-loop speech chain mechanism with auditory feedback from the speaker’s mouth to her ear is crucial. In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning. The sequence-to-sequence model in close-loop architecture allows us to train our model on the concatenation of both labeled and unlabeled data. While ASR transcribes the unlabeled speech features, TTS attempts to reconstruct the original speech waveform based on the text from ASR. In the opposite direction, ASR also attempts to reconstruct the original text transcription given the synthesized speech. To the best of our knowledge, this is the first deep learning model that integrates human speech perception and production behaviors. Our experimental results show that the proposed approach significantly improved the performance more than separate systems that were only trained with labeled data.

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification
The Reversible Residual Network: Backpropagation Without Storing Activations
On Minimax Optimality of Sparse Bayes Predictive Density Estimates
f-GANs in an Information Geometric Nutshell
Symbolic Stochastic Chase Decoding of Reed-Solomon and BCH Codes
A new look at the inverse Gaussian distribution
Comprehensive Analysis on Exact Asymptotics of Random Coding Error Probability
Lenient Multi-Agent Deep Reinforcement Learning
Partial Identification of Nonseparable Models using Binary Instruments
Inner-Scene Similarities as a Contextual Cue for Object Detection
Developing a concept-level knowledge base for sentiment analysis in Singlish
Isoperimetry in integer lattices
Evaluating Semantic Parsing against a Simple Web-based Question Answering Model
The Mutual information of LDGM codes
Approximating the Nash Social Welfare with Budget-Additive Valuations
A General-Purpose Implementation of Conceptual Spaces
Estimating space-time trend and dependence of heavy rainfall
Monocular Visual Odometry for an Unmanned Sea-Surface Vehicle
Optimal Asynchronous Rendezvous for Mobile Robots with Lights
A note on 2–bisections of claw–free cubic graphs
Modeling Harmony with Skip-Grams
The spectra of lifted digraphs
Computing the number of induced copies of a fixed graph in a bounded degree graph
Hierarchical EM algorithm for estimating the parameters of Mixture of Bivariate Generalized Exponential distributions
Covariate adjustment and prediction of mean response in randomised trials
Ordered and disordered states of 3He-A in aerogel
Nonparametric estimation of locally stationary Hawkes processe
Minimum Coprime Labelings for Operations on Graphs
Efron’s monotonicity property for measures on $\mathbb{R}^2$
Big Data vs. complex physical models: a scalable inference algorithm
Layout of random circulant graphs
GLSR-VAE: Geodesic Latent Space Regularization for Variational AutoEncoder Architectures
LIUM-CVC Submissions for WMT17 Multimodal Translation Task
On an Exact and Nonparametric Test for the Separability of Two Classes by Means of a Simple Threshold
Guiding InfoGAN with Semi-Supervision
Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic
A Comparative Study of Unipolar OFDM Schemes in Gaussian Optical Intensity Channel
Square Deviation Based Symbol-Level Selection for Virtual Full-Duplex Relaying Networks
LIUM Machine Translation Systems for WMT17 News Translation Task
Competitive Algorithms for Generalized k-Server in Uniform Metrics
Energy-Efficient Power Allocation in Millimeter Wave Massive MIMO with Non-Orthogonal Multiple Access
Performance bounds for optimal feedback control in networks
A Convex Reconstruction Model for X-ray Tomographic Imaging with Uncertain Flat-fields
Cross-genre Document Retrieval: Matching between Conversational and Formal Writings
Coloring cross-intersecting families
Linguistic Markers of Influence in Informal Interactions
CUNI System for the WMT17 Multimodal Translation Task
$G$-Tutte polynomials and abelian Lie group arrangements
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding
Interference-Aided Energy Harvesting: Cognitive Relaying with Multiple Primary Transceivers
Game Theory for Secure Critical Interdependent Gas-Power-Water Infrastructure
Factoring the Cycle Aging Cost of Batteries Participating in Electricity Markets
Toric h-vectors and Chow Betti Numbers of Dual Hypersimplices
Capturing the diversity of biological tuning curves using generative adversarial networks
Multifractal Study of Quasiparticle Localization in Disordered Superconductors
Nonlinear Programming Methods for Distributed Optimization
An Efficient and Distribution-Free Two-Sample Test Based on Energy Statistics and Random Projections
Fine-grained reductions from approximate counting to decision
Group sequential designs for negative binomial outcomes
Intertwining wavelets or Multiresolution analysis on graphs through random forests
Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions
Periodically stationary multivariate non-Gaussian autoregressive models
Predicting multicellular function through multi-layer tissue networks
Assessing Retail Employee Risk Through Unsupervised Learning Techniques
Recognizing Abnormal Heart Sounds Using Deep Learning
EmojiNet: An Open Service and API for Emoji Sense Discovery
A Semantics-Based Measure of Emoji Similarity
Automated Proofs of Many Conjectured Recurrences in the OEIS made by R.J. Mathar
Rotations and Interpretability of Word Embeddings: the Case of the Russian Language
Mixing inequalities in Riesz spaces
Applications of gradient descent method to magnetic Skyrmion problems
The causal impact of bail on case outcomes for indigent defendants
Learning linear structural equation models in polynomial time and sample complexity
Knowledge-Guided Recurrent Neural Network Learning for Task-Oriented Action Prediction
Lyrics-Based Music Genre Classification Using a Hierarchical Attention Network
Early MFCC And HPCP Fusion for Robust Cover Song Identification
$Θ_S-$cyclic codes over $A_k$
Rethinking Reprojection: Closing the Loop for Pose-aware ShapeReconstruction from a Single Image
Lower Bounds for Planar Electrical Reduction
On the Performance of Forecasting Models in the Presence of Input Uncertainty
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
More on the Annihilator-Ideal Graph of a Commutative Ring
Analysis of Type-II hybrid censored competing risks data
Order Restricted Bayesian Analysis of a Simple Step Stress Model
Statistics of spatial averages and optimal sampling
Memoisation: Purely, Left-recursively, and with (Continuation Passing) Style
Finite-Horizon Covariance Control of Linear Time-Varying Systems
Reconstructing random jigsaws
Finding Fair and Efficient Allocations
Validating Wordscores
Electronic structure and X-ray spectroscopy of Cu$_{2}$MnAl$_{1-x}$Ga$_{x}$
Clustering Algorithms for the Centralized and Local Models
Robust optimal component design under consideration of local material defects
Almost Envy-Freeness with General Valuations
Original Loop-closure Detection Algorithm for Monocular vSLAM
AI Challenges in Human-Robot Cognitive Teaming
Semiflow selection and Markov selection theorems
Evolutionary Training of Sparse Artificial Neural Networks: A Network Science Perspective
Modified Alpha-Rooting Color Image Enhancement Method On The Two-Side 2-D Quaternion Discrete Fourier Transform And The 2-D Discrete Fourier Transform
New Classes of Ternary Bent Functions from the Coulter-Matthews Bent Functions
Comparing mixing times on sparse random graphs
MPIgnite: An MPI-Like Language and Prototype Implementation for Apache Spark
Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification
A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Foundations of Finite-, Super-, and Infinite-Population Random Graph Inference
Conditional Independence, Conditional Mean Independence, and Zero Conditional Covariance
The spectral radius of graphs without long cycles
Odd induced subgraphs in graphs with treewidth at most two
Open-Set Language Identification
Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training
Indirect excitation of self-oscillation in perpendicular ferromagnet by spin Hall effect
FML-based Dynamic Assessment Agent for Human-Machine Cooperative System on Game of Go
Process Migration over CCNx
Asymptotic Analysis of Expectations of Plane Partition Statistics
Minimax deviation strategies for machine learning and recognition with short learning samples
Moderate Deviation Asymptotics for Variable-Length Codes with Feedback
Overcoming Catastrophic Interference by Conceptors
A fractal perspective on optimal antichains and intersecting subsets of the unit $n$-cube
On Approximating the Number of $k$-cliques in Sublinear Time
Constructions of Optimal and Near-Optimal Quasi-Complementary Sequence Sets from an Almost Difference Set
Automated Detection of Non-Relevant Posts on the Russian Imageboard ‘2ch’: Importance of the Choice of Word Representations
Metabolic plasticity in synthetic lethal mutants: viability at higher cost
Uncertainty principles and optimally sparse wavelet transforms
Testing bounded arboricity
Near Optimal Sized Weight Tolerant Subgraph for Single Source Shortest Path
Reinforcement Learning for Architecture Search by Network Transformation
Regularity of Powers of edge ideal of very well-covered graphs
Coding with asymmetric prior knowledge
Rigged configuration bijection and proof of the $X=M$ conjecture for nonexceptional affine types
Optical Music Recognition with Convolutional Sequence-to-Sequence Models
Bayesian nonparametric spectral density estimation using B-spline priors
Convergence analysis of Adaptive Biasing Potential methods for diffusion processes
Generative Adversarial Network based on Resnet for Conditional Image Restoration
Derivative formulas and applications for degenerate SDEs with fractional noises
Intermittency for the stochastic heat equation with Lévy noise
Partial Domination in Graphs
Tunnel Effects in Cognition: A new Mechanism for Scientific Discovery and Education
Chinese Typography Transfer