Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to complexity of video data and lack of annotations. Previous convolutional neural networks (CNN) based video action detection approaches usually consist of two major steps: frame-level action proposal detection and association of proposals across frames. Also, these methods employ two-stream CNN framework to handle spatial and temporal feature separately. In this paper, we propose an end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for action detection in videos. The proposed architecture is a unified network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features. Finally, the tube proposals of different clips are linked together employing network flow and spatio-temporal action detection is performed using these linked video proposals. Extensive experiments on several video datasets demonstrate the superior performance of T-CNN for classifying and localizing actions in both trimmed and untrimmed videos compared to state-of-the-arts.


Factorization tricks for LSTM networks

We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is ‘matrix factorization by design’ of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups. Both approaches allow us to train large LSTM networks significantly faster to the state-of the art perplexity. On the One Billion Word Benchmark we improve single model perplexity down to 24.29.


N-gram Language Modeling using Recurrent Neural Network Estimation

We investigate the effective memory depth of RNN models by using them for n-gram language model (LM) smoothing. Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the n-gram state when compared with feed-forward and vanilla RNN models. When preserving the sentence independence assumption the LSTM n-gram matches the LSTM LM performance for n=9 and slightly outperforms it for n=13. When allowing dependencies across sentence boundaries, the LSTM 13-gram almost matches the perplexity of the unlimited history LSTM LM. LSTM n-gram smoothing also has the desirable property of improving with increasing n-gram order, unlike the Katz or Kneser-Ney back-off estimators. Using multinomial distributions as targets in training instead of the usual one-hot target is only slightly beneficial for low n-gram orders. Experiments on the One Billion Words benchmark show that the results hold at larger scale. Building LSTM n-gram LMs may be appealing for some practical situations: the state in a n-gram LM can be succinctly represented with (n-1)*4 bytes storing the identity of the words in the context and batches of n-gram contexts can be processed in parallel. On the downside, the n-gram context encoding computed by the LSTM is discarded, making the model more expensive than a regular recurrent LSTM LM.


Sentence Simplification with Deep Reinforcement Learning

Sentence simplification aims to make sentences easier to read and understand. Most recent approaches draw on insights from machine translation to learn simplification rewrites from monolingual corpora of complex and simple sentences. We address the simplification problem with an encoder-decoder model coupled with a deep reinforcement learning framework. Our model explores the space of possible simplifications while learning to optimize a reward function that encourages outputs which are simple, fluent, and preserve the meaning of the input. Experiments on three datasets demonstrate that our model brings significant improvements over the state of the art.


The Risk of Machine Learning

Many applied settings in empirical economics involve simultaneous estimation of a large number of parameters. In particular, applied economists are often interested in estimating the effects of many-valued treatments (like teacher effects or location effects), treatment effects for many groups, and prediction models with many regressors. In these settings, machine learning methods that combine regularized estimation and data-driven choices of regularization parameters are useful to avoid over-fitting. In this article, we analyze the performance of a class of machine learning estimators that includes ridge, lasso and pretest in contexts that require simultaneous estimation of many parameters. Our analysis aims to provide guidance to applied researchers on (i) the choice between regularized estimators in practice and (ii) data-driven selection of regularization parameters. To address (i), we characterize the risk (mean squared error) of regularized estimators and derive their relative performance as a function of simple features of the data generating process. To address (ii), we show that data-driven choices of regularization parameters, based on Stein’s unbiased risk estimate or on cross-validation, yield estimators with risk uniformly close to the risk attained under the optimal (unfeasible) choice of regularization parameters. We use data from recent examples in the empirical economics literature to illustrate the practical applicability of our results.


Incoherence-Mediated Remote Synchronization

Diving into the shallows: a computational perspective on large-scale shallow learning

Level compressibility for the Anderson model on regular random graphs and the absence of non-ergodic extended eigenfunctions

Study on Resource Efficiency of Distributed Graph Processing

Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

Light spanners for bounded treewidth graphs imply light spanners for $H$-minor-free graphs

Muirhead inequality for convex orders and a problem of I. Raşa on Bernstein polynomials

Convergence of a Scholtes-type Regularization Method for Cardinality-Constrained Optimization Problems with an Application in Sparse Robust Portfolio Optimization

Millimeter Wave communication with out-of-band information

Deep Neural Network Optimized to Resistive Memory with Nonlinear Current-Voltage Characteristics

Screening length in compensation-doped semiconductors and topological insulators

Relevance Subject Machine: A Novel Person Re-identification Framework

What-If Reasoning with Counterfactual Gaussian Processes

Most trees are short and fat

Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images

Near Perfect Protein Multi-Label Classification with Deep Neural Networks

Interference Exploitation in Full Duplex Communications: Trading Interference Power for Both Uplink and Downlink Power Savings

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

Learning and Trust in Auction Markets

Simple and Efficient Budget Feasible Mechanisms for Monotone Submodular Valuations

Computing Equilibrium in Matching Markets

How to Scale Up the Spectral Efficiency of Multi-way Massive MIMO Relaying?

Neutral evolution and turnover over centuries of English word popularity

Concurrent Segmentation and Localization for Tracking of Surgical Instruments

The excess degree of a polytope

Scaling, Proximity, and Optimization of Integrally Convex Functions

Deep 3D Face Identification

A Simple Point Estimator of the Power of Moments

BEGAN: Boundary Equilibrium Generative Adversarial Networks

Quasi-invariant Gaussian measures for the two-dimensional defocusing cubic nonlinear wave equation

A Euclidean Ramsey result in the plane

Deep Domain Adaptation Based Video Smoke Detection using Synthetic Smoke Images

Unsupervised Holistic Image Generation from Key Local Patches

An analysis of budgeted parallel search on conditional Galton-Watson trees

Advanced Quantizer Designs for FDD-based FD-MIMO Systems Using Uniform Planar Arrays

Fundamental Conditions for Low-CP-Rank Tensor Completion

Minimum degree conditions for small percolating sets in bootstrap percolation

Time-triggering versus event-triggering control over communication channels

Sufficient conditions for the value function and optimal strategy to be even and quasi-convex

Numerical Synthesis of Pontryagin Optimal Control Minimizers Using Sampling-Based Methods

On Self-Adaptive Mutation Restarts for Evolutionary Robotics with Real Rotorcraft

Novel Framework for Spectral Clustering using Topological Node Features(TNF)

Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

On cyclic codes of composite length and the minimal distance

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis

A Hybrid Data Association Framework for Robust Online Multi-Object Tracking

Phase transition for the Maki-Thompson rumour model on a small-world network

Cooperative Robust Output Regulation Problem for Discrete-Time Linear Time-Delay Multi-Agent Systems

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

Invariant Measure for Quantum Trajectories

Measurement Results for Millimeter Wave pure LOS MIMO Channels

A note on the generalized heat content for Lévy processes

Tunneling estimates and approximate controllability for hypoelliptic equations

Semantic-driven Generation of Hyperlapse from $360^\circ$ Video

A generalization of Tanaka’s formula

Sparse Control of Kinetic Cooperative Systems to Approximate Alignment

Probabilistic Mid- and Long-Term Electricity Price Forecasting

Probabilistic properties of the elliptic motion

End-To-End Face Detection and Recognition

A note on graphs with disjoint cliques ans a link with evasiveness

A Note on the Polytope of Bipartite TSP

Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks

Treewidth distance on phylogenetic trees

A computational algebraic geometry approach to classify partial Latin rectangles

MidiNet: A Convolutional Generative Adversarial Network for Symbolic-domain Music Generation using 1D and 2D Conditions

(DE)^2 CO: Deep Depth Colorization

Bi-class classification of humpback whale sound units against complex background noise with Deep Convolution Neural Network

Single Image Super Resolution – When Model Adaptation Matters

Study of cost functionals for ptychographic phase retrieval to improve the robustness against noise, and a proposal for another noise-robust ptychographic phase retrieval scheme

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

Random Multi-Unit Assignment with Endogenous Quotas

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

Unsupervised learning from video to detect foreground objects in single images

Fast Predictive Multimodal Image Registration

Quicksilver: Fast Predictive Image Registration – a Deep Learning Approach

EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning

Feature functional theory – binding predictor (FFT-BP) for the blind prediction of binding free energies

Unifying Message Passing Algorithms Under the Framework of Constrained Bethe Free Energy Minimization

Prediction of infectious disease epidemics via weighted density ensembles

Consistent estimation in Cox proportional hazards model with measurement errors and unbounded parameter set

Comparison of multi-task convolutional neural network (MT-CNN) and a few other methods for toxicity prediction

Geometric Symmetric Chain Decompositions

InverseFaceNet: Deep Single-Shot Inverse Face Rendering From A Single Image

Parallelism, Concurreny and Distribution in Constraint Handling Rules: A Survey (Draft)

Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders

Random Delta-attractors

On hypergraphs without loose cycles

Modified Interior-Point Method for Large-and-Sparse Low-Rank Semidefinite Programs

Optimal Robust Precoders for Tracking the AoD and AoA of a mm-Wave Path

Architecture of processing and analysis system for big astronomical data

Catalyst Acceleration for Gradient-Based Non-Convex Optimization

All Cognitive MIMO: A New Multiuser Detection Approach with Different Priorities

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

A simplicial complex model of dynamic epistemic logic for fault-tolerant distributed computing

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Advertisements