Whats new on arXiv

Optimal Estimating Equation for Logistic Regression with Linked Data

We propose an optimal estimating equation for logistic regression with linked data while accounting for false positives. It builds on a previous solution but estimates the regression coefficients with a smaller variance, in large samples.

Multiscale Residual Mixture of PCA: Dynamic Dictionaries for Optimal Basis Learning

In this paper we are interested in the problem of learning an over-complete basis and a methodology such that the reconstruction or inverse problem does not need optimization. We analyze the optimality of the presented approaches, their link to popular already known techniques s.a. Artificial Neural Networks,k-means or Oja’s learning rule. Finally, we will see that one approach to reach the optimal dictionary is a factorial and hierarchical approach. The derived approach lead to a formulation of a Deep Oja Network. We present results on different tasks and present the resulting very efficient learning algorithm which brings a new vision on the training of deep nets. Finally, the theoretical work shows that deep frameworks are one way to efficiently have over-complete (combinatorially large) dictionary yet allowing easy reconstruction. We thus present the Deep Residual Oja Network (DRON). We demonstrate that a recursive deep approach working on the residuals allow exponential decrease of the error w.r.t. the depth.

EncodingWord Confusion Networks with Recurrent Neural Networks for Dialog State Tracking

This paper presents our novel method to encode word confusion networks, which can represent a rich hypothesis space of automatic speech recognition systems, via recurrent neural networks. We demonstrate the utility of our approach for the task of dialog state tracking in spoken dialog systems that relies on automatic speech recognition output. Encoding confusion networks outperforms encoding the best hypothesis of the automatic speech recognition in a neural system for dialog state tracking on the well-known second Dialog State Tracking Challenge dataset.

Equivalence between LINE and Matrix Factorization

LINE [1], as an efficient network embedding method, has shown its effectiveness in dealing with large-scale undirected, directed, and/or weighted networks. Particularly, it proposes to preserve both the local structure (represented by First-order Proximity) and global structure (represented by Second-order Proximity) of the network. In this study, we prove that LINE with these two proximities (LINE(1st) and LINE(2nd)) are actually factoring two different matrices separately. Specifically, LINE(1st) is factoring a matrix M (1), whose entries are the doubled Pointwise Mutual Information (PMI) of vertex pairs in undirected networks, shifted by a constant. LINE(2nd) is factoring a matrix M (2), whose entries are the PMI of vertex and context pairs in directed networks, shifted by a constant. We hope this finding would provide a basis for further extensions and generalizations of LINE.

Deep Active Learning for Named Entity Recognition

Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show that by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.

Session-aware Information Embedding for E-commerce Product Recommendation

Most of the existing recommender systems assume that user’s visiting history can be constantly recorded. However, in recent online services, the user identification may be usually unknown and only limited online user behaviors can be used. It is of great importance to model the temporal online user behaviors and conduct recommendation for the anonymous users. In this paper, we propose a list-wise deep neural network based architecture to model the limited user behaviors within each session. To train the model efficiently, we first design a session embedding method to pre-train a session representation, which incorporates different kinds of user search behaviors such as clicks and views. Based on the learnt session representation, we further propose a list-wise ranking model to generate the recommendation result for each anonymous user session. We conduct quantitative experiments on a recently published dataset from an e-commerce company. The evaluation results validate the effectiveness of the proposed method, which can outperform the state-of-the-art significantly.

When Unsupervised Domain Adaptation Meets Tensor Representations

Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another. It is thus of great practical importance to the application of such methods. Despite the fact that tensor representations are widely used in Computer Vision to capture multi-linear relationships that affect the data, most existing DA methods are applicable to vectors only. This renders them incapable of reflecting and preserving important structure in many problems. We thus propose here a learning-based method to adapt the source and target tensor representations directly, without vectorization. In particular, a set of alignment matrices is introduced to align the tensor representations from both domains into the invariant tensor subspace. These alignment matrices and the tensor subspace are modeled as a joint optimization problem and can be learned adaptively from the data using the proposed alternative minimization scheme. Extensive experiments show that our approach is capable of preserving the discriminative power of the source domain, of resisting the effects of label noise, and works effectively for small sample sizes, and even one-shot DA. We show that our method outperforms the state-of-the-art on the task of cross-domain visual recognition in both efficacy and efficiency, and particularly that it outperforms all comparators when applied to DA of the convolutional activations of deep convolutional networks.

Improving Language Modeling using Densely Connected Recurrent Neural Networks

In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions.

Naive Bayes Classification for Subset Selection

This article focuses on the question of learning how to automatically select a subset of items among a bigger set. We introduce a methodology for the inference of ensembles of discrete values, based on the Naive Bayes assumption. Our motivation stems from practical use cases where one wishes to predict an unordered set of (possibly interdependent) values from a set of observed features. This problem can be considered in the context of Multi-label Classification (MLC) where such values are seen as labels associated to continuous or discrete features. We introduce the \nbx algorithm, an extension of Naive Bayes classification into the multi-label domain, discuss its properties and evaluate our approach on real-world problems.

Can GAN Learn Topological Features of a Graph?

This paper is first-line research expanding GANs into graph topology analysis. By leveraging the hierarchical connectivity structure of a graph, we have demonstrated that generative adversarial networks (GANs) can successfully capture topological features of any arbitrary graph, and rank edge sets by different stages according to their contribution to topology reconstruction. Moreover, in addition to acting as an indicator of graph reconstruction, we find that these stages can also preserve important topological features in a graph.

Imagination-Augmented Agents for Deep Reinforcement Learning

We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

Low-complexity implementation of convex optimization-based phase retrieval
Improving Gibbs Sampler Scan Quality with DoGS
A Note on Unconditional Subexponential-time Pseudo-deterministic Algorithms for BPP Search Problems
A Novel Deep Learning Architecture for Testis Histology Image Classification
Beyond Consensus and Synchrony in Online Network Optimization via Saddle Point Method
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
A deep learning approach to diabetic blood glucose prediction
Skew ribbon plane partitions: calculus and asymptotics
Bagged Empirical Null p-values: A Method to Account for Model Uncertainty in Large Scale Inference
Subgroup Balancing Propensity Score
Hamiltonicity of token graphs of fan graphs
Linear Time Complexity Deep Fourier Scattering Network and Extension to Nonlinear Invariants
On the Computation of Neumann Series
The Devil is in the Decoder
A Short Survey of Biomedical Relation Extraction Techniques
Cooperative Estimation via Altruism
On the Robustness and Asymptotic Properties for Maximum Likelihood Estimators of Parameters in Exponential Power and its Scale Mixture Form Distributions
Logic Programming approaches for routing fault-free and maximally-parallel Wavelength Routed Optical Networks on Chip (Application paper)
On Adaptive Propensity Score Truncation in Causal Inference
Asymptotically Optimal Load Balancing Topologies
Reconciling Graphs and Sets of Sets
The Benefit of Encoder Cooperation in the Presence of State Information
Physics-guided probabilistic modeling of extreme precipitation under climate change
The Value of Information Concealment
On-line Building Energy Optimization using Deep Reinforcement Learning
Risk ratios for contagious outcomes
Monochromatic Subgraphs in Randomly Colored Graphons
MIT SuperCloud Portal Workspace: Enabling HPC Web Application Deployment
Hybrid Conditional Planning using Answer Set Programming
Secure SURF with Fully Homomorphic Encryption
Recovering Latent Signals from a Mixture of Measurements using a Gaussian Process Prior
Recognizing and Curating Photo Albums via Event-Specific Image Importance
Improving the capacity of molecular communication using enzymatic reaction cycles
Uplink Spectral Efficiency Analysis of Multi-Cell Multi-User Massive MIMO over Correlated Ricean Channel
Multiple Imputation of Missing Values in Household Data with Structural Zeros
Optimization-based Quantification of Simulation Input Uncertainty via Empirical Likelihood
Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes
Robustness of semiparametric efficiency in nearly-true models for two-phase samples
Learning Unified Embedding for Apparel Recognition
Achieving both positive secrecy rates of the users in two-way wiretap channel by individual secrecy
Effects of Feedback on the One-sided Secrecy of Two-way Wiretap through Multiple Transmissions
Local picture and level-set percolation of the Gaussian free field on a large discrete torus
Face Alignment Robust to Pose, Expressions and Occlusions
An Efficient Version of the Bombieri-Vaaler Lemma
Rank-Metric Codes with Local Recoverability
First-Order Query Evaluation with Cardinality Conditions
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
Image Projective Invariants
MISO in Ultra-Dense Networks: Balancing the Tradeoff between User and System Performance
Multidimensional classification of hippocampal shape features discriminates Alzheimer’s disease and mild cognitive impairment from normal aging
Measuring Thematic Fit with Distributional Feature Overlap
Games with lexicographically ordered $ω$-regular objectives
Generic Black-Box End-to-End Attack against RNNs and Other API Calls Based Malware Classifiers
Drone-based Object Counting by Spatially Regularized Regional Proposal Network
Orthogonal and Idempotent Transformations for Learning Deep Neural Networks
Three-term polynomial progressions in subsets of finite fields
Probably approximate Bayesian computation: nonasymptotic convergence of ABC under misspecification
On the Dynamical Foundation of Multifractality
Gibbsian representation for point processes via hyperedge potentials
Layered Group Sparse Beamforming for Cache-Enabled Green Wireless Networks
Computing Tutte Paths
Error bounds in local limit theorems using Stein’s method
Argotario: Computational Argumentation Meets Serious Games
Detecting Parts for Action Localization
Better Labeling Schemes for Nearest Common Ancestors through Minor-Universal Trees
Modeling Target-Side Inflection in Neural Machine Translation
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Big minimizers of a non local isoperimetric problem: theoretical and numerical approaches
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
Horofunctions on Sierpiński type triangles
Modeling the Intra-class Variability for Liver Lesion Detection using a Multi-class Patch-based CNN
A notion of rigidity on convolution between Poisson and determinantal point processes
Recommendation via matrix completion using Kolmogorov complexity
Online bipartite matching with amortized $O(\log^2 n)$ replacements
Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition
On Finding Maximum Cardinality Subset of Vectors with a Constraint on Normalized Squared Length of Vectors Sum
Quantum non demolition measurements: parameter estimation for mixtures of multinomials
A Spatio-Temporal Multivariate Shared Component Model with an Application in Iran Cancer Data
Critical Density for Activated Random Walks
Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model
Bayesian Probabilistic Numerical Methods for Industrial Process Monitoring
On shortening u-cycles and u-words for permutations
Microblog Retrieval for Post-Disaster Relief: Applying and Comparing Neural IR Models
Boolean dimension and tree-width
Discriminative convolutional Fisher vector network for action recognition
On Testing Minor-Freeness in Bounded Degree Graphs With One-Sided Error
Transition to synchrony in degree-frequency correlated Sakaguchi-Kuramoto model
Conditioned local limit theorems for random walks defined on finite Markov chains
Solving Mixed Model Workplace Time-dependent Assembly Line Balancing Problem with FSS Algorithm
NuCypher KMS: Decentralized key management system
Self-paced Convolutional Neural Network for Computer Aided Detection in Medical Imaging Analysis
Expect the unexpected: Harnessing Sentence Completion for Sarcasm Detection
Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation
Optimized Signaling of Binary Correlated Sources over GMACs
Quantum ergodic sequences and equilibrium measures
The distinguishing number (index) and the domination number of a graph
Metrical-accent Aware Vocal Onset Detection in Polyphonic Audio
Sentence-level quality estimation by predicting HTER as a multi-component metric
Channel Pruning for Accelerating Very Deep Neural Networks
Fish School Search Algorithm for Constrained Optimization
Learning model-based planning from scratch
Deformable Part-based Fully Convolutional Network for Object Detection
Qualitative and quantitative properties of the dynamics of screw dislocations
Limit Cycles of Dynamic Systems under Random Perturbations with Rapid Switching and Slow Diffusion: A Multi-Scale Approach
Object-Extent Pooling for Weakly Supervised Single-Shot Localization
Domain-adversarial neural networks to address the appearance variability of histopathology images
Simultaneously Solving Mixed Model Assembly Line Balancing and Sequencing problems with FSS Algorithm
Multistage Voting Model with Alternative Elimination
Entropy-based Pruning for Learning Bayesian Networks using BIC
Fast and Accurate OOV Decoder on High-Level Features
A Projection-Based Reformulation and Decomposition Algorithm for Global Optimization of Mixed Integer Bilevel Linear Programs
Crowdsourcing Multiple Choice Science Questions
Submodular Minimization Under Congruency Constraints
Analysis of $p$-Laplacian Regularization in Semi-Supervised Learning
Worst-case vs Average-case Design for Estimation from Fixed Pairwise Comparisons
Acceleration and Averaging in Stochastic Mirror Descent Dynamics


If you did not already know

Shrinkage google
In statistics, shrinkage has two meanings:
• In relation to the general observation that, in regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination ‘shrinks’. This idea is complementary to overfitting and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides ‘shrinkage.’ But the adjustment formula yields an artificial shrinkage, in contrast to the first definition.
• To describe general types of estimators, or the effects of some types of estimation, whereby a naive or raw estimate is improved by combining it with other information (). The term relates to the notion that the improved estimate is at a reduced distance from the value supplied by the ‘other information’ than is the raw estimate. In this sense, shrinkage is used to regularize ill-posed inference problems.
A common idea underlying both of these meanings is the reduction in the effects of sampling variation. …

Generalized Linear Models (GLM) google
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. …

Atomic Triangular Matrix google
An atomic (upper or lower) triangular matrix is a special form of unitriangular matrix, where all of the off-diagonal entries are zero, except for the entries in a single column. Such a matrix is also called a Gauss matrix or a Gauss transformation matrix. …

Distilled News

Beginner’s guide to build data visualisations on the web with D3.js

The web is becoming more accessible day by day and with advancements in browser technology, it is now possible to render complex visualisations on the fly across a variety of devices. This combination of accessibility and complexity makes the web an apt platform to reach out to large audiences. A lot of organisations are already using web / mobile applications showing dashboards to your mobile as well at your laptop and computer.

Network analysis of Game of Thrones

In this post, I am exploring network analysis techniques in a family network of major characters from Game of Thrones. Not surprisingly, we learn that House Stark (specifically Ned and Sansa) and House Lannister (especially Tyrion) are the most important family connections in Game of Thrones; they also connect many of the storylines and are central parts of the narrative.

Data Analysis – World Happiness Report – 2016

Data Analysis using Python – An Introduction

Challenges in Deep Learning

Deep Learning has become one of the primary research areas in developing intelligent machines. Most of the well-known applications (such as Speech Recognition, Image Processing and NLP) of AI are driven by Deep Learning. Deep Learning algorithms mimic human brains using artificial neural networks and progressively learn to accurately solve a given problem. But there are significant challenges in Deep Learning systems which we have to look out for.

Introducing Yandex CatBoost, a state-of-the-art open-source gradient boosting library

Recent developments in machine learning have accelerated its transition from a computer science research area to a technology that drives numerous customer applications. One of the most buzzed about methods leading this transition is deep learning. At Yandex, our homegrown deep neural networks are an important part of the machine learning portfolio that helps sustain our market-leading performance in search, speech recognition and synthesis, vision applications and machine translation. At the same time, we’ve also integrated many other forms of machine learning across our products and services. One thing to remember about machine learning is that there is no singular best approach – it is a rich collection of algorithms that each have their own strengths and weaknesses for specific types of data and certain types of customer problems. Deep learning has unlocked amazing capabilities in the advancement of artificial intelligence, but, at the end of the day, it’s just one part of a much broader machine learning tech stack that also includes linear and tree-based models, factorization methods, and numerous other techniques that leverage statistics and optimization.

Data Structures Related to Machine Learning Algorithms

If you want to solve some real-world problems and design a cool product or algorithm, then having machine learning skills is not enough. You would need good working knowledge of data structures. The Statsbot team has invited Peter Mills to tell you about data structures for machine learning approaches.

Machine Learning Crash Course: Part 4 – The Bias-Variance Dilemma

So what does this have to do with machine learning? Well, it turns out that machine learning algorithms are not that much different from our friend Doge: they often run the risk of over-extrapolating or over-interpolating from the data that they are trained on. There is a very delicate balancing act when machine learning algorithms try to predict things. On the one hand, we want our algorithm to model the training data very closely, otherwise we’ll miss relevant features and interesting trends. However, on the other hand we don’t want our model to fit too closely, and risk over-interpreting every outlier and irregularity.

Machine Learning Explained: supervised learning, unsupervised learning, and reinforcement learning

Machine learning is often split between three main types of learning: supervised learning, unsupervised learning, and reinforcement learning. Knowing the differences between these three types of learning is necessary for any data scientist.

Design by evolution: How to evolve your neural network. AutoML: Time to evolve.

For most machine learning practitioners designing a neural network is an artform. Usually, it begins with a common architecture and then parameters are tweaked until a good combination of layers, activation functions, regularisers, and optimisation parameters are found. Guided by popular architectures?—?like VGG, Inception, ResNets, DenseNets and others?—?one will iterate through variations of the network until it achieves the desired balance of speed and accuracy. But as the available processing power increases, it makes sense to begin automating this network optimisation process.

5 Free Resources for Getting Started with Deep Learning for Natural Language Processing

This is a collection of 5 deep learning for natural language processing resources for the uninitiated, intended to open eyes to what is possible and to the current state of the art at the intersection of NLP and deep learning. It should also provide some idea of where to go next.

Neville’s Method of Polynomial Interpolation

Part 1 of 5 in the series Numerical Analysis
• Neville’s Method of Polynomial Interpolation
• Lagrangian Polynomial Interpolation with R
• The Newton-Raphson Root-Finding Algorithm in R
• The Secant Method Root-Finding Algorithm in R
• The Bisection Method of Root-Finding with R

Some Ideas for your Internal R Package

At RStudio, I have the pleasure of interacting with data science teams around the world. Many of these teams are led by R users stepping into the role of analytic admins. These users are responsible for supporting and growing the R user base in their organization and often lead internal R user groups. One of the most successful strategies to support a corporate R user group is the creation of an internal R package. This article outlines some common features and functions shared in internal packages. Creating an R package is easier than you might expect. A good place to start is this webinar on package creation.

Book Memo: “Multicriteria and Clustering”

Classification Techniques in Agrifood and Environment
This book provides an introduction to operational research methods and their application in the agrifood and environmental sectors. It explains the need for multicriteria decision analysis and teaches users how to use recent advances in multicriteria and clustering classification techniques in practice. Further, it presents some of the most common methodologies for statistical analysis and mathematical modeling, and discusses in detail ten examples that explain and show “hands-on” how operational research can be used in key decision-making processes at enterprises in the agricultural food and environmental industries. As such, the book offers a valuable resource especially well suited as a textbook for postgraduate courses.

Document worth reading: “Provable benefits of representation learning”

There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc. To study the relative merits of these techniques, it’s essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings — linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods. Provable benefits of representation learning

R Packages worth a look

Bayesian Graphical Lasso (BayesianGLasso)
Implements a data-augmented block Gibbs sampler for simulating the posterior distribution of concentration matrices for specifying the topology and parameterization of a Gaussian Graphical Model (GGM). This sampler was originally proposed in Wang (2012) <doi:10.1214/12-BA729>.

Stochastic Gradient Markov Chain Monte Carlo (sgmcmc)
Provides functions that performs popular stochastic gradient Markov chain Monte Carlo (SGMCMC) methods on user specified models. The required gradients are automatically calculated using ‘TensorFlow’ <https://…/>, an efficient library for numerical computation. This means only the log likelihood and log prior functions need to be specified. The methods implemented include stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian Monte Carlo (SGHMC), stochastic gradient Nose-Hoover thermostat (SGNHT) and their respective control variate versions for increased efficiency.

Subgroup Discovery and Bump Hunting (subgroup.discovery)
Developed to assist in discovering interesting subgroups in high-dimensional data. The PRIM implementation is based on the 1998 paper ‘Bump hunting in high-dimensional data’ by Jerome H. Friedman and Nicholas I. Fisher. <doi:10.1023/A:1008894516817> PRIM involves finding a set of ‘rules’ which combined imply unusually large (or small) values of some other target variable. Specifically one tries to find a set of sub regions in which the target variable is substantially larger than overall mean. The objective of bump hunting in general is to find regions in the input (attribute/feature) space with relatively high (low) values for the target variable. The regions are described by simple rules of the type if: condition-1 and … and condition-n then: estimated target value. Given the data (or a subset of the data), the goal is to produce a box B within which the target mean is as large as possible. There are many problems where finding such regions is of considerable practical interest. Often these are problems where a decision maker can in a sense choose or select the values of the input variables so as to optimize the value of the target variable. In bump hunting it is customary to follow a so-called covering strategy. This means that the same box construction (rule induction) algorithm is applied sequentially to subsets of the data.

Estimate (Generalized) Linear Mixed Models with Factor Structures (PLmixed)
Utilizes the ‘lme4’ package and the optim() function from ‘stats’ to estimate (generalized) linear mixed models (GLMM) with factor structures using a profile likelihood approach, as outlined in Jeon and Rabe-Hesketh (2012) <doi:10.3102/1076998611417628>. Factor analysis and item response models can be extended to allow for an arbitrary number of nested and crossed random effects, making it useful for multilevel and cross-classified models.

A Monadic Pipeline System (rmonad)
A monadic solution to pipeline analysis. All operations — and the errors, warnings and messages they emit — are merged into a directed graph. Infix binary operators mediate when values are stored, how exceptions are handled, and where pipelines branch and merge. The resulting structure may be queried for debugging or report generation. ‘rmonad’ complements, rather than competes with, non-monadic pipeline packages like ‘magrittr’ or ‘pipeR’.

R Packages worth a look

Differential Expression Analysis Using a Bottom-Up Model (denoiSeq)
Given count data from two conditions, it determines which transcripts are differentially expressed across the two conditions using Bayesian inference of the parameters of a bottom-up model for PCR amplification. This model is developed in Ndifon Wilfred, Hilah Gal, Eric Shifrut, Rina Aharoni, Nissan Yissachar, Nir Waysbort, Shlomit Reich Zeliger, Ruth Arnon, and Nir Friedman (2012), <http://…/15865.full>, and results in a distribution for the counts that is a superposition of the binomial and negative binomial distribution.

Apply Functions to Multiple Multidimensional Arguments (multiApply)
The base apply function and its variants, as well as the related functions in the ‘plyr’ package, typically apply user-defined functions to a single argument (or a list of vectorized arguments in the case of mapply). The ‘multiApply’ package extends this paradigm to functions taking a list of multiple unidimensional or multidimensional arguments (or combinations thereof) as input, which can have different numbers of dimensions as well as different dimension lengths.

Basic Functions for Pre-Processing Microarrays (PreProcess)
Provides classes to pre-process microarray gene expression data as part of the OOMPA collection of packages described at <http://…/>.

Visualize Reproducibility and Replicability in a Comparison of Scientific Studies (scifigure)
Users may specify what fundamental qualities of a new study have or have not changed in an attempt to reproduce or replicate an original study. A comparison of the differences is visualized. Visualization approach follows Patil, Peng, and Leek (2016) <doi:10.1101/066803>.

Two Stage Forecasting (TSF) for Long Memory Time Series in Presence of Structural Break (TSF)
Forecasting of long memory time series in presence of structural break by using TSF algorithm by Papailias and Dias (2015) <doi:10.1016/j.ijforecast.2015.01.006>.

Book Memo: “Aggregated Search”

The goal of aggregated search is to provide integrated search across multiple heterogeneous search services in a unified interface—a single query box and a common presentation of results. In the web search domain, aggregated search systems are responsible for integrating results from specialized search services, or verticals, alongside the core web results. For example, search portals such as Google, Bing, and Yahoo! provide access to vertical search engines that focus on different types of media (images and video), different types of search tasks (search for local businesses and online products), and even applications that can help users complete certain tasks (language translation and math calculations). This monograph provides a comprehensive summary of previous research in aggregated search. It starts by describing why aggregated search requires unique solutions. It then discusses different sources of evidence that are likely to be available to an aggregated search system, as well as different techniques for integrating evidence in order to make vertical selection and presentation decisions. Next, it surveys different evaluation methodologies for aggregated search and discusses prior user studies that have aimed to better understand how users behave with aggregated search interfaces. It proceeds to review different advanced topics in aggregated search. It concludes by highlighting the main trends and discussing short-term and long-term areas for future work.

Whats new on arXiv

Houdini: Fooling Deep Structured Prediction Models

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.

TensorLog: Deep Learning Meets Probabilistic DBs

We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.

Cooperative Hierarchical Dirichlet Processes: Superposition vs. Maximization

The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature). Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting. One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling. In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.

Improved Neural Machine Translation with a Syntax-Aware Encoder and Decoder

Most neural machine translation (NMT) models are based on the sequential encoder-decoder framework, which makes no use of syntactic information. In this paper, we improve this model by explicitly incorporating source-side syntactic trees. More specifically, we propose (1) a bidirectional tree encoder which learns both sequential and tree structured representations; (2) a tree-coverage model that lets the attention depend on the source-side syntax. Experiments on Chinese-English translation demonstrate that our proposed models outperform the sequential attentional model as well as a stronger baseline with a bottom-up tree encoder and word coverage.

Top-Rank Enhanced Listwise Optimization for Statistical Machine Translation

Pairwise ranking methods are the basis of many widely used discriminative training approaches for structure prediction problems in natural language processing(NLP). Decomposing the problem of ranking hypotheses into pairwise comparisons enables simple and efficient solutions. However, neglecting the global ordering of the hypothesis list may hinder learning. We propose a listwise learning framework for structure prediction problems such as machine translation. Our framework directly models the entire translation list’s ordering to learn parameters which may better fit the given listwise samples. Furthermore, we propose top-rank enhanced loss functions, which are more sensitive to ranking errors at higher positions. Experiments on a large-scale Chinese-English translation task show that both our listwise learning framework and top-rank enhanced listwise losses lead to significant improvements in translation quality.

DeepProbe: Information Directed Sequence Understanding and Chatbot Design via Recurrent Neural Networks

Information extraction and user intention identification are central topics in modern query understanding and recommendation systems. In this paper, we propose DeepProbe, a generic information-directed interaction framework which is built around an attention-based sequence to sequence (seq2seq) recurrent neural network. DeepProbe can rephrase, evaluate, and even actively ask questions, leveraging the generative ability and likelihood estimation made possible by seq2seq models. DeepProbe makes decisions based on a derived uncertainty (entropy) measure conditioned on user inputs, possibly with multiple rounds of interactions. Three applications, namely a rewritter, a relevance scorer and a chatbot for ad recommendation, were built around DeepProbe, with the first two serving as precursory building blocks for the third. We first use the seq2seq model in DeepProbe to rewrite a user query into one of standard query form, which is submitted to an ordinary recommendation system. Secondly, we evaluate DeepProbe’s seq2seq model-based relevance scoring. Finally, we build a chatbot prototype capable of making active user interactions, which can ask questions that maximize information gain, allowing for a more efficient user intention idenfication process. We evaluate first two applications by 1) comparing with baselines by BLEU and AUC, and 2) human judge evaluation. Both demonstrate significant improvements compared with current state-of-the-art systems, proving their values as useful tools on their own, and at the same time laying a good foundation for the ongoing chatbot application.

AE-GAN: adversarial eliminating with GAN

Although Neural networks could achieve state-of-the-art performance while recongnizing images, they often suffer a tremendous defeat from adversarial examples–inputs generated by utilizing imperceptible but intentional perturbations to samples from the datasets. How to defense against adversarial examples is an important problem which is well worth to research. So far, only two well-known methods adversarial training and defensive distillation have provided a significant defense. In contrast to existing methods mainly based on model itself, we address the problem purely based on the adversarial examples itself. In this paper, a novel idea and the first framework based Generative Adversarial Nets named AE-GAN capable of resisting adversarial examples are proposed. Extensive experiments on benchmark datasets indicate that AE-GAN is able to defense against adversarial examples effectively.

Order-Free RNN with Visual Attention for Multi-Label Classification

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

Bayesian Nonlinear Support Vector Machines for Big Data

We propose a fast inference method for Bayesian nonlinear support vector machines that leverages stochastic variational inference and inducing points. Our experiments show that the proposed method is faster than competing Bayesian approaches and scales easily to millions of data points. It provides additional features over frequentist competitors such as accurate predictive uncertainty estimates and automatic hyperparameter search.

Latent Gaussian Process Regression

We introduce Latent Gaussian Process Regression which is a latent variable extension allowing modelling of non-stationary processes using stationary GP priors. The approach is built on extending the input space of a regression problem with a latent variable that is used to modulate the covariance function over the input space. We show how our approach can be used to model non-stationary processes but also how multi-modal or non-functional processes can be described where the input signal cannot fully disambiguate the output. We exemplify the approach on a set of synthetic data and provide results on real data from geostatistics.

Spectral Filter Tracking

Visual object tracking is a challenging computer vision task with numerous real-world applications. Here we propose a simple but efficient Spectral Filter Tracking (SFT)method. To characterize rotational and translation invariance of tracking targets, the candidate image region is models as a pixelwise grid graph. Instead of the conventional graph matching, we convert the tracking into a plain least square regression problem to estimate the best center coordinate of the target. But different from the holistic regression of correlation filter based methods, SFT can operate on localized surrounding regions of each pixel (i.e.,vertex) by using spectral graph filters, which thus is more robust to resist local variations and cluttered background.To bypass the eigenvalue decomposition problem of the graph Laplacian matrix L, we parameterize spectral graph filters as the polynomial of L by spectral graph theory, in which L k exactly encodes a k-hop local neighborhood of each vertex. Finally, the filter parameters (i.e., polynomial coefficients) as well as feature projecting functions are jointly integrated into the regression model.

One-Shot Learning in Discriminative Neural Networks

We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weights, which we model as a multivariate Gaussian. By using this as a prior for the new weights, we demonstrate competitive performance with state-of-the-art methods whilst also being consistent with ‘normal’ methods for training deep networks on large data.

Fast Feature Fool: A data independent approach to universal adversarial perturbations

State-of-the-art object recognition Convolutional Neural Networks (CNNs) are shown to be fooled by image agnostic perturbations, called universal adversarial perturbations. It is also observed that these perturbations generalize across multiple networks trained on the same target data. However, these algorithms require training data on which the CNNs were trained and compute adversarial perturbations via complex optimization. The fooling performance of these approaches is directly proportional to the amount of available training data. This makes them unsuitable for practical attacks since its unreasonable for an attacker to have access to the training data. In this paper, for the first time, we propose a novel data independent approach to generate image agnostic perturbations for a range of CNNs trained for object recognition. We further show that these perturbations are transferable across multiple network architectures trained either on same or different data. In the absence of data, our method generates universal adversarial perturbations efficiently via fooling the features learned at multiple layers thereby causing CNNs to misclassify. Experiments demonstrate impressive fooling rates and surprising transferability for the proposed universal perturbations generated without any training data.

On the State of the Art of Evaluation in Neural Language Models

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

Spherical Paragraph Model

Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-Word-Embedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic relatedness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic interpretability and can fully leverage the rich semantics of words, the word co-occurrence information as well as the corpus-wide information to help the representation learning of texts. Experimental results on topical classification and sentiment analysis demonstrate that SPM can achieve new state-of-the-art performances on several benchmark datasets.

Transitioning between Convolutional and Fully Connected Layers in Neural Networks

Digital pathology has advanced substantially over the last decade however tumor localization continues to be a challenging problem due to highly complex patterns and textures in the underlying tissue bed. The use of convolutional neural networks (CNNs) to analyze such complex images has been well adopted in digital pathology. However in recent years, the architecture of CNNs have altered with the introduction of inception modules which have shown great promise for classification tasks. In this paper, we propose a modified ‘transition’ module which learns global average pooling layers from filters of varying sizes to encourage class-specific filters at multiple spatial resolutions. We demonstrate the performance of the transition module in AlexNet and ZFNet, for classifying breast tumors in two independent datasets of scanned histology sections, of which the transition module was superior.

ExGUtils: A python package for statistical analysis with the ex-gaussian probability density

The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are usually modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element in the field. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done.

Optimizing the Latent Space of Generative Networks

Generative Adversarial Networks (GANs) have been shown to be able to sample impressively realistic images. GAN training consists of a saddle point optimization problem that can be thought of as an adversarial game between a generator which produces the images, and a discriminator, which judges if the images are real. Both the generator and the discriminator are commonly parametrized as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of the optimization procedure and the network parametrization to the success of GANs. To this end we introduce and study Generative Latent Optimization (GLO), a framework to train a generator without the need to learn a discriminator, thus avoiding challenging adversarial optimization problems. We show experimentally that GLO enjoys many of the desirable properties of GANs: learning from large data, synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors.

PDD Graph: Bridging Electronic Medical Records and Biomedical Knowledge Graphs via Entity Linking
An optimal unrestricted learning procedure
The Eigenvalue Distribution of Discrete Periodic Time-Frequency Limiting Operators
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
Show and Recall: Learning What Makes Videos Memorable
Should a Normal Imputation Model Be Modified to Impute Skewed Variables?
Multi-Element VLC Networks: LED Assignment, Power Control, and Optimum Combining
Behaviour of l-bits near the many-body localization transition
Auto-Conditioned LSTM Network for Extended Complex Human Motion Synthesis
Genetic Algorithm for Epidemic Mitigation by Removing Relationships
Efficient semiparametric estimation in time-varying regression models
On weak $ε$-nets and the Radon number
Make Your Bone Great Again : A study on Osteoporosis Classification
Suboptimality of local algorithms for a class of max-cut problems
Cover and Conquer: Augmenting Decompositions for Connectivity Problems
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation
Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative Adversarial Networks
Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition
Linear Dependence Between Hereditary Quasirandomness Conditions
Slanted Stixels: Representing San Francisco’s Steepest Streets
Towards Fast-Convergence, Low-Delay and Low-Complexity Network Optimization
Anderson Localization in Low-Dimensional Optical Lattices
On Treewidth and Stable Marriage
Enumerating Cliques in Direct Product Graphs
Neural Matching Models for Question Retrieval and Next Question Prediction in Conversation
Hybrid PS-V Technique: A Novel Sensor Fusion Approach for Fast Mobile Eye-Tracking with Sensor-Shift Aware Correction
Photosensor Oculography: Survey and Parametric Analysis of Designs using Model-Based Simulation
Wide Inference Network for Image Denoising
List Supermodular Coloring with Shorter Lists
Optimal Universal Lossless Compression with Side Information
Don’t relax: early stopping for convex regularization
Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network
Visually Aligned Word Embeddings for Improving Zero-shot Learning
Distributed Bi-level Energy Allocation Mechanism with Grid Constraints and Hidden User Information
Accelerating uncertainty assessment of environmental model parameters by introducing a Kalman updater in DREAM(ZS)
Proper Distinguishing Colorings with Few Colors for Graphs with Girth at Least 5
Discriminative Transformation Learning for Fuzzy Sparse Subspace Clustering
Pruning Convolutional Neural Networks for Image Instance Retrieval
Coresets for Triangulation
Detecting Intentional Lexical Ambiguity in English Puns
DCTM: Discrete-Continuous Transformation Matching for Semantic Flow
Resurgence of oscillation in coupled oscillators under delayed cyclic interaction
Exact asymptotic formulae of the stationary distribution of a discrete-time two-dimensional QBD process
Synchronization of chaotic modulated time delay networks in presence of noise
PunFields at SemEval-2017 Task 7: Employing Roget’s Thesaurus in Automatic Pun Recognition and Interpretation
A Linguistic Model of Classifying and Clustering Community Pages in a Social Network Based on User Interests
Restoration of oscillation in network of oscillators in presence of direct and indirect interactions
Local analysis of cyclotomic difference sets
Green Base Station Placement for Microwave Backhaul Links
Vision-based Real Estate Price Estimation
Polynomial-time algorithm for Maximum Weight Independent Set on $P_6$-free graphs
ARREST: A RSSI Based Approach for Mobile Sensing and Tracking of a Moving Object
Differentially Private Identity and Closeness Testing of Discrete Distributions
A Machine Learning Approach for Evaluating Creative Artifacts
Story Generation from Sequence of Independent Short Descriptions
Pair Correlation and Gap Distributions for Substitution Tilings and Generalized Ulam Sets in the Plane
Chimera states in a multilayer network of coupled and uncoupled neurons
Magnetocapillary self-assemblies: locomotion and micromanipulation along a liquid interface
Nested Convex Bodies are Chaseable
Global optimization for low-dimensional switching linear regression and bounded-error estimation
Beyond Forward Shortcuts: Fully Convolutional Master-Slave Networks (MSNets) with Backward Skip Connections for Semantic Segmentation
Random Euclidean matching problems in one dimension
Distinguishing Tournaments with Small Label Classes
Batch based Monocular SLAM for Egocentric Videos
Some results on the probability that two elements of an amenable group commute
One-shot Face Recognition by Promoting Underrepresented Classes
Domain Adaptation for Resume Classification Using Convolutional Neural Networks
Impact and Recovery Process of Mini Flash Crashes: An Empirical Study
Graph learning under sparsity priors
On Optimizing Distributed Tucker Decomposition for Dense Tensors
Corrections to the self-consistent Born approximation for Weyl fermions
Cayley Splitting for Second-Order Langevin Stochastic Partial Differential Equations
Solving $\ell^p\!$-norm regularization with tensor kernels
Martingale solutions for the stochastic nonlinear Schrödinger equation in the energy space
VSE++: Improved Visual-Semantic Embeddings
The Compressed Overlap Index
Gibbard-Satterthwaite Games for k-Approval Voting Rules
Congruences for partition functions related to mock theta functions
Dispersion of Mobile Robots: A Study of Memory-Time Trade-offs
Fast Screening Algorithm for Rotation and Scale Invariant Template Matching
Factorization Machines Leveraging Lightweight Linked Open Data-enabled Features for Top-N Recommendations
Faster Than Real-time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses
Learning Powers of Poisson Binomial Distributions
Empirical evaluation of a Q-Learning Algorithm for Model-free Autonomous Soaring
Learning the MMSE Channel Estimator
Exploiting Convolutional Representations for Multiscale Human Settlement Detection
Hashed Binary Search Sampling for Convolutional Network Training with Large Overhead Image Patches
Learning Fashion Compatibility with Bidirectional LSTMs
An Iterative BP-CNN Architecture for Channel Decoding
Necessary and sufficient conditions for consistent root reconstruction in Markov models on trees
Efficient and consistent inference of ancestral sequences in an evolutionary model with insertions and deletions under dense taxon sampling
Enumeration of Self-Dual Cyclic Codes of some Specific Lengths over Finite Fields
Grounding Spatio-Semantic Referring Expressions for Human-Robot Interaction
Submodular Mini-Batch Training in Generative Moment Matching Networks
Circular Networks from Distorted Metrics
Robust Bayesian Optimization with Student-t Likelihood
Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments
Skeleton Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Modeling temporal treatment effects with zero inflated semi-parametric regression models: the case of local development policies in France
Augmented Lagrangian Functions for Cone Constrained Optimization: the Existence of Global Saddle Points and Exact Penalty Property
AirCode: Unobtrusive Physical Tags for Digital Fabrication