“The first rule of data science is: don’t ask how to define data science.” Josh Bloom ( 2014 )

# Magister Dixit

**27**
*Thursday*
Jul 2017

Posted Magister Dixit

in
Advertisements

**27**
*Thursday*
Jul 2017

Posted Magister Dixit

in“The first rule of data science is: don’t ask how to define data science.” Josh Bloom ( 2014 )

Advertisements

**27**
*Thursday*
Jul 2017

Posted Distilled News

in**Vega makes visualizing BIG data easy**

We’re delighted to announce the availability of Vega, the JSON specification for creating custom visualizations of large datasets. Using Vega you can create server-rendered visualizations in the community version and enterprise versions of MapD.

Today we’ve made it dramatically easier to view your Google Analytics data in Data Studio using the new Data control. When a report is created using the Data Control, all viewers can see their own data in the report, without creating anything.

**Machine Learning Exercises in Python: An Introductory Tutorial Series**

This post presents a summary of a series of tutorials covering the exercises from Andrew Ng’s machine learning class on Coursera. Instead of implementing the exercises in Octave, the author has opted to do so in Python, and provide commentary along the way.

**The truth about priors and overfitting**

Have you ever thought about how strong a prior is compared to observed data? It’s not an entirely easy thing to conceptualize. In order to alleviate this trouble I will take you through some simulation exercises. These are meant as a fruit for thought and not necessarily a recommendation. However, many of the considerations we will run through will be directly applicable to your everyday life of applying Bayesian methods to your specific domain. We will start out by creating some data generated from a known process. The process is the following. …

**Revolutionizing Data Science Package Management, July 25**

Learn how Anaconda solves one of the most headache-inducing problems in data science—overcoming the package dependency nightmare—through the power of conda, in this webinar, on July 25.

**Summary of Unintuitive Properties of Neural Networks**

Neural networks work really well on many problems, including language, image and speech recognition. However understanding how they work is not simple, and here is a summary of unusual and counter intuitive properties they have.

I know it’s a weird way to start a blog with a negative, but there was a wave of discussion in the last few days that I think serves as a good hook for some topics on which I’ve been thinking recently. It all started with a post in the Simply Stats blog by Jeff Leek on the caveats of using deep learning in the small sample size regime. In sum, he argues that when the sample size is small (which happens a lot in the bio domain), linear models with few parameters perform better than deep nets even with a modicum of layers and hidden units. He goes on to show that a very simple linear predictor, with top ten most informative features, performs better than a simple deep net when trying to classify zeros and ones in the MNIST dataset using only 80 or so samples. This prompted Andrew Beam to write a rebuttal in which a properly trained deep net was able to beat the simple linear model, even with very few training samples. This back-and-forth comes at a time where more and more researchers in biomedical informatics are adopting deep learning for various problems. Is the hype real or are linear models really all we need? The answer, as always, is that it depends. In this post, I want to visit use cases in machine learning where using deep learning does not really make sense as well as tackle preconceptions that I think prevent deep learning to be used effectively, especially for newcomers.

**A lesson in prescriptive modeling**

For the data professional, the first step to mastering prescriptive modeling is to understand simulation. In this excerpt from the O’Reilly video Hands-On Techniques for Business Model Simulation, I’ll walk you through a practical case study-simulating the cross-breeding of a new species of iris, and new business models for the resulting flowers. Using published open source code, viewers learn to generate a new species of iris, find interesting new characteristics, and search through business model simulations for profitable ways of bringing the new flowers to the market. It takes a lot of knowledge and skill to create useful simulations of the real world. That information is often hidden by obscure techniques or confusing explanations. In the full O’Reilly Learning Path, Creating Simulations to Discover New Business Models, I take viewers through a straightforward approach to learning prescriptive model simulation by treating it like a foreign language. We start by learning key terms and intuitive definitions. We assemble those terms into meaningful ideas, and we complete the Learning Path with the Iris example shown in the video excerpt in this post.

**Thinking with data with “Modern Data Science with R”**

One of the biggest challenges educators face is how to teach statistical thinking integrated with data and computing skills to allow our students to fluidly think with data. Contemporary data science requires a tight integration of knowledge from statistics, computer science, mathematics, and a domain of application. For example, how can one model high earnings as a function of other features that might be available for a customer? How do the results of a decision tree compare to a logistic regression model? How does one assess whether the underlying assumptions of a chosen model are appropriate? How are the results interpreted and communicated?

**27**
*Thursday*
Jul 2017

Posted Books

in
**26**
*Wednesday*
Jul 2017

Posted Documents

inThe performance of deep learning in natural language processing has been spectacular, but the reason for this success remains unclear because of the inherent complexity of deep learning. This paper provides empirical evidence of its effectiveness and of a limitation of neural networks for language engineering. Precisely, we demonstrate that a Long Short-Term Memory (LSTM)-based neural language model effectively reproduces Zipf’s law and Heaps’ law, two representative statistical properties underlying natural language. We discuss the quality of the reproducibility and the emergence of Zipf’s law and Heaps’ law as training progresses. We also point out that the neural language model has a limitation in reproducing long-range correlation, another statistical law of natural language. This understanding could provide a direction of improvement of architectures of neural networks. Do Neural Nets Learn Statistical Laws behind Natural Language?

**26**
*Wednesday*
Jul 2017

Posted arXiv Papers

in**A Deep Investigation of Deep IR Models**

The effective of information retrieval (IR) systems have become more important than ever. Deep IR models have gained increasing attention for its ability to automatically learning features from raw text; thus, many deep IR models have been proposed recently. However, the learning process of these deep IR models resemble a black box. Therefore, it is necessary to identify the difference between automatically learned features by deep IR models and hand-crafted features used in traditional learning to rank approaches. Furthermore, it is valuable to investigate the differences between these deep IR models. This paper aims to conduct a deep investigation on deep IR models. Specifically, we conduct an extensive empirical study on two different datasets, including Robust and LETOR4.0. We first compared the automatically learned features and hand-crafted features on the respects of query term coverage, document length, embeddings and robustness. It reveals a number of disadvantages compared with hand-crafted features. Therefore, we establish guidelines for improving existing deep IR models. Furthermore, we compare two different categories of deep IR models, i.e. representation-focused models and interaction-focused models. It is shown that two types of deep IR models focus on different categories of words, including topic-related words and query-related words.

We present a transition-based AMR parser that directly generates AMR parses from plain text. We use Stack-LSTMs to represent our parser state and make decisions greedily. In our experiments, we show that our parser achieves very competitive scores on English using only AMR training data. Adding additional information, such as POS tags and dependency trees, improves the results further.

**Bellman Gradient Iteration for Inverse Reinforcement Learning**

This paper develops an inverse reinforcement learning algorithm aimed at recovering a reward function from the observed actions of an agent. We introduce a strategy to flexibly handle different types of actions with two approximations of the Bellman Optimality Equation, and a Bellman Gradient Iteration method to compute the gradient of the Q-value with respect to the reward function. These methods allow us to build a differentiable relation between the Q-value and the reward function and learn an approximately optimal reward function with gradient methods. We test the proposed method in two simulated environments by evaluating the accuracy of different approximations and comparing the proposed method with existing solutions. The results show that even with a linear reward function, the proposed method has a comparable accuracy with the state-of-the-art method adopting a non-linear reward function, and the proposed method is more flexible because it is defined on observed actions instead of trajectories.

**Comparing Aggregators for Relational Probabilistic Models**

Relational probabilistic models have the challenge of aggregation, where one variable depends on a population of other variables. Consider the problem of predicting gender from movie ratings; this is challenging because the number of movies per user and users per movie can vary greatly. Surprisingly, aggregation is not well understood. In this paper, we show that existing relational models (implicitly or explicitly) either use simple numerical aggregators that lose great amounts of information, or correspond to naive Bayes, logistic regression, or noisy-OR that suffer from overconfidence. We propose new simple aggregators and simple modifications of existing models that empirically outperform the existing ones. The intuition we provide on different (existing or new) models and their shortcomings plus our empirical findings promise to form the foundation for future representations.

**Linear Discriminant Generative Adversarial Networks**

We develop a novel method for training of GANs for unsupervised and class conditional generation of images, called Linear Discriminant GAN (LD-GAN). The discriminator of an LD-GAN is trained to maximize the linear separability between distributions of hidden representations of generated and targeted samples, while the generator is updated based on the decision hyper-planes computed by performing LDA over the hidden representations. LD-GAN provides a concrete metric of separation capacity for the discriminator, and we experimentally show that it is possible to stabilize the training of LD-GAN simply by calibrating the update frequencies between generators and discriminators in the unsupervised case, without employment of normalization methods and constraints on weights. In the class conditional generation tasks, the proposed method shows improved training stability together with better generalization performance compared to WGAN that employs an auxiliary classifier.

**Towards Semantic Query Segmentation**

Query Segmentation is one of the critical components for understanding users’ search intent in Information Retrieval tasks. It involves grouping tokens in the search query into meaningful phrases which help downstream tasks like search relevance and query understanding. In this paper, we propose a novel approach to segment user queries using distributed query embeddings. Our key contribution is a supervised approach to the segmentation task using low-dimensional feature vectors for queries, getting rid of traditional hand tuned and heuristic NLP features which are quite expensive. We benchmark on a 50,000 human-annotated web search engine query corpus achieving comparable accuracy to state-of-the-art techniques. The advantage of our technique is its fast and does not use external knowledge-base like Wikipedia for score boosting. This helps us generalize our approach to other domains like eCommerce without any fine-tuning. We demonstrate the effectiveness of this method on another 50,000 human-annotated eCommerce query corpus from eBay search logs. Our approach is easy to implement and generalizes well across different search domains proving the power of low-dimensional embeddings in query segmentation task, opening up a new direction of research for this problem.

**Applications of Economic and Pricing Models for Wireless Network Security: A Survey**

This paper provides a comprehensive literature review on applications of economic and pricing theory to security issues in wireless networks. Unlike wireline networks, the broadcast nature and the highly dynamic change of network environments pose a number of nontrivial challenges to security design in wireless networks. While the security issues have not been completely solved by traditional or system-based solutions, economic and pricing models recently were employed as one efficient solution to discourage attackers and prevent attacks to be performed. In this paper, we review economic and pricing approaches proposed to address major security issues in wireless networks including eavesdropping attack, Denial-of-Service (DoS) attack such as jamming and Distributed DoS (DDoS), and illegitimate behaviors of malicious users. Additionally, we discuss integrating economic and pricing models with cryptography methods to reduce information privacy leakage as well as to guarantee the confidentiality and integrity of information in wireless networks. Finally, we highlight important challenges, open issues and future research directions of applying economic and pricing models to wireless security issues.

**Partial Transfer Learning with Selective Adversarial Networks**

Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring both classification and representation models from existing big domains to unknown small domains. This paper introduces partial transfer learning, which relaxes the shared label space assumption to that the target label space is only a subspace of the source label space. Previous methods typically match the whole source domain to the target domain, which are prone to negative transfer for the partial transfer problem. We present Selective Adversarial Network (SAN), which simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Experiments demonstrate that our models exceed state-of-the-art results for partial transfer learning tasks on several benchmark datasets.

**Mutual Alignment Transfer Learning**

Training robots for operation in the real world is a complex, time consuming and potentially expensive task. Despite significant success of reinforcement learning in games and simulations, research in real robot applications has not been able to match similar progress. While sample complexity can be reduced by training policies in simulation, these can perform sub-optimally on the real platform given imperfect calibration of model dynamics. We present an approach – supplemental to fine tuning on the real robot – to further benefit from parallel access to a simulator during training. The developed approach harnesses auxiliary rewards to guide the exploration for the real world agent based on the proficiency of the agent in simulation and vice versa. In this context, we demonstrate empirically that the reciprocal alignment for both agents provides further benefit as the agent in simulation can adjust to optimize its behaviour for states commonly visited by the real-world agent.

**Question Dependent Recurrent Entity Network for Question Answering**

Question Answering is a task which requires building models that are able to automatically reply to questions given by humans. In the recent years, growing interest has been shown in tasks that require reasoning abilities in order to answer questions. Thus, in this study, we introduced a model to accomplish different Question Answering tasks, which are: Reasoning Question Answering and Reading Comprehension. Specially, we analysed and improved a novel model called Recurrent Entity Network, which follows the Memory Network framework. We named our model Question Dependent Recurrent Entity Network since our main contribution is to include the question into the memorization process. Our model has been validated by using both synthetic and real datasets. For the best of our knowledge, we achieved a new state-of-the-art in the Reasoning Question Answering task i.e bAbI tasks, and promising results in the Reading Comprehension one. Finally, we also studied the behavior of our model, through a visualization, in comparison with the original one.

**Algorithms for Positive Semidefinite Factorization**

This paper considers the problem of positive semidefinite factorization (PSD factorization), a generalization of exact nonnegative matrix factorization. Given an -by- nonnegative matrix and an integer , the PSD factorization problem consists in finding, if possible, symmetric -by- positive semidefinite matrices and such that for , and . PSD factorization is NP-hard. In this work, we introduce several local optimization schemes to tackle this problem: a fast projected gradient method and two algorithms based on the coordinate descent framework. The main application of PSD factorization is the computation of semidefinite extensions, that is, the representations of polyhedrons as projections of spectrahedra, for which the matrix to be factorized is the slack matrix of the polyhedron. We compare the performance of our algorithms on this class of problems. In particular, we compute the PSD extensions of size for the regular -gons when , and . We also show how to generalize our algorithms to compute the square root rank (which is the size of the factors in a PSD factorization where all factor matrices and have rank one) and completely PSD factorizations (which is the special case where the input matrix is symmetric and equality is required for all ).

**Residual Conv-Deconv Grid Network for Semantic Segmentation**

This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic image segmentation, where the task consists in providing a semantic class to each pixel of an image, feature maps reduction is harmful because it leads to a resolution loss in the output prediction. To tackle this problem, our GridNet follows a grid pattern allowing multiple interconnected streams to work at different resolutions. We show that our network generalizes many well known networks such as conv-deconv, residual or U-Net networks. GridNet is trained from scratch and achieves competitive results on the Cityscapes dataset.

**Towards Evolutional Compression**

Compressing convolutional neural networks (CNNs) is essential for transferring the success of CNNs to a wide variety of applications to mobile devices. In contrast to directly recognizing subtle weights or filters as redundant in a given CNN, this paper presents an evolutionary method to automatically eliminate redundant convolution filters. We represent each compressed network as a binary individual of specific fitness. Then, the population is upgraded at each evolutionary iteration using genetic operations. As a result, an extremely compact CNN is generated using the fittest individual. In this approach, either large or small convolution filters can be redundant, and filters in the compressed network are more distinct. In addition, since the number of filters in each convolutional layer is reduced, the number of filter channels and the size of feature maps are also decreased, naturally improving both the compression and speed-up ratios. Experiments on benchmark deep CNN models suggest the superiority of the proposed algorithm over the state-of-the-art compression methods.

**Boosted Zero-Shot Learning with Semantic Correlation Regularization**

We study zero-shot learning (ZSL) as a transfer learning problem, and focus on the two key aspects of ZSL, model effectiveness and model adaptation. For effective modeling, we adopt the boosting strategy to learn a zero-shot classifier from weak models to a strong model. For adaptable knowledge transfer, we devise a Semantic Correlation Regularization (SCR) approach to regularize the boosted model to be consistent with the inter-class semantic correlations. With SCR embedded in the boosting objective, and with a self-controlled sample selection for learning robustness, we propose a unified framework, Boosted Zero-shot classification with Semantic Correlation Regularization (BZ-SCR). By balancing the SCR-regularized boosted model selection and the self-controlled sample selection, BZ-SCR is capable of capturing both discriminative and adaptable feature-to-class semantic alignments, while ensuring the reliability and adaptability of the learned samples. The experiments on two ZSL datasets show the superiority of BZ-SCR over the state-of-the-arts.

**A Simple Exponential Family Framework for Zero-Shot Learning**

We present a simple generative framework for learning to predict previously unseen classes, based on estimating class-attribute-gated class-conditional distributions. We model each class-conditional distribution as an exponential family distribution and the parameters of the distribution of each seen/unseen class are defined as functions of the respective observed class attributes. These functions can be learned using only the seen class data and can be used to predict the parameters of the class-conditional distribution of each unseen class. Unlike most existing methods for zero-shot learning that represent classes as fixed embeddings in some vector space, our generative model naturally represents each class as a probability distribution. It is simple to implement and also allows leveraging additional unlabeled data from unseen classes to improve the estimates of their class-conditional distributions using transductive/semi-supervised learning. Moreover, it extends seamlessly to few-shot learning by easily updating these distributions when provided with a small number of additional labelled examples from unseen classes. Through a comprehensive set of experiments on several benchmark data sets, we demonstrate the efficacy of our framework.

**Challenges in Data-to-Document Generation**

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

**Learning Word Relatedness over Time**

Search systems are often focused on providing relevant results for the ‘now’, assuming both corpora and user needs that focus on the present. However, many corpora today reflect significant longitudinal collections ranging from 20 years of the Web to hundreds of years of digitized newspapers and books. Understanding the temporal intent of the user and retrieving the most relevant historical content has become a significant challenge. Common search features, such as query expansion, leverage the relationship between terms but cannot function well across all times when relationships vary temporally. In this work, we introduce a temporal relationship model that is extracted from longitudinal data collections. The model supports the task of identifying, given two words, when they relate to each other. We present an algorithmic framework for this task and show its application for the task of query expansion, achieving high gain.

In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).

**From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings**

In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is represented as word vector using a pre-trained word embeddings model, a k-means algorithm is applied on the word vectors in order to obtain a fixed-size set of clusters. The centroid of each cluster is interpreted as a super word embedding that embodies all the semantically related word vectors in a certain region of the embedding space. Every embedded word in the collection of documents is then assigned to the nearest cluster centroid. In the end, each document is represented as a bag of super word embeddings by computing the frequency of each super word embedding in the respective document. We also diverge from the idea of building a single vocabulary for the entire collection of documents, and propose to build class-specific vocabularies for better performance. Using this kind of representation, we report results on two text mining tasks, namely text categorization by topic and polarity classification. On both tasks, our model yields better performance than the standard bag of words.

**Learning Bag-of-Features Pooling for Deep Convolutional Neural Networks**

Convolutional Neural Networks (CNNs) are well established models capable of achieving state-of-the-art classification accuracy for various computer vision tasks. However, they are becoming increasingly larger, using millions of parameters, while they are restricted to handling images of fixed size. In this paper, a quantization-based approach, inspired from the well-known Bag-of-Features model, is proposed to overcome these limitations. The proposed approach, called Convolutional BoF (CBoF), uses RBF neurons to quantize the information extracted from the convolutional layers and it is able to natively classify images of various sizes as well as to significantly reduce the number of parameters in the network. In contrast to other global pooling operators and CNN compression techniques the proposed method utilizes a trainable pooling layer that it is end-to-end differentiable, allowing the network to be trained using regular back-propagation and to achieve greater distribution shift invariance than competitive methods. The ability of the proposed method to reduce the parameters of the network and increase the classification accuracy over other state-of-the-art techniques is demonstrated using three image datasets.

• Monitoring Partially Synchronous Distributed Systems using SMT Solvers

• Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

• Per-instance Differential Privacy and the Adaptivity of Posterior Sampling in Linear and Ridge regression

• Stochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls

• Global Normalization of Convolutional Neural Networks for Joint Entity and Relation Classification

• Amortized entanglement of a quantum channel and approximately teleportation-simulable channels

• Maximum local quantum covariances as quantifiers of two-sided quantum correlations beyond entanglement

• Greedy Shortest Common Superstring Approximation in Compact Space

• Liver lesion segmentation informed by joint liver segmentation

• Detection of curved lines with B-COSFIRE filters: A case study on crack delineation

• The nonhomogeneous frog model on $\mathbb{Z}$

• Harmonic Dirichlet Functions on Planar Graphs

• Stochastic Coalitional Games for Cooperative Random Access in M2M Communications

• On computing distributions of products of non-negative random variables

• Domain Recursion for Lifted Inference with Existential Quantifiers

• Second-oder analysis in second-oder cone programming

• Morphometric analysis of polygonal cracking patterns in desiccated starch slurries

• Exact Identification of a Quantum Change Point

• Desensitized RDCA Subspaces for Compressive Privacy in Machine Learning

• Covariations in ecological scaling laws fostered by community dynamics

• Heavy-tailed queues in the Halfin-Whitt regime

• Renewal sequences and record chains related to multiple zeta sums

• Capped Lp approximations for the composite L0 regularization problem

• Space Efficient Breadth-First and Level Traversals of Consistent Global States of Parallel Programs

• The application of representation theory in directed strongly regular graph

• Deep Feature Learning via Structured Graph Laplacian Embedding for Person Re-Identification

• Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams

• Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks

• $(an+b)$-color compositions

• Computing low-rank approximations of large-scale matrices with the Tensor Network randomized SVD

• Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering

• Macro Grammars and Holistic Triggering for Efficient Semantic Parsing

• Asymptotics of Pattern Avoidance in the Klazar Set Partition and Permutation-Tuple Settings

• Graph-Theoretic Spatiotemporal Context Modeling for Video Saliency Detection

• Small-Scale, Local Area, and Transitional Millimeter Wave Propagation for 5G Communications

• Detecting Semantic Parts on Partially Occluded Objects

• Concept Drift Detection and Adaptation with Hierarchical Hypothesis Testing

• Uniqueness for Measure-Valued Equations of Nonlinear Filtering for Stochastic Dynamical Systems with Lévy Noises

• Convergence of Nonlinear Filtering for Stochastic Dynamical Systems with Lévy Noises

• Multiple-Kernel Local-Patch Descriptor

• Stationary Solutions of Neutral Stochastic Partial Differential Equations with Delays in the Highest-Order Derivatives

• On the path-independence of the Girsanov transformation for stochastic evolution equations with jumps in Hilbert spaces

• Improving Robustness of Feature Representations to Image Deformations using Powered Convolution in CNNs

• ssEMnet: Serial-section Electron Microscopy Image Registration using a Spatial Transformer Network with Learned Features

• Coupling and a generalised Policy Iteration Algorithm in continuous time

• HyperQA: Hyperbolic Embeddings for Fast and Efficient Ranking of Question Answer Pairs

• Performance evaluation of energy detector over generalized non-linear and shadowed composite fading channels using a Mixture Gamma Distribution

• Motion-Appearance Interactive Encoding for Object Segmentation in Unconstrained Videos

• Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns

• Quenched mass transport of particles towards a target

• Approximation -diffusion in stochastically forced kinetic equations

• Martingale driven BSDEs, PDEs and other related deterministic problems

• Prices of anarchy of selfish 2D bin packing games

• Wind models and cross-site interpolation for the refugee reception islands in Greece

• Spatiotemporal Modeling for Crowd Counting in Videos

• Best exponential decay rate of energy for the vectorial damped wave equation

• Many-Objective Pareto Local Search

• Verification of operational solar flare forecast: Case of Regional Warning Center Japan

• The Quantum Theil Index: Characterizing Graph Centralization using von Neumann Entropy

• Combinatorial properties of triplet covers for binary trees

• Machine Translation at Booking.com: Journey and Lessons Learned

• Spanning universality in random graphs

• Malliavin and dirichlet structures for independent random variables

• Mean Field Equilibria for Resource Competition in Spatial Settings

• Enhancing Convolutional Neural Networks for Face Recognition with Occlusion Maps and Batch Triplet Loss

• Structural Regularities in Text-based Entity Vector Spaces

• Functional connectivity patterns of autism spectrum disorder identified by deep feature learning

• Error Bounds for Piecewise Smooth and Switching Regression

• The Tu–Deng Conjecture holds almost surely

• Reducing the Need for New Lines in Germany’s Energy Transition: The Hybrid Transmission Grid Architecture

• A Kullback-Leibler divergence measure of intermittency: application to turbulence

• An alternative to the coupling of Berkes-Liu-Wu for strong approximations

• Ecological feedback in quorum-sensing microbial populations can induce heterogeneous production of autoinducers

• Using deterministic approximations to accelerate SMC for posterior sampling

• Scaled Nuclear Norm Minimization for Low-Rank Tensor Completion

• Deep Learning Based MIMO Communications

• Maximal closed subroot systems of affine root systems

• A law of large numbers for branching Markov processes by the ergodicity of ancestral lineages

• Hybrid control for low-regular nonlinear systems: application to an embedded control for an electric vehicle

• Bottom-Up and Top-Down Attention for Image Captioning and VQA

• Evidence combination for a large number of sources

• Un modèle pour la représentation des connaissances temporelles dans les documents historiques

• Dynamic Policies for Cooperative Networked Systems

• Three-way symbolic tree-maps and ultrametrics

• Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

• Dynamically induced cascading failures in supply networks

• Long paths and toughness of k-trees and chordal planar graphs

• A Second Order Method for Nonconvex Optimization

• Price and Profit Awareness in Recommender Systems

• Theoretical Properties of Quasistationary Monte Carlo Methods

• Automatic Liver Segmentation Using an Adversarial Image-to-Image Network

• Combination of direct methods and homotopy in numerical optimal control: application to the optimization of chemotherapy in cancer

• Scheduling to Minimize Total Weighted Completion Time via Time-Indexed Linear Programming Relaxations

• Synthesising Sign Language from semantics, approaching ‘from the target and back’

• Conway’s 99-Graph Problem

• Secure Video Streaming in Heterogeneous Small Cell Networks with Untrusted Cache Helpers

• Approximating predictive probabilities of Gibbs-type priors

• On the Exponential Rate of Convergence of Fictitious Play in Potential Games

• Relative Depth Order Estimation Using Multi-scale Densely Connected Convolutional Networks

• On Multivariate Records from Random Vectors with Independent Components

• Weak vorticity formulation of 2D Euler equations with white noise initial condition

• Monochromatic infinite sumsets

• Global Finite-Time Attitude Consensus of Leader-Following Spacecraft Systems Based on Distributed Observers

• A comparison of single-trial EEG classification and EEG-informed fMRI across three MR compatible EEG recording systems

• Resource-Efficient Common Randomness and Secret-Key Schemes

• Delay Performance of MISO Wireless Communications

• Some Computational Aspects to Find Accurate Estimates for the Parameters of the Generalized Gamma distribution

• Compressed Sparse Linear Regression

• Interval Orders with Two Interval Lengths

• Line-Circle: A Geometric Filter for Single Camera Edge-Based Object Detection

• Tolerance Orders of Open and Closed Intervals

• Learning to Singulate Objects using a Push Proposal Network

• Aspects of Chaitin’s Omega

**26**
*Wednesday*
Jul 2017

Posted arXiv Papers

in**Confidence estimation in Deep Neural networks via density modelling**

State-of-the-art Deep Neural Networks can be easily fooled into providing incorrect high-confidence predictions for images with small amounts of adversarial noise. Does this expose a flaw with deep neural networks, or do we simply need a better way to estimate confidence? In this paper we consider the problem of accurately estimating predictive confidence. We formulate this problem as that of density modelling, and show how traditional methods such as softmax produce poor estimates. To address this issue, we propose a novel confidence measure based on density modelling approaches. We test these measures on images distorted by blur, JPEG compression, random noise and adversarial noise. Experiments show that our confidence measure consistently shows reduced confidence scores in the presence of such distortions – a property which softmax often lacks.

This paper focuses on regularizing the training of the convolutional neural network (CNN). We propose a new regularization approach named “PatchShuffle“ that can be adopted in any classification-oriented CNN models. It is easy to implement: in each mini-batch, images or feature maps are randomly chosen to undergo a transformation such that pixels within each local patch are shuffled. Through generating images and feature maps with interior orderless patches, PatchShuffle creates rich local variations, reduces the risk of network overfitting, and can be viewed as a beneficial supplement to various kinds of training regularization techniques, such as weight decay, model ensemble and dropout. Experiments on four representative classification datasets show that PatchShuffle improves the generalization ability of CNN especially when the data is scarce. Moreover, we empirically illustrate that CNN models trained with PatchShuffle are more robust to noise and local changes in an image.

**Native Language Identification on Text and Speech**

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken responses in form of audio transcriptions and iVectors by non-native English speakers of eleven native languages. Our system competed in the challenge under the team name ZCD and was based on an ensemble of SVM classifiers trained on character n-grams achieving 83.58% accuracy and ranking 3rd in the shared task.

**Introduction to Cluster Algebras. Chapters 4-5**

This is a preliminary draft of Chapters 4-5 of our forthcoming textbook ‘Introduction to Cluster Algebras.’ Chapters 1-3 have been posted as arXiv:1608:05735. This installment contains: Chapter 4. New patterns from old Chapter 5. Finite type classification

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches.

**Language modeling with Neural trans-dimensional random fields**

Trans-dimensional random field language models (TRF LMs) have recently been introduced, where sentences are modeled as a collection of random fields. The TRF approach has been shown to have the advantages of being computationally more efficient in inference than LSTM LMs with close performance and being able to flexibly integrating rich features. In this paper we propose neural TRFs, beyond of the previous discrete TRFs that only use linear potentials with discrete features. The idea is to use nonlinear potentials with continuous features, implemented by neural networks (NNs), in the TRF framework. Neural TRFs combine the advantages of both NNs and TRFs. The benefits of word embedding, nonlinear feature learning and larger context modeling are inherited from the use of NNs. At the same time, the strength of efficient inference by avoiding expensive softmax is preserved. A number of technical contributions, including employing deep convolutional neural networks (CNNs) to define the potentials and incorporating the joint stochastic approximation (JSA) strategy in the training algorithm, are developed in this work, which enable us to successfully train neural TRF LMs. Various LMs are evaluated in terms of speech recognition WERs by rescoring the 1000-best lists of WSJ’92 test data. The results show that neural TRF LMs not only improve over discrete TRF LMs, but also perform slightly better than LSTM LMs with only one fifth of parameters and 16x faster inference efficiency.

**Tensor Fusion Network for Multimodal Sentiment Analysis**

Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of sentiment analysis to a multimodal setup where other relevant modalities accompany language. In this paper, we pose the problem of multimodal sentiment analysis as modeling intra-modality and inter-modality dynamics. We introduce a novel model, termed Tensor Fusion Network, which learns both such dynamics end-to-end. The proposed approach is tailored for the volatile nature of spoken language in online videos as well as accompanying gestures and voice. In the experiments, our model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.

**MatchZoo: A Toolkit for Deep Text Matching**

In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specifically, the toolkit provides a unified data preparation module for different text matching problems, a flexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interaction-focused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.

**Sinkhorn Algorithm for Lifted Assignment Problems**

Recently, Sinkhorn’s algorithm was applied for solving regularized linear programs emerging from optimal transport very efficiently. Sinkhorn’s algorithm is an efficient method of projecting a positive matrix onto the polytope of doubly-stochastic matrices. It is based on alternating closed-form Bregman projections on the larger polytopes of row-stochastic and column-stochastic matrices. In this paper we generalize the Sinkhorn projection algorithm to higher dimensional polytopes originated from well-known lifted linear program relaxations of the Markov Random Field (MRF) energy minimization problem and the Quadratic Assignment Problem (QAP). We derive a closed-form projection on one-sided local polytopes which can be seen as a high-dimensional, generalized version of the row/column-stochastic polytopes. We then use these projections to devise a provably convergent algorithm to solve regularized linear program relaxations of MRF and QAP. Furthermore, as the regularization is decreased both the solution and the optimal energy value converge to that of the respective linear program. The resulting algorithm is considerably more scalable than standard linear solvers and is able to solve significantly larger linear programs.

**Learning uncertainty in regression tasks by deep neural networks**

We suggest a general approach to quantification of different types of uncertainty in regression tasks performed by deep neural networks. It is based on the simultaneous training of two neural networks with a joint loss function. One of the networks performs regression and the other quantifies the uncertainty of predictions of the first one. Unlike in many standard uncertainty quantification methods, the targets are not assumed to be sampled from an a priori given probability distribution. We analyze how the hyperparameters affect the learning process and, additionally, show that our method even allows for better predictions compared to standard neural networks without uncertainty counterparts. Finally, we show that a particular case of our approach is the mean-variance estimation given by a Gaussian network.

**Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models**

Supervisory signals have the potential to make low-dimensional data representations, like those learned by mixture and topic models, more interpretable and useful. We propose a framework for training latent variable models that explicitly balances two goals: recovery of faithful generative explanations of high-dimensional data, and accurate prediction of associated semantic labels. Existing approaches fail to achieve these goals due to an incomplete treatment of a fundamental asymmetry: the intended application is always predicting labels from data, not data from labels. Our prediction-constrained objective for training generative models coherently integrates loss-based supervisory signals while enabling effective semi-supervised learning from partially labeled data. We derive learning algorithms for semi-supervised mixture and topic models using stochastic gradient descent with automatic differentiation. We demonstrate improved prediction quality compared to several previous supervised topic models, achieving predictions competitive with high-dimensional logistic regression on text sentiment analysis and electronic health records tasks while simultaneously learning interpretable topics.

**Contrastive-center loss for deep neural networks**

The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss.

**Synthesizing Robust Adversarial Examples**

Neural networks are susceptible to adversarial examples: small, carefully-crafted perturbations can cause networks to misclassify inputs in arbitrarily chosen ways. However, some studies have showed that adversarial examples crafted following the usual methods are not tolerant to small transformations: for example, zooming in on an adversarial image can cause it to be classified correctly again. This raises the question of whether adversarial examples are a concern in practice, because many real-world systems capture images from multiple scales and perspectives. This paper shows that adversarial examples can be made robust to distributions of transformations. Our approach produces single images that are simultaneously adversarial under all transformations in a chosen distribution, showing that we cannot rely on transformations such as rescaling, translation, and rotation to protect against adversarial examples.

**Big Data Regression Using Tree Based Segmentation**

Scaling regression to large datasets is a common problem in many application areas. We propose a two step approach to scaling regression to large datasets. Using a regression tree (CART) to segment the large dataset constitutes the first step of this approach. The second step of this approach is to develop a suitable regression model for each segment. Since segment sizes are not very large, we have the ability to apply sophisticated regression techniques if required. A nice feature of this two step approach is that it can yield models that have good explanatory power as well as good predictive performance. Ensemble methods like Gradient Boosted Trees can offer excellent predictive performance but may not provide interpretable models. In the experiments reported in this study, we found that the predictive performance of the proposed approach matched the predictive performance of Gradient Boosted Trees.

**Deep Learning based Recommender System: A Survey and New Perspectives**

With the ever-growing volume, complexity and dynamicity of online information, recommender system is an effective key solution to overcome such information overload. In recent years, deep learning’s revolutionary advances in speech recognition, image analysis and natural language processing have drawn significant attention. Meanwhile, recent studies also demonstrate its effectiveness in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user’s demands, item’s characteristics and historical interactions between them. This article provides a comprehensive review of recent research efforts on deep learning based recommender systems towards fostering innovations of recommender system research. A taxonomy of deep learning based recommendation models is presented and used to categorise surveyed articles. Open problems are identified based on the insightful analytics of the reviewed works and potential solutions discussed.

**Character-level Intra Attention Network for Natural Language Inference**

Natural language inference (NLI) is a central problem in language understanding. End-to-end artificial neural networks have reached state-of-the-art performance in NLI field recently. In this paper, we propose Character-level Intra Attention Network (CIAN) for the NLI task. In our model, we use the character-level convolutional network to replace the standard word embedding layer, and we use the intra attention to capture the intra-sentence semantics. The proposed CIAN model provides improved results based on a newly published MNLI corpus.

**Likelihood Estimation for Generative Adversarial Networks**

We present a simple method for assessing the quality of generated images in Generative Adversarial Networks (GANs). The method can be applied in any kind of GAN without interfering with the learning procedure or affecting the learning objective. The central idea is to define a likelihood function that correlates with the quality of the generated images. In particular, we derive a Gaussian likelihood function from the distribution of the embeddings (hidden activations) of the real images in the discriminator, and based on this, define two simple measures of how likely it is that the embeddings of generated images are from the distribution of the embeddings of the real images. This yields a simple measure of fitness for generated images, for all varieties of GANs. Empirical results on CIFAR-10 demonstrate a strong correlation between the proposed measures and the perceived quality of the generated images.

**Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach**

Feature selection is playing an increasingly significant role with respect to many computer vision applications spanning from object recognition to visual object tracking. However, most of the recent solutions in feature selection are not robust across different and heterogeneous set of data. In this paper, we address this issue proposing a robust probabilistic latent graph-based feature selection algorithm that performs the ranking step while considering all the possible subsets of features, as paths on a graph, bypassing the combinatorial problem analytically. An appealing characteristic of the approach is that it aims to discover an abstraction behind low-level sensory data, that is, relevancy. Relevancy is modelled as a latent variable in a PLSA-inspired generative process that allows the investigation of the importance of a feature when injected into an arbitrary set of cues. The proposed method has been tested on ten diverse benchmarks, and compared against eleven state of the art feature selection methods. Results show that the proposed approach attains the highest performance levels across many different scenarios and difficulties, thereby confirming its strong robustness while setting a new state of the art in feature selection domain.

**Interpreting Classifiers through Attribute Interactions in Datasets**

In this work we present the novel ASTRID method for investigating which attribute interactions classifiers exploit when making predictions. Attribute interactions in classification tasks mean that two or more attributes together provide stronger evidence for a particular class label. Knowledge of such interactions makes models more interpretable by revealing associations between attributes. This has applications, e.g., in pharmacovigilance to identify interactions between drugs or in bioinformatics to investigate associations between single nucleotide polymorphisms. We also show how the found attribute partitioning is related to a factorisation of the data generating distribution and empirically demonstrate the utility of the proposed method.

**Improve Lexicon-based Word Embeddings By Word Sense Disambiguation**

There have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same likelihood, or minimize the distance between all of the synonym pairs in the lexicon. Such methods do not consider the relatedness and difference of the corpus and the lexicon, and may not be the best optimized. In this paper, we propose a novel method that considers the relatedness and difference of the corpus and the lexicon. It trains word embeddings by learning the corpus to predicate a word and its corresponding synonym under the context at the same time. For polysemous words, we use a word sense disambiguation filter to eliminate the synonyms that have different meanings for the context. To evaluate the proposed method, we compare the performance of the word embeddings trained by our proposed model, the control groups without the filter or the lexicon, and the prior works in the word similarity tasks and text classification task. The experimental results show that the proposed model provides better embeddings for polysemous words and improves the performance for text classification.

**Deep Architectures for Neural Machine Translation**

It has been shown that increasing model depth improves the quality of neural machine translation. However, different architectural variants to increase model depth have been proposed, and so far, there has been no thorough comparative study. In this work, we describe and evaluate several existing approaches to introduce depth in neural machine translation. Additionally, we explore novel architectural variants, including deep transition RNNs, and we vary how attention is used in the deep decoder. We introduce a novel ‘BiDeep’ RNN architecture that combines deep transition RNNs and stacked RNNs. Our evaluation is carried out on the English to German WMT news translation dataset, using a single-GPU machine for both training and inference. We find that several of our proposed architectures improve upon existing approaches in terms of speed and translation quality. We obtain best improvements with a BiDeep RNN of combined depth 8, obtaining an average improvement of 1.5 BLEU over a strong shallow baseline. We release our code for ease of adoption.

**Copy the dynamics using a learning machine**

Is it possible to generally construct a dynamical system to simulate a black system without recovering the equations of motion of the latter? Here we show that this goal can be approached by a learning machine. Trained by a set of input-output responses or a segment of time series of a black system, a learning machine can be served as a copy system to mimic the dynamics of various black systems. It can not only behave as the black system at the parameter set that the training data are made, but also recur the evolution history of the black system. As a result, the learning machine provides an effective way for prediction, and enables one to probe the global dynamics of a black system. These findings have significance for practical systems whose equations of motion cannot be approached accurately. Examples of copying the dynamics of an artificial neural network, the Lorenz system, and a variable star are given. Our idea paves a possible way towards copy a living brain.

**Engineering multilevel support vector machines**

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters. Thus, the parameter fitting is required to increase the quality but it reduces the performance dramatically. We introduce a generalized fast multilevel framework for SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art SVM libraries.

• MBL-mobile: Many-body-localized engine

• Learning Transferable Architectures for Scalable Image Recognition

• Assessment of Optimal Flexibility in Ensemble of Frequency Responsive Loads

• Coverage in Downlink Heterogeneous mmWave Cellular Networks with User-Centric Small Cell Deployment

• Non-Linear Far Field RF Harvesting in Wireless Communications

• End-to-end Neural Coreference Resolution

• Optimal Secure Multi-Layer IoT Network Design

• Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

• A Dynamic Game Analysis and Design of Infrastructure Network Protection and Recovery

• Bilinear Assignment Problem: Large Neighborhoods and Experimental Analysis of Algorithms

• A Pilot Study of Domain Adaptation Effect for Neural Abstractive Summarization

• Ultraslow diffusion in language: Dynamics of appearance of already popular adjectives on Japanese blogs

• Multipath Multiplexing for Capacity Enhancement in SIMO Wireless Systems

• Adaptive Channel Prediction, Beamforming and Scheduling Design for 5G V2I Network

• Complete convergence and records for dynamically generated stochastic processes

• What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

• Automatic Curation of Golf Highlights using Multimodal Excitement Features

• The Interplay of Competition and Cooperation Among Service Providers

• Identifying civilians killed by police with distantly supervised entity-event extraction

• Joint Dynamic MRI Reconstruction and Aggregated Motion Estimation with Optical Flow Constraint

• Free arrangements with low exponents

• Hybrid Voltage Control in Distribution Networks Under Limited Communication Rates

• On the Performance of NOMA-Based Cooperative Relaying Systems over Rician Fading Channels

• OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts

• Fast and Robust Determination of Power System Emergency Control Actions

• Stability of the quantum Sherrington-Kirkpatrick spin glass model

• Switching and Data Injection Attacks on Stochastic Cyber-Physical Systems: Modeling, Resilient Estimation and Attack Mitigation

• Adversarial Variational Optimization of Non-Differentiable Simulators

• Deep Networks for Compressed Image Sensing

• The domination number and the least Q-eigenvalue II

• A signature-based machine learning model for bipolar disorder and borderline personality disorder

• Single Image Super-Resolution with Dilated Convolution based Multi-Scale Information Learning Inception Module

• Predicting the Gender of Indonesian Names

• How ants move: collective versus individual scaling properties

• AutOMP: An Automatic OpenMP Parallelization Generator for Variable-Oriented High-Performance Scientific Codes

• Clinical Patient Tracking in the Presence of Transient and Permanent Occlusions via Geodesic Feature

• Structural Properties of Uncoded Placement Optimization for Coded Delivery

• pre: An R Package for Fitting Prediction Rule Ensembles

• Multi-Oriented Text Detection and Verification in Video Frames and Scene Images

• Optimal Transmit Beamforming for Secure SWIPT in Heterogeneous Networks

• Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting

• On the restricted almost unbiased Liu estimator in the Logistic regression model

• Warped Riemannian metrics for location-scale models

• Coarse-to-Fine Lifted MAP Inference in Computer Vision

• Attention-Based End-to-End Speech Recognition in Mandarin

• Comparing Apples and Oranges: Off-Road Pedestrian Detection on the NREC Agricultural Person-Detection Dataset

• The edit distance function of some graphs

• Large deviation theorem for random covariance matrices

• Asymptotic Performance Evaluation of Battery Swapping and Charging Station for Electric Vehicles

• Solving Irregular Strip Packing Problems With Free Rotations Using Separation Lines

• Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices

• A survey of exemplar-based texture synthesis

• Hipsters on Networks: How a Small Group of Individuals Can Lead to an Anti-Establishment Majority

• An Event-based Fast Movement Detection Algorithm for a Positioning Robot Using POWERLINK Communication

• MoodSwipe: A Soft Keyboard that Suggests Messages Based on User-Specified Emotions

• Investigating Einstein-Podolsky-Rosen steering of continuous variable bipartite states by non-Gaussian pseudospin measurements

• Equidistributions of MAJ and STAT over pattern avoiding permutations

• PRIMES STEP Plays Games

• Optimal control of continuous-time Markov chains with noise-free observation

• Eyemotion: Classifying facial expressions in VR using eye-tracking cameras

• Inspiring Computer Vision System Solutions

• Nonintersecting Brownian bridges on the unit circle with drift

• ‘i have a feeling trump will win………………’: Forecasting Winners and Losers from User Predictions on Twitter

• Spatio-temporal human action localisation and instance segmentation in temporally untrimmed videos

• Multistage Adaptive Testing of Sparse Signals

• Embedding graphs having Ore-degree at most five

• Packing Topological Minors Half-Integrally

• SAR Image Colorization: Converting Single-Polarization to Fully Polarimetric Using Deep Neural Networks

• A Covert Queueing Channel in FCFS Schedulers

• Iterated function systems with place dependent probabilities and application to the Diaconis-Friedman’s chain on [0,1]

• Team Applied Robotics: A closer look at our robotic picking system

• Towards Good Practices for Deep 3D Hand Pose Estimation

• Existence of absolutely continuous solutions for continuity equations in Hilbert spaces

• Detecting and Grouping Identical Objects for Region Proposal and Classification

• Deeply-Learned Part-Aligned Representations for Person Re-Identification

• Hierarchical Plug-and-Play Voltage/Current Controller of DC microgrid with Grid-Forming/Feeding modules: Line-independent Primary Stabilization and Leader-based Distributed Secondary Regulation

• A global stability estimate for the photo-acoustic inverse problem in layered media

• A GPU Based Memory Optimized Parallel Method For FFT Implementation

• Composing Distributed Representations of Relational Patterns

• Asymptotic Normality of the Median Heuristic

• Hierarchical Embeddings for Hypernymy Detection and Directionality

• Likelihood test in permutations with bias. Premier League and La Liga: surprises during the last 25 seasons

• Optimal Universal Controllers for Rudder Roll Stabilization

• Fine Grained Citation Span for References in Wikipedia

• Using Argument-based Features to Predict and Analyse Review Helpfulness

• Technical report Existence of Kirkman signal sets on $v=1,3\pmod{6}$ points, $14\leq v \leq 3000$

• Optimal Trade Execution Under Endogenous Pressure to Liquidate: Theory and Numerical Solutions

• Minimum size of n-factor-critical graphs and k-extendable graphs

• M-alternating Hamilton paths and M-alternating Hamilton cycles

• A Re-weighted Joint Spatial-Radon Domain CT Image Reconstruction Model for Metal Artifact Reduction

• Preference Reasoning in Matching Procedures: Application to the Admission Post-Baccalaureat Platform

• Joint DOA Estimation and Array Calibration Using Multiple Parametric Dictionary Learning

• Deep Optical Flow Estimation Via Multi-Scale Correspondence Structure Learning

• Chern-Schwartz-MacPherson cycles of matroids

• On Certain Degenerate Whittaker Models for Cuspidal Representations of $\mathrm{GL}_{k\cdot n}\left(\mathbb{F}_q\right)$

• Robust Tracking and Behavioral Modeling of Movements of Biological Collectives from Ordinary Video Recordings

• A new take on measuring nutritional density: The feasibility of using a deep neural network to assess commercially-prepared puree concentrations

• Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

• The symmetric representation of lines in $\text{PG}(\mathbb{F}^3 \otimes \mathbb{F}^3)$

• A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities

• On the behavior of Lagrange multipliers in convex and non-convex infeasible interior point methods

• Adversarial Examples for Evaluating Reading Comprehension Systems

• Optimal estimation of a signal perturbed by a fractional Brownian noise

• Rule-Based Spanish Morphological Analyzer Built From Spell Checking Lexicon

• A Review of Statistical Methods in Imaging Genetics

• Testable Bounded Degree Graph Properties Are Random Order Streamable

• Person Re-identification Using Visual Attention

• An Online Learning Approach to Buying and Selling Demand Response

• A Sequential Model for Classifying Temporal Relations between Intra-Sentence Events

• Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events

• Mathematical aspect of the combinatorial game ‘Mahjong’

• Stability and instability in saddle point dynamics – Part I

• Limit fluctuations for density of asymmetric simple exclusion processes with open boundaries

• Stability and instability in saddle point dynamics Part II: The subgradient method

• Compact Model Representation for 3D Reconstruction

• Lyapunov Stability Analysis for Invariant States of Quantum Systems

• A Discrete Choice Framework for Modeling and Forecasting The Adoption and Diffusion of New Transportation Services

• Group-wise Deep Co-saliency Detection

• A locking-free optimal control problem with $L^1$ cost for optimal placement of control devices in Timoshenko beam

• Semantic 3D Occupancy Mapping through Efficient High Order CRFs

• Wavelet Convolutional Neural Networks for Texture Classification

• Learning for Multi-robot Cooperation in Partially Observable Stochastic Environments with Macro-actions

• Spatial Diversity in Molecular Communications

• Grant-Free Massive NOMA: Outage Probability and Throughput

• Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback

• Self-concordant inclusions: A unified framework for path-following generalized Newton-type algorithms

• Toward Geometric Deep SLAM

• Traffic scene recognition based on deep cnn and vlad spatial pyramids

• Wireless Powered Cooperative Jamming for Secure OFDM System

• Exploring Neural Transducers for End-to-End Speech Recognition

• Eigenvariety of Nonnegative Symmetric Weakly Irreducible Tensors Associated with Spectral Radius

• Generative OpenMax for Multi-Class Open Set Classification

• Dynamical localization and the effects of aperiodicity in Floquet systems

• Health Analytics: a systematic review of approaches to detect phenotype cohorts using electronic health records

• Tail-Tolerant Distributed Search

• Cooperative Prediction-and-Sensing Based Spectrum Sharing in Cognitive Radio Networks

• LV-ROVER: Lexicon Verified Recognizer Output Voting Error Reduction

• From Directed Polymers in Spatial-correlated Environment to Stochastic Heat Equations Driven by Fractional Noise in 1 + 1 Dimensions

• Delineation of line patterns in images using B-COSFIRE filters

• Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

• Exploiting Interference for Secrecy Wireless Information and Power Transfer

• Next Generation Cloud Computing: New Trends and Research Directions

• Building Graph Representations of Deep Vector Embeddings

• Pitman sampling formula and an empirical study of choice behavior

• An energy method for rough partial differential equations

• Invariance of Ideal Limit Points

• About Extensions of the Extremal Principle

• Dipolar self-bound droplets in weak disorder potential

• A Wait-free Multi-word Atomic (1,N) Register for Large-scale Data Sharing on Multi-core Machines

• How to Suppress Dark States in Quantum Networks and Bio-Engineered Structures

• Non-affine lattice dynamics of defective fcc crystals

• Modeling Label Ambiguity for Neural List-Wise Learning to Rank

• Analysing Errors of Open Information Extraction Systems

• The Mahler conjecture in two dimensions via the probabilistic method

• Efficiency of the principal component Liu-type estimator in logistic regression model

• Control Strategies for the Fokker-Planck Equation

• Towards Accurate Markerless Human Shape and Pose Estimation over Time

• Learning Rare Word Representations using Semantic Bridging

• Decision Theory with a Hilbert Space as Possibility Space

• Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

• Automatic breast cancer grading in lymph nodes using a deep neural network

• CAp 2017 challenge: Twitter Named Entity Recognition

• Asymptotic properties of the density of particles in $β$-ensembles

• A note on the van der Waerden complex

• Non-kissing complexes and tau-tilting for gentle algebras

• Measurement of spectral functions of ultracold atoms in disordered potentials

• No-Gap Second-Order Conditions via a Directional Curvature Functional

• On minimal triangle-free 6-chromatic graphs

• Joint Background Reconstruction and Foreground Segmentation via A Two-stage Convolutional Neural Network

• Transition-Based Generation from Abstract Meaning Representations

• Adversarial Sets for Regularising Neural Link Predictors

• Image Pivoting for Learning Multilingual Multimodal Representations

• Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

• Vision-Based Fallen Person Detection for the Elderly

• A Deep Learning Approach to Digitally Stain Optical Coherence Tomography Images of the Optic Nerve Head

• Quenched exit times for random walk on dynamical percolation

• Reliable Beamspace Channel Estimation for Millimeter-Wave Massive MIMO Systems with Lens Antenna Array

• About the slab percolation threshold for the Potts model in dimension $d\ge4$

• Mixing time for random walk on supercritical dynamical percolation

• Hamiltonian cycles in `fair’ k-partite graphs

• Minimax Game-Theoretic Approach to Multiscale $H_{\infty}$ Optimal Filtering

• A mathematical model of the metabolic process of atherosclerosis

• An Improved Approximate Consensus Algorithm in the Presence of Mobile Faults

• Thread Reconstruction in Conversational Data using Neural Coherence Models

• Towards Real-Time Search Planning in Subsea Environments

• Equality of the Jellium and Uniform Electron Gas next-order asymptotic terms for Riesz potentials

**26**
*Wednesday*
Jul 2017

Posted Documents

inClustering is widely used in data analysis where kernel methods are particularly popular due to their generality and discriminating power. However, kernel clustering has a practically significant bias to small dense clusters, e.g. empirically observed in (Shi & Malik, TPAMI’00). Its causes have never been analyzed and understood theoretically, even though many attempts were made to improve the results. We provide conditions and formally prove this bias in kernel clustering. Moreover, we show a general class of locally adaptive kernels directly addressing these conditions. Previously, (Breiman, ML’96) proved a bias to histogram mode isolation in discrete Gini criterion for decision tree learning. We found that kernel clustering reduces to a continuous generalization of Gini criterion for a common class of kernels where we prove a bias to density mode isolation and call it Breiman’s bias. These theoretical findings suggest that a principal solution for the bias should directly address data density inhomogeneity. In particular, our density law shows how density equalization can be done implicitly using certain locally adaptive geodesic kernels. Interestingly, a popular heuristic kernel in (Zelnik-Manor and Perona, NIPS’04) approximates a special case of our Riemannian kernel framework. Our general ideas are relevant to any algorithms for kernel clustering. We show many synthetic and real data experiments illustrating Breiman’s bias and its solution. We anticipate that theoretical understanding of kernel clustering limitations and their principled solutions will be important for a broad spectrum of data analysis applications in diverse disciplines. Kernel clustering: Breiman’s bias and solutions

**26**
*Wednesday*
Jul 2017

Posted R Packages

in* Optimal B-Robust Estimator Tools* (

An implementation for computing Optimal B-Robust Estimators (OBRE) of two parameters distributions. The procedure is composed by some equations that are evaluated alternatively until the solution is reached. Some tools for analyzing the estimates are included. The most relevant is OBRE covariance matrix computation using a closed formula.

Allows you to create an evidence factor (EX analysis) in an instrumental variables regression model. Additionally, performs Sensitivity analysis for OLS analysis, 2SLS analysis and EX analysis with interpretable plotting and printing features.

Performs sparse discriminant analysis on a combination of node and leaf predictors when the predictor variables are structured according to a tree.

Genotype plus genotype-by-environment (GGE) biplots rendered using ‘ggplot2’. Provides a command line interface to all of the functionality contained within ‘GGEBiplotGUI’.

Performs parametric and non-parametric estimation and simulation for multi-state discrete-time semi-Markov processes. For the parametric estimation, several discrete distributions are considered for the sojourn times: Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial. The non-parametric estimation concerns the sojourn time distributions, where no assumptions are done on the shape of distributions. Moreover, the estimation can be done on the basis of one or several sample paths, with or without censoring at the beginning or/and at the end of the sample paths. The implemented methods are described in Barbu, V.S., Limnios, N. (2008) <doi:10.1007/978-0-387-73173-5>, Barbu, V.S., Limnios, N. (2008) <doi:10.1080/10485250701261913> and Trevezas, S., Limnios, N. (2011) <doi:10.1080/10485252.2011.555543>. Estimation and simulation of discrete-time k-th order Markov chains are also considered.

**26**
*Wednesday*
Jul 2017

Posted Books

in
**26**
*Wednesday*
Jul 2017

Posted What is ...

in**Intervention in Prediction Measure (IPM)**

Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests. A new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types. … **Generalized Logistic Distribution**

The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below. One family described here has also been called the skew-logistic distribution. For other families of distributions that have also been called generalized logistic distributions, see the shifted log-logistic distribution, which is a generalization of the log-logistic distribution. … **Gated Recurrent Neural Tensor Network**

Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture long-term dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling long-term dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representing the hidden layers of an RNN with more expressive operations (i.e., tensor products) helps it learn a more complex relationship between the current input and the previous hidden layer information. These ideas can generally improve RNN performances. In this paper, we proposed a novel RNN architecture that combine the concepts of gating mechanism and the tensor product into a single model. By combining these two concepts into a single RNN, our proposed models learn long-term dependencies by modeling with gating units and obtain more expressive and direct interaction between input and hidden layers using a tensor product on 3-dimensional array (tensor) weight parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN and combine them with a tensor product inside their formulations. Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product. We conducted experiments with our proposed models on word-level and character-level language modeling tasks and revealed that our proposed models significantly improved their performance compared to our baseline models. …