Model Selection for Anomaly Detection

Anomaly detection based on one-class classification algorithms is broadly used in many applied domains like image processing (e.g. detection of whether a patient is ‘cancerous’ or ‘healthy’ from mammography image), network intrusion detection, etc. Performance of an anomaly detection algorithm crucially depends on a kernel, used to measure similarity in a feature space. The standard approaches (e.g. cross-validation) for kernel selection, used in two-class classification problems, can not be used directly due to the specific nature of a data (absence of a second, abnormal, class data). In this paper we generalize several kernel selection methods from binary-class case to the case of one-class classification and perform extensive comparison of these approaches using both synthetic and real-world data.

Do Convolutional Networks need to be Deep for Text Classification ?

We study in this work the importance of depth in convolutional models for text classification, either when character or word inputs are considered. We show on 5 standard text classification and sentiment analysis tasks that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms deep models such as DenseNet with word inputs. Our shallow word model further establishes new state-of-the-art performances on two datasets: Yelp Binary (95.9\%) and Yelp Full (64.9\%).

Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models

Even todays most advanced machine learning models are easily fooled by almost imperceptible perturbations of their inputs. Foolbox is a new Python package to generate such adversarial perturbations and to quantify and compare the robustness of machine learning models. It is build around the idea that the most comparable robustness measure is the minimum perturbation needed to craft an adversarial example. To this end, Foolbox provides reference implementations of most published adversarial attack methods alongside some new ones, all of which perform internal hyperparameter tuning to find the minimum adversarial perturbation. Additionally, Foolbox interfaces with most popular deep learning frameworks such as PyTorch, Keras, TensorFlow, Theano and MXNet, provides a straight forward way to add support for other frameworks and allows different adversarial criteria such as targeted misclassification and top-k misclassification as well as different distance measures. The code is licensed under the MIT license and is openly available at https://…/foolbox

Distral: Robust Multitask Reinforcement Learning

Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (Distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a ‘distilled’ policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust and more stable—attributes that are critical in deep reinforcement learning.

Learning Features from Co-occurrences: A Theoretical Analysis

Representing a word by its co-occurrences with other words in context is an effective way to capture the meaning of the word. However, the theory behind remains a challenge. In this work, taking the example of a word classification task, we give a theoretical analysis of the approaches that represent a word X by a function f(P(C|X)), where C is a context feature, P(C|X) is the conditional probability estimated from a text corpus, and the function f maps the co-occurrence measure to a prediction score. We investigate the impact of context feature C and the function f. We also explain the reasons why using the co-occurrences with multiple context features may be better than just using a single one. In addition, some of the results shed light on the theory of feature learning and machine learning in general.

Neural Networks for Information Retrieval

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.

Discriminative Optimization: Theory and Applications to Computer Vision Problems

Many computer vision problems are formulated as the optimization of a cost function. This approach faces two main challenges: (i) designing a cost function with a local optimum at an acceptable solution, and (ii) developing an efficient numerical method to search for one (or multiple) of these local optima. While designing such functions is feasible in the noiseless case, the stability and location of local optima are mostly unknown under noise, occlusion, or missing data. In practice, this can result in undesirable local optima or not having a local optimum in the expected place. On the other hand, numerical optimization algorithms in high-dimensional spaces are typically local and often rely on expensive first or second order information to guide the search. To overcome these limitations, this paper proposes Discriminative Optimization (DO), a method that learns search directions from data without the need of a cost function. Specifically, DO explicitly learns a sequence of updates in the search space that leads to stationary points that correspond to desired solutions. We provide a formal analysis of DO and illustrate its benefits in the problem of 3D point cloud registration, camera pose estimation, and image denoising. We show that DO performed comparably or outperformed state-of-the-art algorithms in terms of accuracy, robustness to perturbations, and computational efficiency.

Tensor-Based Backpropagation in Neural Networks with Non-Sequential Input

Neural networks have been able to achieve groundbreaking accuracy at tasks conventionally considered only doable by humans. Using stochastic gradient descent, optimization in many dimensions is made possible, albeit at a relatively high computational cost. By splitting training data into batches, networks can be distributed and trained vastly more efficiently and with minimal accuracy loss. We have explored the mathematics behind efficiently implementing tensor-based batch backpropagation algorithms. A common approach to batch training is iterating over batch items individually. Explicitly using tensor operations to backpropagate allows training to be performed non-linearly, increasing computational efficiency.

Advances in Artificial Intelligence Require Progress Across all of Computer Science

Advances in Artificial Intelligence require progress across all of computer science.

Deep Gaussian Embedding of Attributed Graphs: Unsupervised Inductive Learning via Ranking
Defensive Alliances in Graphs of Bounded Treewidth
Estimating the unseen from multiple populations
Capacity, Fidelity, and Noise Tolerance of Associative Spatial-Temporal Memories Based on Memristive Neuromorphic Network
Buffer Size for Routing Limited-Rate Adversarial Traffic
Gradient Coding from Cyclic MDS Codes and Expander Graphs
Mechanics Automatically Recognized via Interactive Observation: Jumping
Lyapunov Conditions for Differentiability of Markov Chain Expectations: the Absolutely Continuous Case
Independence, Conditionality and Structure of Dempster-Shafer Belief Functions
The Discrete-Time Geometric Maximum Principle
Heavy traffic analysis of a polling model with retrials and glue periods
Identification and Interpretation of Belief Structure in Dempster-Shafer Theory
A Formal Framework to Characterize Interpretability of Procedures
Additive non-approximability of chromatic number in proper minor-closed classes
Unsupervised body part regression using convolutional neural network with self-organization
Secure and Privacy-Preserving Consensus
A sharp Dirac-Erdős type bound for large graphs
The Generalized Nagell-Ljunggren Problem: Powers with Repetitive Representations
Character bounds for finite groups of Lie type
ClustGeo: an R package for hierarchical clustering with spatial constraints
Autoencoder-augmented Neuroevolution for Visual Doom Playing
Negative Sampling Improves Hypernymy Extraction Based on Projection Learning
Quasar: Datasets for Question Answering by Search and Reading
Influence of Resampling on Accuracy of Imbalanced Classification
Automatic Mapping of NES Games with Mappy
Maximizing and minimizing the number of generalized colorings of trees
Enumerating Vertices of $0/1$-Polyhedra associated with $0/1$-Totally Unimodular Matrices
Large Scale Variable Fidelity Surrogate Modeling
A thermally-driven differential mutation approach for the structural optimization of large atomic systems
A note on X-rays of permutations and a problem of Brualdi and Fritscher
Explainable Entity-based Recommendations with Knowledge Graphs
Principle of Least Rattling from Strong Time-scale Separation
The Waldspurger Transform of Permutations and Alternating Sign Matrices
Representation Learning for Grounded Spatial Reasoning
Upper Rate Functions of Brownian Motion Type for Symmetric Jump Processes
Cooperative HARQ Assisted NOMA Scheme in Large-scale D2D Networks
The Surfacing of Multiview 3D Drawings via Lofting and Occlusion Reasoning
Differential Stability Analysis via Multiplier Sets
Differential stability of a class of convex optimal control problems
Deciding the Confusability of Words under Tandem Repeats
Environmental engineering is an emergent feature of diverse ecosystems and drives community structure
Prediction and Power in Molecular Sensors: Uncertainty and Dissipation When Conditionally Markovian Channels Are Driven by Semi-Markov Environments
Predicting Causes of Reformulation in Intelligent Assistants
Quantifying and Estimating the Predictive Accuracy for Censored Time-to-Event Data with Competing Risks
A Brief Study of In-Domain Transfer and Learning from Fewer Samples using A Few Simple Priors
Learning Photography Aesthetics with Deep CNNs
Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
Merge or Not? Learning to Group Faces via Imitation Learning
Correction to ‘The Generalized Stochastic Likelihood Decoder: Random Coding and Expurgated Bounds’
Approaching $\frac{3}{2}$ for the $s$-$t$-path TSP
Leveraging the Path Signature for Skeleton-based Human Action Recognition
A Web-Based Tool for Analysing Normative Documents in English
Testing High-dimensional Covariance Matrices under the Elliptical Distribution and Beyond
Dependency Injection for Programming by Optimization
Stochastic Packing Integer Programs with Few Queries
Query-Aware Sparse Coding for Multi-Video Summarization
On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
Constraints, Lazy Constraints, or Propagators in ASP Solving: An Empirical Analysis
Kafnets: kernel-based non-parametric activation functions for neural networks
Random Transverse Field Spin-Glass Model on the Cayley tree : phase transition between the two Many-Body-Localized Phases
Deep Learning with Topological Signatures
Large-scale Video Classification guided by Batch Normalized LSTM Translator
Stable Distribution Alignment Using the Dual of the Adversarial Distance
Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Clingo goes Linear Constraints over Reals and Integers
The Chromatic Symmetric Functions of Trivially Perfect Graphs and Cographs
Nonexistence of certain singly even self-dual codes with minimal shadow
Automatic Recognition of Deceptive Facial Expressions of Emotion
A Note on the Inheritance of the Isometry-Dual Property under Puncturing AG Codes
Automation of Feature Engineering for IoT Analytics
Robust Geometry-Based User Scheduling for Large MIMO Systems Under Realistic Channel Conditions
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Disentangling Motion, Foreground and Background Features in Videos
Higher dimensional Steinhaus and Slater problems via homogeneous dynamics
Is writing style predictive of scientific fraud?
Armstrong’s Axioms and Navigation Strategies
Small Sample Inference for the Common Coefficient of Variation
Inferring the parameters of a Markov process from snapshots of the steady state
Randomization-based Inference for Bernoulli-Trial Experiments and Implications for Observational Studies
Material Optimization in Transverse Electromagnetic Scattering Applications
Inference under Missing Data Conditions in the Stochastic Block Model
UTS submission to Google YouTube-8M Challenge 2017
Variable selection in multivariate linear models with high-dimensional covariance matrix estimation
MAC Resolvability: First And Second Order Results
Modeling Hormesis Using a Non-Monotonic Copula Method
Cost-Effective Cache Deployment in Mobile Heterogeneous Networks
Constrained percolation, Ising model and XOR Ising model on planar lattices
Distributionally Robust Optimization Techniques in Batch Bayesian Optimization
On the theory of Lorentz gases with long range interactions
Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting
Multi-Antenna Assisted Full-Duplex Relaying with Reliability-Aware Iterative Decoding
Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding
The (theta, wheel)-free graphs Part III: cliques, stable sets and coloring
Systems with disorder, interactions, and out of equilibrium: The exact independent-particle picture from density functional theory
Triangle packing in (sparse) tournaments: approximation and kernelization
Parsing with Traces: An $O(n^4)$ Algorithm and a Structural Representation
Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies
A survey of quantitative bounds for hypergraph Ramsey problems
Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions
Hypoelliptic diffusions: discretization, filtering and inference from complete and partial observations
Improving Sparsity in Kernel Adaptive Filters Using a Unit-Norm Dictionary
Lithium NLP: A System for Rich Information Extraction from Noisy User Generated Text on Social Media
Iterative Updating of Model Error for Bayesian Inversion
On the maximum diameter of path-pairable graphs
Tight uniform continuity bound for a family of entropies
Linear complementarity problems on extended second order cones
Cultivating DNN Diversity for Large Scale Video Labelling
Fast Restricted Causal Inference
Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation
On (Anti)Conditional Independence in Dempster-Shafer Theory
Polynomial Counting in Anonymous Dynamic Networks with Applications to Anonymous Dynamic Algebraic Computations
Note on group irregularity strength of disconnected graphs
Predicting Abandonment in Online Coding Tutorials
Approximation Schemes for Clustering with Outliers
The size-Ramsey number of powers of paths
Strategic Coalitions with Perfect Recall
Coalescent-based species tree estimation: a stochastic Farris transform
Mellin-Meijer-kernel density estimation on $\mathbb{R}^+$
A Scalable Algorithm for Gaussian Graphical Models with Change-Points
A Dichotomy on Constrained Topological Sorting
Lempel-Ziv: a ‘one-bit catastrophe’ but not a tragedy
Bayesian Optimization for Probabilistic Programs
How hard is it to satisfy (almost) all roommates?
Infinite rate symbiotic branching on the real line: The tired frogs model
Model compression as constrained optimization, with application to neural nets. Part II: quantization
Human-Level Intelligence or Animal-Like Abilities?
Generalized stealthy hyperuniform processes : maximal rigidity and the bounded holes conjecture
A Tight Approximation for Co-flow Scheduling for Minimizing Total Weighted Completion Time
Brittle to Quasi-Brittle Transition and Crack Initiation Precursors in Disordered Crystals
Privacy-preserving Decentralized Optimization Based on ADMM
Constructions of cyclic constant dimension codes
Stable processes, self-similarity and the unit ball
Gaussian Graphical Models: An Algebraic and Geometric Perspective
Weakly Submodular Maximization Beyond Cardinality Constraints: Does Randomization Help Greedy?
A Generating Function for the Distribution of Runs in Binary Words
Derivative Principal Component Analysis for Representing the Time Dynamics of Longitudinal and Functional Data
Kernel Method for Detecting Higher Order Interactions in multi-view Data: An Application to Imaging, Genetics, and Epigenetics
The spt-Function of Andrews
Identification of multi-object dynamical systems: consistency and Fisher information
A two-stage approach for estimating the parameters of an age-group epidemic model from incidence data