In the past five years we have observed the rise of incredibly well performing feed-forward neural networks trained supervisedly for vision related tasks. These models have achieved super-human performance on object recognition, localisation, and detection in still images. However, there is a need to identify the best strategy to employ these networks with temporal visual inputs and obtain a robust and stable representation of video data. Inspired by the human visual system, we propose a deep neural network family, CortexNet, which features not only bottom-up feed-forward connections, but also it models the abundant top-down feedback and lateral connections, which are present in our visual cortex. We introduce two training schemes – the unsupervised MatchNet and weakly supervised TempoNet modes – where a network learns how to correctly anticipate a subsequent frame in a video clip or the identity of its predominant subject, by learning egomotion clues and how to automatically track several objects in the current scene. Find the project website at https://…/.
Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively. Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from ‘What is the right fairness criterion?’ to ‘What do we want to assume about the causal data generating process?’ Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.
We present a novel recurrent neural network (RNN) architecture that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant information in the input sequence. We achieve this by extending Unitary RNNs with a gating mechanism. Our model is able to outperform LSTMs, GRUs and Unitary RNNs on different benchmark tasks, as the ability to simultaneously remember long term dependencies and forget irrelevant information in the input sequence helps with many natural long term sequential tasks such as language modeling and question answering. We provide competitive results along with an analysis of our model on the bAbI Question Answering task, PennTreeBank, as well as synthetic tasks that involve long-term dependencies such as parenthesis, denoising and copying tasks.
Random projections are random linear maps, sampled from appropriate distributions, that approx- imately preserve certain geometrical invariants so that the approximation improves as the dimension of the space grows. The well-known Johnson-Lindenstrauss lemma states that there are random ma- trices with surprisingly few rows that approximately preserve pairwise Euclidean distances among a set of points. This is commonly used to speed up algorithms based on Euclidean distances. We prove that these matrices also preserve other quantities, such as the distance to a cone. We exploit this result to devise a probabilistic algorithm to solve linear programs approximately. We show that this algorithm can approximately solve very large randomly generated LP instances. We also showcase its application to an error correction coding problem.
Developers spend a significant amount of time searching for code: e.g., to understand how to complete, correct, or adapt their own code for a new context. Unfortunately, the state of the art in code search has not evolved much beyond text search over tokenized source. Code has much richer structure and semantics than normal text, and this property can be exploited to specialize the code-search process for better querying, searching, and ranking of code-search results. We present a new code-search engine named Source Forager. Given a query in the form of a C/C++ function, Source Forager searches a pre-populated code database for similar C/C++ functions. Source Forager preprocesses the database to extract a variety of simple code features that capture different aspects of code. A search returns the $k$ functions in the database that are most similar to the query, based on the various extracted code features. We tested the usefulness of Source Forager using a variety of code-search queries from two domains. Our experiments show that the ranked results returned by Source Forager are accurate, and that query-relevant functions can be reliably retrieved even when searching through a large code database that contains very few query-relevant functions. We believe that Source Forager is a first step towards much-needed tools that provide a better code-search experience.
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.
We present a new preprocessing algorithm for embedding the nodes of a given edge-weighted undirected graph into a Euclidean space. In this space, the Euclidean distance between any two nodes approximates the length of the shortest path between them in the given graph. Later, at runtime, a shortest path between any two nodes can be computed using A* search with the Euclidean distances as heuristic estimates. Our preprocessing algorithm, dubbed FastMap, is inspired by the Data Mining algorithm of the same name and runs in near-linear time. Hence, FastMap is orders of magnitude faster than competing approaches that produce a Euclidean embedding using Semidefinite Programming. Our FastMap algorithm also produces admissible and consistent heuristics and therefore guarantees the generation of optimal paths. Moreover, FastMap works on general undirected graphs for which many traditional heuristics, such as the Manhattan Distance heuristic, are not always well defined. Empirically too, we demonstrate that the FastMap heuristic is competitive with other state-of-the-art heuristics like the Differential heuristic.
We present a solution to scale spectral algorithms for learning sequence functions. We are interested in the case where these functions are sparse (that is, for most sequences they return 0). Spectral algorithms reduce the learning problem to the task of computing an SVD decomposition over a special type of matrix called the Hankel matrix. This matrix is designed to capture the relevant statistics of the training sequences. What is crucial is that to capture long range dependencies we must consider very large Hankel matrices. Thus the computation of the SVD becomes a critical bottleneck. Our solution finds a subset of rows and columns of the Hankel that realizes a compact and informative Hankel submatrix. The novelty lies in the way that this subset is selected: we exploit a maximal bipartite matching combinatorial algorithm to look for a sub-block with full structural rank, and show how computation of this sub-block can be further improved by exploiting the specific structure of Hankel matrices.
In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.