“Thanks to a perfect storm of recent advances in the tech industry, AI has risen from the ashes and regained its aura of cool.” Mike Barlow ( 2017 )

# Magister Dixit

**28**
*Sunday*
May 2017

Posted Magister Dixit

in
Advertisements

**28**
*Sunday*
May 2017

Posted Magister Dixit

in“Thanks to a perfect storm of recent advances in the tech industry, AI has risen from the ashes and regained its aura of cool.” Mike Barlow ( 2017 )

Advertisements

**28**
*Sunday*
May 2017

Posted What is ...

in**Generalized Autoregressive Moving Average Models (GARMA)**

A class of generalized autoregressive moving average (GARMA) models is developed that extends the univariate Gaussian ARMA time series model to a flexible observation-driven model for non-Gaussian time series data. The dependent variable is assumed to have a conditional exponential family distribution given the past history of the process. The model estimation is carried out using an iteratively reweighted least squares algorithm. Properties of the model, including stationarity and marginal moments, are either derived explicitly or investigated using Monte Carlo simulation. The relationship of the GARMA model to other models is shown, including the autoregressive models of Zeger and Qaqish, the moving average models of Li, and the reparameterized generalized autoregressive conditional heteroscedastic GARCH model (providing the formula for its fourth marginal moment not previously derived). The model is demonstrated by the application of the GARMA model with a negative binomial conditional distribution to a well-known time series dataset of poliomyelitis counts. … **Machine Vision (MV)**

Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance in industry. The scope of MV is broad. MV is related to, though distinct from, computer vision. … **GraphH**

It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core systems have been proposed recently for processing big graphs using secondary storage, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high- performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply- Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs. …

**28**
*Sunday*
May 2017

Posted R Packages

in* Univariate Pseudo-Random Number Generation* (

Pseudo-random number generation of 17 univariate distributions.

Do data management functions common in real-time monitoring (also called: ecological momentary assessment, experience sampling, micro-longitudinal) data, including centering on participant means and merging event-level data into momentary data sets where you need the events to correspond to the nearest data point in the momentary data. This is VERY early release software, and more features will be added over time.

Helps to visualize multi-variate time-series having numeric and factor variables. You can use the package for visual analysis of data by plotting the data for each variable in the desired order and study interaction between a factor and a numeric variable by creating overlapping plots.

Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘optimx’ package.

Easily simulates regression models, including both simple regression and generalized linear mixed models with up to three level of nesting. Power simulations that are flexible allowing the specification of missing data, unbalanced designs, and different random error distributions are built into the package.

**27**
*Saturday*
May 2017

Posted Documents

inInterpretability has become an important issue as machine learning is increasingly used to inform consequential decisions. We propose an approach for interpreting a blackbox model by extracting a decision tree that approximates the model. Our model extraction algorithm avoids overfitting by leveraging blackbox model access to actively sample new training points. We prove that as the number of samples goes to infinity, the decision tree learned using our algorithm converges to the exact greedy decision tree. In our evaluation, we use our algorithm to interpret random forests and neural nets trained on several datasets from the UCI Machine Learning Repository, as well as control policies learned for three classical reinforcement learning problems. We show that our algorithm improves over a baseline based on CART on every problem instance. Furthermore, we show how an interpretation generated by our approach can be used to understand and debug these models. Interpreting Blackbox Models via Model Extraction

**27**
*Saturday*
May 2017

Posted Books

in
**27**
*Saturday*
May 2017

Posted Magister Dixit

in“Today, you are much less likely to face a scenario in which you cannot query data and get a response back in a brief period of time. Analytical processes that used to require month, days, or hours have been reduced to minutes, seconds, and fractions of seconds. But shorter processing times have led to higher expectations. Two years ago, many data analysts thought that generating a result from a query in less than 40 minutes was nothing short of miraculous. Today, they expect to see results in under a minute. That’s practically the speed of thought – you think of a query, you get a result, and you begin your experiment. “It’s about moving with greater speed toward previously unknown questions, defining new insights, and reducing the time between when an event happens somewhere in the world and someone responds or reacts to that event,” says Erickson. A rapidly emerging universe of newer technologies has dramatically reduced data processing cycle time, making it possible to explore and experiment with data in ways that would not have been practical or even possible a few years ago. Despite the availability of new tools and systems for handling massive amounts of data at incredible speeds, however, the real promise of advanced data analytics lies beyond the realm of pure technology. “Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse,” says Michael Minelli, co-author of Big Data, Big Analytics. “It’s about the ability to make better decisions and take meaningful actions at the right time. It’s about detecting fraud while someone is swiping a credit card, or triggering an offer while a shopper is standing on a checkout line, or placing an ad on a website while someone is reading a specific article. It’s about combining and analyzing data so you can take the right action, at the right time, and at the right place.” For some, real-time big data analytics (RTBDA) is a ticket to improved sales, higher profits and lower marketing costs. To others, it signals the dawn of a new era in which machines begin to think and respond more like humans.” Mike Barlow ( 2013 )

**27**
*Saturday*
May 2017

Posted What is ...

in**Deep Rotation Equivariant Network (DREN)**

Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted to CNN to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network(DREN) consisting of cycle layers, isotonic layers and decycle layers.Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures. Our codes are released on GitHub. … **Semantic Matching**

Semantic matching is a technique used in computer science to identify information which is semantically related. Given any two graph-like structures, e.g. classifications, taxonomies database or XML schemas and ontologies, matching is an operator which identifies those nodes in the two structures which semantically correspond to one another. For example, applied to file systems it can identify that a folder labeled “car” is semantically equivalent to another folder “automobile” because they are synonyms in English. This information can be taken from a linguistic resource like WordNet. In the recent years many of them have been offered. S-Match is an example of a semantic matching operator. It works on lightweight ontologies, namely graph structures where each node is labeled by a natural language sentence, for example in English. These sentences are translated into a formal logical formula (according to an artificial unambiguous language) codifying the meaning of the node taking into account its position in the graph. For example, in case the folder “car” is under another folder “red” we can say that the meaning of the folder “car” is “red car” in this case. This is translated into the logical formula “red AND car”. The output of S-Match is a set of semantic correspondences called mappings attached with one of the following semantic relations: disjointness (⊥), equivalence (≡), more specific (⊑) and less specific (⊒). In our example the algorithm will return a mapping between ”car” and ”automobile” attached with an equivalence relation. Information semantically matched can also be used as a measure of relevance through a mapping of near-term relationships. Such use of S-Match technology is prevalent in the career space where it is used to gauge depth of skills through relational mapping of information found in applicant resumes. Semantic matching represents a fundamental technique in many applications in areas such as resource discovery, data integration, data migration, query translation, peer to peer networks, agent communication, schema and ontology merging. It using is also being investigated in other areas such as event processing. In fact, it has been proposed as a valid solution to the semantic heterogeneity problem, namely managing the diversity in knowledge. Interoperability among people of different cultures and languages, having different viewpoints and using different terminology has always been a huge problem. Especially with the advent of the Web and the consequential information explosion, the problem seems to be emphasized. People face the concrete problem to retrieve, disambiguate and integrate information coming from a wide variety of sources. … **Waterfall Plot**

A waterfall plot is a three-dimensional plot in which multiple curves of data, typically spectra, are displayed simultaneously. Typically the curves are staggered both across the screen and vertically, with ‘nearer’ curves masking the ones behind. The result is a series of ‘mountain’ shapes that appear to be side by side. The waterfall plot is often used to show how two-dimensional information changes over time or some other variable such as rpm. The term ‘waterfall plot’ is sometimes used interchangeably with ‘spectrogram’ or ‘Cumulative Spectral Decay’ (CSD) plot. …

**27**
*Saturday*
May 2017

Posted arXiv Papers

in**Unsupervised Learning Layers for Video Analysis**

This paper presents two unsupervised learning layers (UL layers) for label-free video analysis: one for fully connected layers, and the other for convolutional ones. The proposed UL layers can play two roles: they can be the cost function layer for providing global training signal; meanwhile they can be added to any regular neural network layers for providing local training signals and combined with the training signals backpropagated from upper layers for extracting both slow and fast changing features at layers of different depths. Therefore, the UL layers can be used in either pure unsupervised or semi-supervised settings. Both a closed-form solution and an online learning algorithm for two UL layers are provided. Experiments with unlabeled synthetic and real-world videos demonstrated that the neural networks equipped with UL layers and trained with the proposed online learning algorithm can extract shape and motion information from video sequences of moving objects. The experiments demonstrated the potential applications of UL layers and online learning algorithm to head orientation estimation and moving object localization.

**Proximity Variational Inference**

Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance.

**Approximation and Convergence Properties of Generative Adversarial Learning**

Generative adversarial networks (GAN) approximate a target data distribution by jointly optimizing an objective function through a ‘two-player game’ between a generator and a discriminator. Despite their empirical success, however, two very basic questions on how well they can approximate the target distribution remain unanswered. First, it is not known how restricting the discriminator family affects the approximation quality. Second, while a number of different objective functions have been proposed, we do not understand when convergence to the global minima of the objective function leads to convergence to the target distribution under various notions of distributional convergence. In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions. We show that if the objective function is an adversarial divergence with some additional conditions, then using a restricted discriminator family has a moment-matching effect. Additionally, we show that for objective functions that are strict adversarial divergences, convergence in the objective function implies weak convergence, thus generalizing previous results.

**Neural Decomposition of Time-Series Data for Effective Generalization**

We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the Mackey-Glass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a time-series of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled time-series of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular time-series forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model.

**Towards Consistency of Adversarial Training for Generative Models**

This work presents a rigorous statistical analysis of adversarial training for generative models, advancing recent work by Arjovsky and Bottou [2]. A key element is the distinction between the objective function with respect to the (unknown) data distribution, and its empirical counterpart. This yields a straight-forward explanation for common pathologies in practical adversarial training such as vanishing gradients. To overcome such issues, we pursue the idea of smoothing the Jensen-Shannon Divergence (JSD) by incorporating noise in the formulation of the discriminator. As we show, this effectively leads to an empirical version of the JSD in which the true and the generator densities are replaced by kernel density estimates. We analyze statistical consistency of this objective, and demonstrate its practical effectiveness.

**Neural Attribute Machines for Program Generation**

Recurrent neural networks have achieved remarkable success at generating sequences with complex structures, thanks to advances that include richer embeddings of input and cures for vanishing gradients. Trained only on sequences from a known grammar, though, they can still struggle to learn rules and constraints of the grammar. Neural Attribute Machines (NAMs) are equipped with a logical machine that represents the underlying grammar, which is used to teach the constraints to the neural machine by (i) augmenting the input sequence, and (ii) optimizing a custom loss function. Unlike traditional RNNs, NAMs are exposed to the grammar, as well as samples from the language of the grammar. During generation, NAMs make significantly fewer violations of the constraints of the underlying grammar than RNNs trained only on samples from the language of the grammar.

**Geometric Methods for Robust Data Analysis in High Dimension**

Machine learning and data analysis now finds both scientific and industrial application in biology, chemistry, geology, medicine, and physics. These applications rely on large quantities of data gathered from automated sensors and user input. Furthermore, the dimensionality of many datasets is extreme: more details are being gathered about single user interactions or sensor readings. All of these applications encounter problems with a common theme: use observed data to make inferences about the world. Our work obtains the first provably efficient algorithms for Independent Component Analysis (ICA) in the presence of heavy-tailed data. The main tool in this result is the centroid body (a well-known topic in convex geometry), along with optimization and random walks for sampling from a convex body. This is the first algorithmic use of the centroid body and it is of independent theoretical interest, since it effectively replaces the estimation of covariance from samples, and is more generally accessible. This reduction relies on a non-linear transformation of samples from such an intersection of halfspaces (i.e. a simplex) to samples which are approximately from a linearly transformed product distribution. Through this transformation of samples, which can be done efficiently, one can then use an ICA algorithm to recover the vertices of the intersection of halfspaces. Finally, we again use ICA as an algorithmic primitive to construct an efficient solution to the widely-studied problem of learning the parameters of a Gaussian mixture model. Our algorithm again transforms samples from a Gaussian mixture model into samples which fit into the ICA model and, when processed by an ICA algorithm, result in recovery of the mixture parameters. Our algorithm is effective even when the number of Gaussians in the mixture grows polynomially with the ambient dimension

**Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks**

Content popularity prediction has been extensively studied due to its importance and interest for both users and hosts of social media sites like Facebook, Instagram, Twitter, and Pinterest. However, existing work mainly focuses on modeling popularity using a single metric such as the total number of likes or shares. In this work, we propose Diffusion-LSTM, a memory-based deep recurrent network that learns to recursively predict the entire diffusion path of an image through a social network. By combining user social features and image features, and encoding the diffusion path taken thus far with an explicit memory cell, our model predicts the diffusion path of an image more accurately compared to alternate baselines that either encode only image or social features, or lack memory. By mapping individual users to user prototypes, our model can generalize to new users not seen during training. Finally, we demonstrate our model’s capability of generating diffusion trees, and show that the generated trees closely resemble ground-truth trees.

**Implicit Regularization in Matrix Factorization**

We study implicit regularization when optimizing an underdetermined quadratic objective over a matrix with gradient descent on a factorization of . We conjecture and provide empirical and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.

• Consistent Kernel Density Estimation with Non-Vanishing Bandwidth

• Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

• Attention-based Natural Language Person Retrieval

• Counterfactual Multi-Agent Policy Gradients

• Compiling Quantum Circuits to Realistic Hardware Architectures using Temporal Planners

• Adaptive Estimation of High Dimensional Partially Linear Model

• Doubly Stochastic Variational Inference for Deep Gaussian Processes

• Visual Servoing from Deep Neural Networks

• Dual Dynamic Programming with cut selection: convergence proof and numerical experiments

• Joint PoS Tagging and Stemming for Agglutinative Languages

• Novel Deep Convolution Neural Network Applied to MRI Cardiac Segmentation

• Deep Voice 2: Multi-Speaker Neural Text-to-Speech

• New Results for Provable Dynamic Robust PCA

• Efficient, Safe, and Probably Approximately Complete Learning of Action Models

• Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo

• Communication vs Distributed Computation: an alternative trade-off curve

• Logic Tensor Networks for Semantic Image Interpretation

• Optimal Cooperative Inference

• Cultural Diffusion and Trends in Facebook Photographs

• The Onsager-Machlup functional associated with additive fractional noise

• Multicut decomposition methods with cut selection for multistage stochastic programs

• Automatic sequences and generalised polynomials

• Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks

• Plug-and-Play Unplugged: Optimization Free Reconstruction using Consensus Equilibrium

• The Dual Graph Shift Operator: Identifying the Support of the Frequency Domain

• Matroids Hitting Sets and Unsupervised Dependency Grammar Induction

• State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

• Large induced subgraphs with $k$ vertices of almost maximum degree

• Extraction and Classification of Diving Clips from Continuous Video Footage

• Principled Hybrids of Generative and Discriminative Domain Adaptation

• The tessellation problem of quantum walks

• Learning to Pour

• Spectrum Sharing and Cyclical Multiple Access in UAV-Aided Cellular Offloading

• Online Edge Grafting for Efficient MRF Structure Learning

• Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

• Lat-Net: Compressing Lattice Boltzmann Flow Simulations using Deep Neural Networks

• Deriving Neural Architectures from Sequence and Graph Kernels

• A Conic Integer Programming Approach to Constrained Assortment Optimization under the Mixed Multinomial Logit Model

• Energy-Efficient Multi-Pair Two-Way AF Full-Duplex Massive MIMO Relaying

• Cross-Domain Perceptual Reward Functions

• Expectation Propagation for t-Exponential Family Using Q-Algebra

• Convergence of Langevin MCMC in KL-divergence

• A Clustering-based Consistency Adaptation Strategy for Distributed SDN Controllers

• Weakly Supervised Semantic Segmentation Based on Co-segmentation

• Circular law for the sum of random permutation matrices

• Max-Cosine Matching Based Neural Models for Recognizing Textual Entailment

• The cost of fairness in classification

• Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

• A Spatial Branch-and-Cut Method for Nonconvex QCQP with Bounded Complex Variables

• An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem

• Vector Transport-Free SVRG with General Retraction for Riemannian Optimization: Complexity Analysis and Practical Implementation

• Triangle Finding and Listing in CONGEST Networks

• MagNet: a Two-Pronged Defense against Adversarial Examples

• Gaps between avalanches in 1D Random Field Ising Models

• Load Balancing for Skewed Streams on Heterogeneous Cluster

• Wireless Powered Communications with Finite Battery and Finite Blocklength

• Port-Hamiltonian descriptor systems

• Dynamic degree-corrected blockmodels for social networks: a nonparametric approach

• Performance Optimization of Co-Existing Underlay Secondary Networks

• Recent progress in many-body localization

• SLAM based Quasi Dense Reconstruction For Minimally Invasive Surgery Scenes

• A matrix-based method of moments for fitting multivariate network meta-analysis models with multiple outcomes and random inconsistency effects

• The structure of delta-matroids with width one twists

• Topology Induced Oscillations in Majorana Fermions in a Quasiperiodic Superconducting Chain

• First-spike based visual categorization using reward-modulated STDP

• Deep image representations using caption generators

• Distributionally Robust Optimisation in Congestion Control

• Cut-norm and entropy minimization over weak* limits

• Boolean dimension and local dimension

• Shorter stabilizer circuits via Bruhat decomposition and quantum circuit transformations

• On the (parameterized) complexity of recognizing well-covered (r,l)-graphs

• Investigation of Using VAE for i-Vector Speaker Verification

• Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

• Classification of Quantitative Light-Induced Fluorescence Images Using Convolutional Neural Network

• Firing rate equations require a spike synchrony mechanism to correctly describe fast oscillations in inhibitory networks

• Learning Structured Text Representations

• A simplicial decomposition framework for large scale convex quadratic programming

• Hypergeometric and basic hypergeometric series and integrals associated with root systems

• Geometry of time-reversible group-based models

• Asynchronous Parallel Bayesian Optimisation via Thompson Sampling

• GSplit LBI: Taming the Procedural Bias in Neuroimaging for Disease Prediction

• Arrangements of homothets of a convex body II

• On the Cauchy problem for integro-differential equations in the scale of spaces of generalized smoothness

• Quantum-secured blockchain

• Entanglement properties of quantum grid states

• Flux-dependent localisation in a disordered flat-band lattice

• Is Our Model for Contention Resolution Wrong?

• Filtering Variational Objectives

• Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework

**27**
*Saturday*
May 2017

Posted Documents

inIn this paper we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to \emph{embodiment factors}. We define embodiment factors as the ratio between an entity’s ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is \emph{not} sentient intelligence we should fear but \emph{non-sentient} intelligence. Living Together: Mind and Machine Intelligence

**27**
*Saturday*
May 2017

Posted Books

in