Distilled News

Building an efficient neural language model over a billion words

Neural networks designed for sequence predictions have recently gained renewed interested by achieving state-of-the-art performance across areas such as speech recognition, machine translation or language modeling. However, these models are quite computationally demanding, which in turn can limit their application. In the area of language modeling, recent advances have been made leveraging massively large models that could only be trained on a large GPU cluster for weeks at a time. While impressive, these processing-intensive practices favor exploring on large computational infrastructures that are typically too expensive for academic environments and impractical in a production setting, limiting the speed of research, reproducibility, and usability of the results.

Random forest interpretation – conditional feature contributions

In two of my previous blog posts, I explained how the black box of a random forest can be opened up by tracking decision paths along the trees and computing feature contributions. This way, any prediction can be decomposed into contributions from features, such that prediction=bias+feature 1 contribution+..+feature n contribution . However, this linear breakdown is inherently imperfect, since a linear combination of features cannot capture interactions between them.

Clustering: A Guide for the Perplexed

Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to do it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. We discuss how to do clustering correctly. Finding clusters is a powerful tool for understanding and exploring data. While the task sounds easy, it can be surprisingly difficult to it well. Most standard clustering algorithms can, and do, provide very poor clustering results in many cases. Our intuitions for what a cluster is are not as clear as we would like, and can easily be lead astray. We will attempt to find a definition of clustering that makes sense for most cases, and introduce an algorithm for finding such clusters, along with a high performance python implementation of the algorithm, building up more intuition for what clustering really means as we go.


Simplified implementation of “Convolutional Neural Networks for Sentence Classification” paper

Awesome TensorFlow

A curated list of awesome TensorFlow experiments, libraries, and projects. Inspired by awesome-machine-learning.

Accurately Measuring Model Prediction Error

When assessing the quality of a model, being able to accurately measure its prediction error is of key importance. Often, however, techniques of measuring error are used that give grossly misleading results. This can lead to the phenomenon of over-fitting where a model may fit the training data very well, but will do a poor job of predicting results for new data not used in model training. Here is an overview of methods to accurately measure model prediction error.

Intro to Implicit Matrix Factorization: Classic ALS with Sketchfab Models

Last post I described how I collected implicit feedback data from the website Sketchfab. I then claimed I would write about how to actually build a recommendation system with this data. Well, here we are! Let’s build. I think the best place to start when looking into implicit feedback recommenders is with the model outlined in the classic paper ‘Collaborative Filtering for Implicit Feedback Datasets’ by Koren et.al. (warning: pdf link). I have seen many names in the literature and machine learning libraries for this model. I’ll call it Weighted Regularized Matrix Factorization (WRMF) which tends to be a name used fairly often. WRMF is like the classic rock of implicit matrix factorization. It may not be the trendiest, but it will never go out of style. And, everytime I use it, I know that I’m guaranteed to like what I get out. Specifically, this model makes reasonable intuitive sense, it’s scalable, and, most importantly, I’ve found it easy to tune. There are much fewer hyperparameters than, say, stochastic gradient descent models.

Workflow in R

This came up recently on StackOverflow. One of the answers was particularly helpful and I thought it might be worth mentioning here. The idea presented there is to break the code into four files, all stored in your project directory. These four files are to be processed in the following order. …

pandasql: Make python speak SQL

One of my favorite things about Python is that users get the benefit of observing the R community and then emulating the best parts of it. I’m a big believer that a language is only as helpful as its libraries and tools. This post is about pandasql, a Python package we (Yhat) wrote that emulates the R package sqldf. It’s a small but mighty library comprised of just 358 lines of code. The idea of pandasql is to make Python speak SQL. For those of you who come from a SQL-first background or still ‘think in SQL’, pandasql is a nice way to take advantage of the strengths of both languages.

Will Google NL kill the market? Linguistic APIs review

Google has recently launched a new beta product – Google Cloud Natural Language API. As a rule, any new product by Google makes the company’s rivals get in a slight panic. What major players are to expect in the Natural Language area? Does it mean the end of free competition in the linguistic API market and dictatorship by “The good corporation”?

Predictive Modeling Foundation (binary outcome)

The future is undoubtedly attached to uncertainty, and this uncertainty can be estimated.

Recurrent Neural Network Gradients, and Lessons Learned Therein

I’ve spent the last week hand-rolling recurrent neural networks. I’m currently taking Udacity’s Deep Learning course, and arriving at the section on RNN’s and LSTM’s, I decided to build a few for myself.

bayesAB 0.7.0 + a Primer on Priors: Choosing Priors for Bayesian AB Testing using bayesAB

Most questions I’ve gotten since I released bayesAB have been along the lines of:
• Why/how is Bayesian AB testing better than Frequentist hypothesis AB testing?
• Why do I need priors?
• Do I really really really need priors?
• How do I choose priors?

Whats new on arXiv

Learning to Reason With Adaptive Computation

Multi-hop inference is necessary for machine learning systems to successfully solve tasks such as Recognising Textual Entailment and Machine Reading. In this work, we demonstrate the effectiveness of adaptive computation for learning the number of inference steps required for examples of different complexity and that learning the correct number of inference steps is difficult. We introduce the first model involving Adaptive Computation Time which provides a small performance benefit on top of a similar model without an adaptive component as well as enabling considerable insight into the reasoning process of the model.

A New Class of Private Chi-Square Tests

In this paper, we develop new test statistics for private hypothesis testing. These statistics are designed specifically so that their asymptotic distributions, after accounting for noise added for privacy concerns, match the asymptotics of the classical (non-private) chi-square tests for testing if the multinomial data parameters lie in lower dimensional manifolds (examples include goodness of fit and independence testing). Empirically, these new test statistics outperform prior work, which focused on noisy versions of existing statistics.

A Framework for Network AB Testing

A/B testing, also known as controlled experiment, bucket testing or splitting testing, has been widely used for evaluating a new feature, service or product in the data-driven decision processes of online websites. The goal of A/B testing is to estimate or test the difference between the treatment effects of the old and new variations. It is a well-studied two-sample comparison problem if each user’s response is influenced by her treatment only. However, in many applications of A/B testing, especially those in HIVE of Yahoo and other social networks of Microsoft, Facebook, LinkedIn, Twitter and Google, users in the social networks influence their friends via underlying social interactions, and the conventional A/B testing methods fail to work. This paper considers the network A/B testing problem and provide a general framework consisting of five steps: data sampling, probabilistic model, parameter inference, computing average treatment effect and hypothesis test. The framework performs well for network A/B testing in simulation studies.

A Bayesian Ensemble for Unsupervised Anomaly Detection

Methods for unsupervised anomaly detection suffer from the fact that the data is unlabeled, making it difficult to assess the optimality of detection algorithms. Ensemble learning has shown exceptional results in classification and clustering problems, but has not seen as much research in the context of outlier detection. Existing methods focus on combining output scores of individual detectors, but this leads to outputs that are not easily interpretable. In this paper, we introduce a theoretical foundation for combining individual detectors with Bayesian classifier combination. Not only are posterior distributions easily interpreted as the probability distribution of anomalies, but bias, variance, and individual error rates of detectors are all easily obtained. Performance on real-world datasets shows high accuracy across varied types of time series data.

The Markov Memory for Generating Rare Events

We classify the rare events of structured, memoryful stochastic processes and use this to analyze sequential and parallel generators for these events. Given a stochastic process, we introduce a method to construct a new process whose typical realizations are a given process’ rare events. This leads to an expression for the minimum memory required to generate rare events. We then show that the recently discovered classical-quantum ambiguity of simplicity also occurs when comparing the structure of process fluctuations.

Scalable Dynamic Topic Modeling with Clustered Latent Dirichlet Allocation (CLDA)

Topic modeling is an increasingly important component of Big Data analytics, enabling the sense-making of highly dynamic and diverse streams of text data. Traditional methods such as Dynamic Topic Modeling (DTM), while mathematically elegant, do not lend themselves well to direct parallelization because of dependencies from one time step to another. Data decomposition approaches that partition data across time segments and then combine results in a global view of the dynamic change of topics enable execution of topic models on much larger datasets than is possibly without data decomposition. However, these methods are difficult to analyze mathematically and are relatively untested for quality of topics and performance on parallel systems. In this paper, we introduce and empirically analyze Clustered Latent Dirichlet Allocation (CLDA), a method for extracting dynamic latent topics from a collection of documents. CLDA uses a data decomposition strategy to partition data. CLDA takes advantage of parallelism, enabling fast execution for even very large datasets and a large number of topics. A large corpus is split into local segments to extract textual information from different time steps. Latent Dirichlet Allocation (LDA) is applied to infer topics at local segments. The results are merged, and clustering is used to combine topics from different segments into global topics. Results show that the perplexity is comparable and that topics generated by this algorithm are similar to those generated by DTM. In addition, CLDA is two orders of magnitude faster than existing approaches and allows for more freedom of experiment design. In this paper CLDA is applied successfully to seventeen years of NIPS conference papers, seventeen years of computer science journal abstracts, and to forty years of the PubMed corpus.

Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to unsupervised learning from a massive amount of data, albeit much of it relates to one modality/type of data at a time. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition of utilizing knowledge whenever it is available or can be created purposefully. In this paper, we focus on discussing the indispensable role of knowledge for deeper understanding of complex text and multimodal data in situations where (i) large amounts of training data (labeled/unlabeled) are not available or labor intensive to create, (ii) the objects (particularly text) to be recognized are complex (i.e., beyond simple entity-person/location/organization names), such as implicit entities and highly subjective content, and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create knowledge, varying from comprehensive or cross domain to domain or application specific, and (b) carefully exploit the knowledge to further empower or extend the applications of ML/NLP techniques. Using the early results in several diverse situations – both in data types and applications – we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data.

PATH: Person Authentication using Trace Histories

In this paper, a solution to the problem of Active Authentication using trace histories is addressed. Specifically, the task is to perform user verification on mobile devices using historical location traces of the user as a function of time. Considering the movement of a human as a Markovian motion, a modified Hidden Markov Model (HMM)-based solution is proposed. The proposed method, namely the Marginally Smoothed HMM (MSHMM), utilizes the marginal probabilities of location and timing information of the observations to smooth-out the emission probabilities while training. Hence, it can efficiently handle unforeseen observations during the test phase. The verification performance of this method is compared to a sequence matching (SM) method , a Markov Chain-based method (MC) and an HMM with basic Laplace Smoothing (HMM-lap). Experimental results using the location information of the UMD Active Authentication Dataset-02 (UMDAA02) and the GeoLife dataset are presented. The proposed MSHMM method outperforms the compared methods in terms of equal error rate (EER). Additionally, the effects of different parameters on the proposed method are discussed.

Artificial Intelligence Safety and Cybersecurity: a Timeline of AI Failures

In this work, we present and analyze reported failures of artificially intelligent systems and extrapolate our analysis to future AIs. We suggest that both the frequency and the seriousness of future AI failures will steadily increase. AI Safety can be improved based on ideas developed by cybersecurity experts. For narrow AIs safety failures are at the same, moderate, level of criticality as in cybersecurity, however for general AI, failures have a fundamentally different impact. A single failure of a superintelligent system may cause a catastrophic event without a chance for recovery. The goal of cybersecurity is to reduce the number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every security system will eventually fail; there is no such thing as a 100% secure system.

Floquet symmetry-protected topological phases in cold atomic systems

A Learned Representation For Artistic Style

A Mathematical Model for Fingerprinting-based Localization Algorithms

Online and Random-order Load Balancing Simultaneously

Discrimination power of a quantum detector

A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

A robust quantitative local central limit theorem with applications to enumerative combinatorics and random combinatorial structures

Predicting Counterfactuals from Large Historical Data and Small Randomized Trials

Wirelessly Powered Communication Networks with Short Packets

Surprisal-Driven Zoneout

Counting Zeros of Cosine Polynomials: On a Problem of Littlewood

Combined Hypothesis Testing on Graphs with Applications to Gene Set Enrichment Analysis

Exploratory Analysis of High Dimensional Time Series with Applications to Multichannel Electroencephalograms

Co-Occuring Directions Sketching for Approximate Matrix Multiply

Operational calculus on programming spaces and generalized tensor networks

Numerical simulations of Ising spin glasses with free boundary conditions: the role of droplet excitations and domain walls

Blind Detection for MIMO Systems With Low-Resolution ADCs Using Supervised Learning

Embracing the Blessing of Dimensionality in Factor Models

Singular SDEs with critical non-local and non-symmetric Lévy type generator

Proceedings of the First International Workshop on Formal Methods for and on the Cloud

A three-person chess-like game without Nash equilibria

EmojiNet: Building a Machine Readable Sense Inventory for Emoji

probitfe and logitfe: Bias corrections for probit and logit models with two-way fixed effects

Distributed and parallel time series feature extraction for industrial big data applications

Bias-Aware Sketches

New results for traitor tracing schemes

Sparse Hierarchical Tucker Factorization and its Application to Healthcare

Matroidal Structure of Skew Polynomial Rings with Application to Network Coding

Solving the Dual Problems of Dynamic Programs via Regression

Delocalization of a $(1+1)$-dimensional stochastic wave equation

Camera Fingerprint: A New Perspective for Identifying User’s Identity

Derandomization for k-submodular maximization

Approximate cross-validation formula for Bayesian linear regression

A parallel framework for reverse search using mts

Construction of MDS self-dual codes from orthogonal matrices

Categorical Complexity

MIMO Multiway Distributed-Relay Channel with Full Data Exchange: An Achievable Rate Perspective

Balancing, Regression, Difference-In-Differences and Synthetic Control Methods: A Synthesis

Big Models for Big Data using Multi objective averaged one dependence estimators

A Novel Boundary Matching Algorithm for Video Temporal Error Concealment

Global rigidity of generic frameworks on concentric cylinders

Image Clustering without Ground Truth

Invariance principle for `push’ tagged particles for a Toom Interface

Hardness of approximation for strip packing

Online Submodular Maximization with Free Disposal: Randomization Beats 0.25 for Partition Matroids

Omnidirectional Space-Time Block Coding for Common Information Broadcasting in Massive MIMO Systems

On a restricted linear congruence

Subexponential parameterized algorithms for graphs of polynomial growth

MIMO Systems With Low-Resolution ADCs: Linear Coding Approach

Still not there? Comparing Traditional Sequence-to-Sequence Models to Encoder-Decoder Neural Networks on Monotone String Translation Tasks

Frank-Wolfe Algorithms for Saddle Point Problems

mdBrief – A Fast Online Adaptable, Distorted Binary Descriptor for Real-Time Applications Using Calibrated Wide-Angle Or Fisheye Cameras

How Document Pre-processing affects Keyphrase Extraction Performance

3-dimensional organic Dirac-line material due to non-symmorphic symmetry: a data mining approach

On the convergence rate of the three operator splitting scheme

Classification of crescent configurations

Characteristic polynomials of Linial arrangements for exceptional root systems

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

Bounding Average-energy Games

A hybrid approach for planning and operating active distribution grids

Robustness of mixing under rough isometry, via bottleneck sequences

Maxmin convolutional neural networks for image classification

Generalization Bounds for Weighted Automata

Paracontrolled quasilinear SPDEs

Some upper bounds for the signless Laplacian spectral radius of digraphs

On the distance of stabilizer quantum codes given by $J$-affine variety codes

Towards Modelling Pedestrian-Vehicle Interactions: Empirical Study on Urban Unsignalized Intersection

Counterfactual: An R Package for Counterfactual Analysis

Avoid or Follow? Modelling Route Choice Based on Experimental Empirical Evidences

A work-efficient parallel sparse matrix-sparse vector multiplication algorithm

Two remarks on even and oddtown problems

A pumping lemma for non-cooperative self-assembly

A model free energy for glasses

Stabilization of the water-wave equations with surface tension

Sequence Segmentation Using Joint RNN and Structured Prediction Models

Gini Covariance Matrix and its Affine Equivariant Version

Active User Authentication for Smartphones: A Challenge Data Set and Benchmark Results

Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm

Commuting involution graphs of linear groups

End-to-end Learning of Deep Visual Representations for Image Retrieval

Statistical Inference Based on a New Weighted Likelihood Approach

Wasserstein Stability of the Entropy Power Inequality for Log-Concave Densities

FFLV-type monomial bases for type $B$

Leaders and followers: Quantifying consistency in spatio-temporal propagation patterns

Perfect matchings and Hamiltonian cycles in the preferential attachment model

A note on dimers and T-graphs

Rapid Mixing of Hypergraph Independent Set

Statistical Machine Translation for Indian Languages: Mission Hindi 2

On Fractional Linear Network Coding Solution of Multiple-Unicast Networks

Improved Upper Bounds on Systematic-Length for Linear Minimum Storage Regenerating Codes

Efficient Polar Code Construction for Higher-Order Modulation

Parallelizable sparse inverse formulation Gaussian processes (SpInGP)

Whats new on arXiv

The K Shortest Paths Problem with Application to Routing

We present a simple algorithm for explicitly computing all k shortest paths bounded by length L from a fixed source to a target in O(m + kL) and O(mlogm + kL) time for unweighted and weighted directed graphs with m edges respectively. For many graphs, this outperforms existing algorithms by exploiting the fact that real world networks have short average path length. Consequently, we would like to adapt our almost shortest paths algorithm to find an efficient solution to the almost short- est simple paths, where we exclude paths that visit any node more than once. To this end, we consider realizations from the Chung-Lu random graph model as the Chung-Lu random graph model is not only amenable to analysis, but also emulates many of the properties frequently observed in real world networks including the small world phenomenon and degree heterogeneity. We provide theoretical and numeric evidence regarding the efficiency of utilizing our almost shortest paths algorithm to find al- most shortest simple paths for Chung-Lu random graphs for a wide range of parameters. Finally, we consider a special application of our almost shortest paths algorithm to study internet routing (withdrawals) in the Autonomous System graph.

Automatic Identification of Sarcasm Target: An Introductory Approach

Past work in computational sarcasm deals primarily with sarcasm detection. In this paper, we introduce a novel, related problem: sarcasm target identification (\textit{i.e.}, extracting the target of ridicule in a sarcastic sentence). We present an introductory approach for sarcasm target identification. Our approach employs two types of extractors: one based on rules, and another consisting of a statistical classifier. To compare our approach, we use two baselines: a na\’ive baseline and another baseline based on work in sentiment target identification. We perform our experiments on book snippets and tweets, and show that our hybrid approach performs better than the two baselines and also, in comparison with using the two extractors individually. Our introductory approach establishes the viability of sarcasm target identification, and will serve as a baseline for future work.

Independent Component Analysis by Entropy Maximization with Kernels

Independent component analysis (ICA) is the most popular method for blind source separation (BSS) with a diverse set of applications, such as biomedical signal processing, video and image analysis, and communications. Maximum likelihood (ML), an optimal theoretical framework for ICA, requires knowledge of the true underlying probability density function (PDF) of the latent sources, which, in many applications, is unknown. ICA algorithms cast in the ML framework often deviate from its theoretical optimality properties due to poor estimation of the source PDF. Therefore, accurate estimation of source PDFs is critical in order to avoid model mismatch and poor ICA performance. In this paper, we propose a new and efficient ICA algorithm based on entropy maximization with kernels, (ICA-EMK), which uses both global and local measuring functions as constraints to dynamically estimate the PDF of the sources with reasonable complexity. In addition, the new algorithm performs optimization with respect to each of the cost function gradient directions separately, enabling parallel implementations on multi-core computers. We demonstrate the superior performance of ICA-EMK over competing ICA algorithms using simulated as well as real-world data.

Online Classification with Complex Metrics

We present a framework and analysis of consistent binary classification for complex and non-decomposable performance metrics such as the F-measure and the Jaccard measure. The proposed framework is general, as it applies to both batch and online learning, and to both linear and non-linear models. Our work follows recent results showing that the Bayes optimal classifier for many complex metrics is given by a thresholding of the conditional probability of the positive class. This manuscript extends this thresholding characterization — showing that the utility is strictly locally quasi-concave with respect to the threshold for a wide range of models and performance metrics. This, in turn, motivates simple normalized gradient ascent updates for threshold estimation. We present a finite-sample regret analysis for the resulting procedure. In particular, the risk for the batch case converges to the Bayes risk at the same rate as that of the underlying conditional probability estimation, and the risk of proposed online algorithm converges at a rate that depends on the conditional probability estimation risk. For instance, in the special case where the conditional probability model is logistic regression, our procedure achieves O(\frac{1}{\sqrt{n}}) sample complexity, both for batch and online training. Empirical evaluation shows that the proposed algorithms out-perform alternatives in practice, with comparable or better prediction performance and reduced run time for various metrics and datasets.

How to be Fair and Diverse?

Due to the recent cases of algorithmic bias in data-driven decision-making, machine learning methods are being put under the microscope in order to understand the root cause of these biases and how to correct them. Here, we consider a basic algorithmic task that is central in machine learning: subsampling from a large data set. Subsamples are used both as an end-goal in data summarization (where fairness could either be a legal, political or moral requirement) and to train algorithms (where biases in the samples are often a source of bias in the resulting model). Consequently, there is a growing effort to modify either the subsampling methods or the algorithms themselves in order to ensure fairness. However, in doing so, a question that seems to be overlooked is whether it is possible to produce fair subsamples that are also adequately representative of the feature space of the data set – an important and classic requirement in machine learning. Can diversity and fairness be simultaneously ensured? We start by noting that, in some applications, guaranteeing one does not necessarily guarantee the other, and a new approach is required. Subsequently, we present an algorithmic framework which allows us to produce both fair and diverse samples. Our experimental results on an image summarization task show marked improvements in fairness without compromising feature diversity by much, giving us the best of both the worlds.

Inertial Regularization and Selection (IRS): Sequential Regression in High-Dimension and Sparsity

In this paper, we develop a new sequential regression modeling approach for data streams. Data streams are commonly found around us, e.g in a retail enterprise sales data is continuously collected every day. A demand forecasting model is an important outcome from the data that needs to be continuously updated with the new incoming data. The main challenge in such modeling arises when there is a) high dimensional and sparsity, b) need for an adaptive use of prior knowledge, and/or c) structural changes in the system. The proposed approach addresses these challenges by incorporating an adaptive L1-penalty and inertia terms in the loss function, and thus called Inertial Regularization and Selection (IRS). The former term performs model selection to handle the first challenge while the latter is shown to address the last two challenges. A recursive estimation algorithm is developed, and shown to outperform the commonly used state-space models, such as Kalman Filters, in experimental studies and real data.

Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization

We propose a new model based on the deconvolutional networks and SAX discretization to learn the representation for multivariate time series. Deconvolutional networks fully exploit the advantage the powerful expressiveness of deep neural networks in the manner of unsupervised learning. We design a network structure specifically to capture the cross-channel correlation with deconvolution, forcing the pooling operation to perform the dimension reduction along each position in the individual channel. Discretization based on Symbolic Aggregate Approximation is applied on the feature vectors to further extract the bag of features. We show how this representation and bag of features helps on classification. A full comparison with the sequence distance based approach is provided to demonstrate the effectiveness of our approach on the standard datasets. We further build the Markov matrix from the discretized representation from the deconvolution to visualize the time series as complex networks, which show more class-specific statistical properties and clear structures with respect to different labels.

Large Scale Parallel Computations in R through Elemental

Even though in recent years the scale of statistical analysis problems has increased tremendously, many statistical software tools are still limited to single-node computations. However, statistical analyses are largely based on dense linear algebra operations, which have been deeply studied, optimized and parallelized in the high-performance-computing community. To make high-performance distributed computations available for statistical analysis, and thus enable large scale statistical computations, we introduce RElem, an open source package that integrates the distributed dense linear algebra library Elemental into R. While on the one hand, RElem provides direct wrappers of Elemental’s routines, on the other hand, it overloads various operators and functions to provide an entirely native R experience for distributed computations. We showcase how simple it is to port existing R programs to Relem and demonstrate that Relem indeed allows to scale beyond the single-node limitation of R with the full performance of Elemental without any overhead.

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

Introduction: Cognitive Issues in Natural Language Processing

This special issue is dedicated to get a better picture of the relationships between computational linguistics and cognitive science. It specifically raises two questions: ‘what is the potential contribution of computational language modeling to cognitive science?’ and conversely: ‘what is the influence of cognitive science in contemporary computational linguistics?’

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.

Transforming a matrix into a standard form

We show that every matrix all of whose entries are in a fixed subgroup of the group of units of a commutative ring with identity is equivalent to a standard form. As a consequence, we improve the proof of Theorem 5 in D. Best, H. Kharaghani, H. Ramp [Disc. Math. 313 (2013), 855–864].

Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research

Meaning has been called the ‘holy grail’ of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion.

Parallelizing Spectral Algorithms for Kernel Learning

We consider a distributed learning approach in supervised learning for a large class of spectral regularization methods in an RKHS framework. The data set of size n is partitioned into m=O(n^\alpha), \alpha \leq \frac{1}{2}, disjoint subsets. On each subset, some spectral regularization method (belonging to a large class, including in particular Kernel Ridge Regression, L^2-boosting and spectral cut-off) is applied. The regression function f is then estimated via simple averaging, leading to a substantial reduction in computation time. We show that minimax optimal rates of convergence are preserved if m grows sufficiently slowly (corresponding to an upper bound for \alpha) as n \to \infty, depending on the smoothness assumptions on f and the intrinsic dimensionality. In spirit, our approach is classical.

High-Dimensional Adaptive Function-on-Scalar Regression

Applications of functional data with large numbers of predictors have grown precipitously in recent years, driven, in part, by rapid advances in genotyping technologies. Given the large numbers of genetic mutations encountered in genetic association studies, statistical methods which more fully exploit the underlying structure of the data are imperative for maximizing statistical power. However, there is currently very limited work in functional data with large numbers of predictors. Tools are presented for simultaneous variable selection and parameter estimation in a functional linear model with a functional outcome and a large number of scalar predictors; the technique is called AFSL for \textit{Adaptive Function-on-Scalar Lasso}. It is demonstrated how techniques from convex analysis over Hilbert spaces can be used to establish a functional version of the oracle property for AFSL over any real separable Hilbert space, even when the number of predictors, I, is exponentially large compared to the sample size, N. AFSL is illustrated via a simulation study and data from the Childhood Asthma Management Program, CAMP, selecting those genetic mutations which are important for lung growth.

On Multiplicative Multitask Feature Learning

We investigate a general framework of multiplicative multitask feature learning which decomposes each task’s model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods have been proposed as special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effect. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. Empirical studies have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks.

A data augmentation methodology for training machine/deep learning gait recognition algorithms

There are several confounding factors that can reduce the accuracy of gait recognition systems. These factors can reduce the distinctiveness, or alter the features used to characterise gait, they include variations in clothing, lighting, pose and environment, such as the walking surface. Full invariance to all confounding factors is challenging in the absence of high-quality labelled training data. We introduce a simulation-based methodology and a subject-specific dataset which can be used for generating synthetic video frames and sequences for data augmentation. With this methodology, we generated a multi-modal dataset. In addition, we supply simulation files that provide the ability to simultaneously sample from several confounding variables. The basis of the data is real motion capture data of subjects walking and running on a treadmill at different speeds. Results from gait recognition experiments suggest that information about the identity of subjects is retained within synthetically generated examples. The dataset and methodology allow studies into fully-invariant identity recognition spanning a far greater number of observation conditions than would otherwise be possible.

Automatic Image De-fencing System

Safety Verification of Deep Neural Networks

Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects

Cut-off method for endogeny of recursive tree processes

Mean-Field Variational Inference for Gradient Matching with Gaussian Processes

A Noisy-Influence Regularity Lemma for Boolean Functions

Improved Method to extract Nucleon Helicity Distributions using Event Weighting

Permutation tests in the two-sample problem for functional data

Learning Cost-Effective Treatment Regimes using Markov Decision Processes

Tracy-Widom fluctuations for perturbations of the log-gamma polymer in intermediate disorder

Distance signless Laplacian spectral radius and Hamiltonian properties of graphs

Spectral Angle Based Unary Energy Functions for Spatial-Spectral Hyperspectral Classification using Markov Random Fields

Multitask Learning of Vegetation Biochemistry from Hyperspectral Data

Modeling and Analysis of Uplink Non-Orthogonal Multiple Access (NOMA) in Large-Scale Cellular Networks Using Poisson Cluster Processes

Ranking of classification algorithms in terms of mean-standard deviation using A-TOPSIS

Optimization on Submanifolds of Convolution Kernels in CNNs

P_3-Games on Chordal Bipartite Graphs

Understanding Sea Ice Melting via Functional Data Analysis

Ergodic maximum principle for stochastic systems

Windings of planar processes and applications to the pricing of Asian options

Exercise Motion Classification from Large-Scale Wearable Sensor Data Using Convolutional Neural Networks

Study of Tomlinson-Harashima Precoding Strategies for Physical-Layer Security in Wireless Networks

A class of Weiss-Weinstein bounds and its relationship with the Bobrovsky-Mayer-Wolf-Zakai bounds

Certified Roundoff Error Bounds using Bernstein Expansions and Sparse Krivine-Stengle Representations

p-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

Convergence of the Euler-Maruyama method for multidimensional SDEs with discontinuous drift and degenerate diffusion coefficient

The effect of delay on contact tracing

The limits of weak selection and large population size in evolutionary game theory

Fluctuations of Functions of Wigner Matrices

Deep image mining for diabetic retinopathy screening

Reinforcement Learning in Conflicting Environments for Autonomous Vehicles

A statistical approach to covering lemmas

Local Maxima and Improved Exact Algorithm for MAX-2-SAT

General Central Limit Theorems for Associated Sequences

Fast and Reliable Parameter Estimation from Nonlinear Observations

Cross Device Matching for Online Advertising with Neural Feature Ensembles : First Place Solution at CIKM Cup 2016

Analysis of Count Data by Transmuted Geometric Distribution

Multi-View Subspace Clustering via Relaxed $L_1$-Norm of Tensor Multi-Rank

The quadratic regulator problem and the Riccati equation for a process governed by a linear Volterra integrodifferential equations

Asymptotic of Non-Crossings probability of Additive Wiener Fields

The first Cheeger constant of a simplex

Another characterization of homogeneous Poisson processes

Two are Better than One: An Ensemble of Retrieval- and Generation-Based Dialog Systems

Real-time Halfway Domain Reconstruction of Motion and Geometry

On Zermelo’s theorem

Stochastic inference with spiking neurons in the high-conductance state

Colouring simplicial complexes: on the Lechuga-Murillo’s model

Not All Multi-Valued Partial CFL Functions Are Refined by Single-Valued Functions

Death and rebirth of neural activity in sparse inhibitory networks

Hybrid-DCA: A Double Asynchronous Approach for Stochastic Dual Coordinate Ascent

Learning Deep Architectures for Interaction Prediction in Structure-based Virtual Screening

The Security of Hardware-Based Omega(n^2) Cryptographic One-Way Functions: Beyond Satisfiability and P=NP

Simpler PAC-Bayesian Bounds for Hostile Data

On the general solution of the Heideman-Hogan family of recurrences

Distinguishing number and distinguishing index of Kronecker product of two graphs

On the dynamic consistency of hierarchical risk-averse decision problems

Output-sensitive Complexity of Multiobjective Combinatorial Optimization

On the maximum number of colorings of a graph

Stochastic Modeling and Statistical Inference of Intrinsic Noise in Gene Regulation System via Chemical Master Equation

3D Hand Pose Tracking and Estimation Using Stereo Matching

Sets of Priors Reflecting Prior-Data Conflict and Agreement

Eulerian polynomials and descent statistics

Maximizing the number of $x$-colorings of $4$-chromatic graphs

Partitioning Trillion-edge Graphs in Minutes

Robust Bayesian Reliability for Complex Systems under Prior-Data Conflict

A Polynomial Kernel for Distance-Hereditary Vertex Deletion

Template Matching Advances and Applications in Image Analysis

Hybrid Static/Dynamic Schedules for Tiled Polyhedral Programs

SPiKeS: Superpixel-Keypoints Structure for Robust Visual Tracking

Are mmWave Low-Complexity Beamforming Structures Energy-Efficient? Analysis of the Downlink MU-MIMO

Power of one non-clean qubit

Irregular Stochastic differential equations driven by a family of Markov processes

Random Multiple Access for M2M Communications with QoS Guarantees

Information-theoretic Physical Layer Security for Satellite Channels

Dual Ore’s theorem for distributive intervals of small index

Minimum triplet covers of binary phylogenetic $X$-trees

Differential Modulation for Asynchronous Two-Way-Relay Systems over Frequency-Selective Fading Channels

Bayesian Nonparametric Modeling of Heterogeneous Groups of Censored Data

Evolutionary State-Space Model and Its Application to Time-Frequency Analysis of Local Field Potentials

Bridging Neural Machine Translation and Bilingual Dictionaries

Encoding Temporal Markov Dynamics in Graph for Time Series Visualization

Cohort aggregation modelling for complex forest stands: Spruce-aspen mixtures in British Columbia

Decentralized Transmission Policies for Energy Harvesting Devices

Molecular solutions for the Maximum K-colourable Sub graph Problem in Adleman-Lipton model

Channel capacity of polar coding with a given polar mismatched successive cancellation decoder

Coincidences between characters to hooks and 2-part partitions on families arising from 2-regular classes

A Rate-Distortion Approach to Caching

Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies

Cubic edge-transitive bi-$p$-metacirculant

Stability analysis of delay differential equations via Semidefinite programming

Optimal insider control of systems with delay

Discrete least-squares approximations over optimized downward closed polynomial spaces in arbitrary dimension

Limiting behavior of 3-color excitable media on arbitrary graphs

A coarse-to-fine algorithm for registration in 3D street-view cross-source point clouds

Interference Management and Power Allocation for NOMA Visible Light Communications Network

An Assmus-Mattson theorem for codes over commutative association schemes

MultiCol-SLAM – A Modular Real-Time Multi-Camera SLAM System

Optimizing egalitarian performance in the side-effects model of colocation for data~center resource management

STDP allows close-to-optimal spatiotemporal spike pattern detection by single coincidence detector neurons

Large and moderate deviations for the left random walk on GL d (R)

Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media

Challenges to be addressed for realising an Ephemeral Cloud Federation

Theoretical Analysis of Active Contours on Graphs

Cutoff phenomenon for the asymmetric simple exclusion process and the biased card shuffling

On Solving Non-preemptive Mixed-criticality Match-up Scheduling Problem with Two and Three Criticality Levels

QoE-aware Scalable Video Transmission in MIMO~Systems

Characterization of an inconsistency ranking for pairwise comparison matrices

Percolation results for the Continuum Random Cluster Model

Record Counting in Historical Handwritten Documents with Convolutional Neural Networks

Possibilities of Recursive GPU Mapping for Discrete Orthogonal Simplices

The Function-on-Scalar LASSO with Applications to Longitudinal GWAS

Tracking of Wideband Multipath Components in a Vehicular Communication Scenario

C-mix: a high dimensional mixture model for censored durations, with applications to genetic data

Simplices in a small set of points in $\mathbb{F}_p^2$

Statistical Machine Translation for Indian Languages: Mission Hindi

Using Machine Learning to Detect Noisy Neighbors in 5G Networks

Reordering rules for English-Hindi SMT

Fluctuations around mean walking behaviours in diluted pedestrian flows

Coalescence on the real line

Finite size scaling of random XORSAT

An Attempt to Design a Better Algorithm for the Uncapacitated Facility Location Problem

Greedy Gaussian Segmentation of Multivariate Time Series

Deep Multi-scale Location-aware 3D Convolutional Neural Networks for Automated Detection of Lacunes of Presumed Vascular Origin

A Framework for Parallel and Distributed Training of Neural Networks

Hybrid Quantile Regression Estimation for Time Series Models with Conditional Heteroscedasticity

Optimistic Aborts for Geo-distributed Transactions

Conditions on square geometric graphs

Distilling Information Reliability and Source Trustworthiness from Digital Traces

Feature Sensitive Label Fusion with Random Walker for Atlas-based Image Segmentation

Strongly robust toric ideals in codimension 2

One-dimensional reflected rough differential equations

Relating Diversity and Human Appropriation from Land Cover Data

Laplacian regularized low rank subspace clustering

Analyzing the structure of multidimensional compressed sensing problems through coherence

Dynamic Complexity of the Dyck Reachability

‘Weak yet strong’ restrictions of Hindman’s Finite Sums Theorem

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

A Variational Bayesian Approach for Restoring Data Corrupted with Non-Gaussian Noise

Nonlinear Adaptive Algorithms on Rank-One Tensor Models

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments

Target Set Selection in Dense Graph Classes

PhaseMax: Convex Phase Retrieval via Basis Pursuit

Nonconvex penalized regression using depth-based penalty functions: multitask learning and support union recovery in high dimensions

Collapse transition of the interacting prudent walk

Statistical inference in partially observed stochastic compartmental models with application to cell lineage tracking of in vivo hematopoiesis

Robustness of critical bit rates for practical stabilization of networked control systems

Some Relationships and Properties of the Hypergeometric Distribution

On the smoothness of the value function for affine optimal control problems

Automatic and Manual Segmentation of Hippocampus in Epileptic Patients MRI

Automated OCT Segmentation for Images with DME

Asymptotics of the number of standard Young tableaux of skew shape

Quantized Precoding for Massive MU-MIMO

Geometry of Polysemy

Node Isolation of Secure Wireless Sensor Networks under a Heterogeneous Channel Model

On capacity of optical communications over a lossy bosonic channel with a receiver employing the most general coherent electro-optic feedback control

Adjusting for Unmeasured Spatial Confounding with Distance Adjusted Propensity Score Matching

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

On the Network Reliability Problem of the Heterogeneous Key Predistribution Scheme

Book Memo: “Novel Applications of Intelligent Systems”

In this carefully edited book some selected results of theoretical and applied research in the field of broadly perceived intelligent systems are presented. The problems vary from industrial to web and problem independent applications. All this is united under the slogan: ‘Intelligent systems conquer the world”.
The book brings together innovation projects with analytical research, invention, retrieval and processing of knowledge and logical applications in technology.
This book is aiming to a wide circle of readers and particularly to the young generation of IT/ICT experts who will build the next generations of intelligent systems.

R Packages worth a look

Performs Bayesian Variable Selection on the Covariates in a Semi-Competing Risks Model (SCRSELECT)
Contains four functions used in the DIC-tau_g procedure. SCRSELECT() and SCRSELECTRUN() uses Stochastic Search Variable Selection to select important covariates in the three hazard functions of a semi-competing risks model. These functions perform the Gibbs sampler for variable selection and a Metropolis-Hastings-Green sampler for the number of split points and parameters for the three baseline hazard function. The function SCRSELECT() returns the posterior sample of all quantities sampled in the Gibbs sampler after a burn-in period to a desired file location, while the function SCRSELECTRUN() returns posterior values of important quantities to the DIC-Tau_g procedure in a list. The function DICTAUG() returns a list containing the DIC values for the unique models visited by the DIC-Tau_g grid search. The function ReturnModel() uses SCRSELECTRUN() and DICTAUG() to return a summary of the posterior coefficient vectors for the optimal model along with saving this posterior sample to a desired path location.

Fast Linear Models for Objects from the ‘bigmemory’ Package (bigFastlm)
A reimplementation of the fastLm() functionality of ‘RcppEigen’ for big.matrix objects for fast out-of-memory linear model fitting.

d3.js’ Utilities for R (d3r)
Helper functions for using ‘d3.js’ in R.

Document worth reading: “Predicting the future of predictive analytics”

The proliferation of data and the increasing awareness of the potential to gain valuable insight and a competitive advantage from that information are driving organizations to place data at the heart of their corporate strategy. Consumers regularly benefit from predictive analytics, in the form of anything from weather forecasts to insurance premiums. Organizations are now exploring the possibilities of using historical data to exploit growth opportunities and minimize business risks, a field known as predictive analytics. SAP commissioned Loudhouse to conduct primary research among business decision-makers in UK and US organizations to understand their attitudes to and experiences of predictive analytics, as well as a future view of usage, value and investment. The research reveals that businesses are struggling to take full advantage of the burgeoning and already overwhelming amount of data being collected. Challenges abound as firms seek to make effective use of data. While many businesses are investing in predictive analytics and already seeing benefits in a number of areas, even more see this as a future investment priority for their business. The research points to a data-driven future where advanced predictive analytics sits at the core of the business function rather than being siloed, is embraced by a greater proportion of the workforce and is used to drive decision-making across the whole business. To achieve this future vision, however, it is clear that businesses need to up-skill their workforce and invest in more intuitive technology. While firms in the UK and US recognize the potential of predictive analytics and the need for investment in skills, the US is further along the adoption curve than the UK. US organizations show greater promise for future investment in – and roll-out of – predictive analytics software across the workforce. Furthermore, US organizations perceive fewer challenges in using data to inform corporate strategy, and sense a greater need for training to embed the benefits of the technology into day-to-day business. Predicting the future of predictive analytics

Whats new on arXiv

Iterative Refinement for Machine Translation

Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions. As a result, earlier mistakes cannot be corrected at a later stage. In this paper, we present a translation scheme that starts from an initial guess and then makes iterative improvements that may revisit previous decisions. We parameterize our model as a convolutional neural network that predicts discrete substitutions to an existing translation based on an attention mechanism over both the source sentence as well as the current translation output. By making less than one modification per sentence, we improve the output of a phrase-based translation system by up to 0.4 BLEU on WMT15 German-English translation.

Functional Asynchronous Networks: Factorization of Dynamics and Function

In this note we describe the theory of functional asynchronous networks and one of the main results, the Modularization of Dynamics Theorem, which for a large class of functional asynchronous networks gives a factorization of dynamics in terms of constituent subnetworks. For these networks we can give a complete description of the network function in terms of the function of the events comprising the network and thereby answer a question originally raised by Alon in the context of biological networks.

Novelty Learning via Collaborative Proximity Filtering

The vast majority of recommender systems model preferences as static or slowly changing due to observable user experience. However, spontaneous changes in user preferences are ubiquitous in many domains like media consumption and key factors that drive changes in preferences are not directly observable. These latent sources of preference change pose new challenges. When systems do not track and adapt to users’ tastes, users lose confidence and trust, increasing the risk of user churn. We meet these challenges by developing a model of novelty preferences that learns and tracks latent user tastes. We combine three innovations: a new measure of item similarity based on patterns of consumption co-occurrence; model for {\em spontaneous} changes in preferences; and a learning agent that tracks each user’s dynamic preferences and learns individualized policies for variety. The resulting framework adaptively provides users with novelty tailored to their preferences for change per se.

Single Pass PCA of Matrix Products

In this paper we present a new algorithm for computing a low rank approximation of the product A^TB by taking only a single pass of the two matrices A and B. The straightforward way to do this is to (a) first sketch A and B individually, and then (b) find the top components using PCA on the sketch. Our algorithm in contrast retains additional summary information about A,B (e.g. row and column norms etc.) and uses this additional information to obtain an improved approximation from the sketches. Our main analytical result establishes a comparable spectral norm guarantee to existing two-pass methods; in addition we also provide results from an Apache Spark implementation that shows better computational and statistical performance on real-world and synthetic evaluation datasets.

Maximally Divergent Intervals for Anomaly Detection

We present new methods for batch anomaly detection in multivariate time series. Our methods are based on maximizing the Kullback-Leibler divergence between the data distribution within and outside an interval of the time series. An empirical analysis shows the benefits of our algorithms compared to methods that treat each time step independently from each other without optimizing with respect to all possible intervals.

Convex Formulation for Kernel PCA and its Use in Semi-Supervised Learning

In this paper, Kernel PCA is reinterpreted as the solution to a convex optimization problem. Actually, there is a constrained convex problem for each principal component, so that the constraints guarantee that the principal component is indeed a solution, and not a mere saddle point. Although these insights do not imply any algorithmic improvement, they can be used to further understand the method, formulate possible extensions and properly address them. As an example, a new convex optimization problem for semi-supervised classification is proposed, which seems particularly well-suited whenever the number of known labels is small. Our formulation resembles a Least Squares SVM problem with a regularization parameter multiplied by a negative sign, combined with a variational principle for Kernel PCA. Our primal optimization principle for semi-supervised learning is solved in terms of the Lagrange multipliers. Numerical experiments in several classification tasks illustrate the performance of the proposed model in problems with only a few labeled data.

Automated Big Text Security Classification

In recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government’s relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization’s network. However, state-of-art DLP detection models are only able to detect very limited types of sensitive information, and research in the field has been hindered due to the lack of available sensitive texts. Many researchers have focused on document-based detection with artificially labeled ‘confidential documents’ for which security labels are assigned to the entire document, when in reality only a portion of the document is sensitive. This type of whole-document based security labeling increases the chances of preventing authorized users from accessing non-sensitive information within sensitive documents. In this paper, we introduce Automated Classification Enabled by Security Similarity (ACESS), a new and innovative detection model that penetrates the complexity of big text security classification/detection. To analyze the ACESS system, we constructed a novel dataset, containing formerly classified paragraphs from diplomatic cables made public by the WikiLeaks organization. To our knowledge this paper is the first to analyze a dataset that contains actual formerly sensitive information annotated at paragraph granularity.

Review of Action Recognition and Detection Methods

In computer vision, action recognition refers to the act of classifying an action that is present in a given video and action detection involves locating actions of interest in space and/or time. Videos, which contain photometric information (e.g. RGB, intensity values) in a lattice structure, contain information that can assist in identifying the action that has been imaged. The process of action recognition and detection often begins with extracting useful features and encoding them to ensure that the features are specific to serve the task of action recognition and detection. Encoded features are then processed through a classifier to identify the action class and their spatial and/or temporal locations. In this report, a thorough review of various action recognition and detection algorithms in computer vision is provided by analyzing the two-step process of a typical action recognition and detection algorithm: (i) extraction and encoding of features, and (ii) classifying features into action classes. In efforts to ensure that computer vision-based algorithms reach the capabilities that humans have of identifying actions irrespective of various nuisance variables that may be present within the field of view, the state-of-the-art methods are reviewed and some remaining problems are addressed in the final chapter.

Relational Crowdsourcing and its Application in Knowledge Graph Evaluation

Automatic construction of large knowledge graphs (KG) by mining web-scale text datasets has received considerable attention over the last few years, resulting in the construction of several KGs, such as NELL, Google Knowledge Vault, etc. These KGs consist of thousands of predicate-relations (e.g., isPerson, isMayorOf ) and millions of their instances (e.g., (Bill de Blasio, isMayorOf, New York City)). Estimating accuracy of such automatically constructed KGs is a challenging problem due to their size and diversity. Even though crowdsourcing is an obvious choice for such evaluation, the standard single-task crowdsourcing, where each predicate in the KG is evaluated independently, is very expensive and especially problematic if the budget available is limited. We show that such approaches are sub-optimal as they ignore dependencies among various predicates and their instances. To overcome this challenge, we propose Relational Crowdsourcing (RelCrowd), where the tasks are created while taking dependencies among predicates and instances into account. We apply this framework in the context of evaluation of large-scale KGs and demonstrate its effectiveness through extensive experiments on real-world datasets.

Euclidean distance matrix completion and point configurations from the minimal spanning tree

Combinatorial Multi-Armed Bandit with General Reward Functions

The fixation probability and time for a doubly beneficial mutant

Solving Multi-Objective Optimization via Adaptive Stochastic Search with Domination Measure

An Efficient Optimal Algorithm for Integer-Forcing Linear MIMO Receivers Design

Proposing Plausible Answers for Open-ended Visual Question Answering

W-Operators and Permutation Groups

Chimera states in a network-organized public goods game with mutations

On the Inverse Power Flow Problem

Sampling hyperparameters in hierarchical models: improving on Gibbs for high-dimensional latent fields and large data sets

Characteristic Polynomials of Symmetric Matrices over the Univariate Polynomial Ring

Three phase classification of an uninterrupted traffic flow: a $k$-means clustering study

A Generalized Correlation Index for Quantifying Signal Morphological Similarity

Minimal Skew energy of oriented bicyclic graphs with a given diameter

The Computational Complexity of Ball Permutations

Immune Therapeutic Strategies Using Optimal Controls with L1 and L2 Type Objectives

Analysis of One-Bit Quantized Precoding for the Multiuser Massive MIMO Downlink

Unveiling the Multi-fractal Structure of Complex Networks

Stochastic Gradient MCMC with Stale Gradients

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators

Short-term prediction of localized cloud motion using ground-based sky imagers

Detecting Rainfall Onset Using Sky Images

Scalable Pooled Time Series of Big Video Data from the Deep Web

Stochastic analysis in a tubular neighborhood or Onsager-Machlup functions revisited

Multi-view metric learning for multi-instance image classification

The Broadcaster-Repacking Problem

A Convex Programming-based Algorithm for Mean Payoff Stochastic Games with Perfect Information

New Survey Questions and Estimators for Network Clustering with Respondent-Driven Sampling Data

Error estimates with explicit constants for the Sinc approximation over infinite intervals

Current Redistribution in Resistor Networks: Fat-Tail Statistics in Regular and Small-World Networks

Multispectral image denoising with optimized vector non-local mean filter

Tempered Fractional Multistable Motion and Tempered Multifractional Stable Motion

End-to-End Training Approaches for Discriminative Segmental Models

Two Stage Optimization with Recourse and Revocation

Random constraint sampling and duality for convex optimization

Interplay of inhibition and multiplexing : Largest eigenvalue statistics

Scaling Limits of Solutions of SPDE Driven by Lévy White Noises

A New Simulation Approach to Performance Evaluation of Binary Linear Codes in the Extremely Low Error Rate Region

Optimal Mechanisms for Selling Two Items to a Single Buyer Having Uniformly Distributed Valuations

Minimax Error of Interpolation and Optimal Design of Experiments for Variable Fidelity Data

Almost Budget Balanced Mechanisms with Scalar Bids For Allocation of a Divisible Good

The distance spectra of the derangement graphs

Model-based Outdoor Performance Capture

Tropical spectrahedra

Study of Domains in the Ground State of the Two Dimensional Coulomb Glass

Comments on ‘Bayesian Solution Uncertainty Quantification for Differential Equations’ by Chkrebtii, Campbell, Calderhead & Girolami

Switching in time-optimal problem with co-dimension 1 control

Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches

Deterministic Distributed (Delta + o(Δ))-Edge-Coloring, and Vertex-Coloring of Graphs with Bounded Diversity

Quantum rational preferences and desirability

Performance Analysis of Multi-User Massive MIMO Downlink under Channel Non-Reciprocity and Imperfect CSI

Passage times, exit times and Dirichlet problems for open quantum walks

Asymptotic equivalence of paired Hotelling test and conditional logistic regression

Vision-Based Reaching Using Modular Deep Networks: from Simulation to the Real World

Disorder relevance without Harris Criterion: the case of pinning model with $γ$-stable environment

Robust Markowitz mean-variance portfolio selection under ambiguous volatility and correlation

Robust training on approximated minimal-entropy set

Deep Models for Engagement Assessment With Scarce Label Information

Generating functions and statistics on spaces of maximal tori in classical Lie groups

Multiscale Abstraction, Planning and Control using Diffusion Wavelets for Stochastic Optimal Control Problems

Asymptotic behaviors of bivariate Gaussian powered extremes

Vector quantile regression beyond correct specification

A Projected Gradient and Constraint Linearization Method for Nonlinear Model Predictive Control

A note on sequences of expected maxima and expected ranges

A Polynomial-Time Algorithm for Pliable Index Coding

Stochastic Geometric Analysis of Energy-Efficient Dense Cellular Networks

Spatio-temporal adaptive penalized splines with application to Neuroscience

Fast estimation of multidimensional adaptive P-spline models

Convergence Rate Estimates for Consensus over Random Graphs

New nonlinear programming techniques for multiobjective optimization with applications to portfolio selection

A model for vortex nucleation in the Ginzburg-Landau equations

Six-vertex models and the GUE-corners process

Floquet topological phases with symmetry in all dimensions

Dictionary Learning Strategies for Compressed Fiber Sensing Using a Probabilistic Sparse Model

Joint Deep Exploitation of Semantic Keywords and Visual Features for Malicious Crowd Image Classification

Enhanced Object Detection via Fusion With Prior Beliefs from Image Classification

Transportation inequalities for non-globally dissipative SDEs with jumps via Malliavin calculus and coupling

Learning to Protect Communications with Adversarial Neural Cryptography

Book Memo: “Statistics for Lawyers”

This classic text, first published in 1990, is designed to introduce law students, law teachers, practitioners, and judges to the basic ideas of mathematical probability and statistics as they have been applied in the law. The third edition includes over twenty new sections, including the addition of timely topics, like New York City police stops, exonerations in death-sentence cases, projecting airline costs, and new material on various statistical techniques such as the randomized response survey technique, rare-events meta-analysis, competing risks, and negative binomial regression. The book consists of sections of exposition followed by real-world cases and case studies in which statistical data have played a role. The reader is asked to apply the theory to the facts, to calculate results (a hand calculator is sufficient), and to explore legal issues raised by quantitative findings. The authors’ calculations and comments are given in the back of the book. As with previous editions, the cases and case studies reflect a broad variety of legal subjects, including antidiscrimination, mass torts, taxation, school finance, identification evidence, preventive detention, handwriting disputes, voting, environmental protection, antitrust, sampling for insurance audits, and the death penalty. A chapter on epidemiology was added in the second edition. In 1991, the first edition was selected by the University of Michigan Law Review as one of the important law books of the year.