# Whats new on arXiv

Time domain simulation is the basis of dynamic security assessment for power systems. Traditionally, numerical integration methods are adopted by simulation software to solve nonlinear power system differential-algebraic equations about any given contingency under a specific operating condition. An alternative approach promising for online simulation is to offline derive a semi-analytical solution (SAS) and then online evaluate the SAS over consecutive time windows regarding the operating condition and contingency until obtaining the simulation result over a desired period. This paper proposes a general semi-analytical approach that derives and evaluates an SAS in the form of power series in time to approximate the solutions of power system differential equations. An error-rate upper bound of the SAS is also proposed to guarantee the reliable use of adaptive time windows for evaluation of the SAS. A dynamic bus method is proposed to extend the semi-analytical approach for solving general power system DAEs by efficiently linking the SASs for dynamic components through the numerical solution of the network algebraic equations. Case studies performed on the New England 39-bus system and the Polish 2383-bus system test the performance of the proposed semi-analytical approach and compare to existing methods. The results show that the SAS based approach has potentials for online simulations.
A good representation for arbitrarily complicated data should have the capability of semantic generation, clustering and reconstruction. Previous research has already achieved impressive performance on either one. This paper aims at learning a disentangled representation effective for all of them in an unsupervised way. To achieve all the three tasks together, we learn the forward and inverse mapping between data and representation on the basis of a symmetric adversarial process. In theory, we minimize the upper bound of the two conditional entropy loss between the latent variables and the observations together to achieve the cycle consistency. The newly proposed RepGAN is tested on MNIST, fashionMNIST, CelebA, and SVHN datasets to perform unsupervised or semi-supervised classification, generation and reconstruction tasks. The result demonstrates that RepGAN is able to learn a useful and competitive representation. To the author’s knowledge, our work is the first one to achieve both a high unsupervised classification accuracy and low reconstruction error on MNIST.
Personalized search provides a potentially powerful tool, however, it is limited due to the large number of roles that a person has: parent, employee, consumer, etc. We present the role-relevance algorithm: a search technique that favors search results relevant to the user’s current role. The role-relevance algorithm uses three factors to score documents: (1) the number of keywords each document contains; (2) each document’s geographic relevance to the user’s role (if applicable); and (3) each document’s topical relevance to the user’s role (if applicable). Topical relevance is assessed using a novel extension to Latent Dirichlet Allocation (LDA) that allows standard LDA to score document relevance to user-defined topics. Overall results on a pre-labeled corpus show an average improvement in search precision of approximately 20% compared to keyword search alone.
Researchers are often interested in assessing the impact of an intervention on an outcome of interest in situations where the intervention is non-randomised, information is available at an aggregate level, the intervention is only applied to one or few units, the intervention is binary, and there are outcome measurements at multiple time points. In this paper, we review existing methods for causal inference in the setup just outlined. We detail the assumptions underlying each method, emphasise connections between the different approaches and provide guidelines regarding their practical implementation. Several open problems are identified thus highlighting the need for future research.
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.
While deep neural networks have proven to be a powerful tool for many recognition and classification tasks, their stability properties are still not well understood. In the past, image classifiers have been shown to be vulnerable to so-called adversarial attacks, which are created by additively perturbing the correctly classified image. In this paper, we propose the ADef algorithm to construct a different kind of adversarial attack created by iteratively applying small deformations to the image, found through a gradient descent step. We demonstrate our results on MNIST with a convolutional neural network and on ImageNet with Inception-v3 and ResNet-101.
A model of music needs to have the ability to recall past details and have a clear, coherent understanding of musical structure. Detailed in the paper is a neural network architecture that predicts and generates polyphonic music aligned with musical rules. The probabilistic model presented is a Bi-axial LSTM trained with a kernel reminiscent of a convolutional kernel. When analyzed quantitatively and qualitatively, this approach performs well in composing polyphonic music. Link to the code is provided.
Nonconvex mixed-integer nonlinear programs (MINLPs) represent a challenging class of optimization problems that often arise in engineering and scientific applications. Because of nonconvexities, these programs are typically solved with global optimization algorithms, which have limited scalability. However, nonlinear branch-and-bound has recently been shown to be an effective heuristic for quickly finding high-quality solutions to large-scale nonconvex MINLPs, such as those arising in infrastructure network optimization. This work proposes Juniper, a Julia-based open-source solver for nonlinear branch-and-bound. Leveraging the high-level Julia programming language makes it easy to modify Juniper’s algorithm and explore extensions, such as branching heuristics, feasibility pumps, and parallelization. Detailed numerical experiments demonstrate that the initial release of Juniper is comparable with other nonlinear branch-and-bound solvers, such as Bonmin, Minotaur, and Knitro, illustrating that Juniper provides a strong foundation for further exploration in utilizing nonlinear branch-and-bound algorithms as heuristics for nonconvex MINLPs.
Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.
Discriminative Correlation Filters (DCF)-based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness. Template handcrafted features have shown excellent performance, but they perform poorly when the appearance of target changes rapidly such as fast motions and fast deformations. In contrast, statistical handcrafted features are insensitive to fast states changes, but they yield inferior performance in the scenarios of illumination variations and background clutters. In this work, to achieve an efficient tracking performance, we propose a novel visual tracking algorithm, named MFCMT, based on a complementary ensemble model with multiple features, including Histogram of Oriented Gradients (HOGs), Color Names (CNs) and Color Histograms (CHs). Additionally, to improve tracking results and prevent targets drift, we introduce an effective fusion method by exploiting relative entropy to coalesce all basic response maps and get an optimal response. Furthermore, we suggest a simple but efficient update strategy to boost tracking performance. Comprehensive evaluations are conducted on two tracking benchmarks demonstrate and the experimental results demonstrate that our method is competitive with numerous state-of-the-art trackers. Our tracker achieves impressive performance with faster speed on these benchmarks.
Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data is peculiar because they are obtained in a streaming fashion, they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated to the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.
We present a novel natural language query interface, the FactChecker, aimed at text summaries of relational data sets. The tool focuses on natural language claims that translate into an SQL query and a claimed query result. Similar in spirit to a spell checker, the FactChecker marks up text passages that seem to be inconsistent with the actual data. At the heart of the system is a probabilistic model that reasons about the input document in a holistic fashion. Based on claim keywords and the document structure, it maps each text claim to a probability distribution over associated query translations. By efficiently executing tens to hundreds of thousands of candidate translations for a typical input document, the system maps text claims to correctness probabilities. This process becomes practical via a specialized processing backend, avoiding redundant work via query merging and result caching. Verification is an interactive process in which users are shown tentative results, enabling them to take corrective actions if necessary. Our system was tested on a set of 53 public articles containing 392 claims. Our test cases include articles from major newspapers, summaries of survey results, and Wikipedia articles. Our tool revealed erroneous claims in roughly a third of test cases. A detailed user study shows that users using our tool are in average six times faster at checking text summaries, compared to generic SQL interfaces. In fully automated verification, our tool achieves significantly higher recall and precision than baselines from the areas of natural language query interfaces and fact checking.
In this paper, we propose a regression model where the response variable is beta prime distributed using a new parameterization of this distribution that is indexed by mean and precision parameters. The proposed regression model is useful for situations where the variable of interest is continuous and restricted to the positive real line and is related to other variables through the mean and precision parameters. The variance function of the proposed model has a quadratic form. In addition, the beta prime model has properties that its competitor distributions of the exponential family do not have. Estimation is performed by maximum likelihood. Furthermore, we discuss residuals and influence diagnostic tools. Finally, we also carry out an application to real data that demonstrates the usefulness of the proposed model.

# Magister Dixit

“What was once just a figment of the imagination of some our most famous science fiction writers, artificial intelligence (AI) is taking root in our everyday lives. We’re still a few years away from having robots at our beck and call, but AI has already had a profound impact in more subtle ways. Weather forecasts, email spam filtering, Google’s search predictions, and voice recognition, such Apple’s Siri, are all examples. What these technologies have in common are machine-learning algorithms that enable them to react and respond in real time. There will be growing pains as AI technology evolves, but the positive effect it will have on society in terms of efficiency is immeasurable.” Or Shani ( January 27, 2015 )

# Document worth reading: “Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification”

We investigate star-galaxy classification for astronomical surveys in the context of four methods enabling the interpretation of black-box machine learning systems. The first is outputting and exploring the decision boundaries as given by decision tree based methods, which enables the visualization of the classification categories. Secondly, we investigate how the Mutual Information based Transductive Feature Selection (MINT) algorithm can be used to perform feature pre-selection. If one would like to provide only a small number of input features to a machine learning classification algorithm, feature pre-selection provides a method to determine which of the many possible input properties should be selected. Third is the use of the tree-interpreter package to enable popular decision tree based ensemble methods to be opened, visualized, and understood. This is done by additional analysis of the tree based model, determining not only which features are important to the model, but how important a feature is for a particular classification given its value. Lastly, we use decision boundaries from the model to revise an already existing method of classification, essentially asking the tree based method where decision boundaries are best placed and defining a new classification method. We showcase these techniques by applying them to the problem of star-galaxy separation using data from the Sloan Digital Sky Survey (hereafter SDSS). We use the output of MINT and the ensemble methods to demonstrate how more complex decision boundaries improve star-galaxy classification accuracy over the standard SDSS frames approach (reducing misclassifications by up to $\approx33\%$). We then show how tree-interpreter can be used to explore how relevant each photometric feature is when making a classification on an object by object basis. Learning from the machine: interpreting machine learning algorithms for point- and extended- source classification

# R Packages worth a look

Spatial Association Between Regionalizations (sabre)
Calculates a degree of spatial association between regionalizations or categorical maps using the information-theoretical V-measure (Nowosad and Stepinski (2018) <doi:10.17605/OSF.IO/RCJH7>). It also offers an R implementation of the MapCurve method (Hargrove et al. (2006) <doi:10.1007/s10109-006-0025-x>).

Basic and Advanced Statistical Power Analysis (WebPower)
This is a collection of tools for conducting both basic and advanced statistical power analysis including correlation, proportion, t-test, one-way ANOVA, two-way ANOVA, linear regression, logistic regression, Poisson regression, mediation analysis, longitudinal data analysis, structural equation modeling and multilevel modeling. It also serves as the engine for conducting power analysis online at <https://webpower.psychstat.org>.

Interface to the ‘Libraries.io’ API (rbraries)
Interface to the ‘Libraries.io’ API (<https://…/api> ). ‘Libraries.io’ indexes data from 36 different package managers for programming languages.

AWS Comprehend’ Client Package (aws.comprehend)
Client for ‘AWS Comprehend’ <https://…/comprehend>, a cloud natural language processing service that can perform a number of quantitative text analyses, including language detection, sentiment analysis, and feature extraction.

Entropy-Based Segregation Indices (segregation)
Computes entropy-based segregation indices, as developed by Theil (1971) <isbn:978-0471858454>, with a focus on the Mutual Information Index (M). The M, further described by Mora and Ruiz-Castillo (2011) <doi:10.1111/j.1467-9531.2011.01237.x> and Frankel and Volij (2011) <doi:10.1016/j.jet.2010.10.008>, is a measure of segregation that is highly decomposable. The package provides tools to decompose the index by units and groups (local segregation), and by within and between terms. Includes standard error estimation by bootstrapping.

# Book Memo: “Visual Pattern Discovery and Recognition”

 This book presents a systematic study of visual pattern discovery, from unsupervised to semi-supervised manner approaches, and from dealing with a single feature to multiple types of features. Furthermore, it discusses the potential applications of discovering visual patterns for visual data analytics, including visual search, object and scene recognition. It is intended as a reference book for advanced undergraduates or postgraduate students who are interested in visual data analytics, enabling them to quickly access the research world and acquire a systematic methodology rather than a few isolated techniques to analyze visual data with large variations. It is also inspiring for researchers working in computer vision and pattern recognition fields. Basic knowledge of linear algebra, computer vision and pattern recognition would be helpful to readers.

# Book Memo: “Kernel Mean Embedding of Distributions”

 A Review and Beyond A Hilbert space embedding of a distribution – in short, a kernel mean embedding – has recently emerged as a powerful tool for machine learning and statistical inference. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability measures. It can be viewed as a generalization of the original “feature map” common to support vector machines (SVMs) and other kernel methods. In addition to the classical applications of kernel methods, the kernel mean embedding has found novel applications in fields ranging from probabilistic modeling to statistical inference, causal discovery, and deep learning. This survey aims to give a comprehensive review of existing work and recent advances in this research area, and to discuss challenging issues and open problems that could potentially lead to new research directions. The survey begins with a brief introduction to the RKHS and positive definite kernels which forms the backbone of this survey, followed by a thorough discussion of the Hilbert space embedding of marginal distributions, theoretical guarantees, and a review of its applications. The embedding of distributions enables us to apply RKHS methods to probability measures which prompts a wide range of applications such as kernel two-sample testing, independent testing, and learning on distributional data. Next, we discuss the Hilbert space embedding for conditional distributions, give theoretical insights, and review some applications. The conditional mean embedding enables us to perform sum, product, and Bayes’ rules—which are ubiquitous in graphical model, probabilistic inference, and reinforcement learning— in a non-parametric way using this new representation of distributions. We then discuss relationships between this framework and other related areas. Lastly, we give some suggestions on future research directions.

# If you did not already know

Deep Policy Inference Q-Network (DPIQN)
We present DPIQN, a deep policy inference Q-network that targets multi-agent systems composed of controllable agents, collaborators, and opponents that interact with each other. We focus on one challenging issue in such systems—modeling agents with varying strategies—and propose to employ ‘policy features’ learned from raw observations (e.g., raw images) of collaborators and opponents by inferring their policies. DPIQN incorporates the learned policy features as a hidden vector into its own deep Q-network (DQN), such that it is able to predict better Q values for the controllable agents than the state-of-the-art deep reinforcement learning models. We further propose an enhanced version of DPIQN, called deep recurrent policy inference Q-network (DRPIQN), for handling partial observability. Both DPIQN and DRPIQN are trained by an adaptive training procedure, which adjusts the network’s attention to learn the policy features and its own Q-values at different phases of the training process. We present a comprehensive analysis of DPIQN and DRPIQN, and highlight their effectiveness and generalizability in various multi-agent settings. Our models are evaluated in a classic soccer game involving both competitive and collaborative scenarios. Experimental results performed on 1 vs. 1 and 2 vs. 2 games show that DPIQN and DRPIQN demonstrate superior performance to the baseline DQN and deep recurrent Q-network (DRQN) models. We also explore scenarios in which collaborators or opponents dynamically change their policies, and show that DPIQN and DRPIQN do lead to better overall performance in terms of stability and mean scores. …

Multiregression Dynamic Models (MDM)
Multiregression dynamic models are defined to preserve certain conditional independence structures over time across a multivariate time series. They are non-Gaussian and yet they can often be updated in closed form. The first two moments of their one-step-ahead forecast distribution can be easily calculated. Furthermore, they can be built to contain all the features of the univariate dynamic linear model and promise more efficient identification of causal structures in a time series than has been possible in the past …

Extremal Depth (ED)
We propose a new notion called extremal depth’ (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme outlyingness’. ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: a) the central region achieves the nominal (desired) simultaneous coverage probability; b) there is a correspondence between ED-based (simultaneous) central regions and appropriate point-wise central regions; and c) the method is resistant to certain classes of functional outliers. The paper examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. …

# Book Memo: “Analysis of Repeated Measures Data”

 This book presents a broad range of statistical techniques to address emerging needs in the field of repeated measures. It also provides a comprehensive overview of extensions of generalized linear models for the bivariate exponential family of distributions, which represent a new development in analysing repeated measures data. The demand for statistical models for correlated outcomes has grown rapidly recently, mainly due to presence of two types of underlying associations: associations between outcomes, and associations between explanatory variables and outcomes. The book systematically addresses key problems arising in the modelling of repeated measures data, bearing in mind those factors that play a major role in estimating the underlying relationships between covariates and outcome variables for correlated outcome data. In addition, it presents new approaches to addressing current challenges in the field of repeated measures and models based on conditional and joint probabilities. Markov models of first and higher orders are used for conditional models in addition to conditional probabilities as a function of covariates. Similarly, joint models are developed using both marginal-conditional probabilities as well as joint probabilities as a function of covariates. In addition to generalized linear models for bivariate outcomes, it highlights extended semi-parametric models for continuous failure time data and their applications in order to include models for a broader range of outcome variables that researchers encounter in various fields. The book further discusses the problem of analysing repeated measures data for failure time in the competing risk framework, which is now taking on an increasingly important role in the field of survival analysis, reliability and actuarial science. Details on how to perform the analyses are included in each chapter and supplemented with newly developed R packages and functions along with SAS codes and macro/IML. It is a valuable resource for researchers, graduate students and other users of statistical techniques for analysing repeated measures data.

# Book Memo: “Iterative Learning Control for Multi-agent Systems Coordination”

 A timely guide using iterative learning control (ILC) as a solution for multi-agent systems (MAS) challenges, showcasing recent advances and industrially relevant applications • Explores the synergy between the important topics of iterative learning control (ILC) and multi-agent systems (MAS) • Concisely summarizes recent advances and significant applications in ILC methods for power grids, sensor networks and control processes • Covers basic theory, rigorous mathematics as well as engineering practice

# If you did not already know

Convolutional Neural Network – Support Vector Machine (CNN-SVM)
Convolutional neural networks (CNNs) are similar to ‘ordinary’ neural networks in the sense that they are made up of hidden layers consisting of neurons with ‘learnable’ parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study. …

Dynamically Routed Network (SkipNet)
Increasing depth and complexity in convolutional neural networks has enabled significant progress in visual perception tasks. However, incremental improvements in accuracy are often accompanied by exponentially deeper models that push the computational limits of modern hardware. These incremental improvements in accuracy imply that only a small fraction of the inputs require the additional model complexity. As a consequence, for any given image it is possible to bypass multiple stages of computation to reduce the cost of forward inference without affecting accuracy. We exploit this simple observation by learning to dynamically route computation through a convolutional network. We introduce dynamically routed networks (SkipNets) by adding gating layers that route images through existing convolutional networks and formulate the routing problem in the context of sequential decision making. We propose a hybrid learning algorithm which combines supervised learning and reinforcement learning to address the challenges of inherently non-differentiable routing decisions. We show SkipNet reduces computation by 30 – 90% while preserving the accuracy of the original model on four benchmark datasets. We compare SkipNet with SACT and ACT to show SkipNet achieves better accuracy with lower computation. …

Out-of-Distribution Detector for Neural Networks (ODIN)
We consider the problem of detecting out-of-distribution examples in neural networks. We propose ODIN, a simple and effective out-of-distribution detector for neural networks, that does not require any change to a pre-trained model. Our method is based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions of in- and out-of-distribution samples, allowing for more effective detection. We show in a series of experiments that our approach is compatible with diverse network architectures and datasets. It consistently outperforms the baseline approach[1] by a large margin, establishing a new state-of-the-art performance on this task. For example, ODIN reduces the false positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to CIFAR-10) when the true positive rate is 95%. We theoretically analyze the method and prove that performance improvement is guaranteed under mild conditions on the image distributions. …