We apply techniques in natural language processing, computational linguistics, and machine-learning to investigate papers in hep-th and four related sections of the arXiv: hep-ph, hep-lat, gr-qc, and math-ph. All of the titles of papers in each of these sections, from the inception of the arXiv until the end of 2017, are extracted and treated as a corpus which we use to train the neural network Word2Vec. A comparative study of common n-grams, linear syntactical identities, word cloud and word similarities is carried out. We find notable scientific and sociological differences between the fields. In conjunction with support vector machines, we also show that the syntactic structure of the titles in different sub-fields of high energy and mathematical physics are sufficiently different that a neural network can perform a binary classification of formal versus phenomenological sections with 87.1% accuracy, and can perform a finer five-fold classification across all sections with 65.1% accuracy.

Adversarial Examples in Deep Learning: Characterization and Divergence

The burgeoning success of deep learning has raised the security and privacy concerns as more and more tasks are accompanied with sensitive data. Adversarial attacks in deep learning have emerged as one of the dominating security threat to a range of mission-critical deep learning systems and applications. This paper takes a holistic and principled approach to perform statistical characterization of adversarial examples in deep learning. We provide a general formulation of adversarial examples and elaborate on the basic principle for adversarial attack algorithm design. We introduce easy and hard categorization of adversarial attacks to analyze the effectiveness of adversarial examples in terms of attack success rate, degree of change in adversarial perturbation, average entropy of prediction qualities, and fraction of adversarial examples that lead to successful attacks. We conduct extensive experimental study on adversarial behavior in easy and hard attacks under deep learning models with different hyperparameters and different deep learning frameworks. We show that the same adversarial attack behaves differently under different hyperparameters and across different frameworks due to the different features learned under different deep learning model training process. Our statistical characterization with strong empirical evidence provides a transformative enlightenment on mitigation strategies towards effective countermeasures against present and future adversarial attacks.

Fundamental Limits of Distributed Data Shuffling

Data shuffling of training data among different computing nodes (workers) has been identified as a core element to improve the statistical performance of modern large scale machine learning algorithms. Data shuffling is often considered one of the most significant bottlenecks in such systems due to the heavy communication load. Under a master-worker architecture (where a master has access to the entire dataset and only communications between the master and workers is allowed) coding has been recently proved to considerably reduce the communication load. In this work, we consider a different communication paradigm referred to as distributed data shuffling, where workers, connected by a shared link, are allowed to communicate with one another while no communication between the master and workers is allowed. Under the constraint of uncoded cache placement, we first propose a general coded distributed data shuffling scheme, which achieves the optimal communication load within a factor two. Then, we propose an improved scheme achieving the exact optimality for either large memory size or at most four workers in the system.

Quasi Markov Chain Monte Carlo Methods

Quasi-Monte Carlo (QMC) methods for estimating integrals are attractive since the resulting estimators converge at a faster rate than pseudo-random Monte Carlo. However, they can be difficult to set up on arbitrary posterior densities within the Bayesian framework, in particular for inverse problems. We introduce a general parallel Markov chain Monte Carlo (MCMC) framework, for which we prove a law of large numbers and a central limit theorem. We further extend this approach to the use of adaptive kernels and state conditions, under which ergodicity holds. As a further extension, an importance sampling estimator is derived, for which asymptotic unbiasedness is proven. We consider the use of completely uniformly distributed (CUD) numbers and non-reversible transitions within the above stated methods, which leads to a general parallel quasi-MCMC (QMCMC) methodology. We prove consistency of the resulting estimators and demonstrate numerically that this approach scales close to n^{-1} as we increase parallelisation, instead of the usual n^{-1/2} that is typical of standard MCMC algorithms. In practical statistical models we observe up to 2 orders of magnitude improvement compared with pseudo-random methods.

Amanuensis: The Programmer’s Apprentice

This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus on digital assistants that learn from continuous dialog with an expert software engineer while providing initial value as powerful analytical, computational and mathematical savants. Over time these savants learn cognitive strategies (domain-relevant problem solving skills) and develop intuitions (heuristics and the experience necessary for applying them) by learning from their expert associates. By doing so these savants elevate their innate analytical skills allowing them to partner on an equal footing as versatile collaborators – effectively serving as cognitive extensions and digital prostheses, thereby amplifying and emulating their human partner’s conceptually-flexible thinking patterns and enabling improved access to and control over powerful computing resources.

A Learning Theory in Linear Systems under Compositional Models

We present a learning theory for the training of a linear system operator having an input compositional variable and propose a Bayesian inversion method for inferring the unknown variable from an output of a noisy linear system. We assume that we have partial or even no knowledge of the operator but have training data of input and ouput. A compositional variable satisfies the constraints that the elements of the variable are all non-negative and sum to unity. We quantified the uncertainty in the trained operator and present the convergence rates of training in explicit forms for several interesting cases under stochastic compositional models. The trained linear operator with the covariance matrix, estimated from the training set of pairs of ground-truth input and noisy output data, is further used in evaluation of posterior uncertainty of the solution. This posterior uncertainty clearly demonstrates uncertainty propagation from noisy training data and addresses possible mismatch between the true operator and the estimated one in the final solution.

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

Game-Theoretic Interpretability for Temporal Modeling

Interpretability has arisen as a key desideratum of machine learning models alongside performance. Approaches so far have been primarily concerned with fixed dimensional inputs emphasizing feature relevance or selection. In contrast, we focus on temporal modeling and the problem of tailoring the predictor, functionally, towards an interpretable family. To this end, we propose a co-operative game between the predictor and an explainer without any a priori restrictions on the functional class of the predictor. The goal of the explainer is to highlight, locally, how well the predictor conforms to the chosen interpretable family of temporal models. Our co-operative game is setup asymmetrically in terms of information sets for efficiency reasons. We develop and illustrate the framework in the context of temporal sequence models with examples.

Robust functional regression based on principal components

Functional data analysis is a fast evolving branch of modern statistics and the functional linear model has become popular in recent years. However, most estimation methods for this model rely on generalized least squares procedures and therefore are sensitive to atypical observations. To remedy this, we propose a two-step estimation procedure that combines robust functional principal components and robust linear regression. Moreover, we propose a transformation that reduces the curvature of the estimators and can be advantageous in many settings. For these estimators we prove Fisher-consistency at elliptical distributions and consistency under mild regularity conditions. The influence function of the estimators is investigated as well. Simulation experiments show that the proposed estimators have reasonable efficiency, protect against outlying observations, produce smooth estimates and perform well in comparison to existing approaches.

Algorithms for solving optimization problems arising from deep neural net models: smooth problems

Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonlinear. This presents a challenge to application and development of appropriate optimization algorithms for solving the problem. In this paper, we summarize the primary challenges involved and present the case for a Newton-based method incorporating directions of negative curvature, including promising numerical results on data arising from security anomally deetection.

Algorithms for solving optimization problems arising from deep neural net models: nonsmooth problems

Machine Learning models incorporating multiple layered learning networks have been seen to provide effective models for various classification problems. The resulting optimization problem to solve for the optimal vector minimizing the empirical risk is, however, highly nonconvex. This alone presents a challenge to application and development of appropriate optimization algorithms for solving the problem. However, in addition, there are a number of interesting problems for which the objective function is non- smooth and nonseparable. In this paper, we summarize the primary challenges involved, the state of the art, and present some numerical results on an interesting and representative class of problems.

The Historical Significance of Textual Distances

Measuring similarity is a basic task in information retrieval, and now often a building-block for more complex arguments about cultural change. But do measures of textual similarity and distance really correspond to evidence about cultural proximity and differentiation To explore that question empirically, this paper compares textual and social measures of the similarities between genres of English-language fiction. Existing measures of textual similarity (cosine similarity on tf-idf vectors or topic vectors) are also compared to new strategies that use supervised learning to anchor textual measurement in a social context.

On null hypotheses in survival analysis

The conventional nonparametric tests in survival analysis, such as the log-rank test, assess the null hypothesis that the hazards are equal at all times. However, hazards are hard to interpret causally, and other null hypotheses are more relevant in many scenarios with survival outcomes. To allow for a wider range of null hypotheses, we present a generic approach to define test statistics. This approach utilizes the fact that a wide range of common parameters in survival analysis can be expressed as solutions of differential equations. Thereby, we can test hypotheses based on survival parameters that solve differential equations driven by hazards, and it is easy to implement the tests on a computer. We present simulations, suggesting that our generic approach performs well for several hypotheses in a range of scenarios. Finally, we extend the strategy to to allow for testing conditional on covariates.

Beyond Winning and Losing: Modeling Human Motivations and Behaviors Using Inverse Reinforcement Learning

In recent years, reinforcement learning (RL) methods have been applied to model gameplay with great success, achieving super-human performance in various environments, such as Atari, Go, and Poker. However, those studies mostly focus on winning the game and have largely ignored the rich and complex human motivations, which are essential for understanding different players’ diverse behaviors. In this paper, we present a novel method called Multi-Motivation Behavior Modeling (MMBM) that takes the multifaceted human motivations into consideration and models the underlying value structure of the players using inverse RL. Our approach does not require the access to the dynamic of the system, making it feasible to model complex interactive environments such as massively multiplayer online games. MMBM is tested on the World of Warcraft Avatar History dataset, which recorded over 70,000 users’ gameplay spanning three years period. Our model reveals the significant difference of value structures among different player groups. Using the results of motivation modeling, we also predict and explain their diverse gameplay behaviors and provide a quantitative assessment of how the redesign of the game environment impacts players’ behaviors.

Survey of Graph Analysis Applications

Recently, many systems for graph analysis have been developed to address the growing needs of both industry and academia to study complex graphs. Insight into the practical uses of graph analysis will allow future developments of such systems to optimize for real-world usage, instead of targeting single use cases or hypothetical workloads. This insight may be derived from surveys on the applications of graph analysis. However, existing surveys are limited in the variety of application domains, datasets, and/or graph analysis techniques they study. In this work we present and apply a systematic method for identifying practical use cases of graph analysis. We identify commonly used graph features and analysis methods and use our findings to construct a taxonomy of graph analysis applications. We conclude that practical use cases of graph analysis cover a diverse set of graph features and analysis methods. Furthermore, most applications combine multiple features and methods. Our findings motivate further development of graph analysis systems to support a broader set of applications and to facilitate the combination of multiple analysis methods in an (interactive) workflow.

Task-Driven Convolutional Recurrent Models of the Visual System
Using routinely collected patient data to support clinical trials research in accountable care organizations
Citizen Social Lab: A digital platform for human behaviour experimentation within a citizen science framework
An Efficient Data Warehouse for Crop Yield Prediction
A generalization of some random variables involving in certain compressive sensing problems
Introduction to the Special Issue on Approaches to Control Biological and Biologically Inspired Networks
Flexibility potentials of a combined use of heat storages and batteries in PV-CHP hybrid systems
Single Index Latent Variable Models for Network Topology Inference
Dynamic Power Allocation and User Scheduling for Power-Efficient and Low-Latency Communications
YH Technologies at ActivityNet Challenge 2018
Grapevine: A Wine Prediction Algorithm Using Multi-dimensional Clustering Methods
A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding
Global optimization of spin Hamiltonians with gain-dissipative systems
A simple proof of the discrete time geometric Pontryagin maximum principle
On Solving Ambiguity Resolution with Robust Chinese Remainder Theorem for Multiple Numbers
Null controllability of parabolic equations with interior degeneracy and one-sided control
Bounds on the Poincaré constant for convolution measures
Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
Deciding the Closure of Inconsistent Rooted Triples is NP-Complete
A degree condition for diameter two orientability of graphs
Neural Networks Trained to Solve Differential Equations Learn General Representations
It All Matters: Reporting Accuracy, Inference Time and Power Consumption for Face Emotion Recognition on Embedded Systems
AI in Game Playing: Sokoban Solver
Determination of Friendship Intensity between Online Social Network Users Based on Their Interaction
Temporal Logic Verification of Stochastic Systems Using Barrier Certificates
Fully Nonparametric Bayesian Additive Regression Trees
Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates
Over-the-Air Time Synchronization for URLLC: Requirements, Challenges and Possible Enablers
Ergodic-localized junctions in periodically-driven spin chains
Topology classification with deep learning to improve real-time event selection at the LHC
Whitehead products in moment-angle complexes
Classification of lung nodules in CT images based on Wasserstein distance in differential geometry
Probabilistic Bisection with Spatial Metamodels
Title Generation for Web Tables
$h^*$-Polynomials With Roots on the Unit Circle
Stress-testing memcomputing on hard combinatorial optimization problems
Finding a Path in Group-Labeled Graphs with Two Labels Forbidden
Linear and sublinear convergence rates for a subdifferentiable distributed deterministic asynchronous Dykstra’s algorithm
Approximate Nearest Neighbors in Limited Space
A High-Diversity Transceiver Design for K-User MISO Broadcast Channels
Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships
An optimal algorithm for 2-bounded delay buffer management with lookahead
A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics
Modeling Mistrust in End-of-Life Care
Generation of Automatic and Realistic Artificial Profiles
A New Benchmark and Progress Toward Improved Weakly Supervised Learning
Fractional Wavelet Scattering Network and Applications
Sampling and Reconstruction of Signals on Product Graphs
Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria
Ant routing algorithm for the Lightning Network
AI in Education needs interpretable machine learning: Lessons from Open Learner Modelling
On line covers of finite projective and polar spaces
On the Optimality of Affine Policies for Budgeted Uncertainty Sets
The asymptotic spectrum of graphs and the Shannon capacity
Self-consistency of voting implies majority vote
Benchmarking the Hill-Valley Evolutionary Algorithm for the GECCO 2018 Competition on Niching Methods Multimodal Optimization
Modeling Friends and Foes
Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction
Local Properties in Colored Graphs, Distinct Distances, and Difference Sets
Improved Techniques for Learning to Dehaze and Beyond: A Collective Study
Fast Characterization of Segmental Duplications in Genome Assemblies
Exploratory Analysis of Pairwise Interactions in Online Social Networks
Advanced Methods for the Optical Quality Assurance of Silicon Sensors
Generalized operator-scaling random ball model
The Challenge of Multi-Operand Adders in CNNs on FPGAs: How not to solve it!
Storage-Repair Bandwidth Trade-off for Wireless Caching with Partial Failure and Broadcast Repair
Embedding Models for Episodic Memory
Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization
String-Averaging Algorithms for Convex Feasibility with Infinitely Many Sets
chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models
Hill Climbing Optimized Twin Classification Using Resting-State Functional MRI
A Shared Attention Mechanism for Interpretation of Neural Automatic Post-Editing Systems
Stochastic model-based minimization under high-order growth
Information Retrieval in the Cloud
Models of Gradient Type with Sub-Quadratic Actions
Analysis and Optimization of Caching and Multicasting for Multi-Quality Videos in Large-Scale Wireless Networks
Accurate Uncertainties for Deep Learning Using Calibrated Regression
Augmented Lagrangian Optimization under Fixed Point Arithmetic
An Efficient Approach to Encoding Context for Spoken Language Understanding
Photorealistic Style Transfer for Videos
Explicit factorization of $x^{2^nd}-1$ over a finite field
Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera
A horse racing between the block maxima method and the peak-over-threshold approach
Autonomous Deep Learning: A Genetic DCNN Designer for Image Classification
Optimal Two-impulse Space Interception with Multi-constraints
Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages
Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions
Multi-Task Generative Adversarial Nets with Shared Memory for Cross-Domain Coordination Control
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks
Modeling, comprehending and summarizing textual content by graphs
Local Equilibria
Bayesian Nonparametrics for Directional Statistics
Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data
Ensuring domain consistency in an adaptive framework with distributed topology for fluid flow simulations
An Improved Envy-Free Cake Cutting Protocol for Four Agents
On the R$_0$-tensors and the solution map of tensor complementarity problems
Solution maps of polynomial variational inequalities
Joint Failure Recovery, Fault Prevention, and Energy-efficient Resource Management for Real-time SFC in Fog-supported SDN
Data-driven satisficing measure and ranking
Performance Analysis of Indoor THz Communications with One-Bit Precoding
Automatic Analysis of Expected Termination Time for Population Protocols
Records for Some Stationary Dependence Sequences
Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification
Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays
A complete characterization of plateaued Boolean functions in terms of their Cayley graphs
Robust Inference Under Heteroskedasticity via the Hadamard Estimator
Heuristic Framework for Multi-Scale Testing of the Multi-Manifold Hypothesis
Asymptotically optimal delay-aware scheduling in wireless networks
Long range random walks and associated geometries on groups of polynomial growth
On invariant probability measures of regime-switching diffusion processes with singular drifts
Calculation of sample size guaranteeing the required width of the empirical confidence interval with predefined probability
A Data-Driven Approach to Dynamically Adjust Resource Allocation for Compute Clusters
Representation of ordered trees with a given degree distribution
New Heuristics for Parallel and Scalable Bayesian Optimization
Augmented Cyclic Adversarial Learning for Domain Adaptation
Human Satisfaction as the Ultimate Goal in Ridesharing
Inner approximating the completely positive cone via the cone of scaled diagonally dominant matrices
Model-based Exception Mining for Object-Relational Data
More on the long time stability of Feynman-Kac semigroups
Gradient Reversal Against Discrimination