Generalized Strucutral Causal Models

Structural causal models are a popular tool to describe causal relations in systems in many fields such as economy, the social sciences, and biology. In this work, we show that these models are not flexible enough in general to give a complete causal representation of equilibrium states in dynamical systems that do not have a unique stable equilibrium independent of initial conditions. We prove that our proposed generalized structural causal models do capture the essential causal semantics that characterize these systems. We illustrate the power and flexibility of this extension on a dynamical system corresponding to a basic enzymatic reaction. We motivate our approach further by showing that it also efficiently describes the effects of interventions on functional laws such as the ideal gas law.

Efficient compilation of array probabilistic programs

Probabilistic programming languages are valuable because they allow us to build, run, and change concise probabilistic models that elide irrelevant details. However, current systems are either inexpressive, failing to support basic features needed to write realistic models, or inefficient, taking orders of magnitude more time to run than hand-coded inference. Without resolving this dilemma, model developers are still required to manually rewrite their high-level models into low-level code to get the needed performance. We tackle this dilemma by presenting an approach for efficient probabilistic programming with arrays. Arrays are a key element of almost any realistic model. Our system extends previous compilation techniques from scalars to arrays. These extensions allow the transformation of high-level programs into known efficient algorithms. We then optimize the resulting code by taking advantage of the domain-specificity of our language. We further JIT-compile the final product using LLVM on a per-execution basis. These steps combined lead to significant new opportunities for specialization. The resulting performance is competitive with manual implementations of the desired algorithms, even though the original program is as concise and expressive as the initial model.

Causal Inference from Strip-Plot Designs in a Potential Outcomes Framework

Strip-plot designs are very useful when the treatments have a factorial structure and the factors levels are hard-to-change. We develop a randomization-based theory of causal inference from such designs in a potential outcomes framework. For any treatment contrast, an unbiased estimator is proposed, an expression for its sampling variance is worked out, and a conservative estimator of the sampling variance is obtained. This conservative estimator has a nonnegative bias, and becomes unbiased under between-block additivity, a condition milder than Neymannian strict additivity. A minimaxity property of this variance estimator is also established. Simulation results on the coverage of resulting confidence intervals lend support to theoretical considerations.

GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training

Anomaly detection is a classical problem in computer vision, namely the determination of the normal from the abnormal when datasets are highly biased towards one class (normal) due to the insufficient sample size of the other class (abnormal). While this can be addressed as a supervised learning problem, a significantly more challenging problem is that of detecting the unknown/unseen anomaly case that takes us instead into the space of a one-class, semi-supervised learning paradigm. We introduce such a novel anomaly detection model, by using a conditional generative adversarial network that jointly learns the generation of high-dimensional image space and the inference of latent space. Employing encoder-decoder-encoder sub-networks in the generator network enables the model to map the input image to a lower dimension vector, which is then used to reconstruct the generated output image. The use of the additional encoder network maps this generated image to its latent representation. Minimizing the distance between these images and the latent vectors during training aids in learning the data distribution for the normal samples. As a result, a larger distance metric from this learned data distribution at inference time is indicative of an outlier from that distribution – an anomaly. Experimentation over several benchmark datasets, from varying domains, shows the model efficacy and superiority over previous state-of-the-art approaches.

Matching Consecutive Subpatterns Over Streaming Time Series

Pattern matching of streaming time series with lower latency under limited computing resource comes to a critical problem, especially as the growth of Industry 4.0 and Industry Internet of Things. However, against traditional single pattern matching model, a pattern may contain multiple subpatterns representing different physical meanings in the real world. Hence, we formulate a new problem, called ‘consecutive subpatterns matching’, which allows users to specify a pattern containing several consecutive subpatterns with various specified thresholds. We propose a novel representation Equal-Length Block (ELB) together with two efficient implementations, which work very well under all Lp-Norms without false dismissals. Extensive experiments are performed on synthetic and real-world datasets to illustrate that our approach outperforms the brute-force method and MSM, a multi-step filter mechanism over the multi-scaled representation by orders of magnitude.

The Blessings of Multiple Causes

Causal inference from observation data often assumes ‘strong ignorability,’ that all confounders are observed. This assumption is standard yet untestable. However, many scientific studies involve multiple causes, different variables whose effects are simultaneously of interest. We propose the deconfounder, an algorithm that combines unsupervised machine learning and predictive model checking to perform causal inference in multiple-cause settings. The deconfounder infers a latent variable as a substitute for unobserved confounders and then uses that substitute to perform causal inference. We develop theory for when the deconfounder leads to unbiased causal estimates, and show that it requires weaker assumptions than classical causal inference. We analyze its performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genomewide association studies, and a real dataset about actors and movie revenue. The deconfounder provides a checkable approach to estimating close-to-truth causal effects.

Text classification based on ensemble extreme learning machine

In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.

Hybrid Adaptive Fuzzy Extreme Learning Machine for text classification

In traditional ELM and its improved versions suffer from the problems of outliers or noises due to overfitting and imbalance due to distribution. We propose a novel hybrid adaptive fuzzy ELM(HA-FELM), which introduces a fuzzy membership function to the traditional ELM method to deal with the above problems. We define the fuzzy membership function not only basing on the distance between each sample and the center of the class but also the density among samples which based on the quantum harmonic oscillator model. The proposed fuzzy membership function overcomes the shortcoming of the traditional fuzzy membership function and could make itself adjusted according to the specific distribution of different samples adaptively. Experiments show the proposed HA-FELM can produce better performance than SVM, ELM, and RELM in text classification.

First Experiments with Neural Translation of Informal to Formal Mathematics

We report on our first experiments to train deep neural networks that automatically translate informalized \LaTeX{}-written Mizar texts into the formal Mizar language. Using Luong et al.’s neural machine translation model (NMT), we tested our aligned informal-formal corpora against various hyperparameters and evaluated their results. Our experiments show that NMT is able to generate correct Mizar statements on more than 60 percent of the inference data, indicating that formalization through artificial neural network is a promising approach for automated formalization of mathematics. We present several case studies to illustrate our results.

Weight Initialization in Neural Language Models

Semantic Similarity is an important application which finds its use in many downstream NLP applications. Though the task is mathematically defined, semantic similarity’s essence is to capture the notions of similarity impregnated in humans. Machines use some heuristics to calculate the similarity between words, but these are typically corpus dependent or are useful for specific domains. The difference between Semantic Similarity and Semantic Relatedness motivates the development of new algorithms. For a human, the word car and road are probably as related as car and bus. But this may not be the case for computational methods. Ontological methods are good at encoding Semantic Similarity and Vector Space models are better at encoding Semantic Relatedness. There is a dearth of methods which leverage ontologies to create better vector representations. The aim of this proposal is to explore in the direction of a hybrid method which combines statistical/vector space methods like Word2Vec and Ontological methods like WordNet to leverage the advantages provided by both.

End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition

In this paper we study the problem of learning the weights of a deep convolutional neural network. We consider a network where convolutions are carried out over non-overlapping patches with a single kernel in each layer. We develop an algorithm for simultaneously learning all the kernels from the training data. Our approach dubbed Deep Tensor Decomposition (DeepTD) is based on a rank-1 tensor decomposition. We theoretically investigate DeepTD under a realizable model for the training data where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted convolutional kernels. We show that DeepTD is data-efficient and provably works as soon as the sample size exceeds the total number of convolutional weights in the network. We carry out a variety of numerical experiments to investigate the effectiveness of DeepTD and verify our theoretical findings.

Career Transitions and Trajectories: A Case Study in Computing

From artificial intelligence to network security to hardware design, it is well-known that computing research drives many important technological and societal advancements. However, less is known about the long-term career paths of the people behind these innovations. What do their careers reveal about the evolution of computing research Which institutions were and are the most important in this field, and for what reasons Can insights into computing career trajectories help predict employer retention In this paper we analyze several decades of post-PhD computing careers using a large new dataset rich with professional information, and propose a versatile career network model, R^3, that captures temporal career dynamics. With R^3 we track important organizations in computing research history, analyze career movement between industry, academia, and government, and build a powerful predictive model for individual career transitions. Our study, the first of its kind, is a starting point for understanding computing research careers, and may inform employer recruitment and retention mechanisms at a time when the demand for specialized computational expertise far exceeds supply.

A Spline Theory of Deep Networks (Extended Version)

We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classifi- cation performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of a paper with the same title that will appear at ICML 2018.)

Accelerating Nonnegative Matrix Factorization Algorithms using Extrapolation

In this paper, we propose a general framework to accelerate significantly the algorithms for nonnegative matrix factorization (NMF). This framework is inspired from the extrapolation scheme used to accelerate gradient methods in convex optimization and from the method of parallel tangents. However, the use of extrapolation in the context of the two-block coordinate descent algorithms tackling the non-convex NMF problems is novel. We illustrate the performance of this approach on two state-of-the-art NMF algorithms, namely, accelerated hierarchical alternating least squares (A-HALS) and alternating nonnegative least squares (ANLS), using synthetic, image and document data sets.

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://…/defensegan.

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e.g. entailment graphs). However, OE learns a deterministic knowledge base, limiting expressiveness of queries and the ability to use uncertainty for both prediction and learning (e.g. learning from expectations). Probabilistic extensions of OE (Lai and Hockenmaier, 2017) have provided the ability to somewhat calibrate these denotational probabilities while retaining the consistency and inductive bias of ordered models, but lack the ability to model the negative correlations found in real-world knowledge. In this work we show that a broad class of models that assign probability measures to OE can never capture negative correlation, which motivates our construction of a novel box lattice and accompanying probability measure to capture anticorrelation and even disjoint concepts, while still providing the benefits of probabilistic modeling, such as the ability to perform rich joint and conditional queries over arbitrary sets of concepts, and both learning from and predicting calibrated uncertainty. We show improvements over previous approaches in modeling the Flickr and WordNet entailment graphs, and investigate the power of the model.

Birnbaum-Saunders Distribution: A Review of Models, Analysis and Applications

Birnbaum and Saunders introduced a two-parameter lifetime distribution to model fatigue life of a metal, subject to cyclic stress. Since then, extensive work has been done on this model providing different interpretations, constructions, generalizations, inferential methods, and extensions to bivariate, multivariate and matrix-variate cases. More than two hundred papers and one research monograph have already appeared describing all these aspects and developments. In this paper, we provide a detailed review of all these developments and at the same time indicate several open problems that could be considered for further research.

DNN or $k$-NN: That is the Generalize vs. Memorize Question

This paper studies the relationship between the classification performed by deep neural networks and the k-NN decision at the embedding space of these networks. This simple important connection shown here provides a better understanding of the relationship between the ability of neural networks to generalize and their tendency to memorize the training data, which are traditionally considered to be contradicting to each other and here shown to be compatible and complementary. Our results support the conjecture that deep neural networks approach Bayes optimal error rates.

Revisiting the tree edit distance and its backtracing: A tutorial

Almost 30 years ago, Zhang and Shasha published a seminal paper describing an efficient dynamic programming algorithm computing the tree edit distance, that is, the minimum number of node deletions, insertions, and replacements that are necessary to transform one tree into another. Since then, the tree edit distance has had widespread applications, for example in bioinformatics and intelligent tutoring systems. However, the original paper of Zhang and Shasha can be challenging to read for newcomers and it does not describe how to efficiently infer the optimal edit script. In this contribution, we provide a comprehensive tutorial to the tree edit distance algorithm of Zhang and Shasha. We further prove metric properties of the tree edit distance, and describe efficient algorithms to infer the cheapest edit script, as well as a summary of all cheapest edit scripts between two trees.

Neural language representations predict outcomes of scientific research

Many research fields codify their findings in standard formats, often by reporting correlations between quantities of interest. But the space of all testable correlates is far larger than scientific resources can currently address, so the ability to accurately predict correlations would be useful to plan research and allocate resources. Using a dataset of approximately 170,000 correlational findings extracted from leading social science journals, we show that a trained neural network can accurately predict the reported correlations using only the text descriptions of the correlates. Accurate predictive models such as these can guide scientists towards promising untested correlates, better quantify the information gained from new findings, and has implications for moving artificial intelligence systems from predicting structures to predicting relationships in the real world.

Improving End-of-turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task
Analogical Reasoning on Chinese Morphological and Semantic Relations
Convolutional Social Pooling for Vehicle Trajectory Prediction
R2-based hypervolume contribution approximation in multi-objective optimization
Emergence of Benford’s Law in Classical Music
Market Self-Learning of Signals, Impact and Optimal Trading: Invisible Hand Inference with Free Energy
The Crossing Number of Single-Pair-Seq-Shellable Drawings of Complete Graphs
Survival probability in Generalized Rosenzweig-Porter random matrix ensemble
Understanding Federation: An Analytical Framework for the Interoperability of Social Networking Sites
Semi-parametric Bayesian change-point model based on the Dirichlet process
A new convexity-based inequality, characterization of probability distributions and some free-of-distribution tests
QuaterNet: A Quaternion-based Recurrent Model for Human Motion
Valid and Approximately Valid Confidence Intervals for Current Status Data
Deconvolution of dust mixtures by latent Dirichlet allocation in forensic science
Utility maximization with proportional transaction costs under model uncertainty
QoE-Aware Beamforming Design for Massive MIMO Heterogeneous Networks
Remote Source Coding under Gaussian Noise : Dueling Roles of Power and Entropy Power
Composite Semantic Relation Classification
Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation
Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising
Beyond 5G with UAVs: Foundations of a 3D Wireless Cellular Network
Modeling Naive Psychology of Characters in Simple Commonsense Stories
Are BLEU and Meaning Representation in Opposition
Direct transcription methods based on fractional integral approximation formulas for solving nonlinear fractional optimal control problems
Dancing Pigs or Externalities Measuring the Rationality of Security Decisions
Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification
Defoiling Foiled Image Captions
Graph-Based Resource Allocation with Conflict Avoidance for V2V Broadcast Communications
A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation
Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples
Exponential Integrators with Parallel-in-Time Rational Approximations for Climate and Weather Simulations
Recurrent Neural Network for Learning DenseDepth and Ego-Motion from Video
Extensions of Ramanujan’s reciprocity theorem and the Andrews–Askey integral
DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images
NPE: Neural Personalized Embedding for Collaborative Filtering
Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model
Identification of the source of an interferer by comparison with known carriers using a single satellite
Gauss summation and Ramanujan type series for $1/π$
Caching With Time-Varying Popularity Profiles: A Learning-Theoretic Perspective
Some inequalities for Garvan’s bicrank function of 2-colored partitions
On the edge Szeged index of unicyclic graphs with given diameter
ADMM and Accelerated ADMM as Continuous Dynamical Systems
Classification of Coxeter groups with finitely many elements of $\mathbf{a}$-value 2
Cooperative Limited Feedback Design for Massive Machine-Type Communications
$W^{2,p}$-solutions of parabolic SPDEs in general domains
Deep Reinforcement Learning for Network Slicing
Cross-Target Stance Classification with Self-Attention Networks
Leveraging Social Signal to Improve Item Recommendation for Matrix Factorization
Covariance-Insured Screening
ARUM: Polar Coded HARQ Scheme based on Incremental Channel Polarization
Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
A Formulation of Recursive Self-Improvement and Its Possible Efficiency
Antenna Switching Sequence Design for Channel Sounding in a Fast Time-varying Channel
Optimization of Transfer Learning for Sign Language Recognition Targeting Mobile Platform
Taxi demand forecasting: A HEDGE based tessellation strategy for improved accuracy
Generative networks as inverse problems with Scattering transforms
Implementation of True Random Number Generator based on Double-Scroll Attractor circuit with GST memristor emulator
Structure-preserving Guided Retinal Image Filtering and Its Application for Optic Disc Analysis
Analysis of Noise in Current Mirrors with memristive Device
UAV-Aided 5G Communications with Deep Reinforcement Learning Against Jamming
Fast Entropy Estimation for Natural Sequences
Widlar Current Mirror Design Using BJT-Memristor Circuits
How to Dimension 5G Network When Users Are Distributed on Roads Modeled by Poisson Line Process
Independent Component Analysis via Energy-based and Kernel-based Mutual Dependence Measures
Testing for Conditional Mean Independence with Covariates through Martingale Difference Divergence
Joint direct estimation of 3D geometry and 3D motion using spatio temporal gradients
Memristor-based Approximation of Gaussian Filter
Performance Analysis and Optimization of Cooperative Full-Duplex D2D Communication Underlaying Cellular Networks
Implementation of Memristor in Bessel filter with RLC components
Spontaneous synchronization and nonequilibrium statistical mechanics of coupled phase oscillators
Extrapolation in NLP
Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks
Deep-learning Based Modeling of Fault Detachment Stability for Power Grid
Single Shot Active Learning using Pseudo Annotators
Evolutionary RL for Container Loading
Classifying medical relations in clinical text via convolutional neural networks
LQ-optimal Sample-data Control under Stochastic Delays: Gridding Approach for Stabilizability and Detectability
Realizing Wireless Communication through Software-defined HyperSurface Environments
Detecting cyber threats through social network analysis: short survey
Analyzing order flows in limit order books with ratios of Cox-type intensities
Happy family of stable marriages
A Note on Polynomial Identity Testing for Depth-3 Circuits
Hierarchical Beamforming: Resource Allocation, Fairness and Flow Level Performance
Disentangling $α$ and $β$ relaxation in orientationally disordered crystals with theory and experiments
Fuzzy Membership Function Implementation with Memristor
Dual parameterization of Weighted Coloring
Data-Driven Nonlinear Identification of Li-Ion Battery Based on a Frequency Domain Nonparametric Analysis
Super Ricci flows for weighted graphs
Systematic encoders for generalized Gabidulin codes and the $q$-analogue of Cauchy matrices
Brownian Motions on Metric Graphs with Non-Local Boundary Conditions I: Characterization
Bounds for the smallest $k$-chromatic graphs of given girth
High-dimensional doubly robust tests for regression parameters
Density for solutions to stochastic differential equations with unbounded drift
Fréchet differentiable drift dependence of Perron–Frobenius and Koopman operators for non-deterministic dynamics
Exploiting the Superposition Property of Wireless Communication for Max-Consensus Problems in Multi-Agent Systems
A Distributed Algorithm for Finding Hamiltonian Cycles in Random Graphs in O(log n) Time
Data-Driven Chance Constrained Optimization under Wasserstein Ambiguity Sets
On a probabilistic Nyman-Beurling criterion for the Riemann hypothesis
A Robust Background Initialization Algorithm with Superpixel Motion Detection
An experiment-oriented analysis of 2D spin-glass dynamics: a twelve time-decades scaling study
Minimum Margin Loss for Deep Face Recognition
Action Completion: A Temporal Model for Moment Detection
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
Circularly Pulse-Shaped Precoding for OFDM: A New Waveform and Its Optimization Design for 5G New Radio
Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction
Faster Rates for Convex-Concave Games
Adaptive Discrete Second Order Sliding Mode Control with Application to Nonlinear Automotive Systems
Dependability in a Multi-tenant Multi-framework Deep Learning as-a-Service Platform
Supplier Cooperation in Drone Delivery
Counting Gallai 3-colorings of complete graphs
Pattern Recognition on Oriented Matroids: Symmetric Cycles in the Hypercube Graphs. III
Recursive parameter estimation in a Riemannian manifold
Annotating Electronic Medical Records for Question Answering
External memory BWT and LCP computation for sequence collections with applications
Learning Time-Sensitive Strategies in Space Fortress
On two consequences of Berge-Fulkerson conjecture
Disparity Sliding Window: Object Proposals From Disparity Images
An extension of the Plancherel measure
Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis
Deleting edges to restrict the size of an epidemic in temporal networks
A fast algorithm with minimax optimal guarantees for topic models with an unknown number of topics
Methods for the inclusion of real world evidence in network meta-analysis
RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
Quantitative structure of stable sets in finite abelian groups
Edge-statistics on large graphs
Mixed integer linear programming: a new approach for instrumental variable quantile regressions and related problems
Answer Set Programming Modulo `Space-Time’
Design Identification of Curve Patterns on Cultural Heritage Objects: Combining Template Matching and CNN-based Re-Ranking
Resource allocation under uncertainty: an algebraic and qualitative treatment
Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs
Coding for Interactive Communication with Small Memory and Applications to Robust Circuits
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning
It’s all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
Changing Observations in Epistemic Temporal Logic