Modern machine learning techniques can be used to construct powerful models for difficult collider physics problems. In many applications, however, these models are trained on imperfect simulations due to a lack of truth-level information in the data, which risks the model learning artifacts of the simulation. In this paper, we introduce the paradigm of classification without labels (CWoLa) in which a classifier is trained to distinguish statistical mixtures of classes, which are common in collider physics. Crucially, neither individual labels nor class proportions are required, yet we prove that the optimal classifier in the CWoLa paradigm is also the optimal classifier in the traditional fully-supervised case where all label information is available. After demonstrating the power of this method in an analytical toy example, we consider a realistic benchmark for collider physics: distinguishing quark- versus gluon-initiated jets using mixed quark/gluon training samples. More generally, CWoLa can be applied to any classification problem where labels or class proportions are unknown or simulations are unreliable, but statistical mixtures of the classes are available.
In this paper, we use variational recurrent neural network to investigate the anomaly detection problem on graph time series. The temporal correlation is modeled by the combination of recurrent neural network (RNN) and variational inference (VI), while the spatial information is captured by the graph convolutional network. In order to incorporate external factors, we use feature extractor to augment the transition of latent variables, which can learn the influence of external factors. With the target function as accumulative ELBO, it is easy to extend this model to on-line method. The experimental study on traffic flow data shows the detection capability of the proposed method.
It is a well-known fact that adding noise to the input data often improves network performance. While the dropout technique may be a cause of memory loss, when it is applied to recurrent connections, Tikhonov regularization, which can be regarded as the training with additive noise, avoids this issue naturally, though it implies regularizer derivation for different architectures. In case of feedforward neural networks this is straightforward, while for networks with recurrent connections and complicated layers it leads to some difficulties. In this paper, a Tikhonov regularizer is derived for Long-Short Term Memory (LSTM) networks. Although it is independent of time for simplicity, it considers interaction between weights of the LSTM unit, which in theory makes it possible to regularize the unit with complicated dependences by using only one parameter that measures the input data perturbation. The regularizer that is proposed in this paper has three parameters: one to control the regularization process, and other two to maintain computation stability while the network is being trained. The theory developed in this paper can be applied to get such regularizers for different recurrent neural networks with Hadamard products and Lipschitz continuous functions.
Statistical analysis (SA) is a complex process to deduce population properties from analysis of data. It usually takes a well-trained analyst to successfully perform SA, and it becomes extremely challenging to apply SA to big data applications. We propose to use deep neural networks to automate the SA process. In particular, we propose to construct convolutional neural networks (CNNs) to perform automatic model selection and parameter estimation, two most important SA tasks. We refer to the resulting CNNs as the neural model selector and the neural model estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation study shows that both the selector and estimator demonstrate excellent performances. The idea and proposed framework can be further extended to automate the entire SA process and have the potential to revolutionize how SA is performed in big data analytics.
Can ideas and techniques from machine learning be leveraged to automatically generate ‘good’ routing configurations? We investigate the power of data-driven routing protocols. Our results suggest that applying ideas and techniques from deep reinforcement learning to this context yields high performance, motivating further research along these lines.
Simple random walks are a basic staple of the foundation of probability theory and form the building block of many useful and complex stochastic processes. In this paper we study a natural generalization of the random walk to a process in which the allowed step sizes take values in the set $\{\pm1,\pm2,\ldots,\pm k\}$, a process we call a random leap. The need to analyze such models arises naturally in modern-day data science and so-called ‘big data’ applications. We provide closed-form expressions for quantities associated with first passage times and absorption events of random leaps. These expressions are formulated in terms of the roots of the characteristic polynomial of a certain recurrence relation associated with the transition probabilities. Our analysis shows that the expressions for absorption probabilities for the classical simple random walk are a special case of a universal result that is very elegant. We also consider an important variant of a random leap: the reflecting random leap. We demonstrate that the reflecting random leap exhibits more interesting behavior in regard to the existence of a stationary distribution and properties thereof. Questions relating to recurrence/transience are also addressed, as well as an application of the random leap.
Genetic Programming, a kind of evolutionary computation and machine learning algorithm, is shown to benefit significantly from the application of vectorized data and the TensorFlow numerical computation library on both CPU and GPU architectures. The open source, Python Karoo GP is employed for a series of 190 tests across 6 platforms, with real-world datasets ranging from 18 to 5.5M data points. This body of tests demonstrates that datasets measured in tens and hundreds of data points see 2-15x improvement when moving from the scalar/SymPy configuration to the vector/TensorFlow configuration, with a single core performing on par or better than multiple CPU cores and GPUs. A dataset composed of 90,000 data points demonstrates a single vector/TensorFlow CPU core performing 875x better than 40 scalar/Sympy CPU cores. And a dataset containing 5.5M data points sees GPU configurations out-performing CPU configurations on average by 1.3x.