The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches have achieved surpassing results in comparison to previous machine learning algorithms. However, finding the suitable structure for these models has been a challenge for researchers. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. In short, RMDL trains multiple randomly generated models of Deep Neural Network (DNN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in parallel and combines their results to produce better result of any of those models individually. In this paper, we describe RMDL model and compare the results for image and text classification as well as face recognition. We used MNIST and CIFAR-10 datasets as ground truth datasets for image classification and WOS, Reuters, IMDB, and 20newsgroup datasets for text classification. Lastly, we used ORL dataset to compare the model performance on face recognition task.
We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose an analytic span bound for model selection with almost 2-4x faster computation times than standard resampling techniques. We empirically demonstrate the efficacy of the proposed MUSVM formulation on several real world datasets achieving > 20% improvement in test accuracies compared to multi-class SVM.
Language style transferring rephrases text with specific stylistic attributes while preserving the original attribute-independent content. One main challenge in learning a style transfer system is a lack of parallel data where the source sentence is in one style and the target sentence in another style. With this constraint, in this paper, we adapt unsupervised machine translation methods for the task of automatic style transfer. We first take advantage of style-preference information and word embedding similarity to produce pseudo-parallel data with a statistical machine translation (SMT) framework. Then the iterative back-translation approach is employed to jointly train two neural machine translation (NMT) based transfer systems. To control the noise generated during joint training, a style classifier is introduced to guarantee the accuracy of style transfer and penalize bad candidates in the generated pseudo data. Experiments on benchmark datasets show that our proposed method outperforms previous state-of-the-art models in terms of both accuracy of style transfer and quality of input-output correspondence.
Reinforcement learning approaches have long appealed to the data management community due to their ability to learn to control dynamic behavior from raw system performance. Recent successes in combining deep neural networks with reinforcement learning have sparked significant new interest in this domain. However, practical solutions remain elusive due to large training data requirements, algorithmic instability, and lack of standard tools. In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks. While prior work has frequently explored applications in simulations, LIFT centers on utilizing human expertise to learn from demonstrations, thus lowering online training times. We further introduce TensorForce, a TensorFlow library for applied deep reinforcement learning exposing a unified declarative interface to common RL algorithms, thus providing a backend to LIFT. We demonstrate the utility of LIFT in two case studies in database compound indexing and resource management in stream processing. Results show LIFT controllers initialized from demonstrations can outperform human baselines and heuristics across latency metrics and space usage by up to 70%.
This article proposes a new approach to modeling high-dimensional time series data by providing a simple and natural way to understand the mechanism of factor models. We treat a $p$-dimensional time series as a nonsingular linear transformation of certain common factors and structured idiosyncratic components. Unlike the approximate factor models, we allow the largest eigenvalues of the covariance matrix of the idiosyncratic components to diverge as the dimension $p$ increases, which is reasonable in the high-dimensional setting. A white noise testing procedure for high-dimensional random vectors is proposed to determine the number of common factors under the assumption that the idiosyncratic term is a vector white noise. We also introduce a projected Principal Component Analysis (PCA) to eliminate the diverging effect of the noises. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to assess the performance of the proposed method. We also compare our method with two commonly used methods in the literature and find that the proposed approach not only provides interpretable results, but also performs well in out-of-sample forecasting.
The ability to conduct logical reasoning is a fundamental aspect of intelligent behavior, and thus an important problem along the way to human-level artificial intelligence. Traditionally, symbolic methods from the field of knowledge representation and reasoning have been used to equip agents with capabilities that resemble human reasoning qualities. More recently, however, there has been an increasing interest in applying alternative approaches based on machine learning rather than logic-based formalisms to tackle this kind of tasks. Here, we make use of state-of-the-art methods for training deep neural networks to devise a novel model that is closely coupled to symbolic reasoning methods, and thus able to learn how to effectively perform basic ontology reasoning. This term describes an important and at the same time very natural kind of problem settings where the rules for conducting reasoning are specified alongside with the actual information. Many problems in practice may be viewed as such reasoning tasks, which is why the presented approach is applicable to a plethora of important real-world problems. To demonstrate the effectiveness of the suggested method, we present the outcomes of several experiments that have been conducted on both toy datasets as well as real-world data, which show that our model learned to perform precise reasoning on a number of diverse inference tasks that require comprehensive deductive proficiencies. Furthermore, it turned out that the suggested model suffers much less from different obstacles that prohibit symbolic reasoning.
In this theoretical note we compare different types of computational models of word similarity and association in their ability to predict a set of about 900 rating data. Using regression and predictive modeling tools (neural net, decision tree) the performance of a total of 28 models using different combinations of both surface and semantic word features is evaluated. The results present evidence for the hypothesis that word similarity ratings are based on more than only semantic relatedness. The limited cross-validated performance of the models asks for the development of psychological process models of the word similarity rating task.
Existing relation classification methods that rely on distant supervision assume that a bag of sentences mentioning an entity pair are all describing a relation for the entity pair. Such methods, performing classification at the bag level, cannot identify the mapping between a relation and a sentence, and largely suffers from the noisy labeling problem. In this paper, we propose a novel model for relation classification at the sentence level from noisy data. The model has two modules: an instance selector and a relation classifier. The instance selector chooses high-quality sentences with reinforcement learning and feeds the selected sentences into the relation classifier, and the relation classifier makes sentence level prediction and provides rewards to the instance selector. The two modules are trained jointly to optimize the instance selection and relation classification processes. Experiment results show that our model can deal with the noise of data effectively and obtains better performance for relation classification at the sentence level.
Multi-task clustering (MTC) has attracted a lot of research attentions in machine learning due to its ability in utilizing the relationship among different tasks. Despite the success of traditional MTC models, they are either easy to stuck into local optima, or sensitive to outliers and noisy data. To alleviate these problems, we propose a novel self-paced multi-task clustering (SPMTC) paradigm. In detail, SPMTC progressively selects data examples to train a series of MTC models with increasing complexity, thus highly decreases the risk of trapping into poor local optima. Furthermore, to reduce the negative influence of outliers and noisy data, we design a soft version of SPMTC to further improve the clustering performance. The corresponding SPMTC framework can be easily solved by an alternating optimization method. The proposed model is guaranteed to converge and experiments on real data sets have demonstrated its promising results compared with state-of-the-art multi-task clustering methods.
Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification.
This work compares user collaboration with conversational personal assistants vs. teams of expert chatbots. Two studies were performed to investigate whether each approach affects accomplishment of tasks and collaboration costs. Participants interacted with two equivalent financial advice chatbot systems, one composed of a single conversational adviser and the other based on a team of four experts chatbots. Results indicated that users had different forms of experiences but were equally able to achieve their goals. Contrary to the expected, there were evidences that in the teamwork situation that users were more able to predict agent behavior better and did not have an overhead to maintain common ground, indicating similar collaboration costs. The results point towards the feasibility of either of the two approaches for user collaboration with conversational agents.
Kearns et al. [2018] recently proposed a notion of rich subgroup fairness intended to bridge the gap between statistical and individual notions of fairness. Rich subgroup fairness picks a statistical fairness constraint (say, equalizing false positive rates across protected groups), but then asks that this constraint hold over an exponentially or infinitely large collection of subgroups defined by a class of functions with bounded VC dimension. They give an algorithm guaranteed to learn subject to this constraint, under the condition that it has access to oracles for perfectly learning absent a fairness constraint. In this paper, we undertake an extensive empirical evaluation of the algorithm of Kearns et al. On four real datasets for which fairness is a concern, we investigate the basic convergence of the algorithm when instantiated with fast heuristics in place of learning oracles, measure the tradeoffs between fairness and accuracy, and compare this approach with the recent algorithm of Agarwal et al. [2018], which implements weaker and more traditional marginal fairness constraints defined by individual protected attributes. We find that in general, the Kearns et al. algorithm converges quickly, large gains in fairness can be obtained with mild costs to accuracy, and that optimizing accuracy subject only to marginal fairness leads to classifiers with substantial subgroup unfairness. We also provide a number of analyses and visualizations of the dynamics and behavior of the Kearns et al. algorithm. Overall we find this algorithm to be effective on real data, and rich subgroup fairness to be a viable notion in practice.
Spiking neural networks are motivated from principles of neural systems and may possess unexplored advantages in the context of machine learning. A class of \textit{convolutional spiking neural networks} is introduced, trained to detect image features with an unsupervised, competitive learning mechanism. Image features can be shared within subpopulations of neurons, or each may evolve independently to capture different features in different regions of input space. We analyze the time and memory requirements of learning with and operating such networks. The MNIST dataset is used as an experimental testbed, and comparisons are made between the performance and convergence speed of a baseline spiking neural network.
The digitalization of automation engineering generates large quantities of engineering data that is interlinked in knowledge graphs. Classifying and clustering subgraphs according to their functionality is useful to discover functionally equivalent engineering artifacts that exhibit different graph structures. This paper presents a new graph learning algorithm designed to classify engineering data artifacts — represented in the form of graphs — according to their structure and neighborhood features. Our Structural Graph Convolutional Neural Network (SGCNN) is capable of learning graphs and subgraphs with a novel graph invariant convolution kernel and downsampling/pooling algorithm. On a realistic engineering-related dataset, we show that SGCNN is capable of achieving ~91% classification accuracy.