A Siamese Deep Forest

A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest as the weighted sum of the tree class probabilities such that the weights are determined in order to reduce distances between similar pairs and to increase them between dissimilar points. We show that the weights can be obtained by solving a quadratic optimization problem. The SDF aims to prevent overfitting which takes place in neural networks when only limited training data are available. The numerical experiments illustrate the proposed distance metric method.

Artificial Intelligence Based Malware Analysis

Artificial intelligence methods have often been applied to perform specific functions or tasks in the cyber-defense realm. However, as adversary methods become more complex and difficult to divine, piecemeal efforts to understand cyber-attacks, and malware-based attacks in particular, are not providing sufficient means for malware analysts to understand the past, present and future characteristics of malware. In this paper, we present the Malware Analysis and Attributed using Genetic Information (MAAGI) system. The underlying idea behind the MAAGI system is that there are strong similarities between malware behavior and biological organism behavior, and applying biologically inspired methods to corpora of malware can help analysts better understand the ecosystem of malware attacks. Due to the sophistication of the malware and the analysis, the MAAGI system relies heavily on artificial intelligence techniques to provide this capability. It has already yielded promising results over its development life, and will hopefully inspire more integration between the artificial intelligence and cyber–defense communities.

Learning a Neural Semantic Parser from User Feedback

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention. To achieve this, we adapt neural sequence models to map utterances directly to SQL with its full expressivity, bypassing any intermediate meaning representations. These models are immediately deployed online to solicit feedback from real users to flag incorrect queries. Finally, the popularity of SQL facilitates gathering annotations for incorrect predictions using the crowd, which is directly used to improve our models. This complete feedback loop, without intermediate representations or database specific engineering, opens up new ways of building high quality semantic parsers. Experiments suggest that this approach can be deployed quickly for any new target domain, as we show by learning a semantic parser for an online academic database from scratch.

DeepArchitect: Automatically Designing and Training Deep Architectures

In deep learning, performance is strongly affected by the choice of architecture and hyperparameters. While there has been extensive work on automatic hyperparameter optimization for simple spaces, complex spaces such as the space of deep architectures remain largely unexplored. As a result, the choice of architecture is done manually by the human expert through a slow trial and error process guided mainly by intuition. In this paper we describe a framework for automatically designing and training deep models. We propose an extensible and modular language that allows the human expert to compactly represent complex search spaces over architectures and their hyperparameters. The resulting search spaces are tree-structured and therefore easy to traverse. Models can be automatically compiled to computational graphs once values for all hyperparameters have been chosen. We can leverage the structure of the search space to introduce different model search algorithms, such as random search, Monte Carlo tree search (MCTS), and sequential model-based optimization (SMBO). We present experiments comparing the different algorithms on CIFAR-10 and show that MCTS and SMBO outperform random search. In addition, these experiments show that our framework can be used effectively for model discovery, as it is possible to describe expressive search spaces and discover competitive models without much effort from the human expert. Code for our framework and experiments has been made publicly available.

Deep Feature Learning for Graphs

This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multi-layered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higher-order. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize across-networks and therefore useful for graph-based transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is space-efficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of \mathcal{O}(|E|), and scalable for large networks via an efficient parallel implementation. Compared with the state-of-the-art method, DeepGL is (1) effective for across-network transfer learning tasks and attributed graph representation learning, (2) space-efficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks.

Parseval Networks: Improving Robustness to Adversarial Examples

We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to non-square matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the state-of-the-art in terms of accuracy on CIFAR-10/100 and Street View House Numbers (SVHN) while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks.

On weight initialization in deep neural networks

A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.

Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages

We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting – to the best of our knowledge – the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.

Intelligent Personal Assistant with Knowledge Navigation

An Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. We will be testing different methods for optimal query generation. To felicitate the ease of usage of the application, the agent will be able to accept the input through Text (Keyboard), Voice (Speech Recognition) and Server (Facebook) and output responses using the same method. Existing chat bots reply by making changes in the input, but we will give responses based on multiple SRT files. The model will learn using the human dialogs dataset and will be able respond human-like. Responses to queries about famous things (places, people, and words) can be provided using web scraping which will enable the bot to have knowledge navigation features. The agent will even learn from its past experiences supporting semi-supervised learning.

Substochastic Monte Carlo Algorithms

In this paper we introduce and formalize Substochastic Monte Carlo (SSMC) algorithms. These algorithms, originally intended to be a better classical foil to quantum annealing than simulated annealing, prove to be worthy optimization algorithms in their own right. In SSMC, a population of walkers is initialized according to a known distribution on an arbitrary search space and varied into the solution of some optimization problem of interest. The first argument of this paper shows how an existing classical algorithm, ‘Go-With-The-Winners’ (GWW), is a limiting case of SSMC when restricted to binary search and particular driving dynamics. Although limiting to GWW, SSMC is more general. We show that (1) GWW can be efficiently simulated within the SSMC framework, (2) SSMC can be exponentially faster than GWW, (3) by naturally incorporating structural information, SSMC can exponentially outperform the quantum algorithm that first inspired it, and (4) SSMC exhibits desirable search features in general spaces. Our approach combines ideas from genetic algorithms (GWW), theoretical probability (Fleming-Viot processes), and quantum computing. Not only do we demonstrate that SSMC is often more efficient than competing algorithms, but we also hope that our results connecting these disciplines will impact each independently. An implemented version of SSMC has previously enjoyed some success as a competitive optimization algorithm for Max-k-SAT.

Self-organized critical behavior in Ising spin glasses

A universal tree balancing theorem

Finding the Size of a Radio Network with Short Labels

The phase transition in bounded-size Achlioptas processes

Action Understanding with Multiple Classes of Actors

Structured Sparse Modelling with Hierarchical GP

Calibration of a two-state pitch-wise HMM method for note segmentation in Automatic Music Transcription systems

Splittability and 1-amalgamability of permutation classes

Conserved quantities of Q-systems from dimer integrable systems

Portfolio-driven Resource Management for Transient Cloud Servers

Signed graphs: from modulo flows to integer-valued flows

Improving Facial Attribute Prediction using Semantic Segmentation

Efficient Feature Screening for Lasso-Type Problems via Hybrid Safe-Strong Rules

Bifurcation Mechanism Design — From Optimal Flat Taxes to Improved Cancer Treatments

A Network Perspective on Stratification of Multi-Label Data

Obstacle Avoidance through Deep Networks based Intermediate Perception

Computational complexity of the initial value problem for the three body problem

GazeDirector: Fully Articulated Eye Gaze Redirection in Video

Data Based Identification and Prediction of Nonlinear and Complex Dynamical Systems

Prediction of Daytime Hypoglycemic Events Using Continuous Glucose Monitoring Data and Classification Technique

Strong Coordination over Noisy Channels: Is Separation Sufficient?

Deep Face Deblurring

Genealogical Distance as a Diversity Estimate in Evolutionary Algorithms

Partially Occluded Leaf Recognition via Beta-Spline Curve Matching and Energy Minimization

Learning Quadratic Variance Function (QVF) DAG models via OverDispersion Scoring (ODS)

One-Dimensional Packing: Maximality Implies Rationality

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

Generating Simple Near-Bipartite Bricks

Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning

Word Affect Intensities

The spectral symmetry of weakly irreducible nonnegative tensors and connected hypergraphs

Neural Ranking Models with Weak Supervision

Performance Assessment of High-dimensional Variable Identification

Automatic Real-time Background Cut for Portrait Videos

Generator polynomials and generator matrix for quasi cyclic codes

A Tribe Competition-Based Genetic Algorithm for Feature Selection in Pattern Classification

Disorder-protected topological entropy after a quantum quench

Active Collaborative Ensemble Tracking

AKS method: a new image compression by gradient Haar wavelet

Spectral-Efficient Analog Precoding for Generalized Spatial Modulation Aided MmWave MIMO

Generalized Spatial Modulation Aided MmWave MIMO with Sub-Connected Hybrid Precoding Scheme

Classical Widely Linear Estimation of Real Valued Parameter Vectors in Complex Valued Environments

On partitioning the edges of an infinite digraph into directed cycles

Outline Colorization through Tandem Adversarial Networks

Relaxing the Irrevocability Requirement for Online Graph Algorithms

On consecutive pattern-avoiding permutations of length 4, 5 and beyond

Image reconstruction by domain transform manifold learning

The speed of biased random walk among random conductances

The right tool for the right question — beyond the encoding versus decoding dichotomy

On the 1-factorizations of Middle Level Graph: Inner structure, Algorithm, and Application

Learning Spatiotemporal-Aware Representation for POI Recommendation

Multi-antenna Wireless Legitimate Surveillance Systems: Design and Performance Analysis

Structural Parameters, Tight Bounds, and Approximation for (k,r)-Center

Deterministic Gathering with Crash Faults

Improving Small Object Proposals for Company Logo Detection

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

Finite-state Strategies in Delay Games

Stochastic Proximal Gradient Algorithms for Penalized Mixed Models

How consistent are our discourse annotations? Insights from mapping RST-DT and PDTB annotations

Quaternion Gaussian matrices satisfy the RIP

A Hida-Malliavin white noise calculus approach to optimal control

Unbiased Shape Compactness for Segmentation

Adaptation and learning over networks for nonlinear system modeling

Interference Exploitation for Radar and Cellular Coexistence: The Power-Efficient Approach

Necessary conditions for linear convergence of Picard iterations and application to alternating projections

A Framework for Rate Efficient Control of Distributed Discrete Systems

Dynamic disorder in simple enzymatic reactions induces stochastic amplification of substrate

A lower bound on CNF encodings of the at-most-one constraint

Object Discovery via Cohesion Measurement

Expressing Facial Structure and Appearance Information in Frequency Domain for Face Recognition

Neural Word Segmentation with Rich Pretraining

Dependent Microstructure Noise and Integrated Volatility Estimation from High-Frequency Data

Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models

Phase retrieval with a multivariate Von Mises prior: from a Bayesian formulation to a lifting solution

Exact extremal statistics in the classical $1d$ Coulomb gas

When is the mode functional the Bayes classifier?

Brownian disks and the Brownian snake

The topological face of recommendation: models and application to bias detection

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation

Distribution System Voltage Control under Uncertainties

Entropy of Independent Experiments, Revisited

Exploiting the Natural Exploration In Contextual Bandits

A robust parallel algorithm for combinatorial compressed sensing

Unimodular hierarchical models and their Graver bases

Parameter Estimation in Computational Biology by Approximate Bayesian Computation coupled with Sensitivity Analysis

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling