A Hierarchical Approach to Neural Context-Aware Modeling

We present a new recurrent neural network topology to enhance state-of-the-art machine learning systems by incorporating a broader context. Our approach overcomes recent limitations with extended narratives through a multi-layered computational approach to generate an abstract context representation. Therefore, the developed system captures the narrative on word-level, sentence-level, and context-level. Through the hierarchical set-up, our proposed model summarizes the most salient information on each level and creates an abstract representation of the extended context. We subsequently use this representation to enhance neural language processing systems on the task of semantic error detection. To show the potential of the newly introduced topology, we compare the approach against a context-agnostic set-up including a standard neural language model and a supervised binary classification network. The performance measures on the error detection task show the advantage of the hierarchical context-aware topologies, improving the baseline by 12.75% relative for unsupervised models and 20.37% relative for supervised models.


MaskConnect: Connectivity Learning by Gradient Descent

Although deep networks have recently emerged as the model of choice for many computer vision problems, in order to yield good results they often require time-consuming architecture search. To combat the complexity of design choices, prior work has adopted the principle of modularized design which consists in defining the network in terms of a composition of topologically identical or similar building blocks (a.k.a. modules). This reduces architecture search to the problem of determining the number of modules to compose and how to connect such modules. Again, for reasons of design complexity and training cost, previous approaches have relied on simple rules of connectivity, e.g., connecting each module to only the immediately preceding module or perhaps to all of the previous ones. Such simple connectivity rules are unlikely to yield the optimal architecture for the given problem. In this work we remove these predefined choices and propose an algorithm to learn the connections between modules in the network. Instead of being chosen a priori by the human designer, the connectivity is learned simultaneously with the weights of the network by optimizing the loss function of the end task using a modified version of gradient descent. We demonstrate our connectivity learning method on the problem of multi-class image classification using two popular architectures: ResNet and ResNeXt. Experiments on four different datasets show that connectivity learning using our approach yields consistently higher accuracy compared to relying on traditional predefined rules of connectivity. Furthermore, in certain settings it leads to significant savings in number of parameters.


News Article Teaser Tweets and How to Generate Them

We define the task of teaser generation and provide an evaluation benchmark and baseline systems for it. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read the news item. Teasers are one of the main vehicles for transmitting news to social media users. We compile a novel dataset of teasers by systematically accumulating tweets and selecting ones that conform to the teaser definition. We compare a number of neural abstractive architectures on the task of teaser generation and the overall best performing system is See et al.(2017)’s seq2seq with pointer network.


Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks

Mobile networks possess information about the users as well as the network. Such information is useful for making the network end-to-end visible and intelligent. Big data analytics can efficiently analyze user and network information, unearth meaningful insights with the help of machine learning tools. Utilizing big data analytics and machine learning, this work contributes in three ways. First, we utilize the call detail records (CDR) data to detect anomalies in the network. For authentication and verification of anomalies, we use k-means clustering, an unsupervised machine learning algorithm. Through effective detection of anomalies, we can proceed to suitable design for resource distribution as well as fault detection and avoidance. Second, we prepare anomaly-free data by removing anomalous activities and train a neural network model. By passing anomaly and anomaly-free data through this model, we observe the effect of anomalous activities in training of the model and also observe mean square error of anomaly and anomaly free data. Lastly, we use an autoregressive integrated moving average (ARIMA) model to predict future traffic for a user. Through simple visualization, we show that anomaly free data better generalizes the learning models and performs better on prediction task.


Semantic DMN: Formalizing and Reasoning About Decisions in the Presence of Background Knowledg

The Decision Model and Notation (DMN) is a recent OMG standard for the elicitation and representation of decision models, and for managing their interconnection with business processes. DMN builds on the notion of decision table, and their combination into more complex decision requirements graphs (DRGs), which bridge between business process models and decision logic models. DRGs may rely on additional, external business knowledge models, whose functioning is not part of the standard. In this work, we consider one of the most important types of business knowledge, namely background knowledge that conceptually accounts for the structural aspects of the domain of interest, and propose decision requirement knowledge bases (DKBs), where DRGs are modeled in DMN, and domain knowledge is captured by means of first-order logic with datatypes. We provide a logic-based semantics for such an integration, and formalize different DMN reasoning tasks for DKBs. We then consider background knowledge formulated as a description logic ontology with datatypes, and show how the main verification tasks for DMN in this enriched setting, can be formalized as standard DL reasoning services, and actually carried out in ExpTime. We discuss the effectiveness of our framework on a case study in maritime security. This work is under consideration in Theory and Practice of Logic Programming (TPLP).


Security and Privacy Issues in Deep Learning

With the development of machine learning, expectations for artificial intelligence (AI) technology are increasing day by day. In particular, deep learning has shown enriched performance results in a variety of fields. There are many applications that are closely related to our daily life, such as making significant decisions in application area based on predictions or classifications, in which a deep learning (DL) model could be relevant. Hence, if a DL model causes mispredictions or misclassifications due to malicious external influences, it can cause very large difficulties in real life. Moreover, training deep learning models involves relying on an enormous amount of data and the training data often includes sensitive information. Therefore, deep learning models should not expose the privacy of such data. In this paper, we reviewed the threats and developed defense methods on the security of the models and the data privacy under the notion of SPAI: Secure and Private AI. We also discuss current challenges and open issues.


Rank and Rate: Multi-task Learning for Recommender Systems

The two main tasks in the Recommender Systems domain are the ranking and rating prediction tasks. The rating prediction task aims at predicting to what extent a user would like any given item, which would enable to recommend the items with the highest predicted scores. The ranking task on the other hand directly aims at recommending the most valuable items for the user. Several previous approaches proposed learning user and item representations to optimize both tasks simultaneously in a multi-task framework. In this work we propose a novel multi-task framework that exploits the fact that a user does a two-phase decision process – first decides to interact with an item (ranking task) and only afterward to rate it (rating prediction task). We evaluated our framework on two benchmark datasets, on two different configurations and showed its superiority over state-of-the-art methods.


Gender Bias in Neural Natural Language Processing

We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark datasets finds significant gender bias in how models view occupations. We then mitigate bias with CDA: a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.


Practical Constrained Optimization of Auction Mechanisms in E-Commerce Sponsored Search Advertising

Sponsored search in E-commerce platforms such as Amazon, Taobao and Tmall provides sellers an effective way to reach potential buyers with most relevant purpose. In this paper, we study the auction mechanism optimization problem in sponsored search on Alibaba’s mobile E-commerce platform. Besides generating revenue, we are supposed to maintain an efficient marketplace with plenty of quality users, guarantee a reasonable return on investment (ROI) for advertisers, and meanwhile, facilitate a pleasant shopping experience for the users. These requirements essentially pose a constrained optimization problem. Directly optimizing over auction parameters yields a discontinuous, non-convex problem that denies effective solutions. One of our major contribution is a practical convex optimization formulation of the original problem. We devise a novel re-parametrization of auction mechanism with discrete sets of representative instances. To construct the optimization problem, we build an auction simulation system which estimates the resulted business indicators of the selected parameters by replaying the auctions recorded from real online requests. We summarized the experiments on real search traffics to analyze the effects of fidelity of auction simulation, the efficacy under various constraint targets and the influence of regularization. The experiment results show that with proper entropy regularization, we are able to maximize revenue while constraining other business indicators within given ranges.


t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data

Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ dimensionality reduction to two or three dimensions are often inefficient and/or ineffective for these datasets. This paper introduces t-SNE-CUDA, a GPU-accelerated implementation of t-distributed Symmetric Neighbor Embedding (t-SNE) for visualizing datasets and models. t-SNE-CUDA significantly outperforms current implementations with 50-700x speedups on the CIFAR-10 and MNIST datasets. These speedups enable, for the first time, visualization of the neural network activations on the entire ImageNet dataset – a feat that was previously computationally intractable. We also demonstrate visualization performance in the NLP domain by visualizing the GloVe embedding vectors. From these visualizations, we can draw interesting conclusions about using the L2 metric in these embedding spaces. t-SNE-CUDA is publicly available athttps://…/tsne-cuda


Predicting Solution Summaries to Integer Linear Programs under Imperfect Information with Machine Learning

The paper provides a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a methodology to quickly predict solution summaries (i.e., solution descriptions at a given level of detail) to discrete stochastic optimization problems. We approximate the solutions based on supervised learning and the training dataset consists of a large number of deterministic problems that have been solved independently and offline. Uncertainty regarding a missing subset of the inputs is addressed through sampling and aggregation methods. Our motivating application concerns booking decisions of intermodal containers on double-stack trains. Under perfect information, this is the so-called load planning problem and it can be formulated by means of integer linear programming. However, the formulation cannot be used for the application at hand because of the restricted computational budget and unknown container weights. The results show that standard deep learning algorithms allow one to predict descriptions of solutions with high accuracy in very short time (milliseconds or less).


FADE: Fast and Asymptotically efficient Distributed Estimator for dynamic networks

Consider a set of agents that wish to estimate a vector of parameters of their mutual interest. For this estimation goal, agents can sense and communicate. When sensing, an agent measures (in additive gaussian noise) linear combinations of the unknown vector of parameters. When communicating, an agent can broadcast information to a few other agents, by using the channels that happen to be randomly at its disposal at the time. To coordinate the agents towards their estimation goal, we propose a novel algorithm called FADE (Fast and Asymptotically efficient Distributed Estimator), in which agents collaborate at discrete time-steps; at each time-step, agents sense and communicate just once, while also updating their own estimate of the unknown vector of parameters. FADE enjoys five attractive features: first, it is an intuitive estimator, simple to derive; second, it withstands dynamic networks, that is, networks whose communication channels change randomly over time; third, it is strongly consistent in that, as time-steps play out, each agent’s local estimate converges (almost surely) to the true vector of parameters; fourth, it is both asymptotically unbiased and efficient, which means that, across time, each agent’s estimate becomes unbiased and the mean-square error (MSE) of each agent’s estimate vanishes to zero at the same rate of the MSE of the optimal estimator at an almighty central node; fifth, and most importantly, when compared with a state-of-art consensus+innovation (CI) algorithm, it yields estimates with outstandingly lower mean-square errors, for the same number of communications — for example, in a sparsely connected network model with 50 agents, we find through numerical simulations that the reduction can be dramatic, reaching several orders of magnitude.


Integrated Continuous-time Hidden Markov Models

Motivated by applications in movement ecology, in this paper I propose a new class of integrated continuous-time hidden Markov models in which each observation depends on the underlying state of the process over the whole interval since the previous observation, not only on its current state. I show that under appropriate conditioning, such a model can be regarded as a conventional hidden Markov model, enabling efficient evaluation of its likelihood without sampling of its state sequence. This leads to an algorithm for inference which is more efficient, and scales better with the number of data, than existing methods. An application to animal movement data is given.


Egocentric Spatial Memory

Egocentric spatial memory (ESM) defines a memory system with encoding, storing, recognizing and recalling the spatial information about the environment from an egocentric perspective. We introduce an integrated deep neural network architecture for modeling ESM. It learns to estimate the occupancy state of the world and progressively construct top-down 2D global maps from egocentric views in a spatially extended environment. During the exploration, our proposed ESM model updates belief of the global map based on local observations using a recurrent neural network. It also augments the local mapping with a novel external memory to encode and store latent representations of the visited places over long-term exploration in large environments which enables agents to perform place recognition and hence, loop closure. Our proposed ESM network contributes in the following aspects: (1) without feature engineering, our model predicts free space based on egocentric views efficiently in an end-to-end manner; (2) different from other deep learning-based mapping system, ESMN deals with continuous actions and states which is vitally important for robotic control in real applications. In the experiments, we demonstrate its accurate and robust global mapping capacities in 3D virtual mazes and realistic indoor environments by comparing with several competitive baselines.


SafeDrive: Enhancing Lane Appearance for Autonomous and Assisted Driving Under Limited Visibility
Trajectory Optimization for Cooperative Dual-band UAV Swarms
Enumerating Cryptarithms Using Deterministic Finite Automata
Efficient Gauss-Newton-Krylov momentum conservation constrained PDE-LDDMM using the band-limited vector field parameterization
Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems
Kinetic-controlled hydrodynamics for traffic models with driver-assist vehicles
Efficiency, Sequenceability and Deal-Optimality in Fair Division of Indivisible Goods
Refining the bijections among ascent sequences, (2+2)-free posets, integer matrices and pattern-avoiding permutations
Graphs admitting only constant splines
Parameterized Orientable Deletion
A Restricted-Domain Dual Formulation for Two-Phase Image Segmentation
Estimating Failure in Brittle Materials using Graph Theory
On Approximating (Sparse) Covering Integer Programs
Markerless Visual Robot Programming by Demonstration
Sub-Nyquist Radar Systems: Temporal, Spectral and Spatial Compression
The structure of claw-free binary matroids
Textual Explanations for Self-Driving Vehicles
Deep Recurrent Neural Networks for ECG Signal Denoising
Reach-Avoid Problems via Sum-of-Squares Optimization and Dynamic Programming
Time-frequency transforms of white noises and Gaussian analytic functions
Lattice Agreement in Message Passing Systems
Testing the Efficient Network TRaining (ENTR) Hypothesis: initially reducing training image size makes Convolutional Neural Network training for image recognition tasks more efficient
UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering
Acquisition of Localization Confidence for Accurate Object Detection
The Efficiency of Geometric Samplers for Exoplanet Transit Timing Variation Models
On the localization of the roots for Kac polynomials
A Simple Near-Linear Pseudopolynomial Time Randomized Algorithm for Subset Sum
Pulse Sequence Resilient Fast Brain Segmentation
Fast and Robust Symmetric Image Registration Based on Intensity and Spatial Information
Non-crossing trees, quadrangular dissections, ternary trees, and duality preserving bijections
Doubly Attentive Transformer Machine Translation
Pareto-Optimization Framework for Automated Network-on-Chip Design
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
Tight Upper Bounds on the Crossing Number in a Minor-Closed Class
An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization
Shared Spectrum for Mobile-Cells Backhaul and Access Link
K-medoids Clustering of Data Sequences with Composite Distributions
Count-Based Exploration with the Successor Representation
Security against false data injection attack in cyber-physical systems
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Optimized Transmission for Consensus in Wireless Sensor Networks
Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems
Interactive Summarization and Exploration of Top Aggregate Query Answers
Deep Graph Laplacian Regularization
Adaptive Non-Parametric Regression With the $K$-NN Fused Lasso
Brain MRI Image Super Resolution using Phase Stretch Transform and Transfer Learning
Composable Core-sets for Determinant Maximization Problems via Spectral Spanners
The Devil of Face Recognition is in the Noise
Truthful Peer Grading with Limited Effort from Teaching Staff
Unmanned Aerial Vehicle Path Planning for Traffic Estimation and Detection of Non-Recurrent Congestion
A Construction of Bent Functions on a finite group
Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II
Deep Learning-based CSI Feedback Approach for Time-varying Massive MIMO Channels
Improving the Annotation of DeepFashion Images for Fine-grained Attribute Recognition
Leveraging Unlabeled Whole-Slide-Images for Mitosis Detection
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder
Deep Belief Networks Based Feature Generation and Regression for Predicting Wind Power
Extremes of Locally-stationary Chi-square processes on discrete grids
Deep Cross Modal Learning for Caricature Verification and Identification(CaVINet)
Neural Article Pair Modeling for Wikipedia Sub-article Matching
Regular self-dual and self-Petrie-dual maps of arbitrary valency
Spectrum concentration in deep residual learning: a free probability appproach
Generalization of core percolation on complex networks
Input-to-State Stability of a Clamped-Free Damped String in the Presence of Distributed and Boundary Disturbances
Multimodal Deep Domain Adaptation
SegStereo: Exploiting Semantic Information for Disparity Estimation
Two curve Chebyshev approximation and its application to signal clustering
Efficient Computation of Sequence Mappability
Learning Collaborative Generation Correction Modules for Blind Image Deblurring and Beyond
Inserting an Edge into a Geometric Embedding
RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification
Deep Learning in Physical Layer Communications
Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data
A Robust Deep Attention Network to Noisy Labels in Semi-supervised Biomedical Segmentation
Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks
Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals
A Zero-Shot Framework for Sketch-based Image Retrieval
Expectation of the Largest Betting Size in Labouchère System
Neighborhood Complexes of Kneser Graphs, $KG_{3,k}$
Size reconstructibility of graphs
Robust distributed calibration of radio interferometers with direction dependent distortions
Modeling joint probability distribution of yield curve parameters
Deep Visual Odometry Methods for Mobile Robots
Combinatorial proofs of some linear algebraic identities
Co-existence of Trend and Value in Financial Markets: Estimating an Extended Chiarella Model
Epidemic Spreading and Aging in Temporal Networks with Memory
A First Experiment on Including Text Literals in KGloVe
Remote sensing image regression for heterogeneous change detection
Interior gradient and Hessian estimates for the Dirichlet problem of semi-linear degenerate elliptic systems: a probabilistic approach
The Becker-Döring process: law of large numbers and non-equilibrium potential
On subgroups of minimal index
Maximal displacement and population growth for branching Brownian motions
Robustness of the pathways structure of fluctuations in stochastic homogenization
Scale equivariance in CNNs with vector fields
Nodal Lengths in Shrinking Domains for Random Eigenfunctions on $\mathbb{S}^2$
Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition
Bayesian Uncertainty Estimation Under Complex Sampling
A note on full weight spectrum codes
Speed-sensorless state feedback control of induction machines with LC filter
Disaster Monitoring using Unmanned Aerial Vehicles and Deep Learning
Deep learning in agriculture: A survey
Single-shot holographic 3D particle-localization under multiple scattering
OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices
On the spectral gap of some Cayley graphs on the Weyl group $W(B_n)$
A critique of the econometrics of happiness: Are we underestimating the returns to education and income?
Inferring the ground truth through crowdsourcing
Extensible Grounding of Speech for Robot Instruction
Synchronization patterns in LIF Neural Networks: Merging Nonlocal and Diagonal Connectivity
Resource Allocation in Full-Duplex Mobile-Edge Computing Systems with NOMA and Energy Harvesting
Investigating the time dynamics of high frequency wind speed in complex terrains by using the Fisher-Shannon method: application to Switzerland
Joint Learning of Intrinsic Images and Semantic Segmentation
Antipodes of monoidal decomposition spaces
Data Center Interconnects at 400G and Beyond
On the Unbiased Asymptotic Normality of Quantile Regression with Fixed Effects
Weak ergodic theorem for Markov chains in the absence of invariant countably additive measures
On Exploring Temporal Graphs of Small Pathwidth
Parallel Optimal Control for Cooperative Automation of Large-scale Connected Vehicles via ADMM
Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
Deep Dual Pyramid Network for Barcode Segmentation using Barcode-30k Database
Gaussian Process Landmarking for Three-Dimensional Geometric Morphometrics
Deep End-to-end Fingerprint Denoising and Inpainting
Minimal Ramsey graphs for cyclicity
The Formal Inverse of the Period-Doubling Sequence
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
Computing the Strategy to Commit to in Polymatrix Games (Extended Version)
End-to-End Physics Event Classification with the CMS Open Data: Applying Image-based Deep Learning on Detector Data to Directly Classify Collision Events at the LHC
Real-Time Millimeter-Wave MIMO Channel Sounder for Dynamic Directional Measurements
What am I searching for?
Gender Privacy: An Ensemble of Semi Adversarial Networks for Confounding Arbitrary Gender Classifiers
Entanglement cost and quantum channel simulation