Penalized Maximum Tangent Likelihood Estimation and Robust Variable Selection

We introduce a new class of mean regression estimators — penalized maximum tangent likelihood estimation — for high-dimensional regression estimation and variable selection. We first explain the motivations for the key ingredient, maximum tangent likelihood estimation (MTE), and establish its asymptotic properties. We further propose a penalized MTE for variable selection and show that it is \sqrt{n}-consistent, enjoys the oracle property. The proposed class of estimators consists penalized \ell_2 distance, penalized exponential squared loss, penalized least trimmed square and penalized least square as special cases and can be regarded as a mixture of minimum Kullback-Leibler distance estimation and minimum \ell_2 distance estimation. Furthermore, we consider the proposed class of estimators under the high-dimensional setting when the number of variables d can grow exponentially with the sample size n, and show that the entire class of estimators (including the aforementioned special cases) can achieve the optimal rate of convergence in the order of \sqrt{\ln(d)/n}. Finally, simulation studies and real data analysis demonstrate the advantages of the penalized MTE.

On Ensuring that Intelligent Machines Are Well-Behaved

Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are well-behaved—that they do not, for example, cause harm to humans or act in a racist or sexist way—is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we address here. We propose a new framework for designing machine learning algorithms that simplifies the problem of specifying and regulating undesirable behaviors. To show the viability of this new framework, we use it to create new machine learning algorithms that preclude the sexist and harmful behaviors exhibited by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.

Large-Scale Domain Adaptation via Teacher-Student Learning

High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults speech acoustic model to children speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain.

Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

Deep neural networks (DNNs) have demonstrated impressive performance on a wide array of tasks, but they are usually considered opaque since internal structure and learned parameters are not interpretable. In this paper, we re-examine the internal representations of DNNs using adversarial images, which are generated by an ensemble-optimization algorithm. We find that: (1) the neurons in DNNs do not truly detect semantic objects/parts, but respond to objects/parts only as recurrent discriminative patches; (2) deep visual representations are not robust distributed codes of visual concepts because the representations of adversarial images are largely not consistent with those of real images, although they have similar visual appearance, both of which are different from previous findings. To further improve the interpretability of DNNs, we propose an adversarial training scheme with a consistent loss such that the neurons are endowed with human-interpretable concepts. The induced interpretable representations enable us to trace eventual outcomes back to influential neurons. Therefore, human users can know how the models make predictions, as well as when and why they make errors.

Optimizing Revenue over Data-driven Assortments

We revisit the problem of assortment optimization under the multinomial logit choice model with general constraints and propose new efficient optimization algorithms. Our algorithms do not make any assumptions on the structure of the feasible sets and in turn do not require a compact representation of constraints describing them. For the case of cardinality constraints, we specialize our algorithms and achieve time complexity sub-quadratic in the number of products in the assortment (existing methods are quadratic or worse). Empirical validations using the billion prices dataset and several retail transaction datasets show that our algorithms are competitive even when the number of items is 10^5 and beyond (100x larger instances than previously studied), supporting their practicality in data driven revenue management applications.

Practical Network Blocks Design with Q-Learning

Convolutional neural network provides an end-to-end solution to train many computer vision tasks and has gained great successes. However, the design of network architectures usually relies heavily on expert knowledge and is hand-crafted. In this paper, we provide a solution to automatically and efficiently design high performance network architectures. To reduce the search space of network design, we focus on constructing network blocks, which can be stacked to generate the whole network. Blocks are generated through an agent, which is trained with Q-learning to maximize the expected accuracy of the searching blocks on the learning task. Distributed asynchronous framework and early stop strategy are used to accelerate the training process. Our experimental results demonstrate that the network architectures designed by our approach perform competitively compared with hand-crafted state-of-the-art networks. We trained the Q-learning on CIFAR-100, and evaluated on CIFAR10 and ImageNet, the designed block structure achieved 3.60% error on CIFAR-10 and competitive result on ImageNet. The Q-learning process can be efficiently trained only on 32 GPUs in 3 days.

Induction of Decision Trees based on Generalized Graph Queries

Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQ-ID3, a multi-relational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. GGQs allow to express complex patterns (including cycles) and they can be refined step-by-step. Also, they can evaluate structures (not only single records) and perform Regular Pattern Matching. GGQ are built dynamically (pattern mining) during the GGQ-ID3 tree construction process. We will show how to use GGQ-ID3 to perform multi-relational machine learning keeping complexity under control. Finally, some real examples of automatically obtained classification trees and semantic patterns are shown. —– Normalmente, los algoritmos de inducci\’on de \’arboles de decisi\’on trabajan con datos no relacionales. Dado un registro, no tienen en cuenta los atributos de otros objetos a pesar de que \’estos pueden proporcionar informaci\’on \’util para la tarea de aprendizaje. En este art\’iculo presentamos GGQ-ID3, un algoritmo de aprendizaje de \’arboles de decisiones multi-relacional que utiliza Generalized Graph Queries (GGQ) como predicados en los nodos de decisi\’on. Los GGQs permiten expresar patrones complejos (incluyendo ciclos) y pueden ser refinados paso a paso. Adem\’as, pueden evaluar estructuras (no solo registros) y llevar a cabo Regular Pattern Matching. En GGQ-ID3, los GGQ son construidos din\’amicamente (pattern mining) durante el proceso de construcci\’on del \’arbol. Adem\’as, se muestran algunos ejemplos reales de \’arboles de clasificaci\’on multi-relacionales y patrones sem\’anticos obtenidos autom\’aticamente.

Community detection in networks via nonlinear modularity eigenvectors

Revealing a community structure in a network or dataset is a central problem arising in many scientific areas. The modularity function Q is an established measure quantifying the quality of a community, being identified as a set of nodes having high modularity. In our terminology, a set of nodes with positive modularity is called a \textit{module} and a set that maximizes Q is thus called \textit{leading module}. Finding a leading module in a network is an important task, however the dimension of real-world problems makes the maximization of Q unfeasible. This poses the need of approximation techniques which are typically based on a linear relaxation of Q, induced by the spectrum of the modularity matrix M. In this work we propose a nonlinear relaxation which is instead based on the spectrum of a nonlinear modularity operator \mathcal M. We show that extremal eigenvalues of \mathcal M provide an exact relaxation of the modularity measure Q, however at the price of being more challenging to be computed than those of M. Thus we extend the work made on nonlinear Laplacians, by proposing a computational scheme, named \textit{generalized RatioDCA}, to address such extremal eigenvalues. We show monotonic ascent and convergence of the method. We finally apply the new method to several synthetic and real-world data sets, showing both effectiveness of the model and performance of the method.

Agree to Disagree: Improving Disagreement Detection with Dual GRUs

This paper presents models for detecting agreement/disagreement in online discussions. In this work we show that by using a Siamese inspired architecture to encode the discussions, we no longer need to rely on hand-crafted features to exploit the meta thread structure. We evaluate our model on existing online discussion corpora – ABCD, IAC and AWTP. Experimental results on ABCD dataset show that by fusing lexical and word embedding features, our model achieves the state of the art performance of 0.804 average F1 score. We also show that the model trained on ABCD dataset performs competitively on relatively smaller annotated datasets (IAC and AWTP).

Online Service with Delay

In this paper, we introduce the online service with delay problem. In this problem, there are n points in a metric space that issue service requests over time, and a server that serves these requests. The goal is to minimize the sum of distance traveled by the server and the total delay in serving the requests. This problem models the fundamental tradeoff between batching requests to improve locality and reducing delay to improve response time, that has many applications in operations management, operating systems, logistics, supply chain management, and scheduling. Our main result is to show a poly-logarithmic competitive ratio for the online service with delay problem. This result is obtained by an algorithm that we call the preemptive service algorithm. The salient feature of this algorithm is a process called preemptive service, which uses a novel combination of (recursive) time forwarding and spatial exploration on a metric space. We hope this technique will be useful for related problems such as reordering buffer management, online TSP, vehicle routing, etc. We also generalize our results to k > 1 servers.

Sufficient conditions for graphs to be $k$-connected, maximally connected and super-connected
Deformable Modeling for Human Body Acquired from Depth Sensors
Secrecy Energy Efficiency of MIMOME Wiretap Channels with Full-Duplex Jamming
Pathway for Multivariate Dependence Modeling in Long-Term Horizon of Electrical Power System
Massive BLAST: An Architecture for Realizing Ultra-High Data Rates for Large-Scale MIMO
An Equal Space for Complex Data with Unknown Internal Order: Observability, Gauge Invariance and Manifold Learning
The $6\times 6$ grid is $4$-path-pairable
Escaping from a quadrant of the $6\times 6$ grid by edge disjoint paths
Escaping from the corner of a grid by edge disjoint paths
Nowhere Dense Graph Classes and Dimension
A Heuristic Approach to Protocol Tuning for High Performance Data Transfers
Phase Noise Compensation for OFDM Systems
Bayes and maximum likelihood for $L^1$-Wasserstein deconvolution of Laplace mixtures
Robust Contextual Bandit via the Capped-$\ell_{2}$ norm
An Annotated Corpus of Relational Strategies in Customer Service
Innovation rather than improvement: a solvable high-dimensional model highlights the limitations of scalar fitness
On subgraphs of $C_{2k}$-free graphs and a problem of Kühn and Osthus
Broadcasts in Graphs: Diametrical Trees
Connected k-Dominating Graphs
Simultaneous Detection and Quantification of Retinal Fluid with Deep Learning
Eigen Evolution Pooling for Human Action Recognition
Information-Theoretic Privacy with General Distortion Constraints
Consistency of Dirichlet Partitions
Dilated Residual Network for Image Denoising
The Storage vs Repair Bandwidth Trade-off for Multiple Failures in Clustered Storage Networks
Generalized Hamming weights of a class of linear codes
The Linear Complexity of a Class of Binary Sequences With Optimal Autocorrelation
A Question Answering Approach to Emotion Cause Extraction
A debiased distributed estimation for sparse partially linear models in diverging dimensions
A Characterization of $(4,2)$-Choosable Graphs
Standard Bases for Linear Codes over Prime Fields
Non-Asymptotic Converse Bounds and Refined Asymptotics for Two Lossy Source Coding Problems
Inference for high-dimensional nested regression
Modeling Between-Study Heterogeneity for Improved Reproducibility in Gene Signature Selection and Clinical Prediction
Towards the Automatic Anime Characters Creation with Generative Adversarial Networks
Large Margin Learning in Set to Set Similarity Comparison for Person Re-identification
Reflectance Intensity Assisted Automatic and Accurate Extrinsic Calibration of 3D LiDAR and Panoramic Camera Using a Printed Chessboard
Syllable-level Neural Language Model for Agglutinative Language
Numerical Algorithm for Optimal Control of Continuity Equations
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
Signed Countings of types B and D permutations and $t,q$-Euler Numbers
An Optimal Realization Algorithm for Bipartite Graphs with Degrees in Prescribed Intervals
EmoAtt at EmoInt-2017: Inner attention sentence embedding for Emotion Intensity
Exploring Directional Path-Consistency for Solving Constraint Networks
On the strong non-rigidity of certain tight Euclidean designs
Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding
Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution
Mesh-based 3D Textured Urban Mapping
Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation
Ultra-Sparse Non-Binary LDPC Codes for Probabilistic Amplitude Shaping
LADDER: A Human-Level Bidding Agent for Large-Scale Real-Time Online Auctions
Two provably consistent divide and conquer clustering algorithms for large networks
Massive MIMO Unlicensed for High-Performance Indoor Networks
The episode of Quasi-Biennial Oscillation in 2016 did not have its roots in its distant past; the role of the strong El Niño event
On Gaussian Limits and Large Deviations for Queues Fed by High Intensity Randomly Scattered Traffic
Weighted Exponential Random graph models: Scope and large network limits
Future Word Contexts in Neural Network Language Models
Statistical Latent Space Approach for Mixed Data Modelling and Applications
Self-explanatory Deep Salient Object Detection
On the nxnxn Rubik’s Cube
Nonnegative Restricted Boltzmann Machines for Parts-based Representations Discovery and Predictive Model Stabilization
Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization
Far-from-Equilibrium Time Evolution between two Gamma Distributions
Optimization-Based Decoding Algorithms for LDPC Convolutional Codes in Communication Systems
Improved Rectangular Matrix Multiplication using Powers of the Coppersmith-Winograd Tensor
Multi-Symbol Forbidden Configurations
Winqi: A System for 6D Localization and SLAM Augmentation Using Wideangle Optics and Coded Light Beacons
Block sizes in two families of random permutations
3D Pose Regression using Convolutional Neural Networks
Learning to Transfer
An accretive operator approach to ergodic zero-sum stochastic games
On the number of segregating sites
What does a convolutional neural network recognize in the moon?
The Gardner transition in physical dimensions
Topology of tropical moduli of weighted stable curves
Quasi-Linear (Stochastic) Partial Differential Equations with Time-Fractional Derivatives
Symmetric Rank Covariances: a Generalised Framework for Nonparametric Measures of Dependence
Multi-objective Contextual Multi-armed Bandit Problem with a Dominant Objective
Robust Near-Field 3D Localization of an Unaligned Single-Coil Agent Using Unobtrusive Anchors
In Search of Lost (Mixing) Time: Adaptive Markov chain Monte Carlo schemes for Bayesian variable selection with very large p
Asymptotic Analysis of Plausible Tree Hash Modes for SHA-3
On privacy amplification, lossy compression, and their duality to channel coding
Asymptotic chip-firing groups of iterated cones
Sparse Digital Cancellation of Receiver Nonlinear Distortion in Carrier Aggregation Systems