Pointer Networks google
Pointer networks are a variation of the sequence-to-sequence model with attention. Instead of translating one sequence into another, they yield a succession of pointers to the elements of the input series. The most basic use of this is ordering the elements of a variable-length sequence. Basic seq2seq is an LSTM encoder coupled with an LSTM decoder. It’s most often heard of in the context of machine translation: given a sentence in one language, the encoder turns it into a fixed-size representation. Decoder transforms this into a sentence again, possibly of different length than the source. For example, “como estas?” – two words – would be translated to “how are you?” – three words. The model gives better results when augmented with attention. Practically it means that instead of processing the input from start to finish, the decoder can look back and forth over input. Specifically, it has access to encoder states from each step, not just the last one. Consider how it may help with Spanish, in which adjectives go before nouns: “neural network” becomes “red neuronal”. In technical terms, attention (at least this particular kind, content-based attention) boils down to dot products and weighted averages. In short, a weighted average of encoder states becomes the decoder state. Attention is just the distribution of weights. …

Structured Attention Networks google
Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. We experiment with two different classes of structured attention networks: a linear-chain conditional random field and a graph-based parsing model, and describe how these models can be practically implemented as neural network layers. Experiments show that this approach is effective for incorporating structural biases, and structured attention networks outperform baseline attention models on a variety of synthetic and real tasks: tree transduction, neural machine translation, question answering, and natural language inference. We further find that models trained in this way learn interesting unsupervised hidden representations that generalize simple attention. …

RelATive cEntrality (RATE) google
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated within the context of statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the ‘RelATive cEntrality’ (RATE) measure to prioritize candidate predictors that are not just marginally important, but whose associations also stem from significant covarying relationships with other variables in the data. We focus on illustrating RATE through Bayesian Gaussian process regression; although, the methodological innovations apply to other and more general methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for outcomes generated by complex architectures. With detailed simulations and a botanical QTL mapping study, we show that applying RATE enables an explanation for this improved performance. …

Advertisements