**An Introduction to Stock Market Data Analysis with Python (Part 1)**

**Interpreting and Visualizing Neural Networks for Text Processing**

**LSTM for Human Activity Recognition**

**The Argument for Accelerated and Integrated Analytics**

**26**
*Monday*
Sep 2016

Posted Distilled News

in**An Introduction to Stock Market Data Analysis with Python (Part 1)**

This post is the first in a two-part series on stock data analysis using Python, based on a lecture I gave on the subject for MATH 3900 (Data Science) at the University of Utah. In these posts, I will discuss basics such as obtaining the data from Yahoo! Finance using pandas, visualizing stock data, moving averages, developing a moving-average crossover strategy, backtesting, and benchmarking. The final post will include practice problems. This first post discusses topics up to introducing moving averages.

**Interpreting and Visualizing Neural Networks for Text Processing**

Neural networks have become the go-to approach for text processing tasks, but they are also notorious for their obscurity. Recently, we applied a neural network to the task of predicting numerical ratings for text-based consumer reviews, training the model to learn the ratings directly from the words in each review. It turns out that our model had decent prediction accuracy, but this wasn’t our objective, especially since we already knew the ratings for these reviews. Instead, we wanted to understand why those particular ratings were assigned. Interpreting what a neural network has learned from data is a separate endeavor from applying it to make predictions. In this post, we’ll explore some strategies for bringing the inside of a neural network to light, using our ratings prediction model to demonstrate.

**LSTM for Human Activity Recognition**

I will be using an LSTM on the data to learn (as a cellphone attached on the waist) to recognise the type of activity that the user is doing.

**The Argument for Accelerated and Integrated Analytics**

The rise of modern business intelligence (BI) has seen the emergence of a number of component parts designed to support the different analytical functions necessary to deliver what enterprises require. Perhaps the most fundamental component of the BI movement is the traditional frontend or visualization application. Companies like Tableau, Qlik, Birst, Domo and Periscope provide these. There are dozens more – all with essentially equivalent capabilities: the ability to make spreadsheets look beautiful. Some of these companies have been tremendously successful, primarily differentiating themselves on the axis of usability. Another, equally critical component of the BI equation is the database. Here, too, there are the usual suspects: Redshift, Impala, Vertica, Netezza and others. Some of these databases are fully featured, system-of-record worthy solutions, while others focus on a particular performance axis, streaming, for instance, and do it well. Finally, there is the emergence, across BI and database players, more advanced analytics tools, driven by the explosion of interest in and development of machine learning, deep learning and artificial intelligence. This market has its stars, starting with the big boys – Google, Facebook, Amazon, Microsoft, IBM, Baidu, Tesla – as well as a host of highly credentialed startups, like Sentient, Ayasdi, Bonsai and H2O.ai.

**26**
*Monday*
Sep 2016

Posted arXiv Papers

inThis paper presents the input convex neural network architecture. These are scalar-valued (potentially deep) neural networks with constraints on the network parameters such that the output of the network is a convex function of (some of) the inputs. The networks allow for efficient inference via optimization over some inputs to the network given others, and can be applied to settings including structured prediction, data imputation, reinforcement learning, and others. In this paper we lay the basic groundwork for these models, proposing methods for inference, optimization and learning, and analyze their representational power. We show that many existing neural network architectures can be made input-convex with only minor modification, and develop specialized optimization algorithms tailored to this setting. Finally, we highlight the performance of the methods on multi-label prediction, image completion, and reinforcement learning problems, where we show improvement over the existing state of the art in many cases.

**Efficient Feature Selection With Large and High-dimensional Data**

Driven by the advances in technology, large and high-dimensional data have become the rule rather than the exception. Approaches that allow for feature selection with such data are thus highly sought after, in particular, since standard methods, like cross-validated Lasso, can be computationally intractable and, in any case, lack theoretical guarantees. In this paper, we propose a novel approach to feature selection in regression. Consisting of simple optimization steps and tests, it is computationally more efficient than existing methods and, therefore, suited even for very large data sets. Moreover, in contrast to standard methods, it is equipped with sharp statistical guarantees. We thus expect that our algorithm can help to leverage the increasing volume of data in Biology, Public Health, Astronomy, Economics, and other fields.

**Annotating Derivations: A New Evaluation Strategy and Dataset for Algebra Word Problems**

We propose a new evaluation for automatic solvers for algebra word problems, which can identify reasoning mistakes that existing evaluations overlook. Our proposal is to use derivations for evaluations, which reflect the reasoning process of the solver by explaining how the equation system was constructed. We accomplish this by developing an algorithm for checking the equivalence between two derivations, and showing how derivation annotations can be semi-automatically added to existing datasets. To make our experiments more comprehensive, we also annotated DRAW-1K , a new dataset of 1000 general algebra word problems. In total, our experiments span over 2300 algebra word problems. We found that the annotated derivation enable a superior evaluation of automatic solvers than previously used metrics.

**A Topological Algorithm for Determining How Road Networks Evolve Over Time**

We provide an efficient algorithm for determining how a road network has evolved over time, given two snapshot instances from different dates. To allow for such determinations across different databases and even against hand drawn maps, we take a strictly topological approach in this paper, so that we compare road networks based strictly on graph-theoretic properties. Given two road networks of same region from two different dates, our approach allows one to match road network portions that remain intact and also point out added or removed portions. We analyze our algorithm both theoretically, showing that it runs in polynomial time for non-degenerate road networks even though a related problem is NP-complete, and experimentally, using dated road networks from the TIGER/Line archive of the U.S. Census Bureau.

**Constraint-Based Clustering Selection**

Semi-supervised clustering methods incorporate a limited amount of supervision into the clustering process. Typically, this supervision is provided by the user in the form of pairwise constraints. Existing methods use such constraints in one of the following ways: they adapt their clustering procedure, their similarity metric, or both. All of these approaches operate within the scope of individual clustering algorithms. In contrast, we propose to use constraints to choose between clusterings generated by very different unsupervised clustering algorithms, run with different parameter settings. We empirically show that this simple approach often outperforms existing semi-supervised clustering methods.

**Language as a Latent Variable: Discrete Generative Models for Sentence Compression**

In this work we explore deep generative models of text in which the latent representation of a document is itself drawn from a discrete language model distribution. We formulate a variational auto-encoder for inference in this model and apply it to the task of compressing sentences. In this application the generative model first draws a latent summary sentence from a background language model, and then subsequently draws the observed sentence conditioned on this latent summary. In our empirical evaluation we show that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data. Further, we explore semi-supervised compression scenarios where we show that it is possible to achieve performance competitive with previously proposed supervised models while training on a fraction of the supervised data.

**Changepoint Detection in the Presence of Outliers**

Many traditional methods for identifying changepoints can struggle in the presence of outliers, or when the noise is heavy-tailed. Often they will infer additional changepoints in order to fit the outliers. To overcome this problem, data often needs to be pre-processed to remove outliers, though this is not feasible in applications where the data needs to be analysed online. We present an approach to changepoint detection that is robust to the presence of outliers. The idea is to adapt existing penalised cost approaches for detecting changes so that they use cost functions that are less sensitive to outliers. We argue that cost functions that are bounded, such as the classical biweight cost, are particularly suitable — as we show that only bounded cost functions are robust to arbitrarily extreme outliers. We present a novel and efficient dynamic programming algorithm that can then find the optimal segmentation under our penalised cost criteria. Importantly, this algorithm can be used in settings where the data needs to be analysed online. We present theoretical bounds on the worst-case complexity of this algorithm, and show empirically that its average computational cost is linear in the amount of the data. We show the usefulness of our approach for applications such as analysing well-log data, detecting copy number variation, and detecting tampering of wireless devices.

**GIRAF: A Fast Algorithm for Structured Low-Rank Matrix Recovery**

Structured low-rank matrix priors are emerging as powerful alternatives to traditional image recovery methods such as total variation (TV) and wavelet regularization. The main challenge in applying these schemes to large-scale problems is the computational complexity and memory demand resulting from a lifting of the image to a large structured matrix. We introduce a fast and memory efficient algorithm that exploits the convolution structure of the structured matrix to work in the original non-lifted domain, thus considerably reducing the complexity. Our experiments on the recovery of images from undersampled Fourier measurements show that the resulting algorithm provides improved reconstructions over TV regularization with comparable computation time.

• Deep Learning in Multi-Layer Architectures of Dense Nuclei

• Robust Confidence Intervals in High-Dimensional Left-Censored Regression

• Chains conditions in algebraic lattices

• Computationally Efficient Markov Chain Monte Carlo Methods for Hierarchical Bayesian Inverse Problems

• Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers

• Review of multi-fidelity models

• Multilayer Spectral Graph Clustering via Convex Layer Aggregation

• Neighborhood growth dynamics on the Hamming plane

• Functional Laws for Trimmed Levy Processes

• Matching preclusion for $n$-grid graphs

• A Novel Progressive Multi-label Classifier for Classincremental Data

• Statistical Modeling for Spatio-Temporal Degradation Data

• Empty-car routing in ridesharing systems

• Deep Multi-Task Learning with Shared Memory

• Fully Bayesian Estimation and Variable Selection in Partially Linear Wavelet Models

• Hereditarily Non Uniformly Perfect Sets

• On the (im)possibility of fairness

• Novel stochastic properties of the short-time spectrum for unvoiced pronunciation modeling and synthesis

• Using Neural Network Formalism to Solve Multiple-Instance Problems

• Tight fluctuations of weight-distances in random graphs with infinite-variance degrees

• Percolation on networks with weak and heterogeneous dependency

• Random Popular Matchings with Incomplete Preference Lists

• A Computer Algebra Package for Polynomial Sequence Recognition

• Transport-entropy inequalities on locally acting groups of permutations

• Semiparametric clustered overdispersed multinomial goodness-of-fit of log-linear models

• Estimating Probability Distributions using ‘Dirac’ Kernels (via Rademacher-Walsh Polynomial Basis Functions)

• On rotated Schur-positive sets

• Implicit renewal theory in the arithmetic case

• Stabilizing on the distinguishing number of a graph

• Some results on the Signature and Cubature of the Fractional Brownian motion for $H>\frac{1}{2}$

• Renyi-Ulam Games and Forbidden Substrings

• Block-proximal methods with spatially adapted acceleration

• Multi-Output Artificial Neural Network for Storm Surge Prediction in North Carolina

• Discovering Sound Concepts and Acoustic Relations In Text

• A penalized likelihood method for classification with matrix-valued predictors

• A Few Photons Among Many: Unmixing Signal and Noise for Photon-Efficient Active Imaging

• One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities

• On the Non-Existence of Unbiased Estimators in Constrained Estimation Problems

• Watermark Options

• Delocalising the parabolic Anderson model through partial duplication of the potential

• An unbiased Monte Carlo estimator for derivatives. Application to CIR

• Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game

• Gershgorin disks for multiple eigenvalues of non-negative matrices

• MPI Parallelization of the Resistive Wall Code STARWALL: Report of the EUROfusion High Level Support Team Project JORSTAR

• Quasi-Monte Carlo for an Integrand with a Singularity along a Diagonal in the Square

• Finding long simple paths in a weighted digraph using pseudo-topological orderings

• AMR-to-text generation as a Traveling Salesman Problem

• A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator

• Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian Additive Regression Trees

• Discreet Coin Weighings and the Sorting Strategy

• Conditionally Bi-Free Independence for Pairs of Algebras

• Asymptotic tensor rank of graph tensors: beyond matrix multiplication

• Screening Rules for Convex Problems

• Incorporating Relation Paths in Neural Relation Extraction

**26**
*Monday*
Sep 2016

Posted Books

in
**26**
*Monday*
Sep 2016

Posted Books

in
**26**
*Monday*
Sep 2016

Posted R Packages

in* Simple Memory Profiling for R* (

A simple and light-weight API for memory profiling of R expressions. The profiling is built on top of R’s built-in memory profiler (‘utils::Rprofmem()’), which records every memory allocation done by R (also native code).

To take nested function calls and convert them to a more readable form using pipes from package ‘magrittr’.

Multi-Objective optimization based on the Kriging metamodel. Important functions: mkm, VMPF, MEGO and HEGO.

An implementation of the network fingerprint framework. This method worked by making systematic comparisons to a set of well-studied ‘basic networks’, measuring both the functional and topological similarity. A biological could be characterized as a spectrum-like vector consisting of similarities to basic networks. It shows great potential in biological network study.

**26**
*Monday*
Sep 2016

Posted Documents

inWe propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. Fast unfolding of communities in large networks

**26**
*Monday*
Sep 2016

Posted What is ...

inWe introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs. … Feedforward Sequential Memory Networks (FSMN)

**26**
*Monday*
Sep 2016

Posted Magister Dixit

in“Now go, write it before them in a table, and note it in a book.” Isaiah, XXX 8

**25**
*Sunday*
Sep 2016

Posted Books

in
**25**
*Sunday*
Sep 2016

Posted Books

in