Using an R ‘template’ package to enhance reproducible research or the ‘R package syndrome’

Have you ever had the feeling that the creation of your data analysis report(s) resulted in looking up, copy-pasting and reuse of code from previous analyses? This approach is time consuming and prone to errors. If you frequently analyze similar data(-types), e.g. from a standardized analysis workflow or different experiments on the same platform, the automation of your report creation via an R ‘template’ package might be a very useful and time-saving step. It also allows to focus on the important part of the analysis (i.e. the experiment or analysis-specific part). Imagine that you need to analyze tens or hundreds of runs of data in the same format, making use of an R ‘template’ package can save you hours, days or even weeks. On the go, reports can be adjusted, errors corrected and extensions added without much effort.


CNN for Short-Term Stocks Prediction using Tensorflow

In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of neural networks that has successfully been applied to image recognition and analysis. In this project I’ve approached this class of models trying to apply it to stock market prediction, combining stock prices with sentiment analysis. The implementation of the network has been made using TensorFlow, starting from the online tutorial. In this article, I will describe the following steps: dataset creation, CNN training and evaluation of the model.


Using Data Analytics to Prevent, Not Just Report

I recently had another client conversation about optimizing their data warehouse and Business Intelligence (BI) environment. The client had lots of pride in their existing data warehouse and business intelligence accomplishments, and rightfully so. The heart of the conversation was about taking costs out of their reporting environments by consolidating runaway data marts and “spreadmarts,” and improving business analyst BI self-sufficiency. These types of conversations are good – saving money and improving effectiveness is always a good thing – but organizations need to be careful that they are not just “paving the cow path.” That is, are they just optimizing existing (old school) processes when new methodologies exist that can possibly eliminate those processes? Or as I challenged the customer: “Do you want to report, or do you want to prevent?”


State-of-the-art result for all Machine Learning Problems

This repository provides state-of-the-art (SoTA) results for all machine learning problems.


Estimating an Optimal Learning Rate For a Deep Neural Network

The learning rate is one of the most important hyper-parameters to tune for training deep neural networks. In this post, I’m describing a simple and powerful way to find a reasonable learning rate that I learned from fast.ai Deep Learning course. I’m taking the new version of the course in person at University of San Francisco. It’s not available to the general public yet, but will be at the end of the year at course.fast.ai (which currently has the last year’s version).


Timing in R

As time goes on, your R scripts are probably getting longer and more complicated, right? Timing parts of your script could save you precious time when re-running code over and over again. Today I’m going to go through the 4 main functions for doing so.


Store Data About Rows

Introduction to keyholder package. Tools for keeping track of information about rows.
• It might be a good idea to extract some package functionality into separate package, as this can lead to one more useful tool.
• Package keyholder offers functionality for keeping track of arbitrary data about rows after application of some user defined function. This is done by creating special attribute “keys” which is updated after every change in rows (subsetting, ordering, etc.).
Advertisements