Time Series Analysis in R Part 1: The Time Series Object

Many of the methods used in time series analysis and forecasting have been around for quite some time but have taken a back seat to machine learning techniques in recent years. Nevertheless, time series analysis and forecasting are useful tools in any data scientist’s toolkit. Some recent time series-based competitions have recently appeared on kaggle, such as one hosted by Wikipedia where competitors are asked to forecast web traffic to various pages of the site. As an economist, I have been working with time series data for many years; however, I was largely unfamiliar with (and a bit overwhelmed by) R’s functions and packages for working with them. From the base ts objects to a whole host of other packages like xts, zoo, TTR, forecast, quantmod and tidyquant, R has a large infrastructure supporting time series analysis. I decided to put together a guide for myself in Rmarkdown. I plan on sharing this as I go in a series of blog posts. In part 1, I’ll discuss the fundamental object in R – the ts object.


Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural

Shot boundary detection (SBD) is an important component of video analysis, as it is used in many video applications such as automatic highlight detection, action recognition and assisting in manual video editing. As such, its something our team at gifs.com cares deeply about.


The Ten Fallacies of Data Science

1. The data exists.
2. The data is accessible.
3. The data is consistent.
4. The data is relevant.
5. The data is intuitively understandable.
6. The data can be processed.
7. Analyses can be easily re-executed.
8. Where we’re going we don’t need encryption.
9. Analytics outputs are easily shared and understood.
10. The answer you’re looking for is there in the first place.


Introduction to Graphs and Networks

This week, we will start the first graduate course on graphs and networks. Slides are available online.


What You Need to Know Before Investing in Containers

In this special guest feature, Ash Wilson, Strategic Engineering Specialist at CloudPassage, has put together a list of tips and actions for those looking to implement or are currently investing in container technology. Ash Wilson is originally from Apison, Tennessee, and has been living in San Francisco since 2012. He has been a paid tech worker since March 2000, and a hobbyist long before that. He came to security via network engineering and systems administration. Ash spent the last five years in post-sales engineering and strategic engineering for security product companies and currently works for CloudPassage.


Big Data Architecture: A Complete and Detailed Overview

Data scientists may not be as educated or experienced in computer science, programming concepts, devops, site reliability engineering, non-functional requirements, software solution infrastructure, or general software architecture as compared to well-trained or experienced software architects and engineers.


Evaluating Data Science Projects: A Case Study Critique

I’ve written two blog posts on evaluation—the broccoli of machine learning. There are actually two closely related concerns under the rubric of evaluation:
• Model evaluation is typically taught to data scientists and concerns the technical quality of a model: How well does the model perform? Can we trust the numbers? Are they statistically significant?
• Project evaluation includes model evaluation, but also asks questions of the application context: Is the right problem being solved? Is the performance metric appropriate to the task? How are the data being provided and how is the model result used? Are the costs acceptable?
Both types are important not only to data scientists but also to managers and executives, who must evaluate project proposals and results. To managers I would say: It’s not necessary to understand the inner workings of a machine learning project, but you should understand whether the right things have been measured and whether the results are suited to the business problem. You need to know whether to believe what data scientists are telling you.


How to make a line chart with ggplot2

Last week’s blog post about Amazon’s search for a location for a second headquarters left me thinking about the company’s growth. After looking at the long term growth of the stock price, it occurred to me that visualizing the stock price data would be a great example of how to create a line chart in R using ggplot2. So in this blog post, I’ll show you how to make a line chart with ggplot2, step by step.


Accessing patent data with the patentsview package

Patents play a critical role in incentivizing innovation, without which we wouldn’t have much of the technology we rely on everyday


Automating roxygen2 package documentation

Thinking of creating a new package? Dread the task of function documentation? Afraid to run devtools::check(build_args = ‘–as-cran’) and get bombarded by Errors, Warnings, and Notes (oh my!) ? Wait…. breathe!


Sacred – including Tensorflow Integration

Sacred is a tool to configure, organize, log and reproduce computational experiments. It is designed to introduce only minimal overhead, while encouraging modularity and configurability of experiments. The ability to conveniently make experiments configurable is at the heart of Sacred. If the parameters of an experiment are exposed in a this way, it will help you to:
• keep track of all the parameters of your experiment
• easily run your experiment for different settings
• save configurations for individual runs in files or a database
• reproduce your results

Sacred provides ways to interact with the Tensorflow library. The goal is to provide an API that would allow tracking certain information about how Tensorflow is being used with Sacred. The collected data are stored in experiment.info[‘tensorflow’] where they can be accessed by various observers.

Advertisements