Getting started with Julia – a high level, high performance language for computing Business Analytics
As part of this learning, I continuously look out for new developments happening in new tools and techniques. It was in this desire to continuously learn that I came across Julia about a year back. It was in very early stages then – it still is! But, there is something special about Julia, which makes it a compelling tool to learn for all future data scientists. So, I thought to write a few articles on it. This is first of these articles, which provides the motivation to learn Julia, its installation, current packages available and ways to become part of Julia community.

Intention analysis using topic models
Topic models are great to categorize WHAT a text is about. It is pretty easy as well: Get an off-the-shelf LDA, train it on your corpus and you are set to go. But there are even more insights you can get about your texts. Modifying your corpus in a certain way (mostly removing everything but verb phrases) allows you to gather a deeper understanding about WHY a certain text was written.

Emacs for Data Science
Data science nowadays demands a polyglot developer and, choosing a correct code editor would definitely be a worthy investment. Here we provide, important features of Emacs and its advantages over other editors.

Finding the essential R packages using the pagerank algorithm
In this post I illustrate:
• Using the miniCRAN package to build a graph of package dependencies (see previous blog post)
• Using page.rank() to compute the most relevant packages
• Incidentally, I also make use of the %>% pipes, exposed by the magrittr package (previous blog post).

The network structure of CRAN
Finding the essential R packages using the pagerank algorithm and Finding clusters of CRAN packages using igraph.

Line plots of longitudinal summary data in R using ggplot2

Faceted ‘World Population by Income’ Choropleths in ggplot
Poynter did a nice interactive piece on world population by income (i.e. ‘How Many Live on How Much, and Where’). I’m always on the lookout for optimized shapefiles and clean data (I’m teaching a data science certificate program starting this Fall) and the speed of the site load and the easy availability of the data set made this one a ‘must acquire’. Rather than just repeat Poynter’s D3-goodness, here’s a way to look at the income data in series of small multiple choropleths—using R & ggplot2#

Data Scientist
library(NewCo knowledge)
function (X, FUN, …, ) {FUN <-
Read the business wires +
Go to lunch with wide range of people +
Read the 10-K and maybe 10-Q +
Find a go-to source for ‘stupid questions’
else Ignorant
}
library(credibility)
function (X, FUN, …, ) {FUN <-
Double-check all assumptions +
Underpromise +
Save counterintuitive findings for last +
Find a potential advocate and find a project to help with
else Ignored
}