Abusing the Power of Retail Analytics
Real-time analytics can even help brick-and-mortar stores optimize the number of workers on the sales floor via ‘on-call shifts.’ Workers call, text or e-mail just before their shifts to find out whether or not they’re actually going to work. The downside? When they don’t work, they don’t get paid.

Hash Table Performance in R: Part IV
In the last post I introduced the package envestigate that provides the hash table structure and interesting statistics associated with an R environment. Now I want to show you some performance characteristics of the R environment as a hash table. I’ll be using a synthetic list of distinct words that mimics real world data scraped from Wikipedia which you can scope out here.

Randomized Experimentation
One good thing about doing machine learning at present is that people actually use it! The back-ends of many systems we interact with on a daily basis are driven by machine learning. In most such systems, as users interact with the system, it is natural for the system designer to wish to optimize the models under the hood over time, in a way that improves the user experience. To ground the discussion a bit, let us consider the example of an online portal, that is trying to present interesting news stories to its user. A user comes to the portal and based on whatever information the portal has on the user, it recommends one (or more) news stories. The user chooses to read the story or not and life goes on. Naturally the portal wants to better tailor the stories it displays to the users’ taste over time, which can be observed if users start to click on the displayed story more often.

Microsoft Hiring Engineers for R Projects
Are you a talented software engineer who would like to build out the R ecosystem and help more companies access the power of R? Microsoft (Revolution Analytics’ parent) is hiring a new team to do just that: Our mission is to empower enterprises to easily and cost-effectively build high-scale analytics solutions leveraging R.

A Big Article About Wee Things
These are not always explicitly about conveying information, but they’re tiny moments that we can make an emotional connection to the user. Delight is something we may overlook sometimes or not bother spending time on, but it can really make a difference. If this tour of wee things has taught us anything, it that it’s worth trying to make those connections, worth spending a little more time and effort on the details – even if they are very small.

Remote Data Science
Remote working and Data Science. Together.

scikit-learn video #3: Machine learning first steps with the Iris dataset
Welcome back to my new video series on machine learning with scikit-learn. Last week, we discussed the pros and cons of scikit-learn, showed how to install scikit-learn independently or as part of the Anaconda distribution of Python, walked through the IPython Notebook interface, and covered a few resources for learning Python if you don’t already know the language. This week, we’re going to take our first steps in scikit-learn by loading and exploring the famous Iris dataset!

A simple explanation of rejection sampling in R
The central quantity in Bayesian inference, the posterior, can usually not be calculated analytically, but needs to be estimated by numerical integration, which is typically done with a Monte-Carlo algorithm. The three main algorithm classes for doing so are
• \{Rejection Sampling}
• Markov-Chain Monte Carlo (MCMC) sampling
• Sequential Monte Carlo (SMC) sampling

Time Series Graphs & Eleven Stunning Ways You Can Use Them
Many graphs use a time series, meaning they measure events over time. William Playfair (1759 – 1823) was a Scottish economist and pioneer of this approach. Playfair invented the line graph. The graph below-one of his most famous-depicts how in the 1750s the Brits started exporting more than they were importing.

Conjoint Analysis and the Strange World of All Possible Feature Combinations
The choice modeler looks over the adjacent display of cheeses and sees the joint marginal effects of the dimensions spanning the feature space: milk source, type, origin, moisture content, added mold or bacteria, aging, salting, packaging, price, and much more. Literally, if products are feature bundles, then one needs to specify all the sources of variation generating so many different cheeses. Here are the cheeses from goats, sheep and cows. Some are local, and some are imported from different countries. In addition, we will require columns separating the hard and soft cheeses. The feature list can become quite long. In the end, one accounts for all the different cheeses with a feature taxonomy consisting of a large multidimensional space of all possible feature combinations. Every cheese falls into a single cell in the joint distribution, and the empty cells represent new product possibilities (unless the feature configuration is impossible).

R Diagram: A plot of co-authorships in my little corner of science
Here’s a mostly useless visualization of the collection of journal articles that sits in my reference database in Endnote. I deal mostly in marine biology, physiology, biomechanics, and climate change papers, with a few molecular/genetics papers thrown in here and there. The database has 3325 entries, 2 of which have ambiguous publication years and aren’t represented above. This is by no means an exhaustive survey of the literature in my field, it’s just an exhaustive survey of the literature on my computer.

Parallel Simulation of Heckman Selection Model
The Heckman Model (aka Heckman correction/Heckit) treats this correlation between the errors as a form of mis-specification and omitted variable bias. If we could observe the behavioral factors that feed into the error terms, this would not be a problem. Alas, that is generally not possible.

Supplementing your R package with a Shiny app
The R community is generally very fond of open-source-ness and the idea of releasing all code to the public. Writing packages has become such an easy experience now that Hadley’s devtools is so powerful, and as a result there are new packages being released by useRs every single day. A good package needs to have two things: useful functionality, and clear usage instructions. The former is a no-brainer, while the latter is what developers usually dread the most – the D-word (Documentation. Yikes.). Proper documentation is essential so that others will know what your package can do and how to do it. And with the use of Shiny, we now have another great tool we can use to showcase a package’s capabilities.