Text Normalization with Spark – Part 2

This is second in a two part series that talks about Text Normalization using Spark.In this blog post, we are going to understand the jargon (jobs,stags and executors) of Apache Spark with Text Normalization application using Spark history server UI. To get a better understanding of the use case, please refer our Text Normalization with Spark – Part 1 blog post.


Displayr is a data science, visualization, and reporting platform for everyone.

New R course: Beginning Bayes in R

There are two schools of thought in the world of statistics, the frequentist perspective and the Bayesian perspective. At the core of the Bayesian perspective is the idea of representing your beliefs about something using the language of probability, collecting some data, then updating your beliefs based on the evidence contained in the data. This provides a convenient way of implementing the scientific method for learning about the world we live in. Bayesian statistics is increasingly popular due to recent improvements in computation, the ability to fit a wide range of models and to produce intuitive interpretations of the results.

The Periodic Table of Data Science

This periodic table can serve as a guide to navigate the key players in the data science space. The resources in the table were chosen by looking at surveys taken from data science users, such as the 2016 Data Science Salary Survey by O’Reilly, the 2017 Magic Quadrant for Data Science Platforms by Gartner, and the KD Nuggets 2016 Software Poll results, among other sources. The categories in the table are not all mutually exclusive.

Open sourcing Sonnet – a new library for constructing neural networks

It’s now nearly a year since DeepMind made the decision to switch the entire research organisation to using TensorFlow (TF). It’s proven to be a good choice – many of our models learn significantly faster, and the built-in features for distributed training have hugely simplified our code. Along the way, we found that the flexibility and adaptiveness of TF lends itself to building higher level frameworks for specific purposes, and we’ve written one for quickly building neural network modules with TF. We are actively developing this codebase, but what we have so far fits our research needs well, and we’re excited to announce that today we are open sourcing it. We call this framework Sonnet.

Shrinking and accelerating deep neural networks

Deep neural networks have proven powerful for a variety of applications, but their sheer size places sobering constraints on speed, memory, and power consumption. These limitations become particularly important given the rise of mobile devices and their limited hardware resources. In this talk, Song Han shows how compression techniques can alleviate these challenges by greatly reducing the size of deep neural nets. He also demonstrates an energy-efficient engine that performs inference to greatly accelerate computation, making deep learning more practical as it spills from university campus to production.