Sharding Your Database

I’m increasingly encountering users on Heroku that are encountering the need to shard their data. For most users this is something you delay as long as possible as you can generally go for sometime before you have to worry about it. Additionally scaling up your database is often a reasonable approach early on and something I encourage as a starting point as scaling up is easy to do with regards to databases. However, for the 1% of users that do need to shard when the time comes many are left wondering where to start, hence the following guide.


Five sharding data models and which is right

When it comes to scaling your database, there are challenges but the good news is that you have options. The easiest option of course is to scale up your hardware. And when you hit the ceiling on scaling up, you have a few more choices: sharding, deleting swaths of data that you think you might not need in the future, or trying to shrink the problem with microservices. Deleting portions of your data is simple, if you can afford to do it. Regarding sharding there are a number of approaches and which one is right depends on a number of factors. Here we’ll review a survey of five sharding approaches and dig into what factors guide you to each approach.


Visualizing convolutional neural networks

In this tutorial, I’ll walk you through how to build a convolutional neural network from scratch, using just the low-level TensorFlow and visualizing our graph and network performance using TensorBoard. If you don’t understand some of the basics of a fully connected neural network, I highly recommend you first check out Not another MNIST tutorial with TensorFlow. Throughout this article, I will also break down each step of the convolutional neural network to its absolute basics so you can fully understand what is happening in each step of the graph. By building this model from scratch, you can easily visualize different aspects of the graph so that you can see each layer of convolutions and use them to make your own inferences. I will only highlight major aspects of the code, so if you would like to follow this code step-by-step, you can checkout the corresponding Jupyter Notebook on GitHub.


Scikit-Learn for Text Analysis of Amazon Fine Food Reviews

We know that Amazon Product Reviews Matter to Merchants because those reviews have a tremendous impact on how we make purchase decisions. So, I downloaded an Amazon fine food reviews data set from Kaggle that originally came from SNAP, to see what I could learn from this large data set. Our aim here isn’t to achieve Scikit-Learn mastery, but to explore some of the main Scikit-Learn tools on a single CSV file: by analyzing a collection of text documents (568,454 food reviews) up to and including October 2012. Let’s get started.


Create Chart Templates Using R Functions

R functions can be used to create chart templates, which keep the look and feel of reports consistent. This post gives step by step guide on how to create chart templates using R functions. R users should already be familiar with calling functions to run calculations and draw plots. However, if you are using a function to create a plot, you need to have a way to remember which fonts and colors to use each time. R functions can be used as a short-cut to creating standard-looking charts, by storing preference for options like font and color. If you have never created a function in R, or, never created functions to automate the creation of charts, then this post is for you! This post is written for beginners, so if you understand these things, then skip the post.


Working with data frames in SQL Server R Services

Most R users are quite familiar with data frames: the data.frame is the fundamental object type for working with columnar data in R. But for SQL Server users, the data frame is an important concept to understand, since it will be the main object type in R used to store data from SQL tables. This guide to working with data frames with SQL Server R Services provides the basic concepts of R data frames and how to generate them and manipulate them within a SQL Server procedure.


Probability functions advanced

In this set of exercises, we are going to explore some applications of probability functions and how to plot some density functions. The package MASS will be used in this set. Note: We are going to use random numbers functions and random processes functions in R such as runif. A problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible, you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random number process.) For this set of exercises, we will use set.seed(1). Don’t forget to specify it before every exercise that includes random numbers.
Advertisements