Effective Management of High Volume Numeric Data with Histograms

How do you capture and organize billions of measurements per second such that you can answer a rich set of queries effectively (percentiles, counts below X, aggregations across streams), and you don’t blow through your AWS budget in minutes To effectively manage billions of data points, your system has to be both performant and scalable. How do you accomplish that Not only do your algorithms have to be on point, but your implementation of them has to be efficient. You want to avoid allocating memory where possible, avoid copying data (pass pointers around instead), avoid locks, and avoid waits. Lots of little optimizations that add up to being able to run your code as close to the metal as possible. You also need to be able to scale your data structures. They need to be as size efficient as possible, which means using strongly typed languages with the optimum choice of data types. We’ve found that histograms are the most efficient data structure for storing the data types we care about at scale.


Top 7 Data Science Use Cases in Finance

1. Automating risk management
2. Managing customer data
3. Predictive analytics
4. Real-time analytics – Fraud detection
5. Real-time analytics – Consumer analytics
6. Real-time analytics – Algorithmic trading
7. Deep personalization and customization


Machine learning: A quick and simple definition

Get a basic overview of machine learning and then go deeper with recommended resources.


Exploratory Factor Analysis in R

As apparent from the bfi survey example, factor analysis is helpful in classifying our current features into factors which represent hidden features not measured directly. It also has an additional advantage of helping reduce our data into a smaller set of features without losing much information. There are a few things to keep in mind before putting factor analysis into action. The first is about the values of factor loadings. We may have datasets where the factor loadings for all factors are low – lower than 0.5 or 0.3. While a factor loading lower than 0.3 means that you are using too many factors and need to re-run the analysis with lesser factors. A range of loadings around 0.5 is satisfactory but indicates poor predicting ability. You should later keep thresholds and discard factors which have a loading lower than the threshold for all features. Factor analysis on dynamic data can also be helpful in tracking changes in the nature of data. In case the data changes significantly, the number of factors in exploratory factor analysis will also change and indicate you to look into the data and check what changes have occurred. The final one of importance is the interpretability of factors. In case you are unable to understand or explain the factor loadings, you are either using a very granular or very generalized set of factors. In this case, you need to find the right number of factors and obtain loadings to features which are both interpretable and beneficial for analysis. There can be a variety of other situations that may occur with factor analysis and are all subject to interpretation.


Format and Interpret Linear Mixed Models

You find it time-consuming to manually format, copy and paste output values to your report or manuscript That time is over: the psycho package is here for you!


PyTorch Tensor Basics

This is an introduction to PyTorch’s Tensor class, which is reasonably analogous to Numpy’s ndarray, and which forms the basis for building neural networks in PyTorch.
Advertisements