Cookiecutter Data Science – A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it’s easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! And we’re not talking about bikeshedding the indentation aesthetics or pedantic formatting standards – ultimately, data science code quality is about correctness and reproducibility. It’s no secret that good analyses are often the result of very scattershot and serendipitous explorations. Tentative experiments and rapidly testing approaches that might not work out are all part of the process for getting to the good stuff, and there is no magic bullet to turn data exploration into a simple, linear progression. That being said, once started it is not a process that lends itself to thinking carefully about the structure of your code or project layout, so it’s best to start with a clean, logical structure and stick to it throughout. We think it’s a pretty big win all around to use a fairly standardized setup like this one.


Conversations Gone Awry: Detecting Early Signs of Conversational Failure

One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged. To this end, we develop a framework for capturing pragmatic devices – such as politeness strategies and rhetorical prompts – used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions.


Temporal Difference Learning in Python

Temporal-Difference Learning (or TD Learning) is quite important and novel thing around. It´s the first time where you can really see some patterns emerging and everything is building upon a previous chapter. – DP + MC = TD The pseudo equation above is not scientific at all but gives an idea about the Temporal-Difference learning. It´s basically some ideas from Monte Carlo and some from Dynamic Programming stuff. Monte Carlo – because TD learns from experience, without a model of any kind Dynamic Programming – as TD doesn´t wait for episode completion. TD bootstraps It seems that those three terms are the foundation of Reinforcement Learning. It will appear in different combinations over and over from now on.


How to easily automate Drone-based monitoring using Deep Learning

This article is a comprehensive overview of using deep learning based object detection methods for aerial imagery via drones.


Realtime tSNE Visualizations with TensorFlow.js

In recent years, the t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. Used to interpret deep neural network outputs in tools such as the TensorFlow Embedding Projector and TensorBoard, a powerful feature of tSNE is that it reveals clusters of high-dimensional data points at different scales while requiring only minimal tuning of its parameters. Despite these advantages, the computational complexity of the tSNE algorithm limits its application to relatively small datasets. While several evolutions of tSNE have been developed to address this issue (mainly focusing on the scalability of the similarity computations between data points), they have so far not been enough to provide a truly interactive experience when visualizing the evolution of the tSNE embedding for large datasets.


Learn about AI with these books, videos, and tutorials

This collection of AI resources will get you up to speed on the basics, best practices, and latest techniques.


Advanced Motif Analysis on Text Induced Graphs

Motif analysis counts the number of reoccurring patterns (or motifs) in a graph and connects these statistical numbers to the intrinsic semantics of the graph. In this thesis, we will demonstrate the potential of motif analysis on textual data, and introduce new concepts that extend conventional motifs. In particular, we will focus on three main research questions:
1. Can we use graph motifs to assess text quality
Based on the open encyclopedia Wikipedia, we transform articles of various quality levels into graph structures. There, we find motifs that indicate high or low article quality, and we connect these motifs to linguistic patterns. We also show that a qualitative analysis of the most relevant patterns can yield fruitful insights to our understanding of quality. We then take a look at quality from a very different angle and analyze motifs in the user interaction of collaborative writing communities. These interaction motifs allow us to assess the overall online community success, measured by a combination of growth and user traffic. Certain combinations of user groups show consistent beneficial or detrimental effects on the community performance.
2. How do motifs change over time
Having established that motif analysis can detect quality on different levels, we now focus at the progression of motifs in dynamic graphs. We take another look at Wikipedia articles, in particular at local text changes in article revisions. To capture patterns in these text revisions, we introduce metamotifs, or motifs of motifs. We also define the novel concept of motif stability – motifs of high stability tend to persist in dynamic graphs, motifs of low stability almost always get changed into other motifs. We present strong correlations between motif stability, established motif characteristics and the quality of the source text.
3. Are metamotifs (motifs of motifs) an improvement over simple motifs and methods
Finally, we confirm the capabilities of metamotifs, but also quantify their predictive power in a classification experiment of political speeches. To generalize from surface text level, we use semantic frames, which are more abstract than words. With a combination of semantic frames and metamotif analysis on US presidency and German Bundestag data, we confirm that metamotifs outperform traditional motifs and simpler approaches when used as machine learning features.


DIY Deep Learning Projects

Inspired by the great work of Akshay Bahadur in this article you will see some projects applying Computer Vision and Deep Learning, with implementations and details so you can reproduce them on your computer.


The True Way to Define Business Metrics for Startups

Data has become a cargo cult. Collect more data, calculate more metrics, hire more analysts, let them figure out what this is all for – and you´re considered to be data driven. I´ve had it up to here while consulting startups over the past three years and helping them to define business metrics. In this article, I´ll try to summarize my experience and address both technical and process aspects, useful both for data specialists and business users. We´ll talk about each step of the analytical process of how to define metrics.
Advertisements