Introduction to Conditional Probability and Bayes theorem for data science professionals

Understanding of probability is must for a data science professional. Solutions to many data science problems are often probabilistic in nature. Hence, a better understanding of probability will help you understand & implement these algorithms more efficiently. In this article, I will focus on conditional probability. For beginners in probability, I would strongly recommend that you go through this article before proceeding further. A predictive model can easily be understood as a statement of conditional probability. For example, the probability of a customer from segment A buying a product of category Z in next 10 days is 0.80. In other words, the probability of a customer buying product from Category Z, given that the customer is from Segment A is 0.80. In this article, I will walk you through conditional probability in detail. I’ll be using examples & real-life scenarios to help you improve your understanding.


Being an Ontologist

I am sometimes asked whether I am working on the stats, whether I am making progress on the stats, and what I do with all of the stats. People are also prone to hyperbole. I am told that I sure work on a lot of stats, I am always keeping myself busy doing stats, and I am the person to go to for stats. I suppose my real job is more mysterious than the one others imagine that I do. I first want to explain that for everyday people, the term “stats” or “statistics” often means historical data rather than statistics in a substantive sense. So people actually mean that I surround myself with data, which is certainly true. However, I would say that I don’t do much true statistics. (I do some statistics maybe in conjunction with charts and reports.) I spend most of my time on ontology, which enables me to transform resources into useful metrics. I didn’t know the meaning of ontology prior to my graduate studies. By sheer chance, I was asked to explain ontology to my class. I said that it certainly isn’t oncology relating to the study and treatment of cancerous tumours. Ontology is the study of existence or being. Worded differently, it is the study of how things come into existence or being. I was listening to an academic provide his perspective. I apologize for forgetting his name. He said that ontology can be interpreted as how things gain relevance. My perspective has always been that ontology gives rise to data. Data is a symbolic representation of something that we have chosen to recognize as relevant; there are layers of this from the perceived reality to the data that appears on my charts.


How to Start an R Project

R is the most widely used programming language in data analysis and data mining. When you first get started with R it can get a little but intimidating if you are a newbie, and sometimes even for statistics pros as the syntax can be a little bit new. There are several ways you can access R. You can install it to your Mac, PC or Linux machine and run it from the terminal. There are also various clients you can install to assist you with the user experience. Datazar, on the other hand, offers a cloud based client for R. Meaning you can use R right in your browser and analyze data, create charts, use packages and share your results.


Technical preview: Native GPU programming with CUDAnative.jl

After 2 years of slow but steady development, we would like to announce the first preview release of native GPU programming capabilities for Julia. You can now write your CUDA kernels in Julia, albeit with some restrictions, making it possible to use Julia’s high-level language features to write high-performance GPU code. The programming support we’re demonstrating here today consists of the low-level building blocks, sitting at the same abstraction level of CUDA C. You should be interested if you know (or want to learn) how to program a parallel accelerator like a GPU, while dealing with tricky performance characteristics and communication semantics. You can easily add GPU support to your Julia installation (see below for detailed instructions) by installing CUDAnative.jl. This package is built on top of experimental interfaces to the Julia compiler, and the purpose-built LLVM.jl and CUDAdrv.jl packages to compile and execute code. All this functionality is brand-new and thoroughly untested, so we need your help and feedback in order to improve and finalize the interfaces before Julia 1.0.


Text Analytics: A Primer

Marketing scientist Kevin Gray asks Professor Bing Liu to give us a quick snapshot of text analytics in this informative interview.
Advertisements