One of the assumptions of Classical Linear Regression Model is that there is no exact collinearity between the explanatory variables. If the explanatory variables are perfectly correlated, you will face with these problems:
• Parameters of the model become indeterminate
• Standard errors of the estimates become infinitely large
R 3.4.2 (codename “Short Summer”) was released yesterday.
One relatively common question in statistics or data science is, how “big” is the difference or the effect? At this point we can state with some statistical confidence that tire brand matters in predicting tire mileage life, it isn’t likely given our data that we would see results like these by chance. But… Is this a really big difference between the brands? Often this is the most important question in our research. After all if it’s a big difference we might change our shopping habits and/or pay more. Is there a way of knowing how big this difference is?
Where we left off? Oh, right, we learned about how to use variables in Python. Here is the second essential topic, that you have to learn, if you are going to use Python as a Data Scientist: Python Data Structures!
This entry is a non-exhaustive introduction on how to create interactive content directly from your Jupyter notebook. Content mostly refers to data visualization artifacts, but we’ll see that we can easily expand beyond the usual plots and graphs, providing worthy interactive bits for all kind of scenarios, from data-exploration to animations.
Next week, I will start a short course on probability and statistics. The slides of the course are now online. There will be more information soon about the exam and the projects.
In the video presentation below (courtesy of Yandex) – “Deep Learning: Theory, Algorithms, and Applications” – Naftali Tishby, a computer scientist and neuroscientist from the Hebrew University of Jerusalem, provides evidence in support of a new theory explaining how deep learning works. Tishby argues that deep neural networks learn according to a procedure called the “information bottleneck,” which he and two collaborators first described in purely theoretical terms in 1999. The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts. Striking new computer experiments by Tishby and his student Ravid Shwartz-Ziv reveal how this squeezing procedure happens during deep learning, at least in the cases they studied.
In this multi-part series, we will explore how to get started with tensorflow. This tensorflow tutorial will lay a solid foundation to this popular tool that everyone seems to be talking about. The second part is a tensorflow tutorial on getting started, installing and building a small use case. This series is excerpts from a Webinar tutorial series I have conducted as part of the United Network of Professionals. Time to time I will be referring to some of the slides that I used there as part of the talk to make it clearer.
Ben Lorica discusses the state of machine learning.
Ziya Ma explains how Intel is driving a holistic approach to powering advanced analytics and artificial intelligence workloads.
Organizations are nowadays storing huge amounts of data related to various business processes. Process mining provides different methods and techniques to analyze and improve these processes. This allows companies to gain a competitive advantage. Process mining initiated with the discovery of work-flow models from event data. However, over the past 20 years, the process mining field has evolved into a broad and diverse research discipline. bupaR is an open-source suite for the handling and analysis of business process data in R. It was developed by the Business Informatics research group at Hasselt University, Belgium. The central package includes basic functionality for creating event log objects in R. It contains several functions to get information about an event log and also provides specific event log versions of generic R functions. Together with the related packages, each of which has their own specific purpose, bupaR aims at supporting each step in the analysis of event data with R, from data import to online process monitoring.
To rapidly master data science, you need to do several things:
1. Break it down
2. Figure out what to do, and what not to do
3. Design a plan
4. Learn
5. Practice
Let’s dive into each of these.