Bayesian Inference via Simulated Annealing
I recently finished a course on discrete optimization and am currently working through Richard McElreath’s excellent textbook Statistical Rethinking. Combining the two, and duly jazzed by this video on the Traveling Salesman Problem, I’d thought I’d build a toy Bayesian model and try to optimize it via simulated annealing. This work was brief, amusing and experimental. The result is a simple Shiny app that contrasts MCMC search via simulated annealing versus the (more standard) Metropolis algorithm. While far from groundbreaking, I did pick up the following few bits of intuition along the way.
How is Deep Learning Changing Data Science Paradigms?
Deep learning is changing everything – and it’s here to stay. Just as electronics and computers transformed all economic activities, artificial intelligence will reshape retailing, transport, manufacturing, medicine, telecommunications, heavy industry…even data science itself. And that list of applications is still growing, as is the list of complex tasks where AI does better than humans. Here at Schibsted we see the opportunities deep learning offers, and we’re excited to contribute.
What Exactly The Heck Are Prescriptive Analytics?
Prescriptive analytics is about using data and analytics to improve decisions and therefore the effectiveness of actions. Isn’t that what all analytics should be about? A hearty “yes” to that because, if analytics does not lead to more informed decisions and more effective actions, then why do it at all? Many wrongly and incompletely define prescriptive analytics as the what comes after predictive analytics. Our research indicates that prescriptive analytics is not a specific type of analytics, but rather an umbrella term for many types of analytics that can improve decisions. Think of the term “prescriptive” as the goal of all these analytics — to make more effective decisions — rather than a specific analytical technique.
Creativity is Crucial in Data Science
Data science might not be seen as the most creative of pursuits. You add a load of data into a repository, and you crunch it the other end to draw your conclusions. Data in, data out, where is the scope for creativity there? It is not like you are working with a blank canvas. For me, the definition of creativity is when you are able to make something out of nothing. This requires an incredible amount of imagination, and seeing past the obvious headline statistics to reach a deeper conclusion is the hallmark of a great Big Data professional.
Bots: What you need to know
Bots are a new, AI-driven way to interact with users in a variety of environments. As AI improves and users turn away from single-purpose apps and toward messaging interfaces, they could revolutionize customer service, productivity, and communication. Getting started with bots is as simple as using any of a handful of new bot platforms that aim to make bot creation easy; sophisticated bots require an understanding of natural language processing (NLP) and other areas of artificial intelligence. Bots use artificial intelligence to converse in human terms, usually through a lightweight messaging interface like Slack or Facebook Messenger, or a voice interface like Amazon Echo or Google Assistant. Since late 2015, bots have been the subject of immense excitement in the belief that they might replace mobile apps for many tasks and provide a flexible and natural interface for sophisticated AI technology.
Sentiment Analysis in R
Current research in finance and the social sciences utilizes sentiment analysis to understand human decisions in response to textual materials. While sentiment analysis has received great traction lately, the available tools are not yet living up to the needs of researchers. Especially R has not yet capabilities that most research desires. Our package “SentimentAnalysis” performs a sentiment analysis of textual contents in R. This implementation utilizes various existing dictionaries, such as General Inquirer, Harvard IV or Loughran-McDonald. Furthermore, it can also create customized dictionaries. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogeneous response variable.
Yes, you can run R in the cloud securely
Once thought of as the ‘little programming language that could’, R has fundamentally transformed the way data scientists and organisations use their data. It gives businesses the power to leverage big data and develop predictive models that enable action, not just reaction. But R isn’t just another programming language. R is a rich ecosystem of more than 10,000 packages, test data and model evaluations that make powerful predictive analytics possible. This is good for data scientists in companies innovating on the edge of industries, but it can be bad news for enterprise security. Why? Because R packages contain executable code. And as with all software you download over the internet, you need to be aware of the security risks. That doesn’t mean you can’t run R in the cloud securely. You can, and you should.
PCA – hierarchical tree – partition: Why do we need to choose for visualizing data?
Principal component methods such as PCA (principal component analysis) or MCA (multiple correspondence analysis) can be used as a pre-processing step before clustering. But principal component methods give also a framework to visualize data. Thus, the clustering methods can be represented onto the map provided by the principal component method. In the figure below, the hierarchical tree is represented in 3D onto the principal component map (using the first 2 component obtained with PCA). And then, a partition has been done and individuals are coloured according to their belonging cluster.
Beyond Deep Learning – 3rd Generation Neural Nets
If Deep Learning is powered by 2nd generation neural nets. What will the 3rd generation look like? What new capabilities does that imply and when will it get here?
Data Manipulation and Visualization with Pandas and Seaborn – A Practical Introduction
In this notebook, I’m going to demonstrate with practical examples various concepts and methods related to Pandas and Seaborn. I will rely on the data format I used for my Facebook Conversation Analyzer project. For seemingly obvious reasons I didn’t use a personal conversation but automatically generated a fake and nonsensical one. Projecting or imagining some conversation relevant to you will most likely help you to better understand and memorize the content of this notebook, even greater if you can play around with your actual data. The main topic is data manipulation with Pandas, for example function application, groupby, aggregation and multi-indexes. All along I’ll mention handy tricks that you can use for various tasks and demonstrate how we can plot results in different ways using Seaborn (based on matplotlib). Given the data format, special focus is put on time-series data manipulation.
The Downside of Converting Full-Text PDFs to XML for Text Mining
To get the best results from text mining projects, researchers need access to full-text articles. Abstracts often don’t include essential facts and relationships, access to secondary study findings, and adverse event data. However, when researchers obtain full-text articles through company subscriptions or document delivery, the documents are often provided as PDFs, a suboptimal format for use with text mining software. The burden is then on researchers to convert the PDFs – potentially thousands in a bulk delivery – to XML (Extensible Markup Language), the preferred format for use in text mining software. But tasking highly-skilled researchers with converting document formats for input into text mining tools creates a number of problems with the transformed content and is inefficient and costly.