Using the Power of Deep Learning for Cyber Security

The majority of the deep learning applications that we see in the community are usually geared towards fields like marketing, sales, finance, etc. We hardly ever read articles or find resources about deep learning being used to protect these products, and the business, from malware and hacker attacks. While the big technology companies like Google, Facebook, Microsoft, and Salesforce have already embedded deep learning into their products, the cybersecurity industry is still playing catch up. It´s a challenging field but one that needs our full attention. In this article, we briefly introduce Deep Learning (DL) along with a few existing Information Security (hereby referred to as InfoSec) applications it enables. We then deep dive into the interesting problem of anonymous tor traffic detection and also present a DL-based solution to detect TOR traffic. The target audience for this article is data science professionals who are already working on machine learning projects. The content of this article assumes that you have foundation knowledge of machine learning and are currently either a beginner, or are exploring, deep learning and it´s use cases.


Articulate – Open source platform for build conversational interfaces with intelligent agents

Articulate is an open source project that will allow you to take control of you conversational interfaces, without being worried where and how your data is stored. Also, Articulate is built with an user-centered design where the main goal is to make experts and beginners feel comfortable when building their intelligent agents.


ff and Too-Big-for-Memory Data in R — Part III

ff addresses R’s memory limit by providing ‘data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory – the effective virtual memory consumption per ff object………Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply).’


A Comprehensive Causality Test Based on the Singular Spectrum Analysis

In this paper, we consider the concept of casual relationship between two time series based on the singular spectrum analysis. We introduce several criteria which characterize this causality. The criteria are based on the forecasting accuracy and the predictability of the direction of change. The performance of the proposed tests is examined using di®erent real time series.


Measures of Causality in Complex Datasets with Application to Financial Data

This article investigates the causality structure of financial time series. We concentrate on three main approaches to measuring causality: linear Granger causality, kernel generalisations of Granger causality (based on ridge regression and the Hilbert-Schmidt norm of the cross-covariance operator) and transfer entropy, examining each method and comparing their theoretical properties, with special attention given to the ability to capture nonlinear causality. We also present the theoretical benefits of applying non-symmetrical measures rather than symmetrical measures of dependence. We apply the measures to a range of simulated and real data. The simulated data sets were generated with linear and several types of nonlinear dependence, using bivariate, as well as multivariate settings. An application to real-world financial data highlights the practical difficulties, as well as the potential of the methods. We use two real data sets: (1) U.S. inflation and one-month Libor; (2) S&P data and exchange rates for the following currencies: AUDJPY, CADJPY, NZDJPY, AUDCHF, CADCHF, NZDCHF. Overall, we reach the conclusion that no single method can be recognised as the best in all circumstances, and each of the methods has its domain of best applicability. We also highlight areas for improvement and future research.


Artificial Intelligence vs. Hybrid AI

There´s a lot of conversation around artificial intelligence (AI) happening across a variety of disciplines right now. And for good reason – more companies are relying on AI in their business. From chatbots to self-driving cars, AI won´t be optional for much longer. However, there are always drawbacks to a completely autonomous system, which is why Hybrid AI creates the best way for brands to utilize AI today.


5 of Our Favorite Open Source Visualization Tools

• R Shiny
• Tableau Public
• Datawrapper
• Pivot
• D3


The unreasonable effectiveness of Deep Learning Representations – Building an image search service from scratch

Many products fundamentally appeal to our perception. When browsing through outfits on clothing sites, looking for a vacation rental on Airbnb, or choosing a pet to adopt, the way something looks is often an important factor in our decision. The way we perceive things is a strong predictor of what kind of items we will like, and therefore a valuable quality to measure. However, making computers understand images the way humans do has been a computer science challenge for quite some time. Since 2012, Deep Learning has slowly started overtaking classical methods such as Histograms of Oriented Gradients (HOG) in perception tasks like image classification or object detection. One of the main reasons often credited for this shift is deep learning´s ability to automatically extract meaningful representations when trained on a large enough dataset.


Basic Generalized Additive Models In Ecology: Exercises

Generalized Additive Models (GAM) are non-parametric models that add smoother to the data. In this exercise, we will look at GAMs using cubic spline using the mgcv package. Data-sets used can be downloaded here. The data-set is the experiment result of grassland richness over time in the Yellowstone National Park (Skkink et al. 2007). Answers to these exercises are available here. If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page. Load the data-set and required package before running the exercise.


Text Classification & Embeddings Visualization Using LSTMs, CNNs, and Pre-trained Word Vectors

In this tutorial, I classify Yelp round-10 review datasets. The reviews contain a lot of metadata that can be mined and used to infer meaning, business attributes, and sentiment. For simplicity, I classify the review comments into two class: either as positive or negative. Reviews that have star higher than three are regarded as positive while the reviews with star less than or equal to 3 are negative. Therefore, the problem is a supervised learning. To build and train the model, I first clean the text and convert them to sequences. Each review comment is limited to 50 words. As a result, short texts less than 50 words are padded with zeros, and long ones are truncated. After processing the review comments, I trained three model in three different ways and obtained three word embeddings.


Manage your Machine Learning Lifecycle with Mlflow – Part 1

Reproducibility, good management and tracking experiments is necessary for making easy to test other´s work and analysis. In this first part we will start learning with simple examples how to record and query experiments, packaging Machine Learning models so they can be reproducible and ran on any platform using MLflow.


State of the Art Natural Language Processing at Scale

The two part presentation below from the Spark+AI Summit 2018 is a deep dive into key design choices made in the NLP library for Apache Spark. The library natively extends the Spark ML pipeline API´s which enables zero-copy, distributed, combined NLP, ML & DL pipelines, leveraging all of Spark´s built-in optimizations. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking and sentiment detection.


K-Means Clustering in Python with scikit-learn

In Machine Learning, the types of Learning can broadly be classified into three types: 1. Supervised Learning, 2. Unsupervised Learning and 3. Semi-supervised Learning. Algorithms belonging to the family of Unsupervised Learning have no variable to predict tied to the data. Instead of having an output, the data only has an input which would be multiple variables that describe the data. This is where clustering comes in.
Advertisements