Advertisements

Document worth reading: “Introduction to Tensor Decompositions and their Applications in Machine Learning”

Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaining popularity in other sub-disciplines like temporal and multi-relational data analysis, too. The scope of this paper is to give a broad overview of tensors, their decompositions, and how they are used in machine learning. As part of this, we are going to introduce basic tensor concepts, discuss why tensors can be considered more rigid than matrices with respect to the uniqueness of their decomposition, explain the most important factorization algorithms and their properties, provide concrete examples of tensor decomposition applications in machine learning, conduct a case study on tensor-based estimation of mixture models, talk about the current state of research, and provide references to available software libraries. Introduction to Tensor Decompositions and their Applications in Machine Learning

Advertisements

If you did not already know

Dionysius google
We address the following problem: How do we incorporate user item interaction signals as part of the relevance model in a large-scale personalized recommendation system such that, (1) the ability to interpret the model and explain recommendations is retained, and (2) the existing infrastructure designed for the (user profile) content-based model can be leveraged? We propose Dionysius, a hierarchical graphical model based framework and system for incorporating user interactions into recommender systems, with minimal change to the underlying infrastructure. We learn a hidden fields vector for each user by considering the hierarchy of interaction signals, and replace the user profile-based vector with this learned vector, thereby not expanding the feature space at all. Thus, our framework allows the use of existing recommendation infrastructure that supports content based features. We implemented and deployed this system as part of the recommendation platform at LinkedIn for more than one year. We validated the efficacy of our approach through extensive offline experiments with different model choices, as well as online A/B testing experiments. Our deployment of this system as part of the job recommendation engine resulted in significant improvement in the quality of retrieved results, thereby generating improved user experience and positive impact for millions of users. …

Predicted Relevance Model (PRM) google
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the Predicted Relevance Model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain (nDCG), which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections. …

Symbolic Data Analysis (SDA) google
Symbolic data analysis (SDA) is an extension of standard data analysis where symbolic data tables are used as input and symbolic objects are outputted as a result. The data units are called symbolic since they are more complex than standard ones, as they not only contain values or categories, but also include internal variation and structure. SDA is based on four spaces: the space of individuals, the space of concepts, the space of descriptions, and the space of symbolic objects. The space of descriptions models individuals, while the space of symbolic objects models concepts.
An Introduction to Symbolic Data Analysis and the Sodas Software

R Packages worth a look

Region-Level Connectivity Network Construction via Kernel Canonical Correlation Analysis (brainKCCA)
It is designed to calculate connection between (among) brain regions and plot connection lines. Also, the summary function is included to summarize group-level connectivity network. Kang, Jian (2016) <doi:10.1016/j.neuroimage.2016.06.042>.

Composition of Probabilistic Preferences (CPP) (CPP)
CPP is a multiple criteria decision method to evaluate alternatives on complex decision making problems, by a probabilistic approach. The CPP was created and expanded by Sant’Anna, Annibal P. (2015) <doi:10.1007/978-3-319-11277-0>.

Datasets from ‘KEEL’ for it Use in ‘RKEEL’ (RKEELdata)
KEEL’ is a popular Java software for a large number of different knowledge data discovery tasks. Furthermore, ‘RKEEL’ is a package with a R code layer between R and ‘KEEL’, for using ‘KEEL’ in R code. This package includes the datasets from ‘KEEL’ in .dat format for its use in ‘RKEEL’ package. For more information about ‘KEEL’, see <http://…/>.

Book Memo: “Building a Recommendation System with R”

A recommendation system performs extensive data analysis in order to generate suggestions to its users about what might interest them. R has recently become one of the most popular programming languages for the data analysis. Its structure allows you to interactively explore the data and its modules contain the most cutting-edge techniques thanks to its wide international community. This distinctive feature of the R language makes it a preferred choice for developers who are looking to build recommendation systems. The book will help you understand how to build recommender systems using R. It starts off by explaining the basics of data mining and machine learning. Next, you will be familiarized with how to build and optimize recommender models using R. Following that, you will be given an overview of the most popular recommendation techniques. Finally, you will learn to implement all the concepts you have learned throughout the book to build a recommender system.

Book Memo: “Guide to Modeling and Simulation of Systems of Systems”

This easy-to-follow textbook provides an exercise-driven guide to the use of the Discrete Event Systems Specification (DEVS) simulation modeling formalism and the System Entity Structure (SES) simulation model ontology supported with the latest advances in software architecture and design principles, methods, and tools for building and testing virtual Systems of Systems (SoS). The book examines a wide variety of SoS problems, ranging from cloud computing systems to biological systems in agricultural food crops. This enhanced and expanded second edition also features a new chapter on DEVS support for Markov modeling and simulation. Topics and features: provides an extensive set of exercises throughout the text to reinforce the concepts and encourage use of the tools, supported by introduction and summary sections; discusses how the SoS concept and supporting virtual build and test environments can overcome the limitations of current approaches; offers a step-by-step introduction to the DEVS concepts and modeling environment features required to build sophisticated SoS models; describes the capabilities and use of the tools CoSMoS/DEVS-Suite, Virtual Laboratory Environment, and MS4 Me™; reviews a range of diverse applications, from the development of new satellite design and launch technologies, to surveillance and control in animal epidemiology; examines software/hardware co-design for SoS, and activity concepts that bridge information-level requirements and energy consumption in the implementation; demonstrates how the DEVS formalism supports Markov modeling within an advanced modeling and simulation environment (NEW). This accessible and hands-on textbook/reference provides invaluable practical guidance for graduate students interested in simulation software development and cyber-systems engineering design, as well as for practitioners in these, and related areas.

Book Memo: “Data Literacy”

How to Make Your Experiments Robust and Reproducible
Data Literacy: How to Make Your Experiments Robust and Reproducible provides an overview of basic concepts and skills in handling data, which are common to diverse areas of science. Readers will get a good grasp of the steps involved in carrying out a scientific study and will understand some of the factors that make a study robust and reproducible.The book covers several major modules such as experimental design, data cleansing and preparation, statistical analysis, data management, and reporting. No specialized knowledge of statistics or computer programming is needed to fully understand the concepts presented.
This book is a valuable source for biomedical and health sciences graduate students and researchers, in general, who are interested in handling data to make their research reproducible and more efficient.
• Presents the content in an informal tone and with many examples taken from the daily routine at laboratories
• Can be used for self-studying or as an optional book for more technical courses
• Brings an interdisciplinary approach which may be applied across different areas of sciences

Book Memo: “Data Science”

Innovative Developments in Data Analysis and Clustering
This edited volume on the latest advances in data science covers a wide range of topics in the context of data analysis and classification. In particular, it includes contributions on classification methods for high-dimensional data, clustering methods, multivariate statistical methods, and various applications. The book gathers a selection of peer-reviewed contributions presented at the Fifteenth Conference of the International Federation of Classification Societies (IFCS2015), which was hosted by the Alma Mater Studiorum, University of Bologna, from July 5 to 8, 2015.

If you did not already know

Vector Field Based Neural Network google
A novel Neural Network architecture is proposed using the mathematically and physically rich idea of vector fields as hidden layers to perform nonlinear transformations in the data. The data points are interpreted as particles moving along a flow defined by the vector field which intuitively represents the desired movement to enable classification. The architecture moves the data points from their original configuration to anew one following the streamlines of the vector field with the objective of achieving a final configuration where classes are separable. An optimization problem is solved through gradient descent to learn this vector field. …

Generative Adversarial Capsule Network (CapsuleGAN) google
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutional-GAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semi-supervised image classification. …

MFCMT google
Discriminative Correlation Filters (DCF)-based tracking algorithms exploiting conventional handcrafted features have achieved impressive results both in terms of accuracy and robustness. Template handcrafted features have shown excellent performance, but they perform poorly when the appearance of target changes rapidly such as fast motions and fast deformations. In contrast, statistical handcrafted features are insensitive to fast states changes, but they yield inferior performance in the scenarios of illumination variations and background clutters. In this work, to achieve an efficient tracking performance, we propose a novel visual tracking algorithm, named MFCMT, based on a complementary ensemble model with multiple features, including Histogram of Oriented Gradients (HOGs), Color Names (CNs) and Color Histograms (CHs). Additionally, to improve tracking results and prevent targets drift, we introduce an effective fusion method by exploiting relative entropy to coalesce all basic response maps and get an optimal response. Furthermore, we suggest a simple but efficient update strategy to boost tracking performance. Comprehensive evaluations are conducted on two tracking benchmarks demonstrate and the experimental results demonstrate that our method is competitive with numerous state-of-the-art trackers. Our tracker achieves impressive performance with faster speed on these benchmarks. …

R Packages worth a look

Group Sequential Design for a Clinical Trial with Censored Survival Data (SurvGSD)
Sample size calculation utilizing the information fraction and the alpha spending function in a group sequential clinical trial with censored survival data from underlying generalized gamma survival distributions or log-logistic survival distributions. Hsu, C.-H., Chen, C.-H, Hsu, K.-N. and Lu, Y.-H. (2018) A useful design utilizing the information fraction in a group sequential clinical trial with censored survival data. To appear in Biometrics.

JAR Dependencies for the ‘DatabaseConnector’ Package (DatabaseConnectorJars)
Provides external JAR dependencies for the ‘DatabaseConnector’ package.

Estimate ED50 Based on Modified Turning Point Method (modTurPoint)
Turning point method is a method proposed by Choi (1990) <doi:10.2307/2531453> to estimate 50 percent effective dose (ED50) in the study of drug sensitivity. The method has its own advantages for that it can provide robust ED50 estimation. This package contains the modified function of Choi’s turning point method.