# Book Memo: “The Basics of Item Response Theory Using R”

 This graduate-level textbook is a tutorial for item response theory that covers both the basics of item response theory and the use of R for preparing graphical presentation in writings about the theory. Item response theory has become one of the most powerful tools used in test construction, yet one of the barriers to learning and applying it is the considerable amount of sophisticated computational effort required to illustrate even the simplest concepts. This text provides the reader access to the basic concepts of item response theory freed of the tedious underlying calculations. It is intended for those who possess limited knowledge of educational measurement and psychometrics. Rather than presenting the full scope of item response theory, this textbook is concise and practical and presents basic concepts without becoming enmeshed in underlying mathematical and computational complexities. Clearly written text and succinct R code allow anyone familiar with statistical concepts to explore and apply item response theory in a practical way. In addition to students of educational measurement, this text will be valuable to measurement specialists working in testing programs at any level and who need an understanding of item response theory in order to evaluate its potential in their settings.

# Document worth reading: “Transferrable Plausibility Model – A Probabilistic Interpretation of Mathematical Theory of Evidence”

This paper suggests a new interpretation of the Dempster-Shafer theory in terms of probabilistic interpretation of plausibility. A new rule of combination of independent evidence is shown and its preservation of interpretation is demonstrated. Transferrable Plausibility Model – A Probabilistic Interpretation of Mathematical Theory of Evidence

# R Packages worth a look

Network Meta-Analysis using Integrated Nested Laplace Approximations (nmaINLA)
Performs network meta-analysis using integrated nested Laplace approximations (‘INLA’). Includes methods to assess the heterogeneity and inconsistency in the network. Contains more than ten different network meta-analysis data. Installation of R package ‘INLA’ is compulsory for successful usage. ‘INLA’ package can be obtained from <http://www.r-inla.org>. We recommend the testing version.

Detection of Univariate Outliers (univOutl)
Well known outlier detection techniques in the univariate case. Methods to deal with skewed distribution are included too. The Hidiroglou-Berthelot (1986) method to search for outliers in ratios of historical data is implemented as well.

ggplot2′ Faceting Utilities for Geographical Data (geofacet)
Provides geofaceting functionality for ‘ggplot2’. Geofaceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation.

Formula Interface to the Grammar of Graphics (ggformula)
Provides a formula interface to ‘ggplot2’ graphics.

Builds Trees by Sampling Variables from Groups (StratifiedRF)
Random Forest that works with groups of predictor variables. When building a tree, a number of variables is taken randomly from each group separately, thus ensuring that it contains variables from each group. Useful when rows contain information about different things (e.g. user information and product information) and it’s not sensible to make a prediction with information from only one group of variables, or when there are far more variables from one group than the other and it’s desired to have groups appear evenly on trees. Trees are grown using the C5.0 algorithm. Currently works for classification only.

# If you did not already know

Hierarchical Spectral Merger (HSM)
We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data. …

FALKON
Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic projections, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. Extensive experiments show that state of the art results on available large scale datasets can be achieved even on a single machine. …

Stochastic Computing based Deep Convolutional Neural Networks (SC-DCNN)
With recent advancing of Internet of Things (IoTs), it becomes very attractive to implement the deep convolutional neural networks (DCNNs) onto embedded/portable systems. Presently, executing the software-based DCNNs requires high-performance server clusters in practice, restricting their widespread deployment on the mobile devices. To overcome this issue, considerable research efforts have been conducted in the context of developing highly-parallel and specific DCNN hardware, utilizing GPGPUs, FPGAs, and ASICs. Stochastic Computing (SC), which uses bit-stream to represent a number within [-1, 1] by counting the number of ones in the bit-stream, has a high potential for implementing DCNNs with high scalability and ultra-low hardware footprint. Since multiplications and additions can be calculated using AND gates and multiplexers in SC, significant reductions in power/energy and hardware footprint can be achieved compared to the conventional binary arithmetic implementations. The tremendous savings in power (energy) and hardware resources bring about immense design space for enhancing scalability and robustness for hardware DCNNs. This paper presents the first comprehensive design and optimization framework of SC-based DCNNs (SC-DCNNs). We first present the optimal designs of function blocks that perform the basic operations, i.e., inner product, pooling, and activation function. Then we propose the optimal design of four types of combinations of basic function blocks, named feature extraction blocks, which are in charge of extracting features from input feature maps. Besides, weight storage methods are investigated to reduce the area and power/energy consumption for storing weights. Finally, the whole SC-DCNN implementation is optimized, with feature extraction blocks carefully selected, to minimize area and power/energy consumption while maintaining a high network accuracy level. …

# Book Memo: “Data Analytics in Digital Humanities”

 This book covers computationally innovative methods and technologies including data collection and elicitation, data processing, data analysis, data visualizations, and data presentation. It explores how digital humanists have harnessed the hypersociality and social technologies, benefited from the open-source sharing not only of data but of code, and made technological capabilities a critical part of humanities work. Chapters are written by researchers from around the world, bringing perspectives from diverse fields and subject areas. The respective authors describe their work, their research, and their learning. Topics include semantic web for cultural heritage valorization, machine learning for parody detection by classification, psychological text analysis, crowdsourcing imagery coding in natural disasters, and creating inheritable digital codebooks. Designed for researchers and academics, this book is suitable for those interested in methodologies and analytics that can be applied in literature, history, philosophy, linguistics, and related disciplines. Professionals such as librarians, archivists, and historians will also find the content informative and instructive.

# Whats new on arXiv

Many science and engineering applications involve solving a linear least-squares system formed from some field measurements. In the distributed cyber-physical systems (CPS), often each sensor node used for measurement only knows partial independent rows of the least-squares system. To compute the least-squares solution they need to gather all these measurement at a centralized location and then compute the solution. These data collection and computation are inefficient because of bandwidth and time constraints and sometimes are infeasible because of data privacy concerns. Thus distributed computations are strongly preferred or demanded in many of the real world applications e.g.: smart-grid, target tracking etc. To compute least squares for the large sparse system of linear equation iterative methods are natural candidates and there are a lot of studies regarding this, however, most of them are related to the efficiency of centralized/parallel computations while and only a few are explicitly about distributed computation or have the potential to apply in distributed networks. This paper surveys the representative iterative methods from several research communities. Some of them were not originally designed for this need, so we slightly modified them to suit our requirement and maintain the consistency. In this survey, we sketch the skeleton of the algorithm first and then analyze its time-to-completion and communication cost. To our best knowledge, this is the first survey of distributed least-squares in distributed networks.
We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models.
Explaining the behavior of a black box machine learning model at the instance level is useful for building trust. However, what is also important is understanding how the model behaves globally. Such an understanding provides insight into both the data on which the model was trained and the generalization power of the rules it learned. We present here an approach that learns rules to explain globally the behavior of black box machine learning models. Collectively these rules represent the logic learned by the model and are hence useful for gaining insight into its behavior. We demonstrate the power of the approach on three publicly available data sets.
Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.
There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to provide more transparency to their algorithms. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that, if these techniques are to succeed, the explanations they generate should have a structure that humans accept. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers’ intuition of what constitutes a `good’ explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence.

# Distilled News

1. Analytics is not a vaccine, but a routine workout
3. Scalability
4. Descriptive analytics is a post-mortem, does it really help
5. Human intervention in analytics is a friend and a foe too
6. Opportunities cost is huge; stale answers make dents
7. Manually intensive
8. Numerical data is analyzed, but what about categorical values
9. Users without expertise
10. Increased lead time to value
So you’re working on a text classification problem. You’re refining your training set, and maybe you’ve even tried stuff out using Naive Bayes. But now you’re feeling confident in your dataset, and want to take it one step further. Enter Support Vector Machines (SVM): a fast and dependable classification algorithm that performs very well with a limited amount of data. Perhaps you have dug a bit deeper, and ran into terms like linearly separable, kernel trick and kernel functions. But fear not! The idea behind the SVM algorithm is simple, and applying it to natural language classification doesn’t require most of the complicated stuff. Before continuing, we recommend reading our guide to Naive Bayes classifiers first, since a lot of the things regarding text processing that are said there are relevant here as well. Done? Great! Let’s move on.
Let’s talk about Meta-Learning because this is one confusing topic. I wrote a previous post about Deconstructing Meta-Learning which explored “Learning to Learn”. I realized thought that there is another kind of Meta-Learning that practitioners are more familiar with. This kind of Meta-Learning can be understood as algorithms the search and select different DL architectures. Hyper-parameter optimization is an instance of this, however there are another more elaborate algorithms that follow the same prescription of searching for architectures.
Part 3 of 3 in the series Set Theory
• Introduction to Set Theory and Sets with R
• Set Operations Unions and Intersections in R
• Set Theory Arbitrary Union and Intersection Operations with R
Power BI has long had the capability to include custom R charts in dashboards and reports. But in sharp contrast to standard Power BI visuals, these R charts were static. While R charts would update when the report data was refreshed or filtered, it wasn’t possible to interact with an R chart on the screen (to display tool-tips, for example).
OpenCV is an incredibly powerful tool to have in your toolbox. I have had a lot of success using it in Python but very little success in R. I haven’t done too much other than searching Google but it seems as if “imager” and “videoplayR” provide a lot of the functionality but not all of it. I have never actually called Python functions from R before. Initially, I tried the “rPython” library – that has a lot of advantages, but was completely unnecessary for me so system() worked absolutely fine. While this example is extremely simple, it should help to illustrate how easy it is to utilize the power of Python from within R. I need to give credit to Harrison Kinsley for all of his efforts and work at PythonProgramming.net – I used a lot of his code and ideas for this post (especially the Python portion). Using videoplayR I created a function which would take a picture with my webcam and save it as “originalWebcamShot.png”
Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be a brief series with goal to craft the reader’s skills on the data wrangling task. This is the second part of this series and it aims to cover the reshaping of data used to turn them into a tidy form. By tidy form, we mean that each feature forms a column and each observation forms a row.

# R Packages worth a look

R Session Information (sessioninfo)
Query and print information about the current R session. It is similar to ‘utils::sessionInfo()’, but includes more information about packages, and where they were installed from.

Convert Tibbles or Data Frames to Xts Easily (tbl2xts)
Facilitate the movement between data frames to ‘xts’. Particularly useful when moving from ‘tidyverse’ to the widely used ‘xts’ package, which is the input format of choice to various other packages. It also allows the user to use a ‘spread_by’ argument for a character column ‘xts’ conversion.

2-Stage Clinical Trial Design and Analysis (preference)
Design and analyze two-stage randomized trials with a continuous outcome measure. The package contains functions to compute the required sample size needed to detect a given preference, treatment, and selection effect; alternatively, the package contains functions that can report the study power given a fixed sample size. Finally, analysis functions are provided to test each effect using either summary data (i.e. means, variances) or raw study data.

Construct Process Maps Using Event Data (processmapR)
Visualize of process maps based on event logs, in the form of directed graphs. Part of the ‘bupaR’ framework.

Estimates, Plots and Evaluates Leaf Angle Distribution Functions, Calculates Extinction Coefficients (RLeafAngle)
Leaf angle distribution is described by a number of functions (e.g. ellipsoidal, Beta and rotated ellipsoidal). The parameters of leaf angle distributions functions are estimated through different empirical relationship. This package includes estimations of parameters of different leaf angle distribution function, plots and evaluates leaf angle distribution functions, calculates extinction coefficients given leaf angle distribution. Reference: Wang(2007)<doi:10.1016/j.agrformet.2006.12.003>.