F# F# (pronounced eff sharp) is a strongly typed, multi-paradigm programming language that encompasses functional, imperative, and object-oriented programming techniques. F# is most often used as a cross-platform CLI language, but can also be used to generate JavaScript and GPU code. F# is developed by the F# Software Foundation, Microsoft and open contributors. An open source, cross-platform compiler for F# is available from the F# Software Foundation. F# is also a fully supported language in Visual Studio and Xamarin Studio. Other tools supporting F# development include Mono, MonoDevelop, SharpDevelop and WebSharper. F# originated from ML and has been influenced by OCaml, C#, Python, Haskell, Scala and Erlang. F1 Score Facebook 20 Tasks(FB20) Factor Analysis Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus “error” terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor Graph A factor graph is a bipartite graph representing the factorization of a function. In probability theory and its applications, factor graphs are used to represent factorization of a probability distribution function, enabling efficient computations, such as the computation of marginal distributions through the sum-product algorithm. One of the important success stories of factor graphs and the sum-product algorithm is the decoding of capacity-approaching error-correcting codes, such as LDPC and turbo codes. Factor graphs generalize constraint graphs. A factor whose value is either 0 or 1 is called a constraint. A constraint graph is a factor graph where all factors are constraints. The max-product algorithm for factor graphs can be viewed as a generalization of the arc-consistency algorithm for constraint processing. FactorBase We describe FactorBase, a new SQL-based framework that leverages a relational database management system to support multi-relational model discovery. A multi-relational statistical model provides an integrated analysis of the heterogeneous and interdependent data resources in the database. We adopt the BayesStore design philosophy: statistical models are stored and managed as first-class citizens inside a database. Whereas previous systems like BayesStore support multi-relational inference, FactorBase supports multi-relational learning. A case study on six benchmark databases evaluates how our system supports a challenging machine learning application, namely learning a first-order Bayesian network model for an entire database. Model learning in this setting has to examine a large number of potential statistical associations across data tables. Our implementation shows how the SQL constructs in FactorBase facilitate the fast, modular, and reliable development of highly scalable model learning systems. Factorial Hidden Markov Models(FHMM) We present a framework for learning in hidden Markov models with distributed state representations. Within this framework , we derive a learning algorithm based on the Expectation-Maximization (EM) procedure for maximum likelihood estimation. Analogous to the standard Baum-Welch update rules, the M-step of our algorithm is exact and can be solved analytically. However, due to the combinatorial nature of the hidden state representation, the exact E-step is intractable. A simple and tractable mean field approximation is derived. Empirical results on a set of problems suggest that both the mean field approximation and Gibbs sampling are viable alternatives to the computationally expensive exact algorithm. Factorisation Autoencoder(FAE) Factorization Machines(FM) In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings. On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models. libFM: Factorization Machine Library Fader Network This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much a specific attribute is perceivable in the generated image. This property could allow for applications where users can modify an image using sliding knobs, like faders on a mixing console, to change the facial expression of a portrait, or to update the color of some objects. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to multiple attributes. We present evidence that our model can significantly change the perceived value of the attributes while preserving the naturalness of images. Failure Rate Failure rate is the frequency with which an engineered system or component fails, expressed, for example, in failures per hour. It is often denoted by the Greek letter lambda and is important in reliability engineering. The failure rate of a system usually depends on time, with the rate varying over the life cycle of the system. For example, an automobile’s failure rate in its fifth year of service may be many times greater than its failure rate during its first year of service. One does not expect to replace an exhaust pipe, overhaul the brakes, or have major transmission problems in a new vehicle. In practice, the mean time between failures (MTBF, 1/lambda) is often reported instead of the failure rate. This is valid and useful if the failure rate may be assumed constant – often used for complex units / systems, electronics – and is a general agreement in some reliability standards (Military and Aerospace). It does in this case only relate to the flat region of the bathtub curve, also called the ‘useful life period’. Because of this, it is incorrect to extrapolate MTBF to give an estimate of the service life time of a component, which will typically be much less than suggested by the MTBF due to the much higher failure rates in the ‘end-of-life wearout’ part of the ‘bathtub curve’. The reason for the preferred use for MTBF numbers is that the use of large positive numbers (such as 2000 hours) is more intuitive and easier to remember than very small numbers (such as 0.0005 per hour). The MTBF is an important system parameter in systems where failure rate needs to be managed, in particular for safety systems. The MTBF appears frequently in the engineering design requirements, and governs frequency of required system maintenance and inspections. In special processes called renewal processes, where the time to recover from failure can be neglected and the likelihood of failure remains constant with respect to time, the failure rate is simply the multiplicative inverse of the MTBF (1/lambda). A similar ratio used in the transport industries, especially in railways and trucking is ‘mean distance between failures’, a variation which attempts to correlate actual loaded distances to similar reliability needs and practices. Failure rates are important factors in the insurance, finance, commerce and regulatory industries and fundamental to the design of safe systems in a wide variety of applications. Failure Time Analysis Fair Top-k Ranking(FA*IR) We present a formal problem definition and an algorithm to solve the Fair Top-k Ranking problem. The problem consists of creating a ranking of k elements out of a pool of n >> k candidates. The objective is to maximize utility, and maximization is subject to a ranked group fairness constraint. Our definition of ranked group fairness uses the standard notion of protected group to extend the concept of group fairness. It ensures that every prefix of the rank contains a number of protected candidates that is statistically indistinguishable from a given target proportion, or exceeds it. The utility objective favors rankings in which every candidate included in the ranking is more qualified than any candidate not included, and rankings in which candidates are sorted by decreasing qualifications. We describe an efficient algorithm for this problem, which is tested on a series of existing datasets, as well as new datasets. Experimentally, this approach yields a ranking that is similar to the so-called ‘color-blind’ ranking, while respecting the fairness criteria. To the best of our knowledge, FA*IR is the first algorithm grounded in statistical tests that can be used to mitigate biases in ranking against an under-represented group. FALKON Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic projections, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. Extensive experiments show that state of the art results on available large scale datasets can be achieved even on a single machine. False Discovery Rate(FDR) False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of findings (i.e. studies where the null-hypotheses are rejected), FDR procedures are designed to control the expected proportion of incorrectly rejected null hypotheses (“false discoveries”). FDR controlling procedures exert a less stringent control over false discovery compared to familywise error rate (FWER) procedures (such as the Bonferroni correction), which seek to reduce the probability of even one false discovery, as opposed to the expected proportion of false discoveries. Thus FDR procedures have greater power at the cost of increased rates of type I errors, i.e., rejecting the null hypothesis of no effect when it should fail to be rejected. LocFDRPois False Nearest Neighbor(FNN) The false nearest neighbor algorithm is an algorithm for estimating the embedding dimension. The concept was proposed by Kennel et al. The main idea is to examine how the number of neighbors of a point along a signal trajectory change with increasing embedding dimension. In too low an embedding dimension, many of the neighbors will be false, but in an appropriate embedding dimension or higher, the neighbors are real. With increasing dimension, the false neighbors will no longer be neighbors. Therefore, by examining how the number of neighbors change as a function of dimension, an appropriate embedding can be determined. False Positive Rate In statistics, when performing multiple comparisons, the term false positive ratio, also known as the false alarm ratio, usually refers to the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate (or “false alarm rate”) usually refers to the expectancy of the false positive ratio. Familywise Error Rate(FWER) In statistics, familywise error rate (FWER) is the probability of making one or more false discoveries, or type I errors, among all the hypotheses when performing multiple hypotheses tests. Coarse-to-fine Multiple Testing Strategies Fan Chart In time series analysis, a fan chart is a chart that joins a simple line chart for observed past data, by showing ranges for possible values of future data together with a line showing a central estimate or most likely value for the future outcomes. As predictions become increasingly uncertain the further into the future one goes, these forecast ranges spread out, creating distinctive wedge or ‘fan’ shapes, hence the term. Alternative forms of the chart can also include uncertainty for past data, such as preliminary data that is subject to revision. The term ‘fan chart’ was coined by the Bank of England, which has been using these charts and this term since 1997 in its ‘Inflation Report’ to describe its best prevision of future inflation to the general public. Fan charts have been used extensively in finance and monetary policy, for instance to represent forecasts of inflation. fanplot Farewells Linear Increments Model(FLIM) FLIM fits linear models for the observed increments in a longitudinal dataset, and imputes missing values according to the models. FLIM Fast Alternating Minimization(FAM) Fast and Frugal Trees(FFT) Fast and Frugal Trees (FFTs) are very simply decision trees for classifying cases (i.e.; breast cancer patients) into one of two classes (e.g.; no cancer vs. true cancer). FFTs can be preferable to more complex algorithms (such as logistic regression) because they are easy to communicate and implement, and are robust against noisy data. FFTrees Fast Boosted Decision Trees(FastBDT) Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cache-friendly implementation for multivariate classification called FastBDT. FastBDT is one order of magnitude faster during the fitting-phase and application-phase, in comparison with popular implementations in software frameworks like TMVA, scikit-learn and XGBoost. The concepts used to optimize the execution time and performance studies are discussed in detail in this paper. The key ideas include: An equal-frequency binning on the input data, which allows replacing expensive floating-point with integer operations, while at the same time increasing the quality of the classification; a cache-friendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/C++, Python and TMVA. It is extensively used in the field of high energy physics by the Belle II experiment. Fast Compressed Neural Networks(FCNN) FCNN (Fast Compressed Neural Networks) is a free open source C++ library for Artificial Neural Network computations. It is easy to use and extend, written in modern C++ and is very fast (to author’s best knowledge it is the fastest freely available neural network library). All FCNN classes are templated to support both single and double precision computations. Main features are listed under Features tab. Internal representation of network in FCNN differs from all other libraries allowing true code modularisation with simultaneous speed improvements. FCNN4R Fast Data Fast Data is ‘data in motion’, data in the process of being collected or moved between applications as part of a transaction or business process flow. Fast Data is real-time data not yet stored as big data. It offers an opportunity for immediate response based on insights derived from deep analytics of incoming data streams. Fast Data processing sits in front of the big data fire hose, sifting through the massive amounts of incoming information to identify actionable business opportunities or threats. Fast Library for Approximate Nearest Neighbors(FLANN) FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms we found to work best for nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset. FLANN is written in C++ and contains bindings for the following languages: C, MATLAB and Python. Fast Linear Iterative Clustering(FLIC) Benefiting from its high efficiency and simplicity, Simple Linear Iterative Clustering (SLIC) remains one of the most popular over-segmentation tools. However, due to explicit enforcement of spatial similarity for region continuity, the boundary adaptation of SLIC is sub-optimal. It also has drawbacks on convergence rate as a result of both the fixed search region and separately doing the assignment step and the update step. In this paper, we propose an alternative approach to fix the inherent limitations of SLIC. In our approach, each pixel actively searches its corresponding segment under the help of its neighboring pixels, which naturally enables region coherence without being harmful to boundary adaptation. We also jointly perform the assignment and update steps, allowing high convergence rate. Extensive evaluations on Berkeley segmentation benchmark verify that our method outperforms competitive methods under various evaluation metrics. It also has the lowest time cost among existing methods (approximately 30fps for a 481×321 image on a single CPU core). Fast Rotation Forest Ensemble approaches in classification are a very popular research area in recent years. An ensemble consists of a set of individual classifiers such as neural networks or decision trees whose predictions are combined for classifying new instances. A method is used here for generating classifier ensembles based on feature extraction. In the base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. It is a technique that is useful for the extraction and classification of data. The purpose is to reduce the dimensionality of a data set. Then the Decision tree is used to classify the data set. Rotation Forest and Extended Space Forest algorithms are used to calculate the accuracy. A novel approach Fast Rotation Forest is introduced to enrich the accuracy rate. The idea of the fast rotation approach is to encourage simultaneously individual accuracy and specificity within the ensemble. By comparing Random forest and Extended Space Forest, Fast Rotation Forest yields high accuracy. Fast Similarity Search(FastSS) Fast Similarity Search (FastSS) performs an exhaustive similarity search in a dictionary, based on the edit distance model of string similarity. The algorithm uses deletions to model the edit distance. For a dictionary containing n words, and given a maximum number of spelling errors k, FastSS creates an index of all n words containing up to k deletions. At search time each query is mutated to generate a deletion neighborhood, which is compared to the indexed deletion dictionary. FastICA FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvaerinen at Helsinki University of Technology. The algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence. It can also be derived as an approximative Newton iteration. Fast-Slow Recurrent Neural Networks(FS-RNN) Processing sequential data of variable length is a major challenge in a wide range of applications, such as speech recognition, language modeling, generative image modeling and machine translation. Here, we address this challenge by proposing a novel recurrent neural network (RNN) architecture, the Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both multiscale RNNs and deep transition RNNs as it processes sequential data on different timescales and learns complex transition functions from one time step to the next. We evaluate the FS-RNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of the art results to $1.19$ and $1.25$ bits-per-character (BPC), respectively. In addition, an ensemble of two FS-RNNs achieves $1.20$ BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FS-RNN, which explains the improved performance compared to other RNN architectures. Our approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks. fastText fastText is a library for efficient learning of word representations and sentence classification. Analysis and Optimization of fastText Linear Text Classifier Fault Tree Analysis(FTA) Fault tree analysis (FTA) is a top down, deductive failure analysis in which an undesired state of a system is analyzed using Boolean logic to combine a series of lower-level events. This analysis method is mainly used in the fields of safety engineering and reliability engineering to understand how systems can fail, to identify the best ways to reduce risk or to determine (or get a feeling for) event rates of a safety accident or a particular system level (functional) failure. FTA is used in the aerospace, nuclear power, chemical and process, pharmaceutical, petrochemical and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to social service system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs. In aerospace, the more general term ‘system Failure Condition’ is used for the ‘undesired state’ / Top event of the fault tree. These conditions are classified by the severity of their effects. The most severe conditions require the most extensive fault tree analysis. These ‘system Failure Conditions’ and their classification are often previously determined in the functional Hazard analysis. Fault Tree Analysis (FTA): Concepts and Applications Fay Herriot Model smallarea Feature Bagging-based Outlier Detection(FBOD) In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm. HighDimOut Feature Engineering Feature engineering is the process of determining which predictor variables will contribute the most to the predictive power of a machine learning algorithm. There are two commonly used methods for making this selection – the Forward Selection Procedure starts with no variables in the model. You then iteratively add variables and test the predictive accuracy of the model until adding more variables no longer makes a positive effect. Next, the Backward Elimination Procedure begins with all the variables in the model. You proceed by removing variables and testing the predictive accuracy of the model. Feature Engineering Wrapper(FEW) We propose a general wrapper for feature learning that interfaces with other machine learning methods to compose effective data representations. The proposed feature engineering wrapper (FEW) uses genetic programming to represent and evolve individual features tailored to the machine learning method with which it is paired. In order to maintain feature diversity,lexicase survival is introduced, a method based on lexicase selection. This survival method preserves semantically unique individuals in the population based on their ability to solve difficult subsets of training cases, thereby yielding a population of uncorrelated features. We demonstrate FEW with five different off-the-shelf machine learning methods and test it on a set of real-world and synthetic regression problems with dimensions varying across three orders of magnitude. The results show that FEW is able to improve model test predictions across problems for several ML methods. We discuss and test the scalability of FEW in comparison to other feature composition strategies, most notably polynomial feature expansion. Feature Evolvable Streaming Learning Learning with streaming data has attracted much attention during the past few years. Though most studies consider data stream with fixed features, in real practice the features may be evolvable. For example, features of data gathered by limited-lifespan sensors will change when these sensors are substituted by new ones. In this paper, we propose a novel learning paradigm: Feature Evolvable Streaming Learning where old features would vanish and new features will occur. Rather than relying on only the current features, we attempt to recover the vanished features and exploit it to improve performance. Specifically, we learn two models from the recovered features and the current features, respectively. To benefit from the recovered features, we develop two ensemble methods. In the first method, we combine the predictions from two models and theoretically show that with assistance of old features, the performance on new features can be improved. In the second approach, we dynamically select the best single prediction and establish a better performance guarantee when the best model switches. Experiments on both synthetic and real data validate the effectiveness of our proposal. Feature Learning Feature learning or representation learning is a set of techniques in machine learning that learn a transformation of “raw” inputs to a representation that can be effectively exploited in a supervised learning task such as classification. Feature learning algorithms themselves may be either unsupervised or supervised, and include autoencoders, dictionary learning, matrix factorization, restricted Boltzmann machines and various form of clustering. Feature Scaling Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step. http://…/2014_about_feature_scaling.html Feature Selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features for use in model construction. The central assumption when using a feature selection technique is that the data contains many redundant or irrelevant features. Redundant features are those which provide no more information than the currently selected features, and irrelevant features provide no useful information in any context. Feature selection techniques are a subset of the more general field of feature extraction. FEAture Selection for compilation Tasks(FEAST) The success of the application of machine-learning techniques to compilation tasks can be largely attributed to the recent development and advancement of program characterization, a process that numerically or structurally quantifies a target program. While great achievements have been made in identifying key features to characterize programs, choosing a correct set of features for a specific compiler task remains an ad hoc procedure. In order to guarantee a comprehensive coverage of features, compiler engineers usually need to select excessive number of features. This, unfortunately, would potentially lead to a selection of multiple similar features, which in turn could create a new problem of bias that emphasizes certain aspects of a program’s characteristics, hence reducing the accuracy and performance of the target compiler task. In this paper, we propose FEAture Selection for compilation Tasks (FEAST), an efficient and automated framework for determining the most relevant and representative features from a feature pool. Specifically, FEAST utilizes widely used statistics and machine-learning tools, including LASSO, sequential forward and backward selection, for automatic feature selection, and can in general be applied to any numerical feature set. This paper further proposes an automated approach to compiler parameter assignment for assessing the performance of FEAST. Intensive experimental results demonstrate that, under the compiler parameter assignment task, FEAST can achieve comparable results with about 18% of features that are automatically selected from the entire feature pool. We also inspect these selected features and discuss their roles in program execution. Feature Squeezing Although deep neural networks (DNNs) have achieved great success in many computer vision tasks, recent studies have shown they are vulnerable to adversarial examples. Such examples, typically generated by adding small but purposeful distortions, can frequently fool DNN models. Previous studies to defend against adversarial examples mostly focused on refining the DNN models. They have either shown limited success or suffer from the expensive computation. We propose a new strategy, \emph{feature squeezing}, that can be used to harden DNN models by detecting adversarial examples. Feature squeezing reduces the search space available to an adversary by coalescing samples that correspond to many different feature vectors in the original space into a single sample. By comparing a DNN model’s prediction on the original input with that on the squeezed input, feature squeezing detects adversarial examples with high accuracy and few false positives. This paper explores two instances of feature squeezing: reducing the color bit depth of each pixel and smoothing using a spatial filter. These strategies are straightforward, inexpensive, and complementary to defensive methods that operate on the underlying model, such as adversarial training. FeatureFu FeatureFu contains a collection of library/tools for advanced feature engineering, such as using extended s-expression based feature transformation, to derive features on top of other features, or convert a light weighted model (logistical regression or decision tree) into a feature, in an intuitive way without touching any code. Feature-Level Domain Adaptation(FLDA) Domain adaptation is the supervised learning setting in which the training and test data originate from different domains: the so-called source and target domains. In this paper, we propose and study a domain adaption approach, called feature-level domain adaptation (flda), that models the dependence between two domains by means of a feature-level transfer distribution. The domain adapted classifier is trained by minimizing the expected loss under this transfer distribution. Our empirical evaluation of flda focuses on problems with binary and count features in which the domain adaptation can be naturally modeled via a dropout distribution, which allows the final classifier to adapt to the importance of specific features in the target data. Our experimental evaluation suggests that under certain conditions, flda converges to the classifier trained on the target distribution. Experiments with our domain adaptation approach on several real-world problems show that flda performs on par with state-of-the-art techniques in domain adaptation. Feedback Networks Currently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the representation is formed in an iterative manner based on a feedback received from previous iteration’s output. We establish that a feedback based approach has several fundamental advantages over feedforward: it enables making early predictions at the query time, its output naturally conforms to a hierarchical structure in the label space (e.g. a taxonomy), and it provides a new basis for Curriculum Learning. We observe that feedback networks develop a considerably different representation compared to feedforward counterparts, in line with the aforementioned advantages. We put forth a general feedback based learning architecture with the endpoint results on par or better than existing feedforward networks with the addition of the above advantages. We also investigate several mechanisms in feedback architectures (e.g. skip connections in time) and design choices (e.g. feedback length). We hope this study offers new perspectives in quest for more natural and practical learning models. Feedforward Neural Network Language Model(NNLM) The probabilistic feedforward neural network language model has been proposed. It consists of input, projection, hidden and output layers. At the input layer, N previous words are encoded using 1-of-V coding, where V is size of the vocabulary. The input layer is then projected to a projection layer P that has dimensionality ND, using a shared projection matrix. As only N inputs are active at any given time, composition of the projection layer is a relatively cheap operation. The NNLM architecture becomes complex for computation between the projection and the hidden layer, as values in the projection layer are dense. For a common choice of N = 10, the size of the projection layer (P) might be 500 to 2000, while the hidden layer size H is typically 500 to 1000 units. Moreover, the hidden layer is used to compute probability distribution over all the words in the vocabulary, resulting in an output layer with dimensionality V. Thus, the computational complexity per each training example is Q = find + NDH + HV; where the dominating term is HV. However, several practical solutions were proposed for avoiding it; either using hierarchical versions of the softmax, or avoiding normalized models completely by using models that are not normalized during training. With binary tree representations of the vocabulary, the number of output units that need to be evaluated can go down to around log2(V). Thus, most of the complexity is caused by the term NDH. Feedforward Sequential Memory Networks(FSMN) We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experimental results have shown that the memory blocks in FSMN can learn effective representations of long history. Experiments have shown that FSMN based language models can significantly outperform not only feedforward neural network (FNN) based LMs but also the popular recurrent neural network (RNN) LMs. Fence Methods This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692. . 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629. 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 3-11. . 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415. . 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314. . 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662. . 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. . 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. . fence FeUdal Networks(FuNs) We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels — allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Manager operates at a lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker. The Worker generates primitive actions at every tick of the environment. The decoupled structure of FuN conveys several benefits — in addition to facilitating very long timescale credit assignment it also encourages the emergence of sub-policies associated with different goals set by the Manager. These properties allow FuN to dramatically outperform a strong baseline agent on tasks that involve long-term credit assignment or memorisation. We demonstrate the performance of our proposed system on a range of tasks from the ATARI suite and also from a 3D DeepMind Lab environment. Few-Shot Classification In few-shot classification a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. ➘ “One-Shot Learning” Field-aware Factorization Machines(FFM) Field-aware factorization machines (FFM) have been used to win two click-through rate prediction competitions hosted by Criteo and Avazu. In these slides we introduce the formulation of FFM together with well known linear model, degree-2 polynomial model, and factorization machines. FiloDB FiloDB is a new open-source distributed, versioned, and columnar analytical database designed for modern streaming workloads. • Distributed – FiloDB is designed from the beginning to run on best-of-breed distributed, scale-out storage platforms such as Apache Cassandra. Queries run in parallel in Apache Spark for scale-out ad-hoc analysis. • Columnar – FiloDB brings breakthrough performance levels for analytical queries by using a columnar storage layout with different space-saving techniques like dictionary compression. True columnar querying techniques are on the roadmap. The current performance is comparable to Parquet, and one to two orders of magnitude faster than Spark on Cassandra 2.x for analytical queries. For the POC performance comparison, please see cassandra-gdelt repo. • Versioned – At the same time, row-level, column-level operations and built in versioning gives FiloDB far more flexibility than can be achieved using file-based technologies like Parquet alone. • Designed for streaming – Enable easy exactly-once ingestion from Kafka for streaming events, time series, and IoT applications – yet enable extremely fast ad-hoc analysis using the ease of use of SQL. Each row is keyed by a partition and sort key, and writes using the same key are idempotent. FiloDB does the hard work of keeping data stored in an efficient and sorted format. FiloDB is easy to use! You can use Spark SQL for both ingestion (including from Streaming!) and querying. Connect Tableau or any other JDBC analysis tool to Spark SQL, and easily ingest data from any source with Spark support(JSON, CSV, traditional database, Kafka, etc.) FiloDB is a great fit for bulk analytical workloads, or streaming / event data. It is not optimized for heavily transactional, update-oriented workflows. Introducing FiloDB Filter Bubble A filter bubble is a result of a personalized search in which a website algorithm selectively guesses what information a user would like to see based on information about the user (such as location, past click behavior and search history) and, as a result, users become separated from information that disagrees with their viewpoints, effectively isolating them in their own cultural or ideological bubbles. Prime examples are Google Personalized Search results and Facebook’s personalized news stream. The term was coined by internet activist Eli Pariser in his book by the same name; according to Pariser, users get less exposure to conflicting viewpoints and are isolated intellectually in their own informational bubble. Pariser related an example in which one user searched Google for “BP” and got investment news about British Petroleum while another searcher got information about the Deepwater Horizon oil spill and that the two search results pages were “strikingly different”. The bubble effect may have negative implications for civic discourse, according to Pariser, but there are contrasting views suggesting the effect is minimal and addressable. Filtering Variational Objectives(FIVOs) The evidence lower bound (ELBO) appears in many algorithms for maximum likelihood estimation (MLE) with latent variables because it is a sharp lower bound of the marginal log-likelihood. For neural latent variable models, optimizing the ELBO jointly in the variational posterior and model parameters produces state-of-the-art results. Inspired by the success of the ELBO as a surrogate MLE objective, we consider the extension of the ELBO to a family of lower bounds defined by a Monte Carlo estimator of the marginal likelihood. We show that the tightness of such bounds is asymptotically related to the variance of the underlying estimator. We introduce a special case, the filtering variational objectives (FIVOs), which takes the same arguments as the ELBO and passes them through a particle filter to form a tighter bound. FIVOs can be optimized tractably with stochastic gradients, and are particularly suited to MLE in sequential latent variable models. In standard sequential generative modeling tasks we present uniform improvements over models trained with ELBO, including some whole nat-per-timestep improvements. Firefly Algorithm(FA) The firefly algorithm (FA) is a metaheuristic algorithm, inspired by the flashing behaviour of fireflies. The primary purpose for a firefly’s flash is to act as a signal system to attract other fireflies. Xin-She Yang formulated this firefly algorithm by assuming: 1.All fireflies are unisexual, so that one firefly will be attracted to all other fireflies; 2.Attractiveness is proportional to their brightness, and for any two fireflies, the less bright one will be attracted by (and thus move to) the brighter one; however, the brightness can decrease as their distance increases; 3.If there are no fireflies brighter than a given firefly, it will move randomly. The brightness should be associated with the objective function. Firefly algorithm is a nature-inspired metaheuristic optimization algorithm. First Story Detection(FSD) Given a series of documents, first story is defined as the first document to discuss a specific event, which occurred at a particular time and place. First story detection (FSD) was firstly defined byAllan in 2002 in terms of topic detection and tracking. Fisher Vector encoding with Variational Auto-Encoder(FV-VAE) Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner. To incorporate FV encoding strategy into deep generative models, we introduce Variational Auto-Encoder model, which steers a variational inference and learning in a neural network which can be straightforwardly optimized using standard stochastic gradient method. Different from the FV characterized by conventional generative models (e.g., Gaussian Mixture Model) which parsimoniously fit a discrete mixture model to data distribution, the proposed FV-VAE is more flexible to represent the natural property of data for better generalization. Extensive experiments are conducted on three public datasets, i.e., UCF101, ActivityNet, and CUB-200-2011 in the context of video action recognition and fine-grained image classification, respectively. Superior results are reported when compared to state-of-the-art representations. Most remarkably, our proposed FV-VAE achieves to-date the best published accuracy of 94.2% on UCF101. Fixed Effects Model In econometrics and statistics, a fixed effects model is a statistical model that represents the observed quantities in terms of explanatory variables that are treated as if the quantities were non-random. This is in contrast to random effects models and mixed models in which either all or some of the explanatory variables are treated as if they arise from random causes. Contrast this to the biostatistics definitions, as biostatisticians use “fixed” and “random” effects to respectively refer to the population-average and subject-specific effects (and where the latter are generally assumed to be unknown, latent variables). Often the same structure of model, which is usually a linear regression model, can be treated as any of the three types depending on the analyst’s viewpoint, although there may be a natural choice in any given situation. Fixed-Point Factorized Networks(FFN) In recent years, Deep Neural Networks (DNNs) based methods have achieved remarkable performance in a wide range of tasks and have been among the most powerful and widely used techniques in computer vision, speech recognition and Natural Language Processing. However, DNN-based methods are both computational-intensive and resource-consuming, which hinders the application of these methods on embedded systems like smart phones. To alleviate this problem, we introduce a novel Fixed-point Factorized Networks (FFN) on pre-trained models to reduce the computational complexity as well as the storage requirement of networks. Extensive experiments on large-scale ImageNet classification task show the effectiveness of our proposed method. Flat Clustering and Topic Modeling based on Fast Rank-2 NMF(FlatNMF2) The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including K-means, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains. Flatland Paradox https://…/the-flatland-paradox Flexible Parametric Model flexPM Flood Algorithm With the U-matrix Ultsch (Information and Classification: Concepts, Methods and Applications, pp. 307-313, Springer, 1993) introduced a powerful visual representation of the Self Organizing Maps results. We propose an approach that utilizes the U-matrix to identify outlying data points. Then the revised subsample (i.e. the initial sample minus the outlying points) is used to give a robust estimation of location and scatter. restlos Flow Classification Algorithm(FCA) Flow Map Flow maps in cartography are a mix of maps and flow charts, that ‘show the movement of objects from one location to another, such as the number of people in a migration, the amount of goods being traded, or the number of packets in a network’. Flowr Flowr is a robust and scalable framework for designing and deploying computing pipelines in an easy-to-use fashion. It implements a scatter-gather approach using computing clusters, simplifying the concept to the use of five simple terms (in submission and dependency types). Most importantly, it is flexible, such that customizing existing pipelines is easy, and since it works across several computing environments (LSF, SGE, Torque, and SLURM), it is portable. GitXiv F-Measure In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test’s accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct positive results divided by the number of all positive results, and r is the number of correct positive results divided by the number of positive results that should have been returned. The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall. FOCA Modeling an ontology is a hard and time-consuming task. Although methodologies are useful for ontologists to create good ontologies, they do not help with the task of evaluating the quality of the ontology to be reused. For these reasons, it is imperative to evaluate the quality of the ontology after constructing it or before reusing it. Few studies usually present only a set of criteria and questions, but no guidelines to evaluate the ontology. The effort to evaluate an ontology is very high as there is a huge dependence on the evaluator’s expertise to understand the criteria and questions in depth. Moreover, the evaluation is still very subjective. This study presents a novel methodology for ontology evaluation, taking into account three fundamental principles: i) it is based on the Goal, Question, Metric approach for empirical evaluation; ii) the goals of the methodologies are based on the roles of knowledge representations combined with specific evaluation criteria; iii) each ontology is evaluated according to the type of ontology. The methodology was empirically evaluated using different ontologists and ontologies of the same domain. The main contributions of this study are: i) defining a step-by-step approach to evaluate the quality of an ontology; ii) proposing an evaluation based on the roles of knowledge representations; iii) the explicit difference of the evaluation according to the type of the ontology iii) a questionnaire to evaluate the ontologies; iv) a statistical model that automatically calculates the quality of the ontologies. Folium Python Data. Leaflet.js Maps. Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via Folium. Concept: Folium makes it easy to visualize data that’s been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map. The library has a number of built-in tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes. Creating interactive crime maps with Folium Folksodriven The Folksodriven framework makes it possible for data scientists to define an ontology environment where searching for buried patterns that have some kind of predictive power to build predictive models more effectively. It accomplishes this through an abstractions that isolate parameters of the predictive modeling process searching for patterns and designing the feature set, too. To reflect the evolving knowledge, this paper considers ontologies based on folksonomies according to a new concept structure called ‘Folksodriven’ to represent folksonomies. So, the studies on the transformational regulation of the Folksodriven tags are regarded to be important for adaptive folksonomies classifications in an evolving environment used by Intelligent Systems to represent the knowledge sharing. Folksodriven tags are used to categorize salient data points so they can be fed to a machine-learning system and ‘featurizing’ the data. FolksoDrivenCloud(FDC) In this paper we present the FolksoDriven Cloud (FDC) built on Cloud and on Semantic technologies. Cloud computing has emerged in these recent years as the new paradigm for the provision of on-demand distributed computing resources. Semantic Web can be used for relationship between different data and descriptions of services to annotate provenance of repositories on ontologies. The FDC service is composed of a back-end which submits and monitors the documents, and a user front-end which allows users to schedule on-demand operations and to watch the progress of running processes. The impact of the proposed method is illustrated on a user since its inception. Folksonomy A folksonomy is a system in which users apply public tags to online items, typically to aid them in re-finding those items. This can give rise to a classification system based on those tags and their frequencies, in contrast to a taxonomic classification specified by the owners of the content when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. However, these terms have slightly different meanings than folksonomy. Folksonomy was originally the result of personal free tagging of information for ones own retrieval. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging (also known as group tagging) is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking. The term was coined by Thomas Vander Wal in 2004 as a portmanteau of folk and taxonomy. Folksonomies became popular as part of social software applications such as social bookmarking and photograph annotation that enable users to collectively classify and find information via shared tags. Some websites include tag clouds as a way to visualize tags in a folksonomy. Folksonomies can be used for K-12 education, business, and higher education. More specifically, folksonomies may be implemented for social bookmarking, teacher resource repositories, e-learning systems, collaborative learning, collaborative research, and professional development. Follow The (Proximally) Regularized Leader(FTRL) Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates. We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system. Follow the Leader(FTL) A natural algorithm to use in the OCO framework is Follow the Leader, which tries to minimize the regret over all of the previous time steps. https://…/notes.pdf Follow the Regularized Leader(FTRL,FoReL) To avoid the failure of FTL we can try to “regularize” the weight vectors by adding a penalty function R(w) to the objective. This yields the FoReL algorithm. https://…/notes.pdf http://…/lecture3.pdf Force Directed Graph Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy. While graph drawing can be a difficult problem, force-directed algorithms, being physical simulations, usually require no special knowledge about graph theory such as planarity. Force Layout Force-Directed Graph qrage Formal Concept Analysis(FCA) In information science, formal concept analysis is a principled way of deriving a concept hierarchy or formal ontology from a collection of objects and their properties. Each concept in the hierarchy represents the set of objects sharing the same values for a certain set of properties; and each sub-concept in the hierarchy contains a subset of the objects in the concepts above it. The term was introduced by Rudolf Wille in 1984, and builds on applied lattice and order theory that was developed by Garrett Birkhoff and others in the 1930s. Formal concept analysis finds practical application in fields including data mining, text mining, machine learning, knowledge management, semantic web, software development, chemistry and biology. Forward Search The Forward Search is a powerful general method, incorporating flexible data-driven trimming, for the detection of outliers and unsuspected structure in data and so for building robust models. Starting from small subsets of data, observations that are close to the fitted model are added to the observations used in parameter estimation. As this subset grows we monitor parameter estimates, test statistics and measures of fit such as residuals. ForwardSearch,forward Forward Slice We propose a method for stochastic optimization: ‘Forward Slice’. We evaluate its performance and apply to design problems in Section 3. At its core, our method is based on the procedure that Neal (2003) called the `slice sampling’ procedure , which was originally developed as a Markov chain Monte Carlo sampling procedure to draw samples from a target distribution. The slice sampling method relies on an auxiliary variable which de nes a level at which we slice the target density to obtain regions from which we draw samples of the target distribution. Similar to Neal’s method, our procedure uses an auxiliary variable for stochastic optimization that also de nes the slices, but of an objective function to be maximized (or minimized). Moreover, unlike with Neal’s method, the auxiliary variable in our approach is not sampled and takes on non-decreasing values in the sequential iterations of the procedure so that, for a given pre{speci ed tolerance, at the end of the procedure we attain the maxima and the argument of the maxima (or close values given the selected tolerance level). Forward Thinking We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving state-of-the-art accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered. Forward Thinking Deep Random Forest The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for deep learning that generalizes the architectural flexibility and sophistication of deep neural networks while also allowing for (i) different types of learning functions in the network, other than neurons, and (ii) the ability to adaptively deepen the network as needed to improve results. This is done by training one layer at a time, and once a layer is trained, the input data are mapped forward through the layer to create a new learning problem. The process is then repeated, transforming the data through multiple layers, one at a time, rendering a new dataset, which is expected to be better behaved, and on which a final output layer can achieve good performance. In the case where the neurons of deep neural nets are replaced with decision trees, we call the result a Forward Thinking Deep Random Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the MNIST dataset. We also provide a general mathematical formulation that allows for other types of deep learning problems to be considered. FP-Growth Algorithm In Data Mining the task of finding frequent pattern in large databases is very important and has been studied in large scale in the past few years. Unfortunately, this task is computationally expensive, especially when a large number of patterns exist. The FP-Growth Algorithm, proposed by Han in , is an efficient and scalable method for mining the complete set of frequent patterns by pattern fragment growth, using an extended prefix-tree structure for storing compressed and crucial information about frequent patterns named frequent-pattern tree (FP-tree). In his study, Han proved that his method outperforms other popular methods for mining frequent patterns, e.g. the Apriori Algorithm and the TreeProjection. In some later works it was proved that FP-Growth has better performance than other methods, including Eclat and Relim. The popularity and efficiency of FP-Growth Algorithm contributes with many studies that propose variations to improve his performance. Fractional Imputation Fractional Langevin Monte Carlo(FLMC) Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the light-tailed nature of the Gaussian proposals. In this study, we extend classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavy-tailed distributions, called $\alpha$-stable L\'{e}vy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. We develop novel computational methods that can scale up to large-scale problems and we provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multi-modal settings, improved convergence rates, and robustness to algorithm parameters. Frailty Model Frailty models are extensions of the proportional hazards model which is best known as the Cox model (Cox, 1972), the most popular model in survival analysis. Normally, in most clinical applications, survival analysis implicitly assumes a homogenous population to be studied. This means that all individuals sampled into that study are subject in principle under the same risk (e.g., risk of death, risk of disease recurrence). In many applications, the study population can not be assumed to be homogeneous but must be considered as a heterogeneous sample, i.e. a mixture of individuals with different hazards. For example, in many cases it is impossible to measure all relevant covariates related to the disease of interest, sometimes because of economical reasons, sometimes the importance of some covariates is still unknown. The frailty approach is a statistical modelling concept which aims to account for heterogeneity, caused by unmeasured covariates. In statistical terms, a frailty model is a random effect model for time-to-event data, where the random effect (the frailty) has a multiplicative effect on the baseline hazard function. parfm,frailtySurv Frank-Wolfe Type Boosting Algorithm(FWBoost) Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using $l_1$ regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost) applied to general loss functions. By using exponential loss, the FWBoost algorithm can be rewritten as a variant of AdaBoost for binary classification. FWBoost algorithms have exactly the same form as existing boosting methods, in terms of making calls to a base learning algorithm with different weights update. This direct connection between boosting and Frank-Wolfe yields a new algorithm that is as practical as existing boosting methods but with new guarantees and rates of convergence. Experimental results show that the test performance of FWBoost is not degraded with larger rounds in boosting, which is consistent with the theoretical analysis. Freedman’s Paradox In statistical analysis, Freedman’s paradox, named after David Freedman, describes a problem in model selection whereby predictor variables with no explanatory power can appear artificially important. Freedman demonstrated (through simulation and asymptotic calculation) that this is a common occurrence when the number of variables is similar to the number of data points. Recently, new information-theoretic estimators have been developed in an attempt to reduce this problem, in addition to the accompanying issue of model selection bias, whereby estimators of predictor variables that have a weak relationship with the response variable are biased. Freedman’s Paradox Freemium Freemium is a pricing strategy by which a product or service (typically a digital offering such as software, media, games or web services) is provided free of charge, but money (premium) is charged for proprietary features, functionality, or virtual goods. The word “freemium” is a portmanteau neologism combining the two aspects of the business model: “free” and “premium”. Freestyle Multilingual Image Question Answering(FM-IQA) Freestyle Multilingual Image Question Answering (FM-IQA) dataset to train and evaluate our mQA model. It contains over 120,000 images and 250,000 freestyle Chinese question-answer pairs and their English translations. The quality of the generated answers of our mQA model on this dataset are evaluated by human judges through a Turing Test. Frequent Pattern Mining The problem of frequent pattern mining is that of finding relationships among the items in a database. The problem can be stated as follows. Given a database D with transactions T1 … TN, determine all patterns P that are present in at least a fraction s of the transactions. The fraction s is referred to as the minimum support. The parameter s can be expressed either as an absolute number, or as a fraction of the total number of transactions in the database. Each transaction Ti can be considered a sparse binary vector, or as a set of discrete values representing the identifiers of the binary attributes that are instantiated to the value of 1. The problem was originally proposed in the context of market basket data in order to find frequent groups of items that are bought together. Thus, in this scenario, each attribute corresponds to an item in a superstore, and the binary value represents whether or not it is present in the transaction. Because the problem was originally proposed, it has been applied to numerous other applications in the context of data mining,Web log mining, sequential pattern mining, and software bug analysis. Frequent Sequence Mining Frequentist Information Criterion(FIC) The failure of the information-based Akaike Information Criterion (AIC) in the context of singular models can be rectified by the definition of a Frequentist Information Criterion (FIC). FIC applies a frequentist approximation to the computation of the model complexity, which can be estimated analytically in many contexts. Like AIC, FIC can be understood as an unbiased estimator of the model predictive performance and is therefore identical to AIC for regular models in the large-observation-number limit . In the presence of unidentifiable parameters, the complexity exhibits a more general, non-AIC-like scaling. For instance, both BIC-like (logN ) and Hannan-Quinn-like (loglogN ) scaling with observation number N are observed. Unlike the Bayesian model selection approach, FIC is free from {\it ad hoc} prior probability distributions and appears to be widely applicable to model selection problems. Finally we demonstrate that FIC (information-based inference) is equivalent to frequentist inference for an important class of models. Frequently Updated Timestamped Structured Data(FUTS) The Internet, and hence IoT, contains potentially billions of Frequently Updated Timestamped Structured (FUTS) data sources, such as real-time traffic reports, air pollution detection, temperature monitoring, crops monitoring, etc. FUTS data sources contain states and updates of physical world things. Frozen Analytics Frozen analytics to create and prototype rule and scoring system, using cross-validation, training sets, sampling and algorithms like traditional machine learning algorithms. Fully Conditional Specification(FCS) In this method, an imputation model for each variable with missing values is specified. This method is an iterative MCMC procedure. In each iteration, it sequentially imputes missing values starting from the first variable with missing values. smcfcs Fully Convolution Networks(FCN) Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build ‘fully convolutional’ networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image. Improving Fully Convolution Network for Semantic Segmentation Functional Additive Regression(FAR) We suggest a new method, called Functional Additive Regression, or FAR, for efficiently performing high-dimensional functional regression. FAR extends the usual linear regression model involving a functional predictor, $X(t)$, and a scalar response, $Y$, in two key respects. First, FAR uses a penalized least squares optimization approach to efficiently deal with high-dimensional problems involving a large number of functional predictors. Second, FAR extends beyond the standard linear regression setting to fit general nonlinear additive models. We demonstrate that FAR can be implemented with a wide range of penalty functions using a highly efficient coordinate descent algorithm. Theoretical results are developed which provide motivation for the FAR optimization criterion. Finally, we show through simulations and two real data sets that FAR can significantly outperform competing methods. Functional Causal Model Functional Data Analysis(FDA) Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. The continuum is often time, but may also be spatial location, wavelength, probability, etc. http://…/annurev-statistics-010814-020413 Functional Linear Array Model(FLAM) The functional linear array model (FLAM) is a unified model class for functional regression models including function-on-scalar, scalar-on-function and function-on-function regression. Mean, median, quantile as well as generalized additive regression models for functional or scalar responses are contained as special cases in this general framework. Our implementation features a broad variety of covariate effects, such as, linear, smooth and interaction effects of grouping variables, scalar and functional covariates. Computational efficiency is achieved by representing the model as a generalized linear array model. While the array structure requires a common grid for functional responses, missing values are allowed. Estimation is conducted using a boosting algorithm, which allows for numerous covariates and automatic, data-driven model selection. To illustrate the flexibility of the model class we use three applications on curing of resin for car production, heat values of fossil fuels and Canadian climate data (the last one in the electronic supplement). These require function-on-scalar, scalar-on-function and function-on-function regression models, respectively, as well as additional capabilities such as robust regression, spatial functional regression, model selection and accommodation of missings. An implementation of our methods is provided in the R add-on package FDboost. FDboost Functional Principal Component Analysis(FPCA) Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or functional regression and classification. Functional Regression This paper deals with functional regression, in which the input attributes as well as the response are functions. To deal with this problem, we develop a functional reproducing kernel Hilbert space approach; here, a kernel is an operator acting on a function and yielding a function. We demonstrate basic properties of these functional RKHS, as well as a representer theorem for this setting; we investigate the construction of kernels; we provide some experimental insight. Fundamental Theorem of Linear Algebra In mathematics, the fundamental theorem of linear algebra makes several statements regarding vector spaces. These may be stated concretely in terms of the rank r of an m x n matrix A and its singular value decomposition. funFEM A novel model-based clustering method for time series (and more generally functional data), called FunFEM. It is based on the discriminative functional mixture (DFM) model which models the data into a single discriminative functional subspace. This subspace allows afterward an insightful visualizations of the clustered data. funFEM funHDDC General procedure for clustering functional data which adapts the efficient clustering method HDDC, originally proposed in the multivariate context. The resulting clustering method, called funHDDC, is based on a functional latent mixture model which fits the functional data in group-specific functional subspaces. By constraining model parameters within and between groups, a family of parsimonious models is exhibited which allow to fit onto various situations. An estimation procedure based on the EM algorithm is proposed for estimating both the model parameters and the group-specific functional subspaces. Experiments on real-world datasets show that the proposed approach performs better or similarly than classical clustering methods while providing useful interpretations of the groups. funHDDC Fused Lasso fuser Future In computer science, future, promise, and delay refer to constructs used for synchronization in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is yet incomplete. The term promise was proposed in 1976 by Daniel P. Friedman and David Wise, and Peter Hibbard called it eventual. A somewhat similar concept future was introduced in 1977 in a paper by Henry Baker and Carl Hewitt. The terms future, promise, and delay are often used interchangeably, although some differences in usage between future and promise are treated below. Specifically, when usage is distinguished, a future is a read-only placeholder view of a variable, while a promise is a writable, single assignment container which sets the value of the future. Notably, a future may be defined without specifying which specific promise will set its value, and different possible promises may set the value of a given future, though this can be done only once for a given future. In other cases a future and a promise are created together and associated with each other: the future is the value, the promise is the function that sets the value – essentially the return value (future) of an asynchronous function (promise). Setting the value of a future is also called resolving, fulfilling, or binding it. future Fuzzy Bayesian Learning In this paper we propose a novel approach for learning from data using rule based fuzzy inference systems where the model parameters are estimated using Bayesian inference and Markov Chain Monte Carlo (MCMC) techniques. We show the applicability of the method for regression and classification tasks using synthetic data-sets and also a real world example in the financial services industry. Then we demonstrate how the method can be extended for knowledge extraction to select the individual rules in a Bayesian way which best explains the given data. Finally we discuss the advantages and pitfalls of using this method over state-of-the-art techniques and highlight the specific class of problems where this would be useful. Fuzzy Clustering Fuzzy clustering is a class of algorithms for cluster analysis in which the allocation of data points to clusters is not “hard” (all-or-nothing) but “fuzzy” in the same sense as fuzzy logic. Fuzzy clustering by Local Approximation of MEmberships Clustering(FLAME) Fuzzy clustering by Local Approximation of MEmberships (FLAME) is a data clustering algorithm that defines clusters in the dense parts of a dataset and performs cluster assignment solely based on the neighborhood relationships among objects. The key feature of this algorithm is that the neighborhood relationships among neighboring objects in the feature space are used to constrain the memberships of neighboring objects in the fuzzy membership space. Fuzzy c-Means Clustering(FCM) In fuzzy clustering, every point has a degree of belonging to clusters, as in fuzzy logic, rather than belonging completely to just one cluster. Thus, points on the edge of a cluster, may be in the cluster to a lesser degree than points in the center of cluster. Any point x has a set of coefficients giving the degree of being in the kth cluster wk(x). With fuzzy c-means, the centroid of a cluster is the mean of all points, weighted by their degree of belonging to the cluster. Fuzzy Cognitive Map A Fuzzy cognitive map is a cognitive map within which the relations between the elements (e.g. concepts, events, project resources) of a “mental landscape” can be used to compute the “strength of impact” of these elements. The theory behind that computation is fuzzy logic. FCMapper,fcm Fuzzy Constraint Linear Discriminant Analysis(FC-LDA) In this paper we introduce a fuzzy constraint linear discriminant analysis (FC-LDA). The FC-LDA tries to minimize misclassification error based on modified perceptron criterion that benefits handling the uncertainty near the decision boundary by means of a fuzzy linear programming approach with fuzzy resources. The method proposed has low computational complexity because of its linear characteristics and the ability to deal with noisy data with different degrees of tolerance. Obtained results verify the success of the algorithm when dealing with different problems. Comparing FC-LDA and LDA shows superiority in classification task.