We propose statistical inferential procedures for panel data models with interactive fixed effects in a kernel ridge regression framework.Compared with traditional sieve methods, our method is automatic in the sense that it does not require the choice of basis functions and truncation parameters.Model complexity is controlled by a continuous regularization parameter which can be automatically selected by generalized cross validation. Based on empirical processes theory and functional analysis tools, we derive joint asymptotic distributions for the estimators in the heterogeneous setting. These joint asymptotic results are then used to construct confidence intervals for the regression means and prediction intervals for the future observations, both being the first provably valid intervals in literature. Marginal asymptotic normality of the functional estimators in homogeneous setting is also obtained. Simulation and real data analysis demonstrate the advantages of our method.
Convolutional Neural Network (CNNs) are typically associated with Computer Vision. CNNs are responsible for major breakthroughs in Image Classification and are the core of most Computer Vision systems today. More recently CNNs have been applied to problems in Natural Language Processing and gotten some interesting results. In this paper, we will try to explain the basics of CNNs, its different variations and how they have been applied to NLP.
Recommender systems have been actively and extensively studied over past decades. In the meanwhile, the boom of Big Data is driving fundamental changes in the development of recommender systems. In this paper, we propose a dynamic intention-aware recommender system to better facilitate users to find desirable products and services. Compare to prior work, our proposal possesses the following advantages: (1) it takes user intentions and demands into account through intention mining techniques. By unearthing user intentions from the historical user-item interactions, and various user digital traces harvested from social media and Internet of Things, it is capable of delivering more satisfactory recommendations by leveraging rich online and offline user data; (2) it embraces the benefits of embedding heterogeneous source information and shared representations of multiple domains to provide accurate and effective recommendations comprehensively; (3) it recommends products or services proactively and timely by capturing the dynamic influences, which can significantly reduce user involvements and efforts.
The Adaptive LASSO (ALASSO) was proposed by Zou [J. Amer. Statist. Assoc. 101 (2006) 1418-1429] as a modification of the LASSO for the purpose of simultaneous variable selection and estimation of the parameters in a linear regression model. Zou (2006) established that the ALASSO estimator is variable-selection consistent as well as asymptotically Normal in the indices corresponding to the nonzero regression coefficients in certain fixed-dimensional settings. In an influential paper, Minnier, Tian and Cai [J. Amer. Statist. Assoc. 106 (2011) 1371-1382] proposed a perturbation bootstrap method and established its distributional consistency for the ALASSO estimator in the fixed-dimensional setting. In this paper, however, we show that this (naive) perturbation bootstrap fails to achieve second order correctness in approximating the distribution of the ALASSO estimator. We propose a modification to the perturbation bootstrap objective function and show that a suitably studentized version of our modified perturbation bootstrap ALASSO estimator achieves second-order correctness even when the dimension of the model is allowed to grow to infinity with the sample size. As a consequence, inferences based on the modified perturbation bootstrap will be more accurate than the inferences based on the oracle Normal approximation. We give simulation studies demonstrating good finite-sample properties of our modified perturbation bootstrap method as well as an illustration of our method on a real data set.
This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection, we first provide a first-order analysis (based on expectations). Then, we explain how to take into account second-order terms (from variance computations, and by taking into account the usefulness of overpenalization). This allows, in the end, to provide some guidelines for choosing the best cross-validation method for a given learning problem.
We investigate a special case of hereditary property that we refer to as {\em robustness}. A property is {\em robust} in a given graph if it is inherited by all connected spanning subgraphs of this graph. We motivate this definition in different contexts, showing that it plays a central role in highly dynamic networks, although the problem is defined in terms of classical (static) graph theory. In this paper, we focus on the robustness of {\em maximal independent sets} (MIS). Following the above definition, a MIS is said to be {\em robust} (RMIS) if it remains a valid MIS in all connected spanning subgraphs of the original graph. We characterize the class of graphs in which {\em all} possible MISs are robust. We show that, in these particular graphs, the problem of finding a robust MIS is {\em local}; that is, we present an RMIS algorithm using only a sublogarithmic number of rounds (in the number of nodes $n$) in the ${\cal LOCAL}$ model. On the negative side, we show that, in general graphs, the problem is not local. Precisely, we prove a $\Omega(n)$ lower bound on the number of rounds required for the nodes to decide consistently in some graphs. This result implies a separation between the RMIS problem and the MIS problem in general graphs. It also implies that any strategy in this case is asymptotically (in order) as bad as collecting all the network information at one node and solving the problem in a centralized manner. Motivated by this observation, we present a centralized algorithm that computes a robust MIS in a given graph, if one exists, and rejects otherwise. Significantly, this algorithm requires only a polynomial amount of local computation time, despite the fact that exponentially many MISs and exponentially many connected spanning subgraphs may exist.
The goal of compressed sensing is to estimate a vector from an underdetermined system of noisy linear measurements, by making use of prior knowledge on the structure of vectors in the relevant domain. For almost all results in this literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we suppose that vectors lie near the range of a generative model $G: \mathbb{R}^k \to \mathbb{R}^n$. Our main theorem is that, if $G$ is $L$-Lipschitz, then roughly $O(k \log L)$ random Gaussian measurements suffice for an $\ell_2/\ell_2$ recovery guarantee. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use $5$$10$x fewer measurements than Lasso for the same accuracy.
In the era of big data and Internet of things, massive sensor data are gathered with Internet of things. Quantity of data captured by sensor networks are considered to contain highly useful and valuable information. However, for a variety of reasons, received sensor data often appear abnormal. Therefore, effective anomaly detection methods are required to guarantee the quality of data collected by those sensor nodes. Since sensor data are usually correlated in time and space, not all the gathered data are valuable for further data processing and analysis. Preprocessing is necessary for eliminating the redundancy in gathered massive sensor data. In this paper, the proposed work defines a sensor data preprocessing framework. It is mainly composed of two parts, i.e., sensor data anomaly detection and sensor data redundancy elimination. In the first part, methods based on principal statistic analysis and Bayesian network is proposed for sensor data anomaly detection. Then, approaches based on static Bayesian network (SBN) and dynamic Bayesian networks (DBNs) are proposed for sensor data redundancy elimination. Static sensor data redundancy detection algorithm (SSDRDA) for eliminating redundant data in static datasets and real-time sensor data redundancy detection algorithm (RSDRDA) for eliminating redundant sensor data in real-time are proposed. The efficiency and effectiveness of the proposed methods are validated using real-world gathered sensor datasets.
There exists two issues among popular lattice reduction (LR) algorithms that should cause our concern. The first one is Korkine Zolotarev (KZ) and Lenstra Lenstra Lovasz (LLL) algorithms may increase the lengths of basis vectors. The other is KZ reduction suffers much worse performance than Minkowski reduction in terms of providing short basis vectors, despite its superior theoretical upper bounds. To address these limitations, we improve the size reduction steps in KZ and LLL to set up two new efficient algorithms, referred to as boosted KZ and LLL, for solving the shortest basis problem (SBP) with exponential and polynomial complexity, respectively. Both of them offer better actual performance than their classic counterparts, and the performance bounds for KZ are also improved. We apply them to designing integer-forcing (IF) linear receivers for multi-input multi-output (MIMO) communications. Our simulations confirm their rate and complexity advantages.
For genetic algorithms using a bit-string representation of length~$n$, the general recommendation is to take $1/n$ as mutation rate. In this work, we discuss whether this is really justified for multimodal functions. Taking jump functions and the $(1+1)$ evolutionary algorithm as the simplest example, we observe that larger mutation rates give significantly better runtimes. For the $\jump_{m,n}$ function, any mutation rate between $2/n$ and $m \ln(m/2) / n$ leads to a speed-up at least exponential in $m$ compared to the standard choice. The asymptotically best runtime, obtained from using the mutation rate $m/n$ and leading to a speed-up super-exponential in $m$, is very sensitive to small changes of the mutation rate. Any deviation by a small $(1 \pm \eps)$ factor leads to a slow-down exponential in $m$. Consequently, any fixed mutation rate gives strongly sub-optimal results for most jump functions. Building on this observation, we propose to use a random mutation rate $\alpha/n$, where $\alpha$ is chosen from a power-law distribution. We prove that the $(1+1)$ EA with this heavy-tailed mutation rate optimizes any $\jump_{m,n}$ function in a time that is only a small polynomial (in~$m$) factor above the one stemming from the optimal rate for this $m$. Our heavy-tailed mutation operator yields similar speed-ups (over the best known performance guarantees) for the vertex cover problem in bipartite graphs and the matching problem in general graphs. Following the example of fast simulated annealing, fast evolution strategies, and fast evolutionary programming, we propose to call genetic algorithms using a heavy-tailed mutation operator \emph{fast genetic algorithms}.
We consider several examples of probabilistic existence proofs using compressibility arguments, including some results that involve Lov\’asz local lemma.
We consider the problem of choosing between parametric models for a discrete observable, taking a Bayesian approach in which the within-model prior distributions are allowed to be improper. In order to avoid the ambiguity in the marginal likelihood function in such a case, we apply a homogeneous scoring rule. For the particular case of distinguishing between Poisson and Negative Binomial models, we conduct simulations that indicate that, applied prequentially, the method will consistently select the true model.
In this paper, we suggest a novel data-driven approach to active learning: Learning Active Learning (LAL). The key idea behind LAL is to train a regressor that predicts the expected error reduction for a potential sample in a particular learning state. By treating the query selection procedure as a regression problem we are not restricted to dealing with existing AL heuristics; instead, we learn strategies based on experience from previous active learning experiments. We show that LAL can be learnt from a simple artificial 2D dataset and yields strategies that work well on real data from a wide range of domains. Moreover, if some domain-specific samples are available to bootstrap active learning, the LAL strategy can be tailored for a particular problem.
Loyalty is an essential component of multi-community engagement. When users have the choice to engage with a variety of different communities, they often become loyal to just one, focusing on that community at the expense of others. However, it is unclear how loyalty is manifested in user behavior, or whether loyalty is encouraged by certain community characteristics. In this paper we operationalize loyalty as a user-community relation: users loyal to a community consistently prefer it over all others; loyal communities retain their loyal users over time. By exploring this relation using a large dataset of discussion communities from Reddit, we reveal that loyalty is manifested in remarkably consistent behaviors across a wide spectrum of communities. Loyal users employ language that signals collective identity and engage with more esoteric, less popular content, indicating they may play a curational role in surfacing new material. Loyal communities have denser user-user interaction networks and lower rates of triadic closure, suggesting that community-level loyalty is associated with more cohesive interactions and less fragmentation into subgroups. We exploit these general patterns to predict future rates of loyalty. Our results show that a user’s propensity to become loyal is apparent from their first interactions with a community, suggesting that some users are intrinsically loyal from the very beginning.
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on a few-shot image classification benchmark, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.