W2VLDA  With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domainspecific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinionwords separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronicdevices). 
Waffle Chart / Square Pie Chart  A littleknown alternative to the round pie chart is the square pie or waffle chart. It consists of a square that is divided into 10×10 cells, making it possible to read values precisely down to a single percent. Depending on how the areas are laid out (as square as possible seems to be the best idea), it is very easy to compare parts to the whole. http://…echartsinrwiththenewwafflepackage waffle 
Waikato Environment for Knowledge Analysis (WEKA) 
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License. 
WaitifDiff and WaitifWorse Agent  (Cho and Esipova, 2016) Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation 
wakefield  wakefield is a Github based R package which is designed to quickly generate random data sets. The user passes n (number of rows) and predefined vectors to the r_data_frame function to produce a dplyr::tbl_df object. 
WakeSleep Algorithm  The wakesleep algorithm is an unsupervised learning algorithm for a multilayer neural network (e.g. sigmoid belief net). Training is divided into two phases, ‘wake’ and ‘sleep’. In the ‘wake’ phase, neurons are driven by recognition connections (connections from what would normally be considered an input to what is normally considered an output), while generative connections (those from outputs to inputs) are modified to increase the probability that they would reconstruct the correct activity in the layer below (closer to the sensory input). In the ‘sleep’ phase the process is reversed: neurons are driven by generative connections, while recognition connections are modified to increase the probability that they would produce the correct activity in the layer above (further from sensory input). GitXiv 
WalkSteered Convolution (WSC) 
Graph classification is a fundamental but challenging problem due to the nonEuclidean property of graph. In this work, we jointly leverage the powerful representation ability of random walk and the essential success of standard convolutional network work (CNN), to propose a random walk based convolutional network, called walksteered convolution (WSC). Different from those existing graph CNNs with deterministic neighbor searching, we randomly sample multiscale walk fields by using random walk, which is more flexible to the scalability of graph. To encode eachscale walk field consisting of several walk paths, specifically, we characterize the directions of walk field by multiple Gaussian models so as to better analogize the standard CNNs on images. Each Gaussian implicitly defines a directions and all of them properly encode the spatial layout of walks after the gradient projecting to the space of Gaussian parameters. Further, a graph coarsening layer using dynamical clustering is stacked upon the Gaussian encoding to capture highlevel semantics of graph. Comprehensive evaluations on several public datasets well demonstrate the superiority of our proposed graph learning method over other stateofthearts for graph classification. 
Walktrap Community Algorithm  Tries to find densely connected subgraphs, also called communities in a graph via random walks. The idea is that short random walks tend to stay in the same community. igraph 
Walsh Figure of Merit  LowWAFOMNX 
Ward Hierarchical Clustering  ➘ “Ward’s Method” Ward’s Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm 
Ward’s Method  In statistics, Ward’s method is a criterion applied in hierarchical cluster analysis. Ward’s minimum variance method is a special case of the objective function approach originally presented by Joe H. Ward, Jr. Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function. This objective function could be ‘any function that reflects the investigator’s purpose.’ Many of the standard clustering procedures are contained in this very general class. To illustrate the procedure, Ward used the example where the objective function is the error sum of squares, and this example is known as Ward’s method or more precisely Ward’s minimum variance method. Ward’s Method 
WarpLDA  Developing efficient and scalable algorithms for Latent Dirichlet Allocation (LDA) is of wide interest for many applications. Previous work has developed an $O(1)$ MetropolisHastings sampling method for each token. However, the performance is far from being optimal due to random accesses to the parameter matrices and frequent cache misses. In this paper, we propose WarpLDA, a novel $O(1)$ sampling algorithm for LDA. WarpLDA is a MetropolisHastings based algorithm which is designed to optimize the cache hit rate. Advantages of WarpLDA include 1) Efficiency and scalability: WarpLDA has good locality and carefully designed partition method, and can be scaled to hundreds of machines; 2) Simplicity: WarpLDA does not have any complicated modules such as alias tables, hybrid data structures, or parameter servers, making it easy to understand and implement; 3) Robustness: WarpLDA is consistently faster than other algorithms, under various settings from smallscale to massivescale dataset and model. WarpLDA is 515x faster than stateoftheart LDA samplers, implying less cost of time and money. With WarpLDA users can learn up to one million topics from hundreds of millions of documents in a few hours, at the speed of 2G tokens per second, or learn topics from smallscale datasets in seconds. 
Wasserstein AutoEncoder (WAE) 
We propose the Wasserstein AutoEncoder (WAE)—a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which leads to a different regularizer than the one used by the Variational AutoEncoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial autoencoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoderdecoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score. 
Wasserstein CNN (WCNN) 
Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with missioncritical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intraclass variations of heterogeneous face images and limited training samples of crossmodality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between nearinfrared and visual face images (i.e. NIRVIS face recognition). The lowlevel layers of WCNN are trained with widely available face images in visual spectrum. The highlevel layer is divided into three parts, i.e., NIR layer, VIS layer and NIRVIS shared layer. The first two layers aims to learn modalityspecific features and NIRVIS shared layer is designed to learn modalityinvariant feature subspace. Wasserstein distance is introduced into NIRVIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So WCNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the overfitting problem on smallscale heterogeneous face data, a correlation prior is introduced on the fullyconnected layers of WCNN network to reduce parameter space. This prior is implemented by a lowrank constraint in an endtoend network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIRVIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over stateoftheart methods. 
Wasserstein Discriminant Analysis (WDA) 
Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of highdimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than crossvariance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset. 
Wasserstein Distance  ➘ “Wasserstein Metric” 
Wasserstein GAN (WGAN) 
Despite being impactful on a variety of problems and applications, the generative adversarial nets (GANs) are remarkably difficult to train. This issue is formally analyzed by \cite{arjovsky2017towards}, who also propose an alternative direction to avoid the caveats in the minmax twoplayer training of GANs. The corresponding algorithm, called Wasserstein GAN (WGAN), hinges on the 1Lipschitz continuity of the discriminator. In this paper, we propose a novel approach to enforcing the Lipschitz continuity in the training procedure of WGANs. Our approach seamlessly connects WGAN with one of the recent semisupervised learning methods. As a result, it gives rise to not only better photorealistic samples than the previous methods but also stateoftheart semisupervised learning results. In particular, our approach gives rise to the inception score of more than 5.0 with only 1,000 CIFAR10 images and is the first that exceeds the accuracy of 90% on the CIFAR10 dataset using only 4,000 labeled images, to the best of our knowledge. 
Wasserstein Identity Testing Problem  Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under $L_1$distance. However, when the support is very large or even continuous, testing under $L_1$distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distance (a.k.a. transportation distance and earthmover distance) on a metric space (discrete or continuous). In this paper, we propose the Wasserstein identity testing problem (Identity Testing in Wasserstein distance). We obtain nearly optimal worstcase sample complexity for the problem. Moreover, for a large class of probability distributions satisfying the socalled ‘Doubling Condition’, we provide nearly instanceoptimal sample complexity. 
Wasserstein Introspective Neural Network (WINN) 
We present Wasserstein introspective neural networks (WINN) that are both a generator and a discriminator within a single model. WINN provides a significant improvement over the recent introspective neural networks (INN) method by enhancing INN’s generative modeling capability. WINN has three interesting properties: (1) A mathematical connection between the formulation of Wasserstein generative adversarial networks (WGAN) and the INN algorithm is made; (2) The explicit adoption of the WGAN term into INN results in a large enhancement to INN, achieving compelling results even with a single classifier on e.g., providing a 20 times reduction in model size over INN within texture modeling; (3) When applied to supervised classification, WINN also gives rise to greater robustness with an $88\%$ reduction of errors against adversarial examples — improved over the result of $39\%$ by an INNfamily algorithm. In the experiments, we report encouraging results on unsupervised learning problems including texture, face, and object modeling, as well as a supervised classification task against adversarial attack. 
Wasserstein Metric  In mathematics, the Wasserstein (or Vasershtein) metric is a distance function defined between probability distributions on a given metric space M. Intuitively, if each distribution is viewed as a unit amount of ‘dirt’ piled on M, the metric is the minimum ‘cost’ of turning one pile into the other, which is assumed to be the amount of dirt that needs to be moved times the distance it has to be moved. Because of this analogy, the metric is known in computer science as the earth mover’s distance. The name ‘Wasserstein distance’ was coined by R. L. Dobrushin in 1970, after the Russian mathematician Leonid Vaseršteĭn who introduced the concept in 1969. Most Englishlanguage publications use the German spelling ‘Wasserstein’ (attributed to the name ‘Vasershtein’ being of German origin). ➚ “Earth Mover’s Distance” Wasserstein Distance D3M 
Wasserstein Variational Inference  This paper introduces Wasserstein variational inference, a new form of approximate Bayesian inference based on optimal transport theory. Wasserstein variational inference uses a new family of divergences that includes both fdivergences and the Wasserstein distance as special cases. The gradients of the Wasserstein variational loss are obtained by backpropagating through the Sinkhorn iterations. This technique results in a very stable likelihoodfree training method that can be used with implicit distributions and probabilistic programs. Using the Wasserstein variational inference framework, we introduce several new forms of autoencoders and test their robustness and performance against existing variational autoencoding techniques. 
WatanabeAkaike Information Criteria (WAIC) 
WAIC (the WatanabeAkaike or widely applicable information criterion; Watanabe, 2010) can be viewed as an improvement on the deviance information criterion (DIC) for Bayesian models. DIC has gained popularity in recent years in part through its implementation in the graphical modeling package BUGS (Spiegelhalter, Best, et al., 2002; Spiegelhalter, Thomas, et al., 1994, 2003), but is known to have some problems, arising in part from it not being fully Bayesian in that it is based on a point estimate (van der Linde, 2005, Plummer, 2008). For example, DIC can produce negative estimates of the effective number of parameters in a model and it is not defined for singular models. WAIC is fully Bayesian and closely approximates Bayesian crossvalidation. Unlike DIC, WAIC is invariant to parametrization and also works for singular models. A Widely Applicable Bayesian Information Criterion 
Waterfall Chart  A waterfall chart is a form of data visualization that helps in understanding the cumulative effect of sequentially introduced positive or negative values. The waterfall chart is also known as a flying bricks chart or Mario chart due to the apparent suspension of columns (bricks) in midair. Often in finance, it will be referred to as a bridge. Waterfall charts were popularized by the strategic consulting firm McKinsey & Company in its presentations to clients. The waterfall chart is normally used for understanding how an initial value is affected by a series of intermediate positive or negative values. Usually the initial and the final values are represented by whole columns, while the intermediate values are denoted by floating columns. The columns are colorcoded for distinguishing between positive and negative values. ➘ “Waterfall Chart” Understanding Waterfall Plots Waterfall plots – what and how? 
Waterfall Plot  A waterfall plot is a threedimensional plot in which multiple curves of data, typically spectra, are displayed simultaneously. Typically the curves are staggered both across the screen and vertically, with ‘nearer’ curves masking the ones behind. The result is a series of ‘mountain’ shapes that appear to be side by side. The waterfall plot is often used to show how twodimensional information changes over time or some other variable such as rpm. The term ‘waterfall plot’ is sometimes used interchangeably with ‘spectrogram’ or ‘Cumulative Spectral Decay’ (CSD) plot. 
Wave Oriented Swarm Programming Paradigm (WOSPP) 
In this work, we present a programming paradigm allowing the control of swarms with a minimum communication bandwidth in a simple manner, yet allowing the emergence of diverse complex behaviors and autonomy of the swarm. Communication in the proposed paradigm is based on single bit ‘ping’signals propagating as informationwaves throughout the swarm. We show that even this minimum bandwidth communication between agents suffices for the design of a substantial set of behaviors in the domain of essential behaviors of a collective, including locomotion and self awareness of the swarm. 
Wavelet Convolutional Neural Network  Spatial and spectral approaches are two major approaches for image processing tasks such as image classification and object recognition. Among many such algorithms, convolutional neural networks (CNNs) have recently achieved significant performance improvement in many challenging tasks. Since CNNs process images directly in the spatial domain, they are essentially spatial approaches. Given that spatial and spectral approaches are known to have different characteristics, it will be interesting to incorporate a spectral approach into CNNs. We propose a novel CNN architecture, wavelet CNNs, which combines a multiresolution analysis and CNNs into one model. Our insight is that a CNN can be viewed as a limited form of a multiresolution analysis. Based on this insight, we supplement missing parts of the multiresolution analysis via wavelet transform and integrate them as additional components in the entire architecture. Wavelet CNNs allow us to utilize spectral information which is mostly lost in conventional CNNs but useful in most image processing tasks. We evaluate the practical performance of wavelet CNNs on texture classification and image annotation. The experiments show that wavelet CNNs can achieve better accuracy in both tasks than existing models while having significantly fewer parameters than conventional CNNs. 
Waveletlike AutoEncoder (WAE) 
Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e.g., enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a twostage process: operating on the trained DNNs (e.g., approximating the convolutional filters with tensor decomposition) and finetuning the amended network, leading to difficulty in balancing the tradeoff between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Waveletlike AutoEncoder (WAE) that decomposes the original input image into two lowresolution channels (subimages) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the lowfrequency information (e.g., image profiles) and highfrequency (e.g., image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the lowfrequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the highfrequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification. 
WaveNet  Various sources have reported the WaveNet deep learning architecture being able to generate highquality speech, but to our knowledge there haven’t been studies on the interpretation or visualization of trained WaveNets. This study investigates the possibility that WaveNet understands speech by unsupervisedly learning an acoustically meaningful latent representation of the speech signals in its receptive field; we also attempt to interpret the mechanism by which the feature extraction is performed. Suggested by singular value decomposition and linear regression analysis on the activations and known acoustic features (e.g. F0), the key findings are (1) activations in the higher layers are highly correlated with spectral features; (2) WaveNet explicitly performs pitch extraction despite being trained to directly predict the next audio sample and (3) for the said feature analysis to take place, the latent signal representation is converted back and forth between baseband and wideband components. 
WDecorrelation  Estimators computed from adaptively collected data do not behave like their nonadaptive brethren. Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method decorrelation procedure — Wdecorrelation — for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarsegrained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy. We bound the finitesample bias and variance of the Westimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic Wdecorrelation procedure in two different adaptive data settings: the multiarmed bandits and autoregressive time series models. 
Weakly Structured Information Processing and Exploration (WIPE) 
WIPE is used for managing the graph traversal manipulation with BIlike data aggregation. WIPE stands for “Weaklystructured Information Processing and Exploration”. It is a data manipulation and query language built on top of the graph functionality in the SAP HANA Database. Like other domain specific languages provided by SAP HANA Database, WIPE is embedded in transactional context, which means that multiple WIPE statements can be executed concurrently, guaranteeing the atomicity, consistency, isolation and durability. With the help of this language, multiple graph operations such as inserting, updating or deleting a node and other query operations can be declared in one complex statement. It is the graph abstraction layer in the SAP HANA Database that provides interaction with the graph data stored in the database by exposing graph concepts directly to the application developer. The application developer can create or delete graphs, access the existing graphs, modify the vertices and edges of the graphs, or retrieve a set of vertices and edges based on their attributes. Besides retrieval and manipulation functions, a set of builtin graph operators are also provided by the SAP HANA Database. These operators, such as breadthfirst or depthfirst traversal algorithms, interact with the column store of the relational engine to execute efficiently and in a highly optimum manner. 
Weaver  We introduce a new distributed graph store, called Weaver, which enables efficient, transactional graph analyses as well as strictly serializable readwrite transactions on dynamic graphs. The key insight that enables Weaver to combine strict serializability with horizontal scalability and high performance is a novel request ordering mechanism called refinable timestamps. This technique couples coarsegrained vector timestamps with a finegrained timeline oracle to pay the overhead of strong consistency only when needed. 
Web Analytics  Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. Web analytics is not just a tool for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a website. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research. There are two categories of web analytics; offsite and onsite web analytics. Offsite web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website’s potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole. Onsite web analytics measure a visitor’s behavior once on your website. This includes its drivers and conversions; for example, the degree to which different landing pages are associated with online purchases. Onsite web analytics measures the performance of your website in a commercial context. This data is typically compared against key performance indicators for performance, and used to improve a website or marketing campaign’s audience response. Google Analytics is the most widely used onsite web analytics service; although new tools are emerging that provide additional layers of information, including heat maps and session replay. Historically, web analytics has been used to refer to onsite visitor measurement. However, in recent years this meaning has become blurred, mainly because vendors are producing tools that span both categories. 
Web Data Commons  The Web Data Commons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for public download in order to support researchers and companies in exploiting the wealth of information that is available on the Web. http://…webdatacommonsdatawebscalemining.html 
Web Mining  Web mining – is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining. 
Web Ontology Language (OWL) 
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in objectoriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly (typically monthly revisions) where as ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases. The OWL languages are characterized by formal semantics. They are built upon a W3C XML standard for objects called the Resource Description Framework (RDF). OWL and RDF have attracted significant academic, medical and commercial interest. In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. W3C announced the new version of OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, RacerPro, FaCT++ and HermiT. The OWL family contains many species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used, including specification version (for example, OWL2 EL). When referring more generally, OWL Family will be used. 
Web Scraping  Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing lowlevel Hypertext Transfer Protocol (HTTP), or embedding a fullyfledged web browser, such as Internet Explorer or Mozilla Firefox. scrapeR 
WebSeg  In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods. To solve such a challenging problem, we leverage several lowlevel cues (such as saliency, edges, etc.) to help generate a proxy ground truth. Due to the diversity of webcrawled images, we anticipate a large amount of ‘label noise’ in which other objects might be present. We design an online noise filtering scheme which is able to deal with this label noise, especially in cluttered images. We use this filtering strategy as an auxiliary module to help assist the segmentation network in learning cleaner proxy annotations. Extensive experiments on the popular PASCAL VOC 2012 semantic segmentation benchmark show surprising good results in both our WebSeg (mIoU = 57.0%) and weakly supervised (mIoU = 63.3%) settings. 
Weibull Distribution  In probability theory and statistics, the Weibull distribution /ˈveɪbʊl/ is a continuous probability distribution. It is named after Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution. 
Weibull Hybrid Autoencoding Inference (WHAI) 
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in outofsample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochasticgradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upwarddownward variational autoencoder, which integrates a deterministicupward deep neural network, and a stochasticdownward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic KullbackLeibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora. 
Weibull Time To Event Recurrent Neural Network (WTTERNN) 
In this thesis we propose a new model for predicting time to events: the Weibull Time To Event RNN. This is a simple framework for timeseries prediction of the time to the next event applicable when we have any or all of the problems of continuous or discrete time, right censoring, recurrent events, temporal patterns, time varying covariates or time series of varying lengths. All these problems are frequently encountered in customer churn, remaining useful life, failure, spiketrain and event prediction. The proposed model estimates the distribution of time to the next event as having a discrete or continuous Weibull distribution with parameters being the output of a recurrent neural network. The model is trained using a special objective function (loglikelihoodloss for censored data) commonly used in survival analysis. The Weibull distribution is simple enough to avoid sparsity and can easily be regularized to avoid overfitting but is still expressive enough to encode concepts like increasing, stationary or decreasing risk and can converge to a pointestimate if allowed. The predicted Weibullparameters can be used to predict expected value and quantiles of the time to the next event. It also leads to a natural 2dembedding of future risk which can be used for monitoring and exploratory analysis. We describe the WTTERNN using a general framework for censored data which can easily be extended with other distributions and adapted for multivariate prediction. We show that the common Proportional Hazards model and the Weibull Accelerated Failure time model are special cases of the WTTERNN. The proposed model is evaluated on simulated data with varying degrees of censoring and temporal resolution. We compared it to binary fixed window forecast models and naive ways of handling censored data. The model outperforms naive methods and is found to have many advantages and comparable performance to binary fixedwindow RNNs without the need to specify window size and the ability to train on more data. Application to the CMAPSSdataset for PHMruntofailure of simulated JetEngines gives promising results. 
Weight of Evidence (WoE) 
The Weight of Evidence or WoE value is a widely used measure of the “strength” of a grouping for separating good and bad risk (default). It is computed from the basic odds ratio: (Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes). Or the ratios of Distr Goods / Distr Bads for short, where Distr refers to the proportion of Goods or Bads in the respective group, relative to the column totals, i.e., expressed as relative proportions of the total number of Goods and Bads. woe 
Weighted Balanced Distribution Adaptation (WBDA) 
➚ “Balanced Distribution Adaptation” 
Weighted Effect Coding  Weighted effect coding refers to a specific coding matrix to include factor variables in generalised linear regression models. With weighted effect coding, the effect for each category represents the deviation of that category from the weighted mean (which corresponds to the sample mean). This technique has particularly attractive properties when analysing observational data, that commonly are unbalanced. The wec package is introduced, that provides functions to apply weighted effect coding to factor variables, and to interactions between (a.) a factor variable and a continuous variable and between (b.) two factor variables. wec 
Weighted Entropy  The concept of weighted entropy takes into account values of different outcomes, i.e., makes entropy contextdependent, through the weight function. 
Weighted Hausdorff Distance  Recent advances in Convolutional Neural Networks (CNN) have achieved remarkable results in localizing objects in images. In these networks, the training procedure usually requires providing bounding boxes or the maximum number of expected objects. In this paper, we address the task of estimating object locations without annotated bounding boxes, which are typically handdrawn and time consuming to label. We propose a loss function that can be used in any Fully Convolutional Network (FCN) to estimate object locations. This loss function is a modification of the Average Hausdorff Distance between two unordered sets of points. The proposed method does not require one to ‘guess’ the maximum number of objects in the image, and has no notion of bounding boxes, region proposals, or sliding windows. We evaluate our method with three datasets designed to locate people’s heads, pupil centers and plant centers. We report an average precision and recall of 94% for the three datasets, and an average location error of 6 pixels in 256×256 images. 
Weighted Majority Algorithm (WMA) 
In machine learning, Weighted Majority Algorithm (WMA) is a metalearning algorithm used to construct a compound algorithm from a pool of prediction algorithms, which could be any type of learning algorithms, classifiers, or even real human experts. The algorithm assumes that we have no prior knowledge about the accuracy of the algorithms in the pool, but there are sufficient reasons to believe that one or more will perform well. There are many variations of the Weighted Majority Algorithm to handle different situations, like shifting targets, infinite pools, or randomized predictions. The core mechanism remain similar, with the final performances of the compound algorithm bounded by a function of the performance of the specialist (best performing algorithm) in the pool. 
Weighted Nonlinear Regression  Nonlinear Least Squares 
Weighted Ontology Approximation Heuristic (WOAH) 
The present paper presents the Weighted Ontology Approximation Heuristic (WOAH), a novel zeroshot approach to ontology estimation for conversational agents development environments. This methodology extracts verbs and nouns separately from data by distilling the dependencies obtained and applying similarity and sparsity metrics to generate an ontology estimation configurable in terms of the level of generalization. 
Weighted Ordered Weighted Aggregation (WOWA) 
From a formal point of view, the WOWA operator is a particular case of Choquet integral (using a particular type of measure: a distorted probability). 
Weighted Orthogonal Components Regression Analysis (WOCR) 
In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods. 
Weighted Parallel SGD (WPSGD) 
Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; lowperformance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WPSGD). WPSGD combines weighted model parameters from different nodes in the system to produce the final output. WPSGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WPSGD does not require that all nodes consume equal quantities of data. We also analyze the theoretical feasibility of running two other parallel SGD algorithms combined with WPSGD in a heterogeneous environment. The experimental results show that WPSGD significantly outperforms the traditional parallel SGD algorithms on distributed training systems with an unbalanced workload. 
Weighted Quantile Sum (WQS) 
wqs 
Weighted Score Table  
Weighted Topological Overlaps (wTO) 
wTO 
WeightedSVD  The Matrix Factorization models, sometimes called the latent factor models, are a family of methods in the recommender system research area to (1) generate the latent factors for the users and the items and (2) predict users’ ratings on items based on their latent factors. However, current Matrix Factorization models presume that all the latent factors are equally weighted, which may not always be a reasonable assumption in practice. In this paper, we propose a new model, called WeightedSVD, to integrate the linear regression model with the SVD model such that each latent factor accompanies with a corresponding weight parameter. This mechanism allows the latent factors have different weights to influence the final ratings. The complexity of the WeightedSVD model is slightly larger than the SVD model but much smaller than the SVD++ model. We compared the WeightedSVD model with several latent factor models on five public datasets based on the RootMeanSquaredErrors (RMSEs). The results show that the WeightedSVD model outperforms the baseline methods in all the experimental datasets under almost all settings. 
WeightMedian Sketch  We introduce a new sublinear space data structure—the WeightMedian Sketch—that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memorylimited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual information. In contrast with related sketches that capture the most commonly occurring features (or items) in a data stream, the WeightMedian Sketch captures the features that are most discriminative of one stream (or class) compared to another. The WeightMedian sketch adopts the core data structure used in the CountSketch, but, instead of sketching counts, it captures sketched gradient updates to the model parameters. We provide a theoretical analysis of this approach that establishes recovery guarantees in the online learning setting, and demonstrate substantial empirical improvements in accuracymemory tradeoffs over alternatives, including countbased sketches and feature hashing. 
Weka  Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. It is also wellsuited for developing new machine learning schemes. 
WHInter  Learning sparse linear models with twoway interactions is desirable in many application domains such as genomics. l1regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate twoway interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1regularised problems with twoway interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art. 
White Noise  In signal processing, white noise is a random signal with a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, including physics, acoustic engineering, telecommunications, statistical forecasting, and many more. White noise refers to a statistical model for signals and signal sources, rather than to any specific signal. A ‘white noise’ image. In discrete time, white noise is a discrete signal whose samples are regarded as a sequence of serially uncorrelated random variables with zero mean and finite variance; a single realization of white noise is a random shock. Depending on the context, one may also require that the samples be independent and have the same probability distribution (in other words i.i.d is a simplest representative of the white noise). In particular, if each sample has a normal distribution with zero mean, the signal is said to be Gaussian white noise. The samples of a white noise signal may be sequential in time, or arranged along one or more spatial dimensions. In digital image processing, the pixels of a white noise image are typically arranged in a rectangular grid, and are assumed to be independent random variables with uniform probability distribution over some interval. The concept can be defined also for signals spread over more complicated domains, such as a sphere or a torus. Some ‘white noise’ sound. An infinitebandwidth white noise signal is a purely theoretical construction. The bandwidth of white noise is limited in practice by the mechanism of noise generation, by the transmission medium and by finite observation capabilities. Thus, a random signal is considered ‘white noise’ if it is observed to have a flat spectrum over the range of frequencies that is relevant to the context. For an audio signal, for example, the relevant range is the band of audible sound frequencies, between 20 to 20,000 Hz. Such a signal is heard as a hissing sound, resembling the /sh/ sound in ‘ash’. In music and acoustics, the term ‘white noise’ may be used for any signal that has a similar hissing sound. White noise draws its name from white light, although light that appears white generally does not have a flat spectral power density over the visible band. The term white noise is sometimes used in the context of phylogenetically based statistical methods to refer to a lack of phylogenetic pattern in comparative data. It is sometimes used in non technical contexts, in the metaphoric sense of ‘random talk without meaningful contents’. 
White Noise Test  
Whitening Transformation  A whitening transformation is a decorrelation transformation that transforms a set of random variables having a known covariance matrix into a set of new random variables whose covariance is the identity matrix (meaning that they are uncorrelated and all have variance 1). The transformation is called “whitening” because it changes the input vector into a white noise vector. It differs from a general decorrelation transformation in that the latter only makes the covariances equal to zero, so that the correlation matrix may be any diagonal matrix. The inverse coloring transformation transforms a vector of uncorrelated variables (a white random vector) into a vector with a specified covariance matrix. 
Widely Applicable Bayesian Information Criterion (WBIC) 
A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is onetoone and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models. ➚ “WatanabeAkaike Information Criteria” 
Widely Applicable Information Criterion (WAIC) 
➚ “WatanabeAkaike Information Criteria” loo 
Wiener Polarity Index  The Wiener polarity index Wp(G) of a graph G is the number of unordered pairs of vertices {u,v} in G such that the distance between u and v is equal to 3. 
Wiener Process  In mathematics, the Wiener process is a continuoustime stochastic process named in honor of Norbert Wiener. It is often called standard Brownian motion, after Robert Brown. It is one of the best known Lévy processes (càdlàg stochastic processes with stationary independent increments) and occurs frequently in pure and applied mathematics, economics, quantitative finance, and physics. The Wiener process plays an important role both in pure and applied mathematics. In pure mathematics, the Wiener process gave rise to the study of continuous time martingales. It is a key process in terms of which more complicated stochastic processes can be described. As such, it plays a vital role in stochastic calculus, diffusion processes and even potential theory. It is the driving process of SchrammLoewner evolution. In applied mathematics, the Wiener process is used to represent the integral of a Gaussian white noise process, and so is useful as a model of noise in electronics engineering, instrument errors in filtering theory and unknown forces in control theory. The Wiener process has applications throughout the mathematical sciences. In physics it is used to study Brownian motion, the diffusion of minute particles suspended in fluid, and other types of diffusion via the FokkerPlanck and Langevin equations. It also forms the basis for the rigorous path integral formulation of quantum mechanics (by the FeynmanKac formula, a solution to the Schrödinger equation can be represented in terms of the Wiener process) and the study of eternal inflation in physical cosmology. It is also prominent in the mathematical theory of finance, in particular the BlackScholes option pricing model. 
WienerFilter  In signal processing, the Wiener Filter (WienerKolmogorov Filter) is a filter used to produce an estimate of a desired or target random process by linear timeinvariant filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process. 
WikiRank  Keyphrase is an efficient representation of the main idea of documents. While background knowledge can provide valuable information about documents, they are rarely incorporated in keyphrase extraction methods. In this paper, we propose WikiRank, an unsupervised method for keyphrase extraction based on the background knowledge from Wikipedia. Firstly, we construct a semantic graph for the document. Then we transform the keyphrase extraction problem into an optimization problem on the graph. Finally, we get the optimal keyphrase set to be the output. Our method obtains improvements over other stateofart models by more than 2% in F1score. 
Wild ScaleEnhanced Bootstrap (WiSE) 
WiSEBoot 
Wisdom of Crowds (WOC) 
The wisdom of the crowd is the collective opinion of a group of individuals rather than that of a single expert. A large group’s aggregated answers to questions involving quantity estimation, general world knowledge, and spatial reasoning has generally been found to be as good as, and often better than, the answer given by any of the individuals within the group. An explanation for this phenomenon is that there is idiosyncratic noise associated with each individual judgment, and taking the average over a large number of responses will go some way toward canceling the effect of this noise.[1] This process, while not new to the Information Age, has been pushed into the mainstream spotlight by social information sites such as Wikipedia, Yahoo! Answers, Quora, and other web resources that rely on human opinion.[2] Trial by jury can be understood as wisdom of the crowd, especially when compared to the alternative, trial by a judge, the single expert. In politics, sometimes sortition is held as an example of what wisdom of the crowd would look like. Decisionmaking would happen by a diverse group instead of by a fairly homogenous political group or party. Research within cognitive science has sought to model the relationship between wisdom of the crowd effects and individual cognition. WoCE: a framework for clustering ensemble by exploiting the wisdom of Crowds theory 
Wishart Distribution  In statistics, the Wishart distribution is a generalization to multiple dimensions of the chisquared distribution, or, in the case of noninteger degrees of freedom, of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. It is a family of probability distributions defined over symmetric, nonnegativedefinite matrixvalued random variables (‘random matrices’). These distributions are of great importance in the estimation of covariance matrices in multivariate statistics. In Bayesian statistics, the Wishart distribution is the conjugate prior of the inverse covariancematrix of a multivariatenormal randomvector. 
Wishart Matrix  ➘ “Wishart Distribution” rWishart 
Wolfson Polarization Index  affluenceIndex 
Word Embedding Association Test (WEAT) 
Universal Sentence Encoder 
Word Embedding Attention Network (WEAN) 
Most recent approaches use the sequencetosequence model for paraphrase generation. The existing sequencetosequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoderdecoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphraseoriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequencetosequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves stateoftheart performances on these three benchmark datasets. 
Word ExtrAction for time SEries cLassification (WEASEL) 
Time series (TS) occur in many scientific and commercial applications, ranging from earth surveillance to industry automation to the smart grids. An important type of TS analysis is classification, which can, for instance, improve energy load forecasting in smart grids by detecting the types of electronic devices based on their energy consumption profiles recorded by automatic sensors. Such sensordriven applications are very often characterized by (a) very long TS and (b) very large TS datasets needing classification. However, current methods to time series classification (TSC) cannot cope with such data volumes at acceptable accuracy; they are either scalable but offer only inferior classification quality, or they achieve stateoftheart classification quality but cannot scale to large data volumes. In this paper, we present WEASEL (Word ExtrAction for time SEries cLassification), a novel TSC method which is both scalable and accurate. Like other stateoftheart TSC methods, WEASEL transforms time series into feature vectors, using a slidingwindow approach, which are then analyzed through a machine learning classifier. The novelty of WEASEL lies in its specific method for deriving features, resulting in a much smaller yet much more discriminative feature set. On the popular UCR benchmark of 85 TS datasets, WEASEL is more accurate than the best current nonensemble algorithms at ordersofmagnitude lower classification and training times, and it is almost as accurate as ensemble classifiers, whose computational complexity makes them inapplicable even for midsize datasets. The outstanding robustness of WEASEL is also confirmed by experiments on two real smart grid datasets, where it outofthebox achieves almost the same accuracy as highly tuned, domainspecific methods. 
Word Vectors  Word vectors (also referred to as distributed representations) are an amazing alternative that sweep away most of the issues of dealing with NLP. They let us ignore the difficulttounderstand grammar & syntax of language while retaining the ability to ask and answer simple questions about a text. https://…/word2vec 
Word2Bits  Word vectors require significant amounts of memory and storage, posing issues to resource limited devices like mobile phones and GPUs. We show that high quality quantized word vectors using 12 bits per parameter can be learned by introducing a quantization function into Word2Vec. We furthermore show that training with the quantization function acts as a regularizer. We train word vectors on English Wikipedia (2017) and evaluate them on standard word similarity and analogy tasks and on question answering (SQuAD). Our quantized word vectors not only take 816x less space than full precision (32 bit) word vectors but also outperform them on word similarity tasks and question answering. 
word2vec  This tool provides an efficient implementation of the continuous bagofwords and skipgram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. http://…/w2vexp.pdf DL4J: Word2Vec 
Wordcloud  A tag cloud (word cloud, or weighted list in visual design) is a visual representation for text data, typically used to depict keyword metadata (tags) on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag. tagcloud 
WordNet  WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short definitions and usage examples, and records a number of relations among these synonym sets or their members. WordNet can thus be seen as a combination of dictionary and thesaurus. While it is accessible to human users via a web browser, its primary use is in automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and are freely available for download from the WordNet website. Both the lexicographic data (lexicographer files) and the compiler (called grind) for producing the distributed database are available. 
Wordswarm  WordSwarm generates dynamic word clouds in which the word size changes as the animation moves forward through the corpus. The top words from the preprocessing are colored randomly or from an assigned pallet, sized according to their magnitude at the first date, and then displayed in a pseudorandom location on the screen. The animation progresses into the future by growing or shrinking each word according to its frequency in the corpus at the next date. Clash detection is achieved using a 2D physics engine, which also applies ‘gravitational force’ to each word, bringing the larger words closer to the center of the screen. 
Work Stealing Load Balancing Algorithm  A methodology for efficient load balancing of computational problems that can be easily decomposed into multiple tasks, but where it is hard to predict the computation cost of each task, and where new tasks are created dynamically during runtime. We present this methodology and its exploitation and feasibility in the context of graphics processors. Workstealing allows an idle core to acquire tasks from a core that is overloaded, causing the total work to be distributed evenly among cores, while minimizing the communication costs, as tasks are only redistributed when required. This will often lead to higher throughput than using static partitioning. Work Stealing with latency 
Workforce Analytics  Workforce analytics is a combination of software and methodology that applies statistical models to workerrelated data, allowing enterprise leaders to optimize human resource management (HRM). 
Working Memory Network  During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI10k, we set a new stateoftheart, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark. 
Write Once, Deploy Anywhere (WODA) 

Write Once, Run Anywhere (WORA) 
Write once, run anywhere’ (WORA), or sometimes write once, run everywhere (WORE), is a slogan created by Sun Microsystems to illustrate the crossplatform benefits of the Java language. Ideally, this means Java can be developed on any device, compiled into a standard bytecode and be expected to run on any device equipped with a Java virtual machine (JVM). The installation of a JVM or Java interpreter on chips, devices or software packages has become an industry standard practice. This means a programmer can develop code on a PC and can expect it to run on Java enabled cell phones, as well as on routers and mainframes equipped with Java, without any adjustments. This is intended to save software developers the effort of writing a different version of their software for each platform or operating system they intend to deploy on. This idea originated as early as in the late 1970s, when the UCSD Pascal system was developed to produce and interpret pcode. UCSD Pascal (along with the Smalltalk virtual machine) was a key influence on the design of the Java virtual machine, as is cited by James Gosling. The catch is that since there are multiple JVM implementations, on top of a wide variety of different operating systems such as Windows, Linux, Solaris, NetWare, HPUX, and Mac OS, there can be subtle differences in how a program may execute on each JVM/OS combination, which may require an application to be tested on various target platforms. This has given rise to a joke among Java developers, ‘Write Once, Debug Everywhere’. This architecture has sometimes been criticized as ‘Saying that Java is better because it works in all platforms is like saying that Anal Sex is better because it works with all genders.’. In comparison, the Squeak Smalltalk programming language and environment, boasts as being, ‘truly write once run anywhere’, because it ‘runs bitidentical images across its wide portability base’ 
Advertisements