d3.compose Compose complex, data-driven visualizations from reusable charts and components with d3.
• Get started quickly with standard charts and components
• Layout charts and components automatically
• Powerful foundation for creating custom charts and components
Daleel In this paper we present Daleel, a multi-criteria adaptive decision making framework that is developed to find the optimal IaaS deployment strategy.
Damped Least-Squares
In mathematics and computing, the Levenberg-Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss-Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. LMA can also be viewed as Gauss-Newton using a trust region approach. The LMA is a very popular curve-fitting algorithm used in many software applications for solving generic curve-fitting problems. However, as for many fitting algorithms, the LMA finds only a local minimum, which is not necessarily the global minimum.
Dark Data The total amount of data in every organization is far, far greater than anyone including, most crucially, their Information Technology group knows about. Moreover this “missing” data that can’t be seen and currently can’t be made use of is also the very stuff that holds the organization together. This is what we call “Dark Data.”
Dark Knowledge A simple way to improve classification performance is to average the predictions of a large ensemble of different classifiers. This is great for winning competitions but requires too much computation at test time for practical applications such as speech recognition. In a widely ignored paper in 2006, Caruana and his collaborators showed that the knowledge in the ensemble could be transferred to a single, efficient model by training the single model to mimic the log probabilities of the ensemble average. This technique works because most of the knowledge in the learned ensemble is in the relative probabilities of extremely improbable wrong answers. For example, the ensemble may give an image of a BMW a probability of one in a billion of being a garbage truck but this is still far greater (in the log domain) than its probability of being a carrot. This ‘dark knowledge’, which is practically invisible in the class probabilities, defines a similarity metric over the classes that makes it much easier to learn a good classifier.
DARPA Open Catalog Welcome to the DARPA Open Catalog, which contains a curated list of DARPA-sponsored software and peer-reviewed publications. DARPA sponsors fundamental and applied research in a variety of areas that may lead to experimental results and reusable technology designed to benefit multiple government domains. The DARPA Open Catalog organizes publicly releasable material from DARPA programs. DARPA has an open strategy to help increase the impact of government investments. DARPA is interested in building communities around government-funded research. DARPA plans to continue to make available information generated by DARPA programs, including software, publications, data, and experimental results. The table on this page lists the programs currently participating in the catalog.
Dat Build data pipelines – Dat is an open source project that provides a streaming interface between every file format and data storage backend.
Data Acceleration Data technologies are evolving rapidly, but organizations have adopted most of these in piecemeal fashion. As a result, enterprise data – whether related to customer interactions, business performance, computer notifications, or external events in the business environment – is vastly underutilized. Moreover, companies’ data ecosystems have become complex and littered with data silos. This makes the data more difficult to access, which in turn limits the value that organizations can get out of it. Indeed, according to a recent Gartner, Inc. report, 85 percent of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage through 2015. Furthermore, a recent Accenture study found that half of all companies have concerns about the accuracy of their data, and the majority of executives are unclear about the business outcomes they are getting from their data analytics programs. To unlock the value hidden in their data, companies must start treating data as a supply chain, enabling it to flow easily and usefully through the entire organization – and eventually throughout each company’s ecosystem of partners, including suppliers and customers. The time is right for this approach. For one thing, new external data sources are becoming available, providing fresh opportunities for data insights. In addition, the tools and technology required to build a better data platform are available and in use. These provide a foundation on which companies can construct an integrated, end-to-end data supply chain.
Data Acquisition Data acquisition is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer. Data acquisition systems (abbreviated with the acronym DAS or DAQ) typically convert analog waveforms into digital values for processing.
Data Aggregation In statistics, aggregate data describes data combined from several measurements. When data are aggregated, groups of observations are replaced with summary statistics based on those observations. In economics, aggregate data or data aggregates describes high-level data that is composed from a multitude or combination of other more individual data.
Data Analysis Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Data Analytics Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Data analytics (DA) is the science of examining raw data with the purpose of drawing conclusions about that information. Data analytics is used in many industries to allow companies and [organizations] to make better business decisions and in the sciences to verify or disprove existing models or theories.
Data Archaeology Data archaeology refers to the art and science of recovering computer data encoded and/or encrypted in now obsolete media or formats. Data archaeology can also refer to recovering information from damaged electronic formats after natural or man made disasters.
Data as a Service
Data as a Service, or DaaS, is a cousin of software as a service. Like all members of the “as a Service” (aaS) family, DaaS is based on the concept that the product, data in this case, can be provided on demand to the user regardless of geographic or organizational separation of provider and consumer. Additionally, the emergence of service-oriented architecture (SOA) has rendered the actual platform on which the data resides also irrelevant. This development has enabled the recent emergence of the relatively new concept of DaaS. Data provided as a service was at first primarily used in Web mashups, but now is being increasingly employed both commercially and, less commonly, within organisations such as the UN. Traditionally, most enterprises have used data stored in a self-contained repository, for which software was specifically developed to access and present the data in a human-readable form. One result of this paradigm is the bundling of both the data and the software needed to interpret it into a single package, sold as a consumer product. As the number of bundled software/data packages proliferated and required interaction among one another, another layer of interface was required. These interfaces, collectively known as enterprise application integration (EAI), often tended to encourage vendor lock-in, as it is generally easy to integrate applications that are built upon the same foundation technology. The result of the combined software/data consumer package and required EAI middleware has been an increased amount of software for organizations to manage and maintain, simply for the use of particular data. In addition to routine maintenance costs, a cascading amount of software updates are required as the format of the data changes. The existence of this situation contributes to the attractiveness of DaaS to data consumers, because it allows for the separation of data cost and usage from that of a specific software or platform.
Data Assimilation Data assimilation is the process by which observations are incorporated into a computer model of a real system. Applications of data assimilation arise in many fields of geosciences, perhaps most importantly in weather forecasting and hydrology. Data assimilation proceeds by analysis cycles. In each analysis cycle, observations of the current (and possibly past) state of a system are combined with the results from a numerical model (the forecast) to produce an analysis, which is considered as ‘the best’ estimate of the current state of the system. This is called the analysis step. Essentially, the analysis step tries to balance the uncertainty in the data and in the forecast. The model is then advanced in time and its result becomes the forecast in the next analysis cycle.
Book: Data Assimilation
Book: Data Assimilation
Data Assimilation Lecture Notes
Data Augmentation Data augmentation adds value to base data by adding information derived from internal and external sources within an enterprise. Data is one of the core assets for an enterprise, making data management essential. Data augmentation can be applied to any form of data, but may be especially useful for customer data, sales patterns, product sales, where additional information can help provide more in-depth insight. Data augmentation can help reduce the manual interventation required to developed meaningful information and insight of business data, as well as significantly enhance data quality. Data augmentation is of the last steps done in enterprise data management after monitoring, profiling and integration Some of the common techniques used in data augmentation include:
• Extrapolation Technique: Based on heuristics. The relevant fields are updated or provided with values.
• Tagging Technique: Common records are tagged to a group, making it easier to understand and differentiate for the group.
• Aggregation Technique: Using mathematical values of averages and means, values are estimated for relevant fields if needed
• Probability Technique: Based on heuristics and analytical statistics, values are populated based on the probability of events.
Data Blending Data blending is the process of combining data from multiple sources to reveal deeper intelligence that drives better business decision-making. Data blending differs from data integration and data warehousing in that its primary use is not to create the single, unified version of the truth that is stored in systems of record. Rather, business and data analysts use data blending to build an analytic dataset to assist in answering a specific business questions and driving a particular business process.
Data Broker / Information Broker An information broker (independent information professional, information consultant, or data broker) collects information, often about individual people. The data are then sold to companies that use it to target advertising and marketing towards specific groups, to verify a person’s identity including for purposes of fraud detection, and to sell to individuals and organizations so they can research particular individuals. Critics, including consumer protection organizations, say the industry is secretive and unaccountable, and should be better regulated.
Data Business Model According to Wikipedia, a business model “describes the rationale of how an organization creates, delivers, and captures value.” A Data Business Model is a business model where data is an indispensable component. If you remove the data, the business fails (or at least suffers greatly). To take one example, Amazon’s data is core to their business. Their historical transaction data helps them figure out how much inventory to hold and how to price products. Additionally, data about product views and purchases powers the recommendation engine, which drives a large portion of sales. Furthermore, product reviews drive traffic and SEO. As icing on the cake, all of this is a virtuous cycle: recommendations drive purchases, which result in more reviews, which lead to better SEO and more traffic, which results in more visitors and better recommendations. If Amazon wasn’t so effective as using data, it would be a much smaller company. The best part of data business models is that they often have the same kind of positive feedback loop as Amazon. In each business model, the more you use data to make money, the more data you get as a result, which helps you make more money in the future.
Data Communications Data Communications concerns the transmission of digital messages to devices external to the message source. “External” devices are generally thought of as being independently powered circuitry that exists beyond the chassis of a computer or other digital message source. As a rule, the maximum permissible transmission rate of a message is directly proportional to signal power, and inversely proportional to channel noise. It is the aim of any communications system to provide the highest possible transmission rate at the lowest possible power and with the least possible noise.
Data Curation Data curation is a term used to indicate management activities required to maintain research data long-term such that it is available for reuse and preservation. In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database. The term is also used in the humanities, where increasing cultural and scholarly data from digital humanities projects requires the expertise and analytical practices of data curation. In broad terms, curation means a range of activities and processes done to create, manage, maintain, and validate a component.
Data Decorations Alberto Cairo left a comment about ‘data decorations’. This is a name he’s using to describe something like the windshield-wiper chart I discussed the other day. It seems like the visual elements were purely ornamental and adds nothing to the experience – one might argue that the experience was worse than just staring at the data table.
Data Driven Business Model
This paper contributes by providing a definition of a data-driven business model as a business model that relies on data as a key resource.
Data Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
Visualizing Data with D3.js
D3 Tips and Tricks
Awesome D3
Data Envelopment Analysis
Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers. It is used to empirically measure productive efficiency of decision making units (or DMUs). Although DEA has a strong link to production theory in economics, the tool is also used for benchmarking in operations management, where a set of measures is selected to benchmark the performance of manufacturing and service operations.
Data Envelopment Analysis
Data Federation In most cases, if the term federation is used, it refers to combining autonomously operating objects. For example, states can be federated to form one country. If we apply this common explanation to data federation, it means combining autonomous data stores to form one large data store. Therefore, we propose the following definition ‘Data federation is a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on-demand data integration.’ This definition is based on the following concepts:
• Data virtualization: Data federation is a form of data virtualization. Note that not all forms of data virtualization imply data federation. For example, if an organization wants to virtualize the database of one application, no need exists for data federation. But data federation always results in data virtualization.
• Heterogeneous set of data stores: Data federation should make it possible to bring data together from data stores using different storage structures, different access languages, and different APIs. An application using data federation should be able to access different types of database servers and files with various formats; it should be able to integrate data from all those data sources; it should offer features for transforming the data; and it should allow the applications and tools to access the data through various APIs and languages.
• Autonomous data stores: Data stores accessed by data federation are able to operate independently; in other words, they can be used outside the scope of data federation.
• One integrated data store: Regardless of how and where data is stored, it should be presented as one integrated data set. This implies that data federation involves transformation, cleansing, and possibly even enrichment of data.
• On-demand integration: This refers to when the data from a heterogeneous set of data stores is integrated. With data federation, integration takes place on the fly, and not in batch. When the data consumers ask for data, only then data is accessed and integrated. So the data is not stored in an integrated way, but remains in its original location and format.
Spark Reaches for the Holy Grail: Federated Queries
Data Fusion Data fusion is the process of integration of multiple data and knowledge representing the same real-world object into a consistent, accurate, and useful representation. Data fusion processes are often categorized as low, intermediate or high, depending on the processing stage at which fusion takes place. Low level data fusion combines several sources of raw data to produce new raw data. The expectation is that fused data is more informative and synthetic than the original inputs. For example, sensor fusion is also known as (multi-sensor) data fusion and is a subset of information fusion.
Data Hoarding http://…/Wikipedia:Avoid_data-hoarding
Data Impartment
Data Journalism / Data Driven Journalism Data-driven journalism, often shortened to “ddj”, is a term in use since 2009/2010, to describe a journalistic process based on analyzing and filtering large data sets for the purpose of creating a news story. Main drivers for this process are newly available resources such as “open source” software and “open data”. This approach to journalism builds on older practices, most notably on CAR (acronym for “computer-assisted reporting”) a label used mainly in the US for decades. Other labels for partially similar approaches are “precision journalism”, based on a book by Philipp Meyer, published in 1972, where he advocated the use of techniques from social sciences in researching stories.
Data Justice As a handful of data platforms generate massive amounts of user data, the barriers to entry rise since potential competitors have little data themselves to entice advertisers compared to the incumbents who have both the concentrated processing power and supply of user data to dominate particular sectors. The upshot of this market power by big data platforms is that the marketplace is doing little to create options for consumers that might alleviate the misuse of consumer data or encourage big data platforms to better compensate users who are willing to share their data. Data Justice has been launched as a project to promote public education and new alliances to challenge the danger of big data to workers, consumers and the public.
Data Lake A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed.
While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data. Each data element in a lake is assigned a unique identifier and tagged with a set of extended metadata tags. When a business question arises, the data lake can be queried for relevant data, and that smaller set of data can then be analyzed to help answer the question.
The term data lake is often associated with Hadoop-oriented object storage. In such a scenario, an organization’s data is first loaded into the Hadoop platform, and then business analytics and data mining tools are applied to the data where it resides on Hadoop’s cluster nodes of commodity computers.
Like big data, the term data lake is sometimes disparaged as being simply a marketing label for a product that supports Hadoop. Increasingly, however, the term is being accepted as a way to describe any large data pool in which the schema and data requirements are not defined until the data is queried.
Data Leakage Data Leakage is the creation of unexpected additional information in the training data, allowing a model or machine learning algorithm to make unrealistically good predictions. Leakage is a pervasive challenge in applied machine learning, causing models to over-represent their generalization error and often rendering them useless in the real world. It can caused by human or mechanical error, and can be intentional or unintentional in both cases.
Data Lineage Data lineage is generally defined as a kind of data life cycle that includes the data’s origins and where it moves over time. This term can also describe what happens to data as it goes through diverse processes. Data lineage can help with efforts to analyze how information is used and to track key bits of information that serve a particular purpose.
How to track and visualize data lineage
Data Lineage Analysis “Data lineage is defined as a data life cycle that includes the data’s origins and where it moves over time.” It describes what happens to data as it goes through diverse processes. It helps provide visibility into the analytics pipeline and simplifies tracing errors back to their sources. It also enables replaying specific portions or inputs of the dataflow for step-wise debugging or regenerating lost output. In fact, database systems have used such information, called data provenance, to address similar validation and debugging challenges already.
Data provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins. The generated evidence supports essential forensic activities such as data-dependency analysis, error/compromise detection and recovery, and auditing and compliance analysis. “Lineage is a simple type of why provenance.”
Data Literacy A statistical understanding and experience of applying analysis techniques to real data through code and visualization. Data literacy will be the fundamental skill for the 21st century. It’s also extremely easy to learn. The best way to develop this skill is to simply work with datasets.
Data Mining
Data mining (the analysis step of the “Knowledge Discovery in Databases” process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Data Mining Reality Check
Is a means for ensuring the validity of data mining results. Data Mining Reality Check is technology for giving you the probability distribution against which to compare the best performance from your data mining exercise – that is, the probability distribution of your best network relative to the benchmark, viewed as a random variable generated by a random process in which really there is nothing better than the benchmark.
Data Normalization Data normalization is the process of reducing data to its canonical form. For instance, Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. In the field of software security, a common vulnerability is unchecked malicious input. The mitigation for this problem is proper input validation. Before input validation may be performed, the input must be normalized, i.e., eliminating encoding (for instance HTML encoding) and reducing the input data to a single common character set.
Data Paring The problem that needs to be more discussed is data paring. The need for this is fairly obvious: data is growing exponentially, and growing your compute data exponentially will require budgets that aren’t realistic. One of the keys to winning at Big Data will be ignoring the noise. As the amount of data increases exponentially, the amount of interesting data doesn’t; I would bet that for most purposes the interesting data added is a tiny percentage of the new data that is added to the overall pool of data.
Data Partitioning Data partitioning in data mining is the division of the whole data available into two or three non overlapping sets: the training set , the validation set , and the test set. If the data set is very large, often only a portion of it is selected for the partitions. Partitioning is normally used when the model for the data at hand is being chosen from a broad set of models. The basic idea of data partitioning is to keep a subset of available data out of analysis, and to use it later for verification of the model.
Data Pattern Processing
Data Plumbing
Data Preprocessing Data pre-processing is an important step in the data mining process. The phrase “garbage in, garbage out” is particularly applicable to data mining and machine learning projects. Data-gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income: -100), impossible data combinations (e.g., Sex: Male, Pregnant: Yes), missing values, etc. Analyzing data that has not been carefully screened for such problems can produce misleading results. Thus, the representation and quality of data is first and foremost before running an analysis.
Data Profiling Data profiling is the process of examining the data available in an existing data source (e.g. a database or a file) and collecting statistics and information about that data. The purpose of these statistics may be to:
1. Find out whether existing data can easily be used for other purposes
2. Improve the ability to search the data by tagging it with keywords, descriptions, or assigning it to a category
3. Give metrics on data quality including whether the data conforms to particular standards or patterns
4. Assess the risk involved in integrating data for new applications, including the challenges of joins
5. Assess whether metadata accurately describes the actual values in the source database
6. Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns.
7. Have an enterprise view of all data, for uses such as master data management where key data is needed, or data governance for improving data quality.
Data Science Data science is a buzz word reflecting the application of statistics by advances in computer science.
Data science is the study of the generalizable extraction of knowledge from data, yet the key word is science. It incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. Data Science is not restricted to only big data, although the fact that data is scaling up makes big data an important aspect of data science.
Data Science Maturity Model
Many organizations have been underwhelmed by the return on their investment in data science. This is due to a narrow focus on tools, rather than a broader consideration of how data science teams work and how they fit within the larger organization. To help data science practitioners and leaders identify their existing gaps and direct future investment, Domino has developed a framework called the Data Science Maturity Model (DSMM). The DSMM assesses how reliably and sustainably a data science team can deliver value for their organization. The model consists of four levels of maturity and is split along five dimensions that apply to all analytical organizations. By design, the model is not specific to any given industry — it applies as much to an insurance company as it does to a manufacturer.
Data Science-as-a-Service
Data Science as a Service is Analyze’s unique approach to providing today’s business and government with the most advanced and efficient big data and data science analytics. With more than 25 years combined experience in data science and cybersecurity, Analyze helps organizations stay ahead of the competition, increase revenue, and improve operational efficiency.
Data Standardization When approaching data for modeling, some standard procedures should be used to prepare the data for modeling:
1.First the data should be filtered, and any outliers removed from the data (watch for a future post on how to scrub your raw data removing only legitimate outliers).
2.The data should be normalized or standardized to bring all of the variables into proportion with one another. For example, if one variable is 100 times larger than another (on average), then your model may be better behaved if you normalize/standardize the two variables to be approximately equivalent. Technically though, whether normalized/standardized, the coefficients associated with each variable will scale appropriately to adjust for the disparity in the variable sizes.
Data Stewardship In metadata, a data steward is a person that is responsible for maintaining a data element in a metadata registry. A data steward is a broad job role that incorporates processes, policies, guidelines and responsibilities for administering organizations’ entire data in compliance with business and/or regulatory obligations. A data steward’s responsibility stems from an understanding of the business domain and the interaction of business processes with data entities/elements. A data steward ensures that there are documented procedures and guidelines for data access and use.
A data steward may share some responsibilities with a data custodian, and work with database/warehouse administrators and other related staff to plan and execute an enterprise-wide data governance, control and compliance policy.
Data stewardship roles are common when organizations are attempting to exchange data precisely and consistently between computer systems and reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed.
Data Stream Mining Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. A data stream is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities. Examples of data streams include computer network traffic, phone conversations, ATM transactions, web searches, and sensor data. Data stream mining can be considered a subfield of data mining, machine learning, and knowledge discovery. In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream. Machine learning techniques can be used to learn this prediction task from labeled examples in an automated fashion. Often, concepts from the field of incremental learning, a generalization of Incremental heuristic search are applied to cope with structural changes, on-line learning and real-time demands. In many applications, especially operating within non-stationary environments, the distribution underlying the instances or the rules underlying their labeling may change over time, i.e. the goal of the prediction, the class to be predicted or the target value to be predicted, may change over time. This problem is referred to as concept drift.
Data Structure Graph A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph.
Data Visualization Data visualization or data visualisation is viewed by many disciplines as a modern equivalent of visual communication. It is not owned by any one field, but rather finds interpretation across many (e.g. it is viewed as a modern branch of descriptive statistics by some, but also as a grounded theory development tool by others). It involves the creation and study of the visual representation of data, meaning ‘information that has been abstracted in some schematic form, including attributes or variables for the units of information’. A primary goal of data visualization is to communicate information clearly and efficiently to users via the information graphics selected, such as tables and charts. Effective visualization helps users in analyzing and reasoning about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic (i.e., showing comparisons or showing causality) follows the task. Tables are generally used where users will look-up a specific measure of a variable, while charts of various types are used to show patterns or relationships in the data for one or more variables. Data visualization is both an art and a science. The rate at which data is generated has increased, driven by an increasingly information-based economy. Data created by internet activity and an expanding number of sensors in the environment, such as satellites and traffic cameras, are referred to as ‘Big Data’. Processing, analyzing and communicating this data present a variety of ethical and analytical challenges for data visualization. The field of data science and practitioners called data scientists have emerged to help address this challenge.
Data Warehouse
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting and data analysis. Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
Data as a Service, or DaaS, is a cousin of software as a service. Like all members of the “as a Service” (aaS) family, DaaS is based on the concept that the product, data in this case, can be provided on demand to the user regardless of geographic or organizational separation of provider and consumer. Additionally, the emergence of service-oriented architecture (SOA) has rendered the actual platform on which the data resides also irrelevant. This development has enabled the recent emergence of the relatively new concept of DaaS. Data provided as a service was at first primarily used in Web mashups, but now is being increasingly employed both commercially and, less commonly, within organisations such as the UN.
Data-Driven Threshold Machine
We present a novel distribution-free approach, the data-driven threshold machine (DTM), for a fundamental problem at the core of many learning tasks: choose a threshold for a given pre-specified level that bounds the tail probability of the maximum of a (possibly dependent but stationary) random sequence. We do not assume data distribution, but rather relying on the asymptotic distribution of extremal values, and reduce the problem to estimate three parameters of the extreme value distributions and the extremal index. We specially take care of data dependence via estimating extremal index since in many settings, such as scan statistics, change-point detection, and extreme bandits, where dependence in the sequence of statistics can be significant. Key features of our DTM also include robustness and the computational efficiency, and it only requires one sample path to form a reliable estimate of the threshold, in contrast to the Monte Carlo sampling approach which requires drawing a large number of sample paths. We demonstrate the good performance of DTM via numerical examples in various dependent settings.
Datification / Datafication A concept that tracks the conception, development, storage and marketing of all types of data, both for business and life. It has grown in popularity of late to capture how data measures things and organizations in order to compete and win. It is about making business visible.
Dato Dato (formerly known as GraphLab): Your app drives business. From inspiration to production, build intelligent apps fast with the power of Dato’s machine learning platform. Data science at scale has never been easier.
Dawid-Skene Algorithm
More and more online communities classify contributions based on collaborative ratings of these contributions. A popular method for such a rating-based classification is the Dawid-Skene algorithm (DSA). However, despite its popularity, DSA has two major shortcomings:
(1) It is vulnerable to raters with a low competence, i.e., a low probability of rating correctly.
(2) It is defenseless against collusion attacks.
In a collusion attack, raters coordinate to rate the same data objects with the same value to artificially increase their remuneration.
Error Rate Analysis of Labeling by Crowdsourcing
DC.js dc.js is a javascript charting library with native crossfilter support and allowing highly efficient exploration on large multi-dimensional dataset (inspired by crossfilter’s demo). It leverages d3 engine to render charts in css friendly svg format. Charts rendered using dc.js are naturally data driven and reactive therefore providing instant feedback on user’s interaction. The main objective of this project is to provide an easy yet powerful javascript library which can be utilized to perform data visualization and analysis in browser as well as on mobile device.
DCM Bandits Search engines recommend a list of web pages. The user examines this list, from the first page to the last, and may click on multiple attractive pages. This type of user behavior can be modeled by the \emph{dependent click model (DCM)}. In this work, we propose \emph{DCM bandits}, an online learning variant of the DCM model where the objective is to maximize the probability of recommending a satisfactory item. The main challenge of our problem is that the learning agent does not observe the reward. It only observes the clicks. This imbalance between the feedback and rewards makes our setting challenging. We propose a computationally-efficient learning algorithm for our problem, which we call dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable assumptions; and prove a matching lower bound up to logarithmic factors. We experiment with dcmKL-UCB on both synthetic and real-world problems. Our algorithm outperforms a range of baselines and performs well even when our modeling assumptions are violated. To the best of our knowledge, this is the first regret-optimal online learning algorithm for learning to rank with multiple clicks in a cascade-like model.
De Bruijn Entropy De Bruijn entropy and string similarity
Debagging It is easy to convert a sentence into a bag of words, but it is much harder to convert a bag of words into a meaningful sentence. We name the latter the debagging problem.
Decision Analysis
Decision analysis (DA) is the discipline comprising the philosophy, theory, methodology, and professional practice necessary to address important decisions in a formal manner. Decision analysis includes many procedures, methods, and tools for identifying, clearly representing, and formally assessing important aspects of a decision, for prescribing a recommended course of action by applying the maximum expected utility action axiom to a well-formed representation of the decision, and for translating the formal representation of a decision and its corresponding recommendation into insight for the decision maker and other stakeholders.
Decision Model and Notation
The primary goal of DMN is to provide an industry standard modelling notation for decision management and business rules that is readily understandable by all business users: from the business analysts who need to create initial decision requirements and then more detailed decision models, to the technical developers responsible for automating the decisions in processes, and finally, to the business people who will manage and monitor those decisions. The submission has been designed to be complementary to and useable alongside the OMG Business Process Model & Notation (BPMN) standard and will ensure that decision models are interchangeable across organizations.
Decision Scientist Decision Scientists build decision support tools to enable decision makers to make decisions, or take action, under uncertainty with a data-centric bias. Traditional analytics falls under this domain. Often decision makers like linear solutions that provide simple, explainable, socializable decision making frameworks. That is, they are looking for a rationale. Data Scientists build machines to make decisions about large-scale complex dynamical processes that are typically too fast (velocity, veracity, volume, etc.) for a human operator/manager. They typically don’t concern themselves with whether the algorithm is explainable or socializable, but are more concerned with whether it is functional, reliable, accurate, and robust.
Decision Stump A decision stump is a machine learning model consisting of a one-level decision tree. That is, it is a decision tree with one internal node (the root) which is immediately connected to the terminal nodes (its leaves). A decision stump makes a prediction based on the value of just a single input feature. Sometimes they are also called 1-rules.
Decision Support
The term Decision Support (DS) is used often and in a variety of contexts related to decision making. Recently, for example, it is often mentioned in connection with Data Warehouses and On-Line Analytical Processing (OLAP). Another recent trend is to associate DS with Data Mining. This is the case in the project SolEuNet , which attempts to exploit these two approaches in a complementary way in order to support difficult real-life problem solving. Unfortunately, although the term ‘Decision Support’ seems rather intuitive and simple, it is in fact very loosely defined. It means different things to different people and in different contexts. Also, its meaning has shifted during the recent history. Nowadays, DS is probably most often associated with Data Warehouses and OLAP. A decade ago, it was coupled with Decision Support Systems (DSS). Still before that, there was a close link with Operations Research (OR) and Decision Analysis (DA). This causes a lot of confusion and misunderstanding, and provokes requests for clarification. The confusion is further exemplified by the multitude of related terms and acronyms that are either equal to, or start with ‘DS’: Decision Support, Decision Sciences, Decision Systems, Decision Support Systems, etc. This paper attempts to clarify these issues. We take the viewpoint that Decision Support is a broad, generic term that encompasses all aspects related to supporting people in making decisions. First, we present the results of a survey of WWW documents related to DS. On this basis, and on the basis of relevant literature and our previous experience in the field of DS, we provide a classification of DS and related disciplines. DS itself is given a role within Decision Making and Decision Sciences. Some most prominent DS disciplines are briefly overviewed: Operations Research, Decision Analysis, Decision Support Systems, Data Warehousing and OLAP, and Group Decision Support.
Decision Support System
A Decision Support System (DSS) is a computer-based information system that supports business or organizational decision-making activities. DSSs serve the management, operations, and planning levels of an organization (usually mid and higher management) and help to make decisions, which may be rapidly changing and not easily specified in advance (Unstructured and Semi-Structured decision problems). Decision support systems can be either fully computerized, human or a combination of both. While academics have perceived DSS as a tool to support decision making process, DSS users see DSS as a tool to facilitate organizational processes. Some authors have extended the definition of DSS to include any system that might support decision making. Sprague (1980) defines DSS by its characteristics:
1. DSS tends to be aimed at the less well structured, underspecified problem that upper level managers typically face;
2. DSS attempts to combine the use of models or analytic techniques with traditional data access and retrieval functions;
3. DSS specifically focuses on features which make them easy to use by noncomputer people in an interactive mode; and
4. DSS emphasizes flexibility and adaptability to accommodate changes in the environment and the decision making approach of the user.
DSSs include knowledge-based systems. A properly designed DSS is an interactive software-based system intended to help decision makers compile useful information from a combination of raw data, documents, and personal knowledge, or business models to identify and solve problems and make decisions. Typical information that a decision support application might gather and present includes:
• inventories of information assets (including legacy and relational data sources, cubes, data warehouses, and data marts),
• comparative sales figures between one period and the next,
• projected revenue figures based on product sales assumptions.
Decision Theory Decision theory or theory of choice in economics, psychology, philosophy, mathematics, computer science, and statistics is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision. It is closely related to the field of game theory; decision theory is concerned with the choices of individual agents whereas game theory is concerned with interactions of agents whose decisions affect each other.
Decision Tree Based Missing Value Imputation Technique
Decision tree based Missing value Imputation technique’ (DMI) makes use of an EM algorithm and a decision tree (DT) algorithm.
Decision Tree Learning / Classification and Regression Trees
Decision tree learning uses a decision tree as a predictive model which maps observations about an item to conclusions about the item’s target value. It is one of the predictive modelling approaches used in statistics, data mining and machine learning. More descriptive names for such tree models are classification trees or regression trees. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
Deducer An R Graphical User Interface (GUI) for Everyone: Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold.
1. Provide an intuitive graphical user interface (GUI) for R, encouraging non-technical users to learn and perform analyses without programming getting in their way.
2. Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming. Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).
Deduplication with Hadoop
Entity Matching for Big Data: Automatically matching entities (objects) and ontologies are key technologies to semantically integrate heterogeneous data. These match techniques are needed to identify equivalent data objects (duplicates) or semantically equivalent metadata elements (ontology concepts, schema attributes). The proposed techniques demand very high resources that limit their applicability to large-scale (Big Data) problems unless a powerful cloud infrastructure can be utilized. This is because the (fuzzy) match approaches basically have a quadratic complexity to compare the all elements to be matched with each other. For sufficient match quality, multiple match algorithms need to be applied and combined within so-called match workflows adding further resource requirements as well as a significant optimization problem to select matchers and configure their combination.
Deep Belief Networks
In machine learning, a deep belief network (DBN) is a generative graphical model, or alternatively a type of deep neural network, composed of multiple layers of latent variables (“hidden units”), with connections between the layers but not between units within each layer. When trained on a set of examples in an unsupervised way, a DBN can learn to probabilistically reconstruct its inputs. The layers then act as feature detectors on inputs. After this learning step, a DBN can be further trained in a supervised way to perform classification. DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, where each sub-network’s hidden layer serves as the visible layer for the next. This also leads to a fast, layer-by-layer unsupervised training procedure, where contrastive divergence is applied to each sub-network in turn, starting from the “lowest” pair of layers (the lowest visible layer being a training set). The observation, due to Hinton’s student Teh, that DBNs can be trained greedily, one layer at a time, has been called a breakthrough in deep learning.
Deep Broad Learning
Deep learning has demonstrated the power of detailed modeling of complex high-order (multivariate) interactions in data. For some learning tasks there is power in learning models that are not only Deep but also Broad. By Broad, we mean models that incorporate evidence from large numbers of features. This is of especial value in applications where many different features and combinations of features all carry small amounts of information about the class. The most accurate models will integrate all that information. In this paper, we propose an algorithm for Deep Broad Learning called DBL. The proposed algorithm has a tunable parameter $n$, that specifies the depth of the model. It provides straightforward paths towards out-of-core learning for large data. We demonstrate that DBL learns models from large quantities of data with accuracy that is highly competitive with the state-of-the-art.
Deep Convolutional Generative Adversarial Networks
In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator.
Deep Convolutional Neural Network
A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks such as visual object and speech recognition. The key factor complicating such tasks is the presence of numerous nuisance variables, for instance, the unknown object position, orientation, and scale in object recognition or the unknown voice pronunciation, pitch, and speed in speech recognition. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks; they are constructed from many layers of alternating linear and nonlinear processing units and are trained using large-scale algorithms and massive amounts of training data. The recent success of deep learning systems is impressive — they now routinely yield pattern recognition systems with nearor super-human capabilities — but a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on a Bayesian generative probabilistic model that explicitly captures variation due to nuisance variables. The graphical structure of the model enables it to be learned from data using classical expectation-maximization techniques. Furthermore, by relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks (DCNs) and random decision forests (RDFs), providing insights into their successes and shortcomings as well as a principled route to their improvement.
Deep Data What we call ‘deep data’ is a combination of experts’ domain knowledge of the area … combined with data science.
Deep Generalized Canonical Correlation Analysis
We present Deep Generalized Canonical Correlation Analysis (DGCCA) — a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA is the first CCA-style multiview representation learning technique that combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many independent sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn DGCCA repre- sentations on two distinct datasets for three downstream tasks: phonetic transcrip- tion from acoustic and articulatory measurements, and recommending hashtags and friends on a dataset of Twitter users. We find that DGCCA representations soundly beat existing methods at phonetic transcription and hashtag recommendation, and in general perform no worse than standard linear many-view techniques.
Deep Hashing Neural Network
In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known soft-max layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected sub-regions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems.
Deep Kernelized Autoencoder In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During training, we optimize both the reconstruction accuracy of input samples and the alignment between a kernel matrix given as prior and the inner products of the hidden representations computed by the autoencoder. Kernel alignment provides control over the hidden representation learned by the autoencoder. Experiments have been performed to evaluate both reconstruction and kernel alignment performance. Additionally, we applied our method to emulate kPCA on a denoising task obtaining promising results.
Deep Learning Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. Deep learning is part of a broader family of machine learning methods based on learning representations. An observation (e.g., an image) can be represented in many ways (e.g., a vector of pixels), but some representations make it easier to learn tasks of interest (e.g., is this the image of a human face?) from examples, and research in this area attempts to define what makes better representations and how to create models to learn these representations. Various deep learning architectures such as deep neural networks, convolutional deep neural networks, and deep belief networks have been applied to fields like computer vision, automatic speech recognition, natural language processing, and music/audio signal recognition where they have been shown to produce state-of-the-art results on various tasks.
Deep Learning Accelerator Unit
As the emerging field of machine learning, deep learning shows excellent ability in solving complex learning problems. However, the size of the networks becomes increasingly large scale due to the demands of the practical applications, which poses significant challenge to construct a high performance implementations of deep learning neural networks. In order to improve the performance as well to maintain the low power cost, in this paper we design DLAU, which is a scalable accelerator architecture for large-scale deep learning networks using FPGA as the hardware prototype. The DLAU accelerator employs three pipelined processing units to improve the throughput and utilizes tile techniques to explore locality for deep learning applications. Experimental results on the state-of-the-art Xilinx FPGA board demonstrate that the DLAU accelerator is able to achieve up to 36.1x speedup comparing to the Intel Core2 processors, with the power consumption at 234mW.
Deep Linear Discriminant Analysis
We introduce Deep Linear Discriminant Analysis (DeepLDA) which learns linearly separable latent representations in an end-to-end fashion. Classic LDA extracts features which preserve class separability and is used for dimensionality reduction for many classification problems. The central idea of this paper is to put LDA on top of a deep neural network. This can be seen as a non-linear extension of classic LDA. Instead of maximizing the likelihood of target labels for individual samples, we propose an objective function that pushes the network to produce feature distributions which: (a) have low variance within the same class and (b) high variance between different classes. Our objective is derived from the general LDA eigenvalue problem and still allows to train with stochastic gradient descent and back-propagation.
Deep Mean Maps
The use of distributions and high-level features from deep architecture has become commonplace in modern computer vision. Both of these methodologies have separately achieved a great deal of success in many computer vision tasks. However, there has been little work attempting to leverage the power of these to methodologies jointly. To this end, this paper presents the Deep Mean Maps (DMMs) framework, a novel family of methods to non-parametrically represent distributions of features in convolutional neural network models. DMMs are able to both classify images using the distribution of top-level features, and to tune the top-level features for performing this task. We show how to implement DMMs using a special mean map layer composed of typical CNN operations, making both forward and backward propagation simple.
Deep Nearest Neighbor Descent
Most density-based clustering methods largely rely on how well the underlying density is estimated. However, density estimation itself is also a challenging problem, especially the determination of the kernel bandwidth. A large bandwidth could lead to the over-smoothed density estimation in which the number of density peaks could be less than the true clusters, while a small bandwidth could lead to the under-smoothed density estimation in which spurious density peaks, or called the ‘ripple noise’, would be generated in the estimated density. In this paper, we propose a density-based hierarchical clustering method, called the Deep Nearest Neighbor Descent (D-NND), which could learn the underlying density structure layer by layer and capture the cluster structure at the same time. The over-smoothed density estimation could be largely avoided and the negative effect of the under-estimated cases could be also largely reduced. Overall, D-NND presents not only the strong capability of discovering the underlying cluster structure but also the remarkable reliability due to its insensitivity to parameters.
Deep Neural Decision Forests We present Deep Neural Decision Forests – a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find onpar or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7:84%=6:38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
Deep Optimistic Linear Support Learning
We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first time that deep reinforcement learning has succeeded in learning multi-objective policies. In addition, we provide a testbed with two experiments to be used as a benchmark for deep multi-objective reinforcement learning.
Deep Q-Network
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Deep Rendering Mixture Model
A Probabilistic Framework for Deep Learning
Deep Rendering Model
In this paper, we develop a new theoretical framework that provides insights into both the successes and shortcomings of deep learning systems, as well as a principled route to their design and improvement. Our framework is based on a generative probabilistic model that explicitly captures variation due to latent nuisance variables. The Rendering Model (RM) explicitly models nuisance variation through a rendering function that combines the task-specific variables of interest (e.g., object class in an object recognition task) and the collection of nuisance variables. The Deep Rendering Model (DRM) extends the RM in a hierarchical fashion by rendering via a product of affine nuisance transformations across multiple levels of abstraction. The graphical structures of the RM and DRM enable inference via message passing, using, for example, the sum-product or max-sum algorithms, and training via the expectation-maximization (EM) algorithm. A key element of the framework is the relaxation of the RM/DRM generative model to a discriminative one in order to optimize the bias-variance tradeoff.
Deep Residual Hashing In this paper, we define an extension of the supersymmetric hyperbolic nonlinear sigma model introduced by Zirnbauer. We show that it arises as a weak joint limit of a time-changed version introduced by Sabot and Tarr\`es of the vertex-reinforced jump process. It describes the asymptotics of rescaled crossing numbers, rescaled fluctuations of local times, asymptotic local times on a logarithmic scale, endpoints of paths, and last exit trees.
Deep Roots We propose a new method for training computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. Our sparse connection structure facilitates a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy. We validate our approach by using it to train more efficient variants of state-of-the-art CNN architectures, evaluated on the CIFAR10 and ILSVRC datasets. Our results show similar or higher accuracy than the baseline architectures with much less compute, as measured by CPU and GPU timings. For example, for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).
Deep Successor Reinforcement Learning
Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components — a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations — simple grid-world domains (MazeBase) and the Doom game engine.
Deep Survival Previous research has shown that neural networks can model survival data in situations in which some patients’ death times are unknown, e.g. right-censored. However, neural networks have rarely been shown to outperform their linear counterparts such as the Cox proportional hazards model. In this paper, we run simulated experiments and use real survival data to build upon the risk-regression architecture proposed by Faraggi and Simon. We demonstrate that our model, DeepSurv, not only works as well as the standard linear Cox proportional hazards model but actually outperforms it in predictive ability on survival data with linear and nonlinear risk functions. We then show that the neural network can also serve as a recommender system by including a categorical variable representing a treatment group. This can be used to provide personalized treatment recommendations based on an individual’s calculated risk. We provide an open source Python module that implements these methods in order to advance research on deep learning and survival analysis.
Deep Texture Encoding Network
(Deep TEN)
We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding Layer integrated on top of convolutional layers, which ports the entire dictionary learning and encoding pipeline into a single model. Current methods build from distinct components, using standard encoders with separate off-the-shelf features such as SIFT descriptors or pre-trained CNN features for material recognition. Our new approach provides an end-to-end learning framework, where the inherent visual vocabularies are learned directly from the loss function. The features, dictionaries and the encoding representation for the classifier are all learned simultaneously. The representation is orderless and therefore is particularly useful for material and texture recognition. The Encoding Layer generalizes robust residual encoders such as VLAD and Fisher Vectors, and has the property of discarding domain specific information which makes the learned convolutional features easier to transfer. Additionally, joint training using multiple datasets of varied sizes and class labels is supported resulting in increased recognition performance. The experimental results show superior performance as compared to state-of-the-art methods using gold-standard databases such as MINC-2500, Flickr Material Database, KTH-TIPS-2b, and two recent databases 4D-Light-Field-Material and GTOS. The source code for the complete system are publicly available.
Deep Variational Canonical Correlation Analysis
We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA~\citep{BachJordan05a} to nonlinear observation models parameterized by deep neural networks (DNNs). Marginal data likelihood as well as inference are intractable under this model. We derive a variational lower bound of the data likelihood by parameterizing the posterior density of the latent variables with another DNN, and approximate the lower bound via Monte Carlo sampling. Interestingly, the resulting model resembles that of multi-view autoencoders~\citep{Ngiam_11b}, with the key distinction of an additional sampling procedure at the bottleneck layer. We also propose a variant of VCCA called VCCA-private which can, in addition to the ‘common variables’ underlying both views, extract the ‘private variables’ within each view. We demonstrate that VCCA-private is able to disentangle the shared and private information for multi-view data without hard supervision.
Deep Visual-Semantic Embedding Model
Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources – such as text data – both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions by up to 65%, achieving hit rates of up to 10% across thousands of novel labels never seen by the visual model.
Deep Web The Deep Web, Deep Net, Invisible Web, or Hidden Web, refers to the content on the World Wide Web that is not indexed by standard search engines. Computer scientist Mike Bergman is credited with coining the term in 2000.
DeepBoost We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new datadependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants.
DeepDetect DeepDetect is a deep learning API and server written in C++11. It makes state of the art deep learning easy to work with and integrate into existing applications.
DeepDive DeepDive is a new type of system that enables developers to analyze data on a deeper level than ever before. DeepDive is a trained system: it uses machine learning techniques to leverage on domain-specific knowledge and incorporates user feedback to improve the quality of its analysis.
DeepDSL In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, internal representation, and execution environment. This makes it difficult to implement portable and customized DL applications. In this paper, we present DeepDSL, a domain specific language (DSL) embedded in Scala, that compiles deep networks written in DeepDSL to Java source code. Deep DSL provides (1) intuitive constructs to support compact encoding of deep networks; (2) symbolic gradient derivation of the networks; (3) static analysis for memory consumption and error detection; and (4) DSL-level optimization to improve memory and runtime efficiency. DeepDSL programs are compiled into compact, efficient, customizable, and portable Java source code, which operates the CUDA and CUDNN interfaces running on Nvidia GPU via a Java Native Interface (JNI) library. We evaluated DeepDSL with a number of popular DL networks. Our experiments show that the compiled programs have very competitive runtime performance and memory efficiency compared to the existing libraries.
DeepGraph The topological (or graph) structures of real-world networks are known to be predictive of multiple dynamic properties of the networks. Conventionally, a graph structure is represented using an adjacency matrix or a set of hand-crafted structural features. These representations either fail to highlight local and global properties of the graph or suffer from a severe loss of structural information. There lacks an effective graph representation, which hinges the realization of the predictive power of network structures. In this study, we propose to learn the represention of a graph, or the topological structure of a network, through a deep learning model. This end-to-end prediction model, named DeepGraph, takes the input of the raw adjacency matrix of a real-world network and outputs a prediction of the growth of the network. The adjacency matrix is first represented using a graph descriptor based on the heat kernel signature, which is then passed through a multi-column, multi-resolution convolutional neural network. Extensive experiments on five large collections of real-world networks demonstrate that the proposed prediction model significantly improves the effectiveness of existing methods, including linear or nonlinear regressors that use hand-crafted features, graph kernels, and competing deep learning methods.
DeepLearningKit In this paper we present DeepLearningKit – an open source framework that supports using pretrained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for the big screen, or OS X desktop applications. The goal is to support using deep learning models trained with popular frameworks such as Caffe, Torch, TensorFlow, Theano, Pylearn, Deeplearning4J and Mocha. Given the massive GPU resources and time required to train Deep Learning models we suggest an App Store like model to distribute and download pretrained and reusable Deep Learning models.
Deeply-Supervised Nets
Our proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent. We make an attempt to boost the classification performance by studying a new formulation in deep networks. Three aspects in convolutional neural networks (CNN) style architectures are being looked at: (1) transparency of the intermediate layers to the overall classification; (2) discriminativeness and robustness of learned features, especially in the early layers; (3) effectiveness in training due to the presence of the exploding and vanishing gradients. We introduce ‘companion objective’ to the individual hidden layers, in addition to the overall objective at the output layer (a different strategy to layer-wise pre-training). We extend techniques from stochastic gradient methods to analyze our algorithm. The advantage of our method is evident and our experimental result on benchmark datasets shows significant performance gain over existing methods (e.g. all state-of-the-art results on MNIST, CIFAR-10, CIFAR-100, and SVHN).
DeepSense Mobile sensing applications usually require time-series inputs from sensors. Some applications, such as tracking, can use sensed acceleration and rate of rotation to calculate displacement based on physical system models. Other applications, such as activity recognition, extract manually designed features from sensor inputs for classification. Such applications face two challenges. On one hand, on-device sensor measurements are noisy. For many mobile applications, it is hard to find a distribution that exactly describes the noise in practice. Unfortunately, calculating target quantities based on physical system and noise models is only as accurate as the noise assumptions. Similarly, in classification applications, although manually designed features have proven to be effective, it is not always straightforward to find the most robust features to accommodate diverse sensor noise patterns and user behaviors. To this end, we propose DeepSense, a deep learning framework that directly addresses the aforementioned noise and feature customization challenges in a unified manner. DeepSense integrates convolutional and recurrent neural networks to exploit local interactions among similar mobile sensors, merge local interactions of different sensory modalities into global interactions, and extract temporal relationships to model signal dynamics. DeepSense thus provides a general signal estimation and classification framework that accommodates a wide range of applications. We demonstrate the effectiveness of DeepSense using three representative and challenging tasks: car tracking with motion sensors, heterogeneous human activity recognition, and user identification with biometric motion analysis. DeepSense significantly outperforms the state-of-the-art methods for all three tasks. In addition, DeepSense is feasible to implement on smartphones due to its moderate energy consumption and low latency
DeepWalk We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk’s latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk’s representations can provide F1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk’s representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.
Deferred Acceptance Algorithm
The Deferred Acceptance Algorithm (DAA) goes back to Gale and Shapley (1962). They introduce a rather simple algorithm that finds a stable matching for example for college admissions or in a marriage market. In a marriage market where M men have preferences over W women, and men take the role of the proposing party, the DAA produces what is called the M-stable matching: each man strictly prefers the M-stable matching to any other potential matching. “Stable” means that no couple of a man and a woman could break the matching by choosing another mate. This is quite a strong result.
Variations of this algoritm are used in Hospital assignments in the USA, whereby recently graduated doctors submit preferences over hospitals, and hospitals submit preferences over graduates. Another application is the kidney exchange, where the algorithm is used to find the best match between a set of donors and a set of receivers.
Definition Extraction Tool
We present DefExt, an easy to use semi supervised Definition Extraction Tool. DefExt is designed to extract from a target corpus those textual fragments where a term is explicitly mentioned together with its core features, i.e. its definition. It works on the back of a Conditional Random Fields based sequential labeling algorithm and a bootstrapping approach. Bootstrapping enables the model to gradually become more aware of the idiosyncrasies of the target corpus. In this paper we describe the main components of the toolkit as well as experimental results stemming from both automatic and manual evaluation. We release DefExt as open source along with the necessary files to run it in any Unix machine. We also provide access to training and test data for immediate use.
Deformable Convolution “Deformable Convolutional Networks”
Deformable Convolutional Networks Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules. In this work, we introduce two new modules to enhance the transformation modeling capacity of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the effectiveness of our approach on sophisticated vision tasks of object detection and semantic segmentation. The code would be released.
Deformable RoI Pooling “Deformable Convolutional Networks”
Delaunay Diagram
Delta Epsilon Alpha Star Delta Epsilon Alpha Star is a minimal coverage, real-time robotic search algorithm that yields a moderately aggressive search path with minimal backtracking. Search performance is bounded by a placing a combinatorial bound, epsilon and delta, on the maximum deviation from the theoretical shortest path and the probability at which further deviations can occur. Additionally, we formally define the notion of PAC-admissibility — a relaxed admissibility criteria for algorithms, and show that PAC-admissible algorithms are better suited to robotic search situations than epsilon-admissible or strict algorithms.
DelugeNets Human brains are adept at dealing with the deluge of information they continuously receive, by suppressing the non-essential inputs and focusing on the important ones. Inspired by such capability, we propose Deluge Networks (DelugeNets), a novel class of neural networks facilitating massive cross-layer information inflows from preceding layers to succeeding layers. The connections between layers in DelugeNets are efficiently established through cross-layer depthwise convolutional layers with learnable filters, acting as a flexible selection mechanism. By virtue of the massive cross-layer information inflows, DelugeNets can propagate information across many layers with greater flexibility and utilize network parameters more effectively, compared to existing ResNet models. Experiments show the superior performances of DelugeNets in terms of both classification accuracies and parameter efficiencies. Remarkably, a DelugeNet model with just 20.2M parameters achieve state-of-the-art error of 19.02% on CIFAR-100 dataset, outperforming DenseNet model with 27.2M parameters. Moreover, DelugeNet performs comparably to ResNet-200 on ImageNet dataset with merely half of the computations needed by the latter.
Demand Sensing Demand Sensing is a next generation forecasting method that leverages new mathematical techniques and near real-time information to create an accurate forecast of demand, based on the current realities of the supply chain. The typical performance of demand sensing systems reduces near-term forecast error by 30% or more compared to traditional time-series forecasting techniques. The jump in forecast accuracy helps companies manage the effects of market volatility and gain the benefits of a demand-driven supply chain, including more efficient operations, increased service levels, and a range of financial benefits including higher revenue, better profit margins, less inventory, better perfect order performance and a shorter cash-to-cash cycle time. Gartner, Inc. insight on demand sensing can be found in its report, “Supply Chain Strategy for Manufacturing Leaders: The Handbook for Becoming Demand Driven.”
Deming Regression In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.
Dempster’s Rule of Combination How to combine two independent sets of probability mass assignments in specific situations. In case different sources express their beliefs over the frame in terms of belief constraints such as in case of giving hints or in case of expressing preferences, then Dempster’s rule of combination is the appropriate fusion operator.
Dempster–Shafer Theory
The Dempster-Shafer theory (DST) is a mathematical theory of evidence. It allows one to combine evidence from different sources and arrive at a degree of belief (represented by a belief function) that takes into account all the available evidence. The theory was first developed by Arthur P. Dempster and Glenn Shafer. In a narrow sense, the term Dempster-Shafer theory refers to the original conception of the theory by Dempster and Shafer. However, it is more common to use the term in the wider sense of the same general approach, as adapted to specific kinds of situations. In particular, many authors have proposed different rules for combining evidence, often with a view to handling conflicts in evidence better.
Dendrogram A dendrogram (from Greek dendron “tree” and gramma “drawing”) is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering.
Denoising Autoencoder
The idea behind denoising autoencoders is simple. In order to force the hidden layer to discover more robust features and prevent it from simply learning the identity, we train the autoencoder to reconstruct the input from a corrupted version of it. The denoising auto-encoder is a stochastic version of the auto-encoder. Intuitively, a denoising auto-encoder does two things: try to encode the input (preserve the information about the input), and try to undo the effect of a corruption process stochastically applied to the input of the auto-encoder. The latter can only be done by capturing the statistical dependencies between the inputs. The denoising auto-encoder can be understood from different perspectives (the manifold learning perspective, stochastic operator perspective, bottom-up – information theoretic perspective, top-down – generative model perspective), all of which are explained in. See also section 7.2 of for an overview of auto-encoders. In , the stochastic corruption process randomly sets some of the inputs (as many as half of them) to zero. Hence the denoising auto-encoder is trying to predict the corrupted (i.e. missing) values from the uncorrupted (i.e., non-missing) values, for randomly selected subsets of missing patterns. Note how being able to predict any subset of variables from the rest is a sufficient condition for completely capturing the joint distribution between a set of variables (this is how Gibbs sampling works). To convert the autoencoder class into a denoising autoencoder class, all we need to do is to add a stochastic corruption step operating on the input. The input can be corrupted in many ways, but in this tutorial we will stick to the original corruption mechanism of randomly masking entries of the input by making them zero.
Dense Convolutional Network
Recent work has shown that convolutional networks can be substantially deeper, more accurate and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper we embrace this observation and introduce the Dense Convolutional Network (DenseNet), where each layer is directly connected to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections, one between each layer and its subsequent layer (treating the input as layer 0), our network has L(L+1)/2 direct connections. For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. Our proposed connectivity pattern has several compelling advantages: it alleviates the vanishing gradient problem and strengthens feature propagation; despite the increase in connections, it encourages feature reuse and leads to a substantial reduction of parameters; its models tend to generalize surprisingly well. We evaluate our proposed architecture on five highly competitive object recognition benchmark tasks. The DenseNet obtains significant improvements over the state-of-the-art on all five of them (e.g., yielding 3.74% test error on CIFAR-10, 19.25% on CIFAR-100 and 1.59% on SVHN).
Density-based spatial clustering of applications with noise
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the e parameter with a maximum search radius.
Dependence Modeling “Copula”
Dependency Network The dependency network approach provides a new system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network’s nodes (when the network structure is analyzed), and provides an important step towards inference of causal activity relations between the network nodes (when analyzing the network activity). This methodology has originally been introduced for the study of financial data, it has been extended and applied to other systems, such as the immune system, and semantic networks. In the case of network activity, the analysis is based on partial correlations, which are becoming ever more widely used to investigate complex systems. In simple words, the partial (or residual) correlation is a measure of the effect (or contribution) of a given node, say j, on the correlations between another pair of nodes, say i and k. Using this concept, the dependency of one node on another node, is calculated for the entire network. This results in a directed weighted adjacency matrix, of a fully connected network. Once the adjacency matrix has been constructed, different algorithms can be used to construct the network, such as a threshold network, Minimal Spanning Tree (MST), Planar Maximally Filtered Graph (PMFG), and others.
DeployR DeployR is a server-based framework that provides simple, secure R integration for application developers. It’s available in two editions: DeployR Open, which is free and open-source; and Revolution R Enterprise DeployR, which adds a scalable grid framework and enterprise authentication features for production applications integrated with R. If you’re looking for an overview of what DeployR is and how you can use it to access R from other applications, we’ve just released a new white paper, Using DeployR to Solve the R Integration Problem.
DeployR Data I/O
Depth-first Search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. One starts at the root (selecting some arbitrary node as the root in the case of a graph) and explores as far as possible along each branch before backtracking. A version of depth-first search was investigated in the 19th century by French mathematician Charles Pierre Tremaux as a strategy for solving mazes.
Descriptive Statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of information, or the quantitative description itself. Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, are not developed on the basis of probability theory. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, and the proportion of subjects with related comorbidities.
Determinantal Point Processes
In mathematics, a determinantal point process is a stochastic point process, the probability distribution of which is characterized as a determinant of some function. Such processes arise as important tools in random matrix theory, combinatorics, and physics.
Deterministic Statistical Machine E.g. you input a data set and then specify the question you are asking (is variable Y related to variable X? can i predict Z from W?) then, depending on your question, it uses a deterministic set of methods to analyze the data. Say regression for inference, linear discriminant analysis for prediction, etc. But the method is fixed and deterministic for each question. It also performs a pre-specified set of checks for outliers, confounders, missing data, maybe even data fudging. It generates a report with a markdown tool and then immediately publishes the result.
Determinized Sparse Partially Observable Tree
The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the ‘curse of dimensionality’ and the ‘curse of history’. To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the ‘execution’ of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for realtime vehicle control.
DetMCD Algorithm Most algorithms for highly robust estimators of multivariate location and scatter start by drawing a large number of random subsets. For instance, the FASTMCD algorithm of Rousseeuw and Van Driessen starts in this way, and then takes so-called concentration steps to obtain a more accurate approximation to the MCD. The FASTMCD algorithm is affine equivariant but not permutation invariant. In this article,we present a deterministic algorithm, denoted as DetMCD, which does not use random subsets and is even faster. It computes a small number of deterministic initial estimators, followed by concentration steps. DetMCD is permutation invariant and very close to affine equivariant.
Deviance In statistics, deviance is a quality of fit statistic for a model that is often used for statistical hypothesis testing. It is a generalization of the idea of using the sum of squares of residuals in ordinary least squares to cases where model-fitting is achieved by maximum likelihood.
Deviance Information Criterion
The deviance information criterion (DIC) is a hierarchical modeling generalization of the AIC (Akaike information criterion) and BIC (Bayesian information criterion, also known as the Schwarz criterion). It is particularly useful in Bayesian model selection problems where the posterior distributions of the models have been obtained by Markov chain Monte Carlo (MCMC) simulation. Like AIC and BIC it is an asymptotic approximation as the sample size becomes large. It is only valid when the posterior distribution is approximately multivariate normal. The idea is that models with smaller DIC should be preferred to models with larger DIC.
Diceware Diceware is a method for creating passphrases, passwords, and other cryptographic variables using an ordinary die from a pair of dice as a hardware random number generator. For each word in the passphrase, five rolls of the dice are required. The numbers from 1 to 6 that come up in the rolls are assembled as a five digit number, e.g. 43146. That number is then used to look up a word in a word list. In the English list 43146 corresponds to munch. Lists have been compiled for several languages, including English, Finnish, German, Italian, Polish, Romanian, Russian, Spanish and Swedish. A Diceware word list is any list of 6^5 = 7,776 unique words, preferably ones the user will find easy to spell and to remember. The contents of the word list do not have to be protected or concealed in any way, as the security of a Diceware passphrase is in the number of words selected, and the number of words each selected word could be taken from. The level of unpredictability of a Diceware passphrase can be easily calculated: each word adds 12.9 bits of entropy to the passphrase (that is, \log_2( 6^5 ) bits). Originally, in 1995, Diceware creator Arnold Reinhold considered five words (64 bits) the minimum length needed by average users. However, starting in 2014, Reinhold recommends that at least six words (77 bits) should be used. This level of unpredictability assumes that a potential attacker knows both that Diceware has been used to generate the passphrase, the particular word list used, and exactly how many words make up the passphrase. If the attacker has less information, the entropy can be greater than 12.9 bits per word. If words were simply concatenated rather than separated by spaces, concatenating could form words that are already in the word list. For example, ‘in’ and ‘put’ form ‘input’; all three words can be found in the above mentioned word list. This could slightly decrease the entropy, when compared with the recommended method of using spaces to separate each word in the list.
dIconomy “dIconomy” = “Digital Economy”
Dictionary Learning
Dictionary Learning Algorithms for Sparse Representation
Dictionary Learning
Dictionary Learning – Separating the Particularity and the Commonality
Empirically, we find that, despite the class-specific features owned by the objects appearing in the images, the objects from different categories usually share some common patterns, which do not contribute to the discrimination of them. Concentrating on this observation and under the general dictionary learning (DL) framework, we propose a novel method to explicitly learn a common pattern pool (the commonality) and class-specific dictionaries (the particularity) for classification. We call our method DL-COPAR, which can learn the most compact and most discriminative class-specific dictionaries used for classification. The proposed DL-COPAR is extensively evaluated both on synthetic data and on benchmark image databases in comparison with existing DL-based classification methods. The experimental results demonstrate that DL-COPAR achieves very promising performances in various applications, such as face recognition, handwritten digit recognition, scene classification and object recognition.
Difference in differences (sometimes ‘Difference-in-Differences’, ‘DID’, or ‘DD’) is a statistical technique used in econometrics and quantitative sociology, which attempts to mimic an experimental research design using observational study data. It calculates the effect of a treatment (i.e., an explanatory variable or an independent variable) on an outcome (i.e., a response variable or dependent variable) by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. This method may be subject to certain biases (mean reversion bias, etc.), although it is intended to eliminate some of the effect of selection bias. In contrast to a within-subjects estimate of the treatment effect (which measures differences over time) or a between-subjects estimate of the treatment effect (which measures the difference between the treatment and control groups), the DID measures the difference in the differences between the treatment and control group over time.
Differentiable Lasso
Differential Evolution
In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the problem being optimized and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found. DE is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means DE does not require for the optimization problem to be differentiable as is required by classic optimization methods such as gradient descent and quasi-newton methods. DE can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. DE optimizes a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to its simple formulae, and then keeping whichever candidate solution has the best score or fitness on the optimization problem at hand. In this way the optimization problem is treated as a black box that merely provides a measure of quality given a candidate solution and the gradient is therefore not needed. DE is originally due to Storn and Price. Books have been published on theoretical and practical aspects of using DE in parallel computing, multiobjective optimization, constrained optimization, and the books also contain surveys of application areas.
Differential Item Functioning
Differential item functioning (DIF), also referred to as measurement bias, occurs when people from different groups (commonly gender or ethnicity) with the same latent trait (ability/skill) have a different probability of giving a certain response on a questionnaire or test. DIF analysis provides an indication of unexpected behavior of items on a test. An item does not display DIF if people from different groups have a different probability to give a certain response; it displays DIF if and only if people from different groups with the same underlying true ability have a different probability of giving a certain response. Common procedures for assessing DIF are Mantel-Haenszel, item response theory (IRT) based methods, and logistic regression.
Differential Privacy In cryptography, differential privacy aims to provide means to maximize the accuracy of queries from statistical databases while minimizing the chances of identifying its records.
DiffSharp DiffSharp is an algorithmic differentiation or automatic differentiation (AD) library for the .NET ecosystem, which is targeted by the C# and F# languages, among others. The library has been designed with machine learning applications in mind, allowing very succinct implementations of models and optimization routines. DiffSharp is implemented in F# and exposes forward and reverse AD operators as general nestable higher-order functions, usable by any .NET language. It provides high-performance linear algebra primitives—scalars, vectors, and matrices, with a generalization to tensors underway—that are fully supported by all the AD operators, and which use a BLAS/LAPACK backend via the highly optimized OpenBLAS library. DiffSharp currently uses operator overloading, but we are developing a transformation-based version of the library using F#’s ‘code quotation’ metaprogramming facility. Work on a CUDA-based GPU backend is also underway.
Diffusion Map Diffusion maps is a machine learning algorithm introduced by R. R. Coifman and S. Lafon. It computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the “diffusion distance” between probability distributions centered at those points. Different from other dimensionality reduction methods such as principal component analysis (PCA) and multi-dimensional scaling (MDS), diffusion maps is a non-linear method that focuses on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps gives a global description of the data-set. Compared with other methods, the diffusion maps algorithm is robust to noise perturbation and is computationally inexpensive.
Digital Analytics Digital analytics is the analysis of qualitative and quantitative data from your business and the competition to drive a continual improvement of the online experience that your customers and potential customers have which translates to your desired outcomes (both online and offline). One of the most important steps of digital analytics is determining what your ultimate business objectives or outcomes are and how you expect to measure those outcomes. In the online world, there are five common business objectives:
• For ecommerce sites, an obvious objective is selling products or services.
• For lead generation sites, the goal is to collect user information for sales teams to connect with potential leads.
• For content publishers, the goal is to encourage engagement and frequent visitation.
• For online informational or support sites, helping users find the information they need at the right time is of primary importance.
• For branding, the main objective is to drive awareness, engagement and loyalty.
There are key actions on any website or mobile application that tie back to a business’ objectives. The actions can indicate an objective, like a purchase on an ecommerce site, has been fully met. These are ‘macro’ conversions. Some of the actions on a site might also be behavioral indicators that a customer hasn’t fully reached your main objectives but is coming closer, like, in the ecommerce example, signing up to receive an email coupon or a new product notification. These are ‘micro’ conversions. It’s important to measure both micro and macro conversions so that you are equipped with more behavioral data to understand what experiences help drive the right outcomes for your site.
Digital Analytics Association
The Digital Analytics Association makes analytics professionals more effective and valuable through professional development and community.
Digital Asset Management
Digital asset management (DAM) consists of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets. Digital photographs, animations, videos and music exemplify the target areas of media asset management (a sub-category of DAM). Digital asset management systems (DAMS) include computer software and hardware systems that aid in the process of digital asset management. The term “digital asset management” (DAM) also refers to the protocol for downloading, renaming, backing up, rating, grouping, archiving, optimizing, maintaining, thinning, and exporting files. The “media asset management” (MAM) sub-category of digital asset management mainly addresses audio, video and other media content. The more recent concept of enterprise content management (ECM) often deals with solutions which address similar features but in a wider range of industries or applications.
Digital Hoarding Digital hoarding (also known as e-hoarding) is excessive acquisition and reluctance to delete electronic material no longer valuable to the user. The behavior includes the mass storage of digital artifacts and the retainment of unnecessary or irrelevant electronic data. The term is increasingly common in pop culture, used to describe the habitual characteristics of compulsive hoarding, but in cyberspace. As with physical space in which excess items are described as ‘clutter’ or ‘junk,’ excess digital media is often referred to as ‘digital clutter.’
Digital Native
The term Digital Native was coined and popularized by education consultant, Marc Prensky in his 2001 article entitled Digital Natives, Digital Immigrants, in which he relates the contemporaneous decline in American education to educators’ failure to understand the needs of modern students. His article posited that ‘the arrival and rapid dissemination of digital technology in the last decade of the 20th century’ had fundamentally changed the way students think and process information, making it impossible for them to excel academically using the outdated teaching methods of the day. In other words, children raised in the post-digital, media saturated world, require a media-rich learning environment to hold their attention. Contextually, his ideas were introduced after a decade of worry over increased diagnosis of children with ADD and ADHD, which itself turned out to be largely overblown. Prensky did not strictly define the Digital Native in his 2001 article, but it was later, somewhat arbitrarily, applied to children born after 1980, due to the fact that computer bulletin board systems, and Usenet were already in use at the time. The idea became popular among educators and parents, whose children fell within Prensky’s definition of a Digital Native, and has since been embraced as an effective marketing tool.
Dijkstra Algorithm Dijkstra’s algorithm, conceived by computer scientist Edsger Dijkstra in 1956 and published in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with non-negative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
Dimensionality Reduction In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.
DimmWitted A storage abstraction that captures the access patterns of popular statistical analytics tasks and a prototype called DimmWitted.
dimple dimple is a simple-to-use charting API powered by D3.js. The aim of dimple is to open up the power and flexibility of d3 to analysts. It aims to give a gentle learning curve and minimal code to achieve something productive. It also exposes the d3 objects so you can pick them up and run to create some really cool stuff.
DiNoDB As data sets grow in size, analytics applications struggle to get instant insight into large datasets. Modern applications involve heavy batch processing jobs over large volumes of data and at the same time require efficient ad-hoc interactive analytics on temporary data. Existing solutions, however, typically focus on one of these two aspects, largely ignoring the need for synergy between the two. Consequently, interactive queries need to re-iterate costly passes through the entire dataset (e.g., data loading) that may provide meaningful return on investment only when data is queried over a long period of time. In this paper, we propose DiNoDB, an interactive-speed query engine for ad-hoc queries on temporary data. DiNoDB avoids the expensive loading and transformation phase that characterizes both traditional RDBMSs and current interactive analytics solutions. It is tailored to modern workflows found in machine learning and data exploration use cases, which often involve iterations of cycles of batch and interactive analytics on data that is typically useful for a narrow processing window. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata that DiNoDB exploits to expedite the interactive queries. Our experimental analysis demonstrates that DiNoDB achieves very good performance for a wide range of ad-hoc queries compared to alternatives %such as Hive, Stado, SparkSQL and Impala.
Directed Acyclic Graph
In mathematics and computer science, a directed acyclic graph (DAG), is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again.
Directed Graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph, or set of nodes connected by edges, where the edges have a direction associated with them. In formal terms, a digraph is a pair G=(V,A) (sometimes G=(V,E)) of:
• a set V, whose elements are called vertices or nodes,
• a set A of ordered pairs of vertices, called arcs, directed edges, or arrows (and sometimes simply edges with the corresponding set named E instead of A).
It differs from an ordinary or undirected graph, in that the latter is defined in terms of unordered pairs of vertices, which are usually called edges. A digraph is called ‘simple’ if it has no loops, and no multiple arcs (arcs with same starting and ending nodes). A directed multigraph, in which the arcs constitute a multiset, rather than a set, of ordered pairs of vertices may have loops (that is, ‘self-loops’ with same starting and ending node) and multiple arcs. Some, but not all, texts allow a digraph, without the qualification simple, to have self loops, multiple arcs, or both.
Directional Statistics Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rn), axes (lines through the origin in Rn) or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds. The fact that 0 degrees and 360 degrees are identical angles, so that for example 180 degrees is not a sensible mean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving temporal periods (e.g. time of day, week, month, year, etc.), compass directions, dihedral angles in molecules, orientations, rotations and so on.
Dirichlet Distribution When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous. For example if there are 2 possible user-generated ratings (like and dislike), then each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents. The Dirichlet distribution is one of the basic probability distributions for describing this type of data.
Dirichlet Lasso
Selection of the most important predictor variables in regression analysis is one of the key problems statistical research has been concerned with for long time. In this article, we propose the methodology, Dirichlet Lasso (abbreviated as DLASSO) to address this issue in a Bayesian framework. In many modern regression settings, large set of predictor variables are grouped and the coefficients belonging to any one of these groups are either all redundant or all important in predicting the response; we say in those cases that the predictors exhibit a group structure. We show that DLASSO is particularly useful where the group structure is not fully known. We exploit the clustering property of Dirichlet Process priors to infer the possibly missing group information. The Dirichlet Process has the advantage of simultaneously clustering the variable coefficients and selecting the best set of predictor variables. We compare the predictive performance of DLASSO to Group Lasso and ordinary Lasso with real data and simulation studies. Our results demonstrate that the predictive performance of DLASSO is almost as good as that of Group Lasso when group label information is given; and superior to the ordinary Lasso for missing group information. For high dimensional data (e.g., genetic data) with missing group information, DLASSO will be a powerful approach of variable selection since it provides a superior predictive performance and higher statistical accuracy.
Dirichlet Process
In probability theory, a Dirichlet process is a way of assigning a probability distribution over probability distributions. That is, a Dirichlet process is a probability distribution whose domain is itself a set of probability distributions. The probability distributions in the domain are almost surely discrete and may be infinite dimensional. Assigning an arbitrary probability distribution over a domain of infinite dimensional probability distributions would require an infinite amount of computational resources. The main function of the Dirichlet process is that it allows the specification of a distribution over infinite dimensional distributions in a way that uses only finite resources.
Dirichlet Process Mixture Model
The Dirichlet process is a family of non-parametric Bayesian models which are commonly used for density estimation, semi-parametric modelling and model selection/averaging. The Dirichlet processes are non-parametric in a sense that they have infinite number of parameters. Since they are treated in a Bayesian approach we are able to construct large models with infinite parameters which we integrate out to avoid overfitting.
Discounted Cumulative Gain
Discounted cumulative gain (DCG) is a measure of ranking quality. In information retrieval, it is often used to measure effectiveness of web search engine algorithms or related applications. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated from the top of the result list to the bottom with the gain of each result discounted at lower ranks.
Discourse Analysis
Discourse analysis (DA), or discourse studies, is a general term for a number of approaches to analyzing written, vocal, or sign language use or any significant semiotic event. The objects of discourse analysis – discourse, writing, conversation, communicative event – are variously defined in terms of coherent sequences of sentences, propositions, speech, or turns-at-talk. Contrary to much of traditional linguistics, discourse analysts not only study language use ‘beyond the sentence boundary’, but also prefer to analyze ‘naturally occurring’ language use, and not invented examples. Text linguistics is related. The essential difference between discourse analysis and text linguistics is that it aims at revealing socio-psychological characteristics of a person/persons rather than text structure. Discourse analysis has been taken up in a variety of social science disciplines, including linguistics, education, sociology, anthropology, social work, cognitive psychology, social psychology, area studies, cultural studies, international relations, human geography, communication studies, and translation studies, each of which is subject to its own assumptions, dimensions of analysis, and methodologies.
Discover, Access, Distill
DAD is comprised of:
• Discover: Find, identify the sources of good data, and the metrics. Sometimes request the data to be created (work with data engineers and business analysts)
• Access: Access the data. Sometimes via an API, a web crawler, an Internet download, a database access or sometimes in-memory within a database.
• Distill: Extract essence from data, the stuff that leads to decisions, increased ROI, and actions (such as determining optimum bid prices in an automated bidding system). It involves
• Exploring the data (creating a data dictionary and exploratory analysis)
• Cleaning (removing impurities)
• Refining (data summarization, sometimes multiple layers of summarization or hierarchical summarization)- Analyzing: statistical analyses (sometimes including stuff like experimental design that can take place even before the Access stage), both automated and manual. Might or might not require statistical modeling
• Presenting results or integrating results in some automated process
Discrete Dantzig Selector We propose a new high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients, subject to a budget on the maximal absolute correlation between the features and the residuals. We show that the estimator can be expressed as a solution to a Mixed Integer Linear Optimization (MILO) problem—a computationally tractable framework that enables the computation of provably optimal global solutions. Our approach has the appealing characteristic that even if we terminate the optimization problem at an early stage, it exits with a certificate of sub-optimality on the quality of the solution. We develop new discrete first order methods, motivated by recent algorithmic developments in first order continuous convex optimization, to obtain high quality feasible solutions for the Discrete Dantzig Selector problem. Our proposal leads to advantages over the off-the-shelf state-of-the-art integer programming algorithms, which include superior upper bounds obtained for a given computational budget. When a solution obtained from the discrete first order methods is passed as a warm-start to a MILO solver, the performance of the latter improves significantly. Exploiting problem specific information, we propose enhanced MILO formulations that further improve the algorithmic performance of the MILO solvers. We demonstrate, both theoretically and empirically, that, in a wide range of regimes, the statistical properties of the Discrete Dantzig Selector are superior to those of popular $\ell_{1}$-based approaches. For problem instances with $p \approx 2500$ features and $n \approx 900$ observations, our computational framework delivers optimal solutions in a few minutes and certifies optimality within an hour.
Discrete Event Simulation
In the field of simulation, a discrete-event simulation (DES), models the operation of a system as a discrete sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in the system is assumed to occur; thus the simulation can directly jump in time from one event to the next. This contrasts with continuous simulation in which the simulation continuously tracks the system dynamics over time. Instead of being event-based, this is called an activity-based simulation; time is broken up into small time slices and the system state is updated according to the set of activities happening in the time slice. Because discrete-event simulations do not have to simulate every time slice, they can typically run much faster than the corresponding continuous simulation. Another alternative to event-based simulation is process-based simulation. In this approach, each activity in a system corresponds to a separate process, where a process is typically simulated by a thread in the simulation program. In this case, the discrete events, which are generated by threads, would cause other threads to sleep, wake, and update the system state. A more recent method is the three-phased approach to discrete event simulation (Pidd, 1998). In this approach, the first phase is to jump to the next chronological event. The second phase is to execute all events that unconditionally occur at that time (these are called B-events). The third phase is to execute all events that conditionally occur at that time (these are called C-events). The three phase approach is a refinement of the event-based approach in which simultaneous events are ordered so as to make the most efficient use of computer resources. The three-phase approach is used by a number of commercial simulation software packages, but from the user’s point of view, the specifics of the underlying simulation method are generally hidden.
Discrete Morse Theory Discrete Morse theory is a tool for determining equivalences between topological spaces arising from discrete mathematical structures. This theory was developed by Robin Forman in the 1990s as a combinatorial analog to Morse theory, developed by Marston Morse in the 1920s. The original theory deals with analyzing such equivalences for general topological spaces, while discrete Morse theory provides similar methods of analysis for topological spaces endowed with additional, discrete structure. For these structures, applications of the discrete theory are often more natural, as well as simpler and more straightforward to apply. Discrete Morse theory has applications throughout many fields of pure and applied mathematics. Within pure mathematics, for example, the theory has been widely applied to problems in geometry, topology, and knot theory; and within computer science, the theory has been used to evaluate data compression algorithms and to bound the complexity of algorithms that determine whether graphs have certain properties – for example, whether all components of a graph are connected. If we wish to know whether a given property holds for a certain topological space, our question can often be reduced to the question of whether the space is equivalent to another space for which the property holds. For example, whether a simple algorithm exists for determining if a graph is connected depends on whether the structure that represents the space of not-connected graphs can be shrunken to a point. Alas, it cannot, so any algorithm for testing graph connectedness must, at least in some cases, conduct an exhaustive search. This result has real-world implications: for example, it means that if we want to test a communications system – say, immediately after a disaster – to determine whether it is still connected, there is no guaranteed way of finding the answer without testing every component individually.
Discrete Sparklines
Discriminant Analysis Discriminant analysis is used to distinguish distinct sets of observations and allocate new observations to previously defined groups. This method is commonly used in biological species classification, in medical classification of tumors, in facial recognition technologies, and in the credit card and insurance industries for determining risk.
Discriminant Function Analysis Discriminant function analysis is a statistical analysis to predict a categorical dependent variable (called a grouping variable) by one or more continuous or binary independent variables (called predictor variables). The original dichotomous discriminant analysis was developed by Sir Ronald Fisher in 1936. It is different from an ANOVA or MANOVA, which is used to predict one (ANOVA) or multiple (MANOVA) continuous dependent variables by one or more independent categorical variables. Discriminant function analysis is useful in determining whether a set of variables is effective in predicting category membership.
Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure. In simple terms, discriminant function analysis is classification – the act of distributing things into groups, classes or categories of the same type.
Moreover, it is a useful follow-up procedure to a MANOVA instead of doing a series of one-way ANOVAs, for ascertaining how the groups differ on the composite of dependent variables. In this case, a significant F test allows classification based on a linear combination of predictor variables. Terminology can get confusing here, as in MANOVA, the dependent variables are the predictor variables, and the independent variables are the grouping variables.
Discriminative Model Discriminative models, also called conditional models, are a class of models used in machine learning for modeling the dependence of an unobserved variable y on an observed variable x. Within a probabilistic framework, this is done by modeling the conditional probability distribution P(y|x), which can be used for predicting y from x. Discriminative models, as opposed to generative models, do not allow one to generate samples from the joint distribution of x and y. However, for tasks such as classification and regression that do not require the joint distribution, discriminative models can yield superior performance. On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently supervised and cannot easily be extended to unsupervised learning. Application specific details ultimately dictate the suitability of selecting a discriminative versus generative model.
DISsimilarity COefficient Networks
(DISCO Nets)
We present a new type of probabilistic model which we call DISsimilarity COefficient Networks (DISCO Nets). DISCO Nets allow us to efficiently sample from a posterior distribution parametrised by a neural network. During training, DISCO Nets are learned by minimising the dissimilarity coefficient between the true distribution and the estimated distribution. This allows us to tailor the training to the loss related to the task at hand. We empirically show that (i) by modeling uncertainty on the output value, DISCO Nets outperform equivalent non-probabilistic predictive networks and (ii) DISCO Nets accurately model the uncertainty of the output, outperforming existing probabilistic models based on deep neural networks.
Dissimilarity Measure If features are given or can be defined they can be used to define a distance measure between objects. This can be understood as the euclidean distance in a properly scaled feature space. For good features this will also result in good dissimilarities. However, as long a dissimilarities are based on features their performance will be determined by the quality of these. As features do not describe the full objects it is possible that two objects that are different have a zero distance based on the available features. This is an essential cause of class overlap in feature spaces. Dissimilarities offer the possibility to overcome this. If the dissimilarity measure is defined in such a way that objects have a zero distance to itself or to entirely identical copies of themselves (which thereby should belong to the same class) there is no class overlap.
Distance Based on Conditional Ordered List
Distance Correlation In statistics and in probability theory, distance correlation is a measure of statistical dependence between two random variables or two random vectors of arbitrary, not necessarily equal dimension. An important property is that this measure of dependence is zero if and only if the random variables are statistically independent. This measure is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation and distance covariance. These take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.
These distance-based measures can be put into an indirect relationship to the ordinary moments by an alternative formulation (described below) using ideas related to Brownian motion, and this has led to the use of names such as Brownian covariance and Brownian distance covariance.
Distance Metric Learning
Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empirically and theoretically, that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks.
Distance Preservation to Local Mean
In this paper, we propose a nonlinear dimensionality reduction algorithm for the manifold of Symmetric Positive Definite (SPD) matrices that considers the geometry of SPD matrices and provides a low dimensional representation of the manifold with high class discrimination. The proposed algorithm, tries to preserve the local structure of the data by preserving distance to local mean (DPLM) and also provides an implicit projection matrix. DPLM is linear in terms of the number of training samples and may use the label information when they are available in order to performance improvement in classification tasks. We performed several experiments on the multi-class dataset IIa from BCI competition IV. The results show that our approach as dimensionality reduction technique – leads to superior results in comparison with other competitor in the related literature because of its robustness against outliers. The experiments confirm that the combination of DPLM with FGMDM as the classifier leads to the state of the art performance on this dataset.
Distance Weighted Discrimination
High Dimension Low Sample Size statistical analysis is becoming increasingly important in a wide range of applied contexts. In such situations, it is seen that the appealing discrimination method called the Support Vector Machine can be improved. The revealing concept is ‘data piling’ at the margin. This leads naturally to the development of ‘Distance Weighted Discrimination’, which also is based on modern computationally intensive optimization methods, and seems to give improved ‘generalizability’.
Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS
Distributed Computing Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs. There are many alternatives for the message passing mechanism, including RPC-like connectors and message queues. A goal and challenge pursued by some computer scientists and practitioners in distributed systems is location transparency; however, this goal has fallen out of favour in industry, as distributed systems are different from conventional non-distributed systems, and the differences, such as network partitions, partial system failures, and partial upgrades, cannot simply be ‘papered over’ by attempts at ‘transparency’ – see CAP theorem. Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers, which communicate with each other by message passing.
Distributed Dynamic Data-intensive Science
(D3 Science)
A common feature across many science and engineering applications is the amount and diversity of data and computation that must be integrated to yield insights. Data sets are growing larger and becoming distributed; and their location, availability and properties are often time-dependent. Collectively, these characteristics give rise to dynamic distributed data-intensive applications. While ‘static’ data applications have received significant attention, the characteristics, requirements, and software systems for the analysis of large volumes of dynamic, distributed data, and data-intensive applications have received relatively less attention. This paper surveys several representative dynamic distributed data-intensive application scenarios, provides a common conceptual framework to understand them, and examines the infrastructure used in support of applications.
Distributed Lag Model A distributed-lag model is a dynamic model in which the effect of a regressor x on y occurs over time rather than all at once. In the simple case of one explanatory variable and a linear relationship. This form is very similar to the infinite-moving-average representation of an ARMA process, except that the lag polynomial on the right-hand side is applied to the explanatory variable x rather than to a white-noise process e. The individual coefficients ßs are called lag weights and the collectively comprise the lag distribution. They define the pattern of how x affects y over time.
Distributed Machine Learning Toolkit
Distributed machine learning has become more important than ever in this big data era. Especially in recent years, practices have demonstrated the trend that bigger models tend to generate better accuracies in various applications. However, it remains a challenge for common machine learning researchers and practitioners to learn big models, because the task usually requires a large number of computation resources. In order to enable the training of big models using just a modest cluster and in an efficient manner, we release the Microsoft Distributed Machine Learning Toolkit (DMTK), which contains both algorithmic and system innovations. These innovations make machine learning tasks on big data highly scalable, efficient and flexible.
Distributed Matrix A distributed matrix has long-typed row and column indices and double-typed values, stored distributively in one or more RDDs. It is very important to choose the right format to store large and distributed matrices. Converting a distributed matrix to a different format may require a global shuffle, which is quite expensive. Three types of distributed matrices have been implemented so far. The basic type is called RowMatrix. A RowMatrix is a row-oriented distributed matrix without meaningful row indices, e.g., a collection of feature vectors. It is backed by an RDD of its rows, where each row is a local vector. We assume that the number of columns is not huge for a RowMatrix so that a single local vector can be reasonably communicated to the driver and can also be stored / operated on using a single node. An IndexedRowMatrix is similar to a RowMatrix but with row indices, which can be used for identifying rows and executing joins. A CoordinateMatrix is a distributed matrix stored in coordinate list (COO) format, backed by an RDD of its entries.
Distributed Robust Algorithm for Count-based Learning
Distributed Stream Processing Engines
Distributed stream processing engines (DSPEs) are a new emergent family of MapReduce inspired technologies that address this issue. These engines allow to express parallel computation on streams, and combine the scalability of distributed processing with the efficiency of streaming algorithms. Examples of these engines include Storm, S4, and Samza.
Distribution Separation Method
Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples
DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) builds an ensemble of J48 trees by recursively adding artificial samples of the training data (‘Melville, P., & Mooney, R. J. (2005). Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 99-111. doi:10.1016/j.inffus.2004.04.001’).
Diversity Index A diversity index is a quantitative measure that reflects how many different types (such as species) there are in a dataset, and simultaneously takes into account how evenly the basic entities (such as individuals) are distributed among those types. The value of a diversity index increases both when the number of types increases and when evenness increases. For a given number of types, the value of a diversity index is maximized when all types are equally abundant.
Dixon’s Q Test In statistics, Dixon’s Q test, or simply the Q test, is used for identification and rejection of outliers. This assumes normal distribution and per Dean and Dixon, and others, this test should be used sparingly and never more than once in a data set. To apply a Q test for bad data, arrange the data in order of increasing values and calculate Q as defined:
Q = gap/range, Where gap is the absolute difference between the outlier in question and the closest number to it. If Q > Qtable, where Qtable is a reference value corresponding to the sample size and confidence level, then reject the questionable point. Note that only one point may be rejected from a data set using a Q test.
Django Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
DKPro Similarity DKPro Similarity is an open source software package for developing text similarity algorithms. The framework is designed to complement DKPro Core, a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. By leveraging the power of the tools available in DKPro Core, it allows for a rich set of similarity computation operations, including the design of full-fledged language processing pipelines and fully customizable processing steps.


Dlib Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. Dlib’s open source licensing allows you to use it in any application, free of charge.
DLVHEX System The DLVHEX system implements the HEX-semantics, which integrates answer set programming (ASP) with arbitrary external sources. Since its first release ten years ago, significant advancements were achieved. Most importantly, the exploitation of properties of external sources led to efficiency improvements and flexibility enhancements of the language, and technical improvements on the system side increased user’s convenience. In this paper, we present the current status of the system and point out the most important recent enhancements over early versions. While existing literature focuses on theoretical aspects and specific components, a bird’s eye view of the overall system is missing. In order to promote the system for real-world applications, we further present applications which were already successfully realized on top of DLVHEX. This paper is under consideration for acceptance in Theory and Practice of Logic Programming.
Docker Build, Ship and RunAny App, Anywhere. Docker – An open platform for distributed applications for developers and sysadmins.
Docker is a relatively new open source application and service, which is seeing interest across a number of areas. It uses recent Linux kernel features (containers, namespaces) to shield processes. While its use (superficially) resembles that of virtual machines, it is much more lightweight as it operates at the level of a single process (rather than an emulation of an entire OS layer). This also allows it to start almost instantly, require very little resources and hence permits an order of magnitude more deployments per host than a virtual machine.
Docker offers a standard interface to creation, distribution and deployment. The shipping container analogy is apt: just how shipping containers (via their standard size and “interface”) allow global trade to prosper, Docker is aiming for nothing less for deployment. A Dockerfile provides a concise, extensible, and executable description of the computational environment. Docker software then builds a Docker image from the Dockerfile. Docker images are analogous to virtual machine images, but smaller and built in discrete, extensible and reuseable layers. Images can be distributed and run on any machine that has Docker software installed—including Windows, OS X and of course Linux. Running instances are called Docker containers. A single machine can run hundreds of such containers, including multiple containers running the same image.
Document Classification Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done ‘manually’ (or ‘intellectually’) or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied. Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach.
Document Term Matrix “Term Document Matrix”
Document-Context Language Model
Text documents are structured on multiple levels of detail: individual words are related by syntax, and larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but it is crucial if we are to have language models that reward coherence and generate coherent texts. We present and empirically evaluate a set of multi-level recurrent neural network language models, called Document-Context Language Models (DCLMs), which incorporate contextual information both within and beyond the sentence. In comparison with word-level recurrent neural network language models, the DCLMs obtain slightly better predictive likelihoods, and considerably better assessments of document coherence.
Dodgson Score Dodgson’s method is a voting system proposed by the author, mathematician and logician Charles Dodgson, better known as Lewis Carroll. The method is to extend the Condorcet method by swapping candidates until a Condorcet winner is found. The winner is the candidate which requires the minimum number of swaps. Dodgson proposed this voting scheme in his 1876 work ‘A method of taking votes on more than two issues’. Given an integer k and an election, it is NP-complete to determine whether or not a candidate can become a Condorcet winner with fewer than k swaps. In Dodgson’s method, each voter submits an ordered list of all candidates according to their own preference (from best to worst). The winner is defined to be the candidate for whom we need to perform the minimum number of pairwise swaps (added over all candidates) before they become a Condorcet winner. In particular, if there is already a Condorcet winner, they win the election. In short, we must find the voting profile with minimum Kendall tau distance from the input, such that it has a Condorcet winner; they are declared the victor. Computing the winner or even the Dodgson score of a candidate (the number of swaps needed to make him a winner) is a PNP||-complete problem.
Efficient Dodgson-Score Calculation Using Heuristics and Parallel Computing
Domain Generating Algorithm
Domain generation algorithm (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their controllers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. By using public-key cryptography, it is unfeasible for law enforcement and other actors to mimic commands from the malware controllers as some worms will automatically reject any updates not signed by the malware controllers.
Dpush Herein this paper is presented a novel invention – called Dpush – that enables truly scalable spam resistant uncensorable automatically encrypted and inherently authenticated messaging; thus restoring our ability to exert our right to private communication, and thus a step forward in restoring an uncorrupted democracy. Using a novel combination of a distributed hash table (DHT) and a proof of work (POW), combined in a way that can only be called a synergy, the emergent property of a scalable and spam resistant unsolicited messaging protocol elegantly emerges. Notable is that the receiver does not need to be online at the time the message is sent. This invention is already implemented and operating within the package that is called MORPHiS – which is a Sybil resistant enhanced Kademlia DHT implementation combined with an already functioning implementation of Dpush, as well as a polished HTTP Dmail interface to send and receive such messages today.
Dropout Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different ‘thinned’ networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
DSDP The DSDP software is a free open source implementation of an interior-point method for semidefinite programming. It provides primal and dual solutions, exploits low-rank structure and sparsity in the data, and has relatively low memory requirements for an interior-point method. It allows feasible and infeasible starting points and provides approximate certificates of infeasibility when no feasible solution exists. The dual-scaling algorithm implemented in this package has a convergence proof and worst-case polynomial complexity under mild assumptions on the data. The software can be used as a set of subroutines, through Matlab, or by reading and writing to data files. Furthermore, the solver offers scalable parallel performance for large problems and a well documented interface. Some of the most popular applications of semidefinite programming and linear matrix inequalities (LMI) are model control, truss topology design, and semidefinite relaxations of combinatorial and global optimization problems.
Dual Lasso Selector We consider the problem of model selection and estimation in sparse high dimensional linear regression models with strongly correlated variables. First, we study the theoretical properties of the dual Lasso solution, and we show that joint consideration of the Lasso primal and its dual solutions are useful for selecting correlated active variables. Second, we argue that correlations among active predictors are not problematic, and we derive a new weaker condition on the design matrix, called Pseudo Irrepresentable Condition (PIC). Third, we present a new variable selection procedure, Dual Lasso Selector, and we prove that the PIC is a necessary and sufficient condition for consistent variable selection for the proposed method. Finally, by combining the dual Lasso selector further with the Ridge estimation even better prediction performance is achieved. We call the combination (DLSelect+Ridge), it can be viewed as a new combined approach for inference in high-dimensional regression models with correlated variables. We illustrate DLSelect+Ridge method and compare it with popular existing methods in terms of variable selection, prediction accuracy, estimation accuracy and computation speed by considering various simulated and real data examples.
Dual Learning for Machine Translation
While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Dual Principal Component Pursuit
We consider the problem of outlier rejection in single subspace learning. Classical approaches work directly with a low-dimensional representation of the subspace. Our approach works with a dual representation of the subspace and hence aims to find its orthogonal complement. We pose this problem as an $\ell_1$-minimization problem on the sphere and show that, under certain conditions on the distribution of the data, any global minimizer of this non-convex problem gives a vector orthogonal to the subspace. Moreover, we show that such a vector can still be found by relaxing the non-convex problem with a sequence of linear programs. Experiments on synthetic and real data show that the proposed approach, which we call Dual Principal Component Pursuit (DPCP), outperforms state-of-the art methods, especially in the case of high-dimensional subspaces.
Dunning-Kruger Effect The Dunning-Kruger effect is a cognitive bias wherein unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate. This bias is attributed to a metacognitive inability of the unskilled to recognize their ineptitude. Conversely, highly skilled individuals tend to underestimate their relative competence, erroneously assuming that tasks which are easy for them are also easy for others.
Duration Analysis “Survival Analysis”
Duration Analysis and its Applications (Finance)
Duration and Interval Hidden Markov Model
Analysis of sequential event data has been recognized as one of the essential tools in data modeling and analysis field. In this paper, after the examination of its technical requirements and issues to model complex but practical situation, we propose a new sequential data model, dubbed Duration and Interval Hidden Markov Model (DI-HMM), that efficiently represents ‘state duration’ and ‘state interval’ of data events. This has significant implications to play an important role in representing practical time-series sequential data. This eventually provides an efficient and flexible sequential data retrieval. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed DI-HMM.
Dyadic Data Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
Dygraphs dygraphs is a fast, flexible open source JavaScript charting library. It allows users to explore and interpret dense data sets.
DYNAMIC In this paper we present DYNAMIC, an open-source C++ library implementing dynamic compressed data structures for string manipulation. Our framework includes useful tools such as searchable partial sums, succinct/gap-encoded bitvectors, and entropy/run-length compressed strings and FM-indexes. We prove close-to-optimal theoretical bounds for the resources used by our structures, and show that our theoretical predictions are empirically tightly verified in practice. To conclude, we turn our attention to applications. We compare the performance of four recently-published compression algorithms implemented using DYNAMIC with those of state-of-the-art tools performing the same task. Our experiments show that algorithms making use of dynamic compressed data structures can be up to three orders of magnitude more space-efficient (albeit slower) than classical ones performing the same tasks.
Dynamic Adaptive Network Intelligence
Accurate representational learning of both the explicit and implicit relationships within data is critical to the ability of machines to perform more complex and abstract reasoning tasks. We describe the efficient weakly supervised learning of such inferences by our Dynamic Adaptive Network Intelligence (DANI) model. We report state-of-the-art results for DANI over question answering tasks in the bAbI dataset that have proved difficult for contemporary approaches to learning representation (Weston et al., 2015).
Dynamic Bayesian Network
A Dynamic Bayesian Network (DBN) is a Bayesian Network which relates variables to each other over adjacent time steps. This is often called a Two-Timeslice BN (2TBN) because it says that at any point in time T, the value of a variable can be calculated from the internal regressors and the immediate prior value (time T-1). DBNs are common in robotics, and have shown potential for a wide range of data mining applications. For example, they have been used in speech recognition, digital forensics, protein sequencing, and bioinformatics. DBN is a generalization of hidden Markov models and Kalman filters.
Dynamic Capacity Network
We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the input data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which to apply the high-capacity sub-networks. The selection is made using a novel gradient-based attention mechanism, that efficiently identifies the modules and input features that are most likely to impact the DCN’s output and to which we’d like to devote more capacity. We focus our empirical evaluation on the cluttered MNIST and SVHN image datasets. Our findings indicate that DCNs are able to drastically reduce the number of computations, compared to traditional convolutional neural networks, while maintaining similar performance.
Dynamic Clustering
Dynamic Continuous Indexing
Dynamic Decision Network
A fully observable dynamic decision network consists of:
• a set of state features, each with a domain;
• a set of possible actions forming a decision node A, with domain the set of actions;
• a two-stage belief network with an action node A, nodes F0 and F1 for each feature F (for the features at time 0 and time 1, respectively), and a conditional probability P(F1|parents(F1)) such that the parents of F1 can include A and features at times 0 and 1 as long as the resulting network is acyclic; and
• a reward function that can be a function of the action and any of the features at times 0 or 1.
Dynamic Deep Neural Networks
We introduce Dynamic Deep Neural Networks (D2NN), a new type of feed-forward deep neural network that allow selective execution. Given an input, only a subset of D2NN neurons are executed, and the particular subset is determined by the D2NN itself. By pruning unnecessary computation depending on input, D2NNs provide a way to improve computational efficiency. To achieve dynamic selective execution, a D2NN augments a regular feed-forward deep neural network (directed acyclic graph of differentiable modules) with one or more controller modules. Each controller module is a sub-network whose output is a decision that controls whether other modules can execute. A D2NN is trained end to end. Both regular modules and controller modules in a D2NN are learnable and are jointly trained to optimize both accuracy and efficiency. Such training is achieved by integrating backpropagation with reinforcement learning. With extensive experiments of various D2NN architectures on image classification tasks, we demonstrate that D2NNs are general and flexible, and can effectively optimize accuracy-efficiency trade-offs.
Dynamic Filter Network
In a traditional convolutional layer, the learned filters stay fixed after training. In contrast, we introduce a new framework, the Dynamic Filter Network, where filters are generated dynamically conditioned on an input. We show that this architecture is a powerful one, with increased flexibility thanks to its adaptive nature, yet without an excessive increase in the number of model parameters. A wide variety of filtering operations can be learned this way, including local spatial transformations, but also others like selective (de)blurring or adaptive feature extraction. Moreover, multiple such layers can be combined, e.g. in a recurrent architecture. We demonstrate the effectiveness of the dynamic filter network on the tasks of video and stereo prediction, and reach state-of-the-art performance on the moving MNIST dataset with a much smaller model. By visualizing the learned filters, we illustrate that the network has picked up flow information by only looking at unlabelled training data. This suggests that the network can be used to pretrain networks for various supervised tasks in an unsupervised way, like optical flow and depth estimation.
Dynamic Linear Model
Dynamic Linear Models (DLMs) or State Space Models define a very general class of non-stationary time series models. DLMs may include terms to model trends, seasonality, covariates and autoregressive components. Other time series models like ARMA models are particular DLMs. The main goals are short-term forecasting, intervention analysis and monitoring.
Dynamic Partition Models We present a new approach for learning compact and intuitive distributed representations with binary encoding. Rather than summing up expert votes as in products of experts, we employ for each variable the opinion of the most reliable expert. Data points are hence explained through a partitioning of the variables into expert supports. The partitions are dynamically adapted based on which experts are active. During the learning phase we adopt a smoothed version of this model that uses separate mixtures for each data dimension. In our experiments we achieve accurate reconstructions of high-dimensional data points with at most a dozen experts.
Dynamic Programming In mathematics, computer science, economics, and bioinformatics, dynamic programming is a method for solving a complex problem by breaking it down into a collection of simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems and optimal substructure (described below). When applicable, the method takes far less time than naive methods that don’t take advantage of the subproblem overlap (like depth-first search). In order to solve a given problem, using a dynamic programming approach, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often when using a more naive method, many of the subproblems are generated and solved many times. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations: once the solution to a given subproblem has been computed, it is stored or “memo-ized”: the next time the same solution is needed, it is simply looked up. This approach is especially useful when the number of repeating subproblems grows exponentially as a function of the size of the input. Dynamic programming algorithms are used for optimization (for example, finding the shortest path between two points, or the fastest way to multiply many matrices). A dynamic programming algorithm will examine the previously solved subproblems and will combine their solutions to give the best solution for the given problem. The alternatives are many, such as using a greedy algorithm, which picks the locally optimal choice at each branch in the road. The locally optimal choice may be a poor choice for the overall solution. While a greedy algorithm does not guarantee an optimal solution, it is often faster to calculate. Fortunately, some greedy algorithms (such as minimum spanning trees) are proven to lead to the optimal solution.
Dynamic Regression in the Presence of Autocorrelated Residuals
Dynamic Time Warping
In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences which may vary in time or speed. For instance, similarities in walking patterns could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data – indeed, any data which can be turned into a linear sequence can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. Also it is seen that it can be used in partial shape matching application. In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are “warped” non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it does’t guarantee the triangle inequality to hold.
Dynamic Treatment Regimens
In medical research, a dynamic treatment regime (DTR), adaptive intervention, or adaptive treatment strategy is a set of rules for choosing effective treatments for individual patients. Historically, medical research and the practice of medicine tended to rely on an acute care model for the treatment of all medical problems, including chronic illness. Treatment choices made for a particular patient under a dynamic regime are based on that individual’s characteristics and history, with the goal of optimizing his or her long-term clinical outcome. A dynamic treatment regime is analogous to a policy in the field of reinforcement learning, and analogous to a controller in control theory. While most work on dynamic treatment regimes has been done in the context of medicine, the same ideas apply to time-varying policies in other fields, such as education, marketing, and economics.
Dynamic treatment regimens (DTRs) are sequential decision rules tailored at each stage by potentially time-varying patient features and intermediate outcomes observed in previous stages. There are 3 main type methods, O-learning, Q-learning and P-learning to learn the optimal Dynamic Treatment Regimes with continuous variables.
Dynamic Variable Effort Deep Neural Networks
Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. In this work, we propose Dynamic Variable Effort Deep Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Previous efforts propose specialized hardware implementations for DNNs, statically prune the network, or compress the weights. Complementary to these approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in the inputs to DNNs to improve their compute efficiency with comparable classification accuracy. DyVEDeep equips DNNs with dynamic effort mechanisms that, in the course of processing an input, identify how critical a group of computations are to classify the input. DyVEDeep dynamically focuses its compute effort only on the critical computa- tions, while skipping or approximating the rest. We propose 3 effort knobs that operate at different levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep versions for 5 popular image recognition benchmarks – one for CIFAR-10 and four for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar operations, which translates to 1.8x-2.3x performance improvement over a Caffe-based implementation, with < 0.5% loss in accuracy.
DyNet We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet’s dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet’s speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at http://…/dynet.