Deduplication with Hadoop (Dedoop) google
Entity Matching for Big Data: Automatically matching entities (objects) and ontologies are key technologies to semantically integrate heterogeneous data. These match techniques are needed to identify equivalent data objects (duplicates) or semantically equivalent metadata elements (ontology concepts, schema attributes). The proposed techniques demand very high resources that limit their applicability to large-scale (Big Data) problems unless a powerful cloud infrastructure can be utilized. This is because the (fuzzy) match approaches basically have a quadratic complexity to compare the all elements to be matched with each other. For sufficient match quality, multiple match algorithms need to be applied and combined within so-called match workflows adding further resource requirements as well as a significant optimization problem to select matchers and configure their combination. …

Cell Suppression Problem (CSP) google
Cell suppression is one of the most frequently used techniques to prevent the disclosure of sensitive data in statistical tables. Finding the minimum cost set of nonsensitive entries to suppress, along with the sensitive ones, in order to make a table safe for publication, is a NP-hard problem, denoted the cell suppression problem (CSP). …

Communication In Focus (CIF) google
A NLU technology that is based on a novel approach that does not require the pre-definition of these terms. Rather, the design uses Context Discriminants to digest these new subjects based on prior understanding of the based language. Context Discriminant reduces complex documents into snippets of words of semantic neighbors consisting of context and points of views on subjects. Higher order derivatives are achieved by applying CD to the result produced by the prior CD. This approach enables us to refine the contexts on related subjects over distant semantic neighbors and to discover higher order dependent subjects that depict entity relationships between subjects. …

Advertisements