An Efficient Method for Discovering Citation Relationships

A technique for discovering methods, relationships, applied in the field of discovery

Active Publication Date: 2020-08-14
北京数语科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This type of database system provides excellent performance such as large storage capacity, high availability, high scalability, and no fixed data model structure.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Efficient Method for Discovering Citation Relationships
  • An Efficient Method for Discovering Citation Relationships
  • An Efficient Method for Discovering Citation Relationships

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] Example 1: First, we generate the employees data set in MySQL. The employees data set is the official example data set of MySQL. It contains 6 tables and some primary and foreign key relationships. Then, we migrated the data to MongoDB. Based on the primary and foreign key relationship, we embed the employees table into the salaries table.

[0033] Such as figure 2 As shown, in the experiment, we migrated the data to MongoDB one by one. The data volume is 0.238G, 0.476G, 0.714G and 0.953G respectively. 0.238G is a quarter of the total data; 0.476G is half of the total data; 0.714G is three quarters of the total data; 0.953G is the total data. We test these 4 sets of data in a memory environment.

[0034] To sum up, the present invention uses data model information and data type distribution to improve the Tane algorithm, making it more efficient, more suitable for document data sets, and can be used for tasks such as document data set standardization and data cleaning. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a high-efficiency reference relationship discovery algorithm. Information input is based on datasets of a document, and information output is to discover the relationships among the datasets. Document databases MongoDB are connected, data models of the MongoDB are extracted, and then all datasets of the MongoDB are browsed to remove repeated data. Dependencies between all attributes of the datasets are analyzed to find a supper key which is an attribute set, each attribute of the document can be determined, and relationships among the datasets can be discovered according to the dependencies among the supper keys of the datasets. By means of the data model information and data type distribution, the high-efficiency reference relationship discovery algorithm has the advantages of improving Tane algorithm and rendering the Tane algorithm more high-efficiency and more suitable for document datasets, and is capable of being applied in assignments including normalization of document datasets, data cleaning and the like.

Description

Technical field [0001] The invention relates to the technical field of discovery methods, in particular to an efficient method for discovering citation relationships. Background technique [0002] Traditional relational databases, such as MySQL, Oracle and DB2, have been widely used in various scenarios for more than 30 years. These relational databases are very easy to use, easy to use, and have a structured data model and standardized SQL statements [1] [2]. They can often provide good performance when dealing with a limited amount of data, which has been proven in many scenarios. Since the 21st century, due to the widespread use of relational databases, more and more functional dependence mining algorithms have been proposed. For example Tane[3], Fun[4], FdMine[5], Dfd[6], Dep-Miner[7], FastFDs[8], Fdep[9] and other algorithms. However, these traditional functional dependency mining algorithms are mainly for relational databases. [0003] With the rapid development of the in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/28G06F16/20
CPCG06F16/20G06F16/28
Inventor 王琤贾天宇
Owner 北京数语科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products