Unlock instant, AI-driven research and patent intelligence for your innovation.

System and method for automatically spreading reference data

A technology of reference data and entity data, applied in the field of data processing, can solve the problems of automatic expansion and update of reference data sets, etc.

Inactive Publication Date: 2008-03-05
INT BUSINESS MASCH CORP
View PDF0 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is currently no means in the art to automatically extend and update reference datasets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for automatically spreading reference data
  • System and method for automatically spreading reference data
  • System and method for automatically spreading reference data

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0050] In the example of FIG. 4 , the input sent to the entity data parsing device 241 of the extension component 141 includes three parts:

[0051] 1) Reference data seed list, including the following seeds:

[0052] {China Unicom Guangdong Branch, Jitong Network Communication Co., Ltd., China Unicom Shanghai Branch,...};

[0053] 2) Refer to the data collection specification to limit the collection of Chinese organizational named entity data

[0054] 3) Data set (ie, data resource), including the following data:

[0055] {China Unicom, China Unicom Guangdong Branch, China Unicom Beijing Branch, Shanghai Unicom, China Unicom Co., Ltd., China Unicom, Guangdong Unicom, Beijing Unicom, China Unicom, Jitong, Jitong Company, Jitong Network Communications Co., Ltd., China Resources Beijing Land, China Resources Land, China Resources Land (Beijing) Co., Ltd., ...}.

[0056] In the above input, for example, for the entity data "China Unicom Guangdong Branch", the entity data parsi...

no. 2 example

[0073] In the example of FIG. 5, the input sent to the entity data parsing device 241 of the expansion component includes three parts:

[0074] 1) Data collection (ie, data resources), including the following data:

[0075] {″ATR Media Integration And Communications Research Laboratories″, ″Aviation Communication Surveillance Systems, LLC″, ″Communication And Control Engineering Company Limited″, ″Communication equipment and contracting company, Inc., ″Comsys Communication And Signal Processing Ltd.″, Fujitsu Networks Communications, Inc,...}

[0076] 2) Reference data sample seed list, including the following seeds:

[0077]{Fujitsu Network Communications, Inc...};

[0078] 3) Refer to the data collection specification to limit the collection of English-language organizational named entity data.

[0079] In the above input, for example, for the entity data "Fujitsu Network Communications, Inc", the entity data parsing device 241 parses it to obtain the internal semantic st...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The system thereof comprises: an entity data analyzing unit coupled to the data source and used for analyzing the entity data in the data resource in order to get the internal semantic structure of each entity data, and to generate a feature set; a data extracting unit used for extracting the reference entity data according to the feature set generated by the entity analyzing unit.

Description

technical field [0001] The present invention relates to the field of data processing. More specifically, the present invention relates to systems and methods for extending reference data. Background technique [0002] Decision support analysis for data warehouses can affect major business decisions. Therefore, the precision of this analysis is very important. However, data received by a data warehouse from outside often includes errors such as spelling mistakes, errors caused by inconsistencies in contracts between data sources, and missing fields. Therefore, it takes a lot of time and expense to perform data cleaning (ie, detect and correct errors in the data). [0003] In this regard, a common technique is to compare incoming data tuples against a reference data lexicon (ie, a relational table) of tuples that are known to be correct. group is standardized. Reference data dictionaries can be the source of a large amount of vocabulary and structure in attribute values. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F17/30592G06Q10/06G06F16/283
Inventor 郭宏蕾郭志立苏中
Owner INT BUSINESS MASCH CORP