Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Information pair matching method and system on small sample set

A matching method and small-sample technology, applied in the field of data matching, can solve problems such as poor scalability, limited generalization ability, and difficulty, and achieve the effects of improving matching accuracy, enhancing generalization ability, and high matching accuracy

Inactive Publication Date: 2020-02-18
鼎复数据科技(北京)有限公司
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there are certain problems in the above methods
1) Extraction based on templates: due to the diversity of language expressions, it is difficult to use fewer templates to cover all situations well, and the scalability is poor; 2) Use information to judge the relative positional relationship of each and build a special word dictionary, etc. Whether there is a match: also due to the diversity of language expressions, it is difficult to achieve a good level of accuracy and recall at the same time; 3) abstract the problem into a classification model, and train a machine learning model based on features such as distance and word vectors: usually In some cases, the feature tokens based on the syntactic path (meaning tokens in lexical analysis) are relatively sparse, so that when the model is doing feature selection, there are a large number of invalid feature samples; or the amount of data required to reach a certain order of magnitude can be significantly improved; Also due to the problem of language order, the generalization ability of this method is also limited; 4) Use the DNN method to build a classification model: thanks to the characteristics of many parameters and strong expressive ability of the neural network, the neural network will perform well on a large number of sample sets. It is better, but it is difficult to guarantee its generalization ability in the case of small sample size

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information pair matching method and system on small sample set
  • Information pair matching method and system on small sample set
  • Information pair matching method and system on small sample set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] Taking the input text "the output of steel and coal in 2017 were 50 tons and 70 tons respectively." as a sample, the construction process of the matching model is explained.

[0091] Step 100, enter text, and identify attributes and corresponding values ​​of attributes in the text.

[0092] Step 110, using the "period" as a sign of the end of a complete sentence, and performing sentence segmentation processing by identifying the "period". The first sentence reads "The production of steel and coal in 2017 is 50 tons and 70 tons respectively".

[0093] Step 120, identifying attributes by constructing an index library, and thus matching the words in the input sentence with the attributes in the index library to obtain the attributes "steel output" and "coal output" that exist in the input sentence

[0094] Step 130, since the corresponding value of the attribute is in numerical form, determine whether there is a string of compound regular expressions in the sentence, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an information pair matching method and system on a small sample set, and the method comprises the steps: inputting a text, and recognizing attributes and attribute corresponding values in the text; constructing attributes appearing in the statements and features of attribute corresponding values in pairs; constructing a training sample; training a model based on the training sample; and constructing an information pair matching model. According to the method, feature construction is carried out on the annotation set of the small sample based on the syntactic path, datamatching is carried out through a machine learning method, a model with high matching accuracy can be efficiently obtained, and structural information in the document is obtained.

Description

technical field [0001] The invention belongs to the technical field of data matching, and relates to an information pair matching method and system, in particular to an information pair matching method and system on a small sample set. Background technique [0002] A large amount of unstructured data exists in various industries. For massive data, the amount of reading is huge, and it is necessary to understand, judge, and obtain useful data based on the content of the document. Since a large amount of documents are unstructured data, and the level of thinking of the people who write the documents is different, people need to understand and view all the content in the process of obtaining information, but there are actually not many contents that need to be focused on. The time cost and labor cost are seriously wasted, and the efficiency is low. Therefore, it is necessary to extract structured data from unstructured data to form information pairs so that the data can be be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F40/289G06F40/295G06K9/62
CPCG06F18/24
Inventor 张虎陈洪亮吴雪军
Owner 鼎复数据科技(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products