Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Repeated material entity recognition method based on mutually different feature vectors

A feature vector and entity recognition technology, applied in the field of entity recognition, can solve problems such as low calculation accuracy, no special vocabulary and professional vocabulary, no consideration of string semantic similarity, etc., to solve the limitations of semantic similarity, Avoid inaccurate, avoid inefficient effects

Pending Publication Date: 2021-05-28
CHINA NAT HEAVY MACHINERY RES INSTCO
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that the similarity between entity names is calculated by using edit distance, which ignores the semantic similarity between entity names, resulting in low recognition accuracy.
The disadvantage of this method is that this method calculates the similarity between strings by editing distance, without considering the semantic similarity between strings, resulting in low calculation accuracy
[0010] (1) In the process of identifying duplicate material records, the string matching method based on the text similarity function only pays attention to the appearance characteristics of strings, ignoring the semantic similarity between strings;
[0011] (2) The general external knowledge base does not contain special vocabulary and professional vocabulary to describe material data, so it cannot measure the duplicate material records in the system well

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Repeated material entity recognition method based on mutually different feature vectors
  • Repeated material entity recognition method based on mutually different feature vectors
  • Repeated material entity recognition method based on mutually different feature vectors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] Such as figure 1 As shown, a method for identifying duplicate material entities based on different feature vectors includes the following steps:

[0048] S1. Input the material data set, and input the two-dimensional table of material data;

[0049] S2. Material data preprocessing, divide all material records into mutually independent record blocks according to the category of the material;

[0050] S3, Construct different feature vectors and category vectors, for each independent material record block, according to the feature description difference between each material record in the record block, construct the different feature vectors and material records between material records in pairs The category vector of similarity or dissimilarity between;

[0051] S4, training and testing probabilistic neural network classifier, the different feature vectors and category vectors in the same material record block are divided into training samples and test samples, and when...

Embodiment 2

[0064] On the basis of Example 1, such as figure 1 As shown, before the above S2, the data inversion algorithm is also used for the material name field in the material data two-dimensional table to perform data inversion operation on the material name field of all material records, and the inverted material name field will be in accordance with the dictionary Sort alphabetically in ascending order.

[0065] Obtain the material name field of the material data, and use the data inversion algorithm to perform data inversion operations on the material names of all material records. Taking the material record whose material name is "self-lubricating bearing" as an example, after the data inversion operation, it becomes "bearing lubricating from".

[0066] In S2, all material records are divided into mutually independent record blocks according to the category of the material by using the method of inverted index.

[0067] In said S3, the specific method of constructing the differ...

Embodiment 3

[0090] On the basis of Example 2, the experimental data for verifying the present invention comes from a professional manufacturer of automobile heat exchangers in Shaanxi Province. A total of 2,678 material records were collected in this experiment. After preliminary data preprocessing and elimination of incomplete and duplicate data, a total of 2,007 pieces of data (1,209 training samples and 798 testing samples) were finally involved in the experimental verification. The training samples and The labeling of repeated records in the test sample is completed by the material management personnel of the enterprise. In the experiment, the method of the present invention was compared with the classic algorithm based on characters - edit distance, and the classic algorithm based on vector space - VSM. The similarity threshold between material records was set to 0.7. The experimental results are shown in Table 1 Show.

[0091] Table 1 method of the present invention and prior art e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a repeated material entity recognition method based on mutually different feature vectors. The method comprises the following steps: S1, inputting a material data set; S2, preprocessing material data; S3, constructing the mutually different feature vectors and category vectors; s4, training and testing a probabilistic neural network classifier; S5, obtaining mutually different feature vectors recorded by the to-be-tested material; s6, inputting the mutually different feature vectors of the material records obtained in S5 into a trained probabilistic neural network classifier, if the output result of the probabilistic neural network classifier is 1, representing that the two material records have difference in semantic expression, and if the output result of the probabilistic neural network classifier is 0, representing that the two material records have difference in semantic expression; if yes, the two material records are the same in semantic expression. The difference between entity feature descriptions is considered, feature information of entities is fully utilized, and the limitation of a universal knowledge base in measuring semantic similarity between entities in different fields is solved.

Description

technical field [0001] The invention belongs to the technical field of entity recognition, and in particular relates to a method for identifying duplicate material entities based on different feature vectors. Background technique [0002] Duplicate entity recognition, also known as entity recognition, is the process of identifying which records in a database represent the same entity in the real world. The identification and detection of duplicate records is a common concern in academia and industry. It has aroused the research interest of scholars in database, information system and other related fields, and the research on this issue has achieved rich results. [0003] Most of the existing entity recognition methods use string matching methods based on text similarity functions, such as edit distance methods, vector space model methods (Vector Space Model, VSM), etc., that is, according to the character matching degree of the corresponding attributes of two records. Deter...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/047G06F18/22G06F18/2415G06F18/214Y02P90/30
Inventor 王红涛冯连强王志超丁小梅崔冬
Owner CHINA NAT HEAVY MACHINERY RES INSTCO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products