Repeated material entity recognition method based on mutually different feature vectors
A feature vector and entity recognition technology, applied in the field of entity recognition, can solve problems such as low calculation accuracy, no special vocabulary and professional vocabulary, no consideration of string semantic similarity, etc., to solve the limitations of semantic similarity, Avoid inaccurate, avoid inefficient effects
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0047] Such as figure 1 As shown, a method for identifying duplicate material entities based on different feature vectors includes the following steps:
[0048] S1. Input the material data set, and input the two-dimensional table of material data;
[0049] S2. Material data preprocessing, divide all material records into mutually independent record blocks according to the category of the material;
[0050] S3, Construct different feature vectors and category vectors, for each independent material record block, according to the feature description difference between each material record in the record block, construct the different feature vectors and material records between material records in pairs The category vector of similarity or dissimilarity between;
[0051] S4, training and testing probabilistic neural network classifier, the different feature vectors and category vectors in the same material record block are divided into training samples and test samples, and when...
Embodiment 2
[0064] On the basis of Example 1, such as figure 1 As shown, before the above S2, the data inversion algorithm is also used for the material name field in the material data two-dimensional table to perform data inversion operation on the material name field of all material records, and the inverted material name field will be in accordance with the dictionary Sort alphabetically in ascending order.
[0065] Obtain the material name field of the material data, and use the data inversion algorithm to perform data inversion operations on the material names of all material records. Taking the material record whose material name is "self-lubricating bearing" as an example, after the data inversion operation, it becomes "bearing lubricating from".
[0066] In S2, all material records are divided into mutually independent record blocks according to the category of the material by using the method of inverted index.
[0067] In said S3, the specific method of constructing the differ...
Embodiment 3
[0090] On the basis of Example 2, the experimental data for verifying the present invention comes from a professional manufacturer of automobile heat exchangers in Shaanxi Province. A total of 2,678 material records were collected in this experiment. After preliminary data preprocessing and elimination of incomplete and duplicate data, a total of 2,007 pieces of data (1,209 training samples and 798 testing samples) were finally involved in the experimental verification. The training samples and The labeling of repeated records in the test sample is completed by the material management personnel of the enterprise. In the experiment, the method of the present invention was compared with the classic algorithm based on characters - edit distance, and the classic algorithm based on vector space - VSM. The similarity threshold between material records was set to 0.7. The experimental results are shown in Table 1 Show.
[0091] Table 1 method of the present invention and prior art e...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com