Method and device for calculating character string similarity and material classification method and device

A technology of similarity calculation and character strings, which is applied in calculation, electrical digital data processing, special data processing applications, etc., can solve problems such as the lack of enterprise material classification research

Active Publication Date: 2011-12-28
CHNA ENERGY INVESTMENT CORP LTD
View PDF2 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although there have been many achievements in the research of text classification and string similarity, there is no research specifically on the classification of enterprise materials in the Chinese environment.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for calculating character string similarity and material classification method and device
  • Method and device for calculating character string similarity and material classification method and device
  • Method and device for calculating character string similarity and material classification method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0027] figure 1 It is a flow chart of the calculation method for character string similarity based on dynamic weights of string prefixes and suffixes provided by the present invention. like figure 1As shown, the present invention provides a method for calculating character string similarity, which method includes: calculating character string X and character string d i The initial similarity between Sim, the string d i for belonging to a set {C 1 , C 2 ...C n} for class C j A string of , the set contains multiple categories C, n is the number of categories C, each category contains multiple strings; get string X and string d i The longest common prefix b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a character string similarity calculation method and device as well as a material classification method and device. The similarity calculation method includes: calculating the initial similarity between the character string X and the character string di; obtaining the character string X and the character string di; The longest common prefix and the longest common suffix between di; determine the weight of the longest common prefix and the weight of the longest common suffix; and calculate the similarity between the string X and the string di. Through the above technical scheme, aiming at the characteristics of Chinese material names, the present invention provides a method for calculating the similarity of Chinese character strings oriented to material classification (that is, the dynamic weight method (DynamicWeight)), which can dynamically estimate the material name characters The weight of the prefix and suffix of the string makes the material names of the same category have a higher similarity, which improves the accuracy of automatic material classification.

Description

technical field [0001] The present invention relates to the fields of similarity calculation and material classification, in particular to a string similarity calculation method and device based on string prefix and suffix dynamic weights, and a material classification method and device. Background technique [0002] At present, relatively mature text automatic classification technologies include neural network (Neural Net, NNet), support vector machine (Support Vector Machine, SVM), simple Bayesian (Naive Bayes, NB), k nearest neighbor (k nearest neighbor, k-NN ) and linear least squares fitting (Linear Least Squares Fit, LLSF), etc. To apply these methods to material classification, it is necessary to solve the problem of calculating the similarity between materials. Different from the application environment of traditional automatic text classification, the names and descriptions of materials in enterprises are often relatively short, and the text similarity calculation ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 韩建国巩军
Owner CHNA ENERGY INVESTMENT CORP LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products