Unlock instant, AI-driven research and patent intelligence for your innovation.

A string processing method and device

A processing method and string technology, applied in the computer field, can solve the problems of poor processing accuracy, poor identification and classification processing accuracy, low string similarity of accuracy, etc., and achieve the effect of improving accuracy

Active Publication Date: 2019-12-24
ADVANCED NEW TECH CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present application provides a character string processing method, which is used to solve the problem that the prior art recognizes character strings such as lists and addresses by using a character substring composed of a single character set to perform an edit distance algorithm, and obtain similar character strings with low accuracy. degree, resulting in poor accuracy of subsequent processing such as identification and classification
[0006] The embodiment of the present application also provides a character string processing device, which is used to solve the problem of using a character substring composed of a single character set to perform an edit distance algorithm in the prior art to identify character strings such as lists and addresses, and to obtain character strings with low accuracy. Similarity, resulting in poor accuracy of subsequent processing such as identification and classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A string processing method and device
  • A string processing method and device
  • A string processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 The flow of the string processing method provided in Embodiment 1 of the present application mainly refers to decomposing the string into character substrings with semantic weights, calculating the semantic edit distance between each string according to the semantic weights, and then performing similar Degree calculation can effectively improve the similarity of strings according to semantic units, and facilitate subsequent processing such as classification and recognition of strings. Including the following steps:

[0035] S101: Obtain a character string to be recognized.

[0036] The acquired character string S to be recognized includes one or more of company name, address, product name, blacklist, problem name or description input by the user.

[0037] For example, users need to enter delivery addresses on some service websites, service providers need to enter commodity names, and some users may need to set some blacklists. And all these data may have a s...

example 1

[0064] Example 1: When the obtained character string S to be recognized is "ABC Information Technology Co., Ltd."; then the character string S to be recognized is segmented to obtain each character substring S={ABC, information, technology, limited, company} , i=5; Find the target character string T from the target string database according to the character substring to be identified, assuming that one of the target character strings is found to be "XYZ Information Technology Co., Ltd."; the target character string T carries out participle processing to obtain the target character substring target character substring T={XYZ, information, technology, limited, company}, j=5; and semantic weight table W n The internal weights of the character substrings are shown in Table 1 below:

[0065] substring ABC XYZ information technology limited company Weights 0.98 0.99 0.02 0.02 0.01 0.01

[0066] Table 1

[0067] Then obtain the character substri...

example 2

[0073] Example 2: when the acquired character string S to be recognized is "ABC company"; then the character string S to be recognized is segmented to obtain each character substring S={ABC, company}, i=2; according to the character string to be recognized Recognize the character substring to find the target character string T from the target string database, assuming that one of the target character strings is found to be "ABC Information Technology Co., Ltd."; the target character string T is carried out word segmentation to obtain the target character substring Target character substring T={ABC, information, technology, limited, company}, j=5; and semantic weight table W n The internal weights of the character substrings are shown in Table 3 below:

[0074] substring ABC information technology limited company Weights 0.98 0.02 0.02 0.01 0.01

[0075] table 3

[0076] Then obtain the character substring to be recognized with semantic weigh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a string processing method which comprises the steps of acquiring a to-be-identified string; performing word segmentation on the to-be-identified string to obtain to-be-identified sub-strings; determining the semantic weights of the to-be-identified sub-strings; searching for a target string according to the to-be-identified sub-strings; performing word segmentation on the target string to obtain target sub-strings; determining the semantic weights of the target sub-strings; determining the semantic editing distance between the to-be-identified string and the target string according to the semantic weights of the to-be-identified sub-strings and the target sub-strings; determining the similarity between the to-be-identified string and the target string according to the semantic editing distance. The method determines the semantic editing distance by using sub-strings with semantic weights and thus improves the accuracy of string similarity identification greatly and solves the problem of poor accuracy of string identification in the prior art. The invention also provides a string processing device.

Description

technical field [0001] The present application relates to the field of computer technology, in particular to a string encoding processing method and device. Background technique [0002] At present, the influence of the Internet on people's daily life is increasing day by day, causing a big explosion of Internet data, and the storage and identification of various data has become an increasingly important issue. In some application scenarios, it is necessary to identify and classify addresses, blacklists, problem names, etc., which involves the problem of calculating the similarity of character strings in the huge database. [0003] In the Internet field, service provider databases will store a huge amount of commodity services and user data, including user addresses, company names, product names, etc. If such strings representing addresses and company names come directly from the information filled in by users, the For example, if the full name of a company is Shanghai XXX ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9032G06F17/27
Inventor 魏爱勇
Owner ADVANCED NEW TECH CO LTD