Chinese character string similarity calculation method and device based on phonetic and morphological codes

A technology of similarity calculation and Chinese characters, applied in other database retrieval, other database query and other directions, can solve the problems of reduced practicability, weakened differences, influence of similarity accuracy, etc., to achieve high conversion efficiency, accurate calculation, more comprehensive Effect

Pending Publication Date: 2020-05-29
SHANDONG UNIV
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This patented technology helps simplify phonetic shapes by representing them through specific patterns called features (folds). These pattern elements help distinguish similar parts from each others better than just their original forms like letters) while still being able to represent complex symbols without having too many unnecessary details. By comparing these two representations, we get clearer understanding about how they are related to eachother's form. Overall, our innovative solution enhances the precision and consistency of Chinese Character Characters when converted to English Hindi Shapes Code format.

Problems solved by technology

This patented technical solution describes how we want to match words like English by comparing their appearance differences beforehand instead of directly copying each word into its own dictionary. While traditional methods have limitations they provide, there was developed a new way called editing distance algorithms. These techniques allow us to identify parts of speech that look very much alike without being copied too far apart. By measuring these matches accurately, researchers could better interpret what people mean differently even if only looking through phrases alone.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese character string similarity calculation method and device based on phonetic and morphological codes
  • Chinese character string similarity calculation method and device based on phonetic and morphological codes
  • Chinese character string similarity calculation method and device based on phonetic and morphological codes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] This embodiment discloses a method for calculating the similarity of Chinese character strings based on phonetic shape codes. The phonetic shape codes include phonetic codes and shape codes. The phonetic codes are composed of digital codes of initials and vowels, and the shape codes are composed of Chinese characters. The four-corner code, the structure code and the number of strokes are composed; the mapping rules of the tone shape code are pre-stored in the database;

[0043] Such as figure 1 As shown, the method includes the following steps:

[0044] Step 1: Receive two strings A and B to be compared;

[0045] Step 2: Read the phonetic shape code mapping rule from the database, and convert each Chinese character in the two character strings into phonetic shape code representation according to the mapping rule;

[0046] Step 3: Calculate the edit distance between the corresponding substrings of the two strings based on the edit distance;

[0047] Step 4: Calculate the similar...

Embodiment 2

[0105] The purpose of this embodiment is to provide a computing device.

[0106] A computing device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, the memory prestores the mapping rule of the phonetic shape code, and the processor implements the following steps when the program is executed :

[0107] Receive two strings to be compared;

[0108] Read the phonetic code mapping rules, and according to the mapping rules, each Chinese character in the two character strings is transformed into phonetic code representation; the phonetic code includes phonetic code and shape code, wherein the phonetic code is represented by The initials and vowels are composed of digital codes, and the shape codes are composed of the four-corner code, structure code and number of strokes of Chinese characters;

[0109] Calculate the edit distance between the corresponding substrings of two strings based on the edit distance;

[0110] Calcul...

Embodiment 3

[0112] The purpose of this embodiment is to provide a computer-readable storage medium.

[0113] A computer-readable storage medium, which stores the mapping rule of the phonetic shape code and a computer program for calculating text similarity in advance, and when the program is executed by a processor, the following steps are performed:

[0114] Receive two strings to be compared;

[0115] Read the phonetic code mapping rules, and according to the mapping rules, each Chinese character in the two character strings is transformed into phonetic code representation; the phonetic code includes phonetic code and shape code, wherein the phonetic code is represented by The initials and vowels are composed of digital codes, and the shape codes are composed of the four-corner code, structure code and number of strokes of Chinese characters;

[0116] Calculate the edit distance between the corresponding substrings of two strings based on the edit distance;

[0117] Calculate the similarity of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The phonetic-morphological codes comprise phonetic codes and morphological codes, the phonetic codes are composed of digital codes of initial consonants and final consonants, and the morphological codes are composed of four-corner codes, structure codes and stroke numbers of Chinese characters; the mapping rule of the phonetic and morphological codes and the pronunciation similarity of part of initial consonants/finals are pre-stored, and the method comprises the steps: receiving two character strings to be compared; reading a mapping rule of phonetic and morphological codes, and converting each Chinese character in the two character strings into phonetic and morphological code representation according to the mapping rule; based on the initial consonant/final pronunciation similarity, calculating an editing distance between every two corresponding substrings of the two character strings by adopting the editing distance; and calculating the similarity of the two character strings according to the editing distance. According to the method, the character string is converted into the phonetic and morphological code numeric string for comparison, the Chinese character matching precisionis improved, and on the other hand, the editing distance of the Chinese character is used for replacing the weight of the editing distance, so that the similarity of the character string can be calculated more accurately.

Description

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products