Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese text recognition method and device

A text recognition and Chinese technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problem of low recognition rate of special words, achieve the effect of improving recognition efficiency and accurate recognition results

Active Publication Date: 2018-09-14
CHINA MOBILE GRP GUANGDONG CO LTD +1
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a Chinese text recognition method and device, which is used to overcome the defect that the existing new word recognition method adopts a unified way to recognize all vocabulary to be confirmed, and the recognition rate of special vocabulary is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text recognition method and device
  • Chinese text recognition method and device
  • Chinese text recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0147] Original Search Record: Is it better to use red string or black string for obsidian pendant?

[0148] Jieba Chinese word segmentation results [8]: obsidian, for pendants, red rope, good, or, black rope

[0149] Jieba word segmentation results based on dynamic thesaurus update: obsidian, pendant, red rope, okay, or, black rope

example 2

[0151] Original search record: Liu Tao Domineering Wall Dong Yang Zi

[0152] Jieba Chinese word segmentation results: Liu Tao, Domineering, Bidong, Yang, Zi

[0153] Jieba word segmentation results based on dynamic thesaurus update: Liu Tao, Domineering, Bi Dong, Yang Zi

example 3

[0155] Original Search Record: Yuecheng Hospital, Lancheng District

[0156] Jieba Chinese word segmentation results: blue,city,month,city,hospital

[0157] Jieba word segmentation results based on dynamic thesaurus update: Lancheng District, Yuecheng, Hospital

[0158] In the second aspect, the embodiment of the present invention provides a Chinese text recognition device, such as Figure 7 shown, including:

[0159] The keyword acquisition unit 201 is used to obtain the keywords reported by each terminal application program in the application program search, and store the keywords into the search corpus of the corresponding category according to the category attribute of the keywords;

[0160] Character string segmentation unit 202, for adopting corresponding preset algorithm for each search corpus to carry out multiple segmentation to the keyword of storage until obtain the single character string that can't continue segmentation;

[0161] The preliminary recognition un...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese text recognition method and device. The method includes: acquiring keywords reported by application programs of terminals in PS domain signaling; classifying the keywords according to type of the application programs; performing segmentation, preliminary recognition and probability screening on the keywords stored in different search corpuses on the basis of different preset algorithms; adding a result acquired by screening into a preset word library. Therefore, compared with existing recognition methods, the method has the advantages that different vocabularies can be specifically processed according to type difference of the application programs reporting the keywords, higher pertinence is realized, more accurate recognition results can be acquired, andrecognition efficiency is improved.

Description

technical field [0001] The embodiment of the present invention relates to the field of software technology, in particular to a Chinese text recognition method and device. Background technique [0002] With the advent of the Internet age, people rely more and more on search engines for information retrieval. However, the traditional mechanical word segmentation method is not ideal for the recognition of ever-changing network words and emerging phrases. Chinese word segmentation technology is the basis of search engines and Chinese natural language processing, and it is a major bottleneck for unregistered word recognition in Chinese word segmentation. Among them, unregistered words refer to words that have not been included in the word segmentation system. [0003] For the identification of unregistered new words, the more commonly used methods are to obtain web page content, search logs or query logs, and identify new words based on rules or statistics based on the content o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F40/247G06F40/289
Inventor 徐志焕陈文鸿陈利青郑丽燕吴锐彬徐睿张晓川
Owner CHINA MOBILE GRP GUANGDONG CO LTD