Information processing method and device

An information processing method and algorithm technology, applied in the computer field, can solve problems such as error-prone, and achieve the effect of improving accuracy and eliminating ambiguous words

Active Publication Date: 2014-02-19
RUN TECH CO LTD BEIJING
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The main defect of this technology is that when the text to be segmented contains ambiguous words, it is easy to make mistakes according to the entries in the dictionary

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information processing method and device
  • Information processing method and device
  • Information processing method and device

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0058] figure 1 It is a flowchart of the information processing method provided by the first embodiment of the present invention, specifically including the following steps:

[0059] Step 101, judging whether there are ambiguous words in the text to be processed.

[0060] For example, according to the database of ambiguous words, it is judged whether there are ambiguous words in the text to be processed. The database of ambiguous words can include ambiguous words and word segmentation rules corresponding to the ambiguous words. Text in English letters.

[0061] Step 102, when there are ambiguous words in the text to be processed, split the ambiguous words from the text to be processed.

[0062] For example, when the ambiguous word is located in the middle of the text to be processed, the text to be processed can be split into three parts, the ambiguous word, the part on the left side of the ambiguous word and the part on the right side of the ambiguous word, when the When t...

no. 2 example

[0069] This embodiment adds the following on the basis of the above embodiments figure 2 steps shown.

[0070] Step 201, split the received information according to the character code, punctuation mark, and name database to obtain the text to be processed.

[0071] For example, the received information may be in Chinese, or a combination of at least one of Chinese and English, numbers and punctuation marks. The text to be processed is the text split from the received information.

[0072] After receiving the information to be processed, the received information can be split into Chinese clauses and / or English words and / or number strings according to the character code and punctuation marks, for example, the received information is "hello Zhang San, Li Si Where did you go?”, after this step, it can be split into “hello”, “Zhang San”, and “Where did Li Si go”. Then, according to the name database, the name of the person in the split Chinese clause is recognized. The recognit...

no. 3 example

[0092] On the basis of the above-mentioned embodiments, this embodiment adds the following Figure 4 steps shown.

[0093] Step 401. Combine the split results and the results obtained after splitting the split ambiguous words to obtain a word segment set, and the word segments in the word segment set are arranged according to their positions in the text to be processed .

[0094] Merge the split result obtained by the forward or reverse maximum matching algorithm and the split result obtained by the first embodiment, the way of merging can be the ambiguous word part in the split result obtained by the forward or reverse maximum matching algorithm Split according to the method provided in the first embodiment, and keep other parts unchanged.

[0095] Step 402, when the word segmentation set contains continuous words, judge whether the continuous words contain low-probability words according to the low-probability word database, and if so, synthesize the continuous words to the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an information processing method and device. The method comprises the following steps that judgment is made whether an ambiguous term exists in a text to be processed; when the ambiguous term exists in the text to be processed exists, the ambiguous term is split from the text to be processed; the split ambiguous term is split according to a term splitting rule corresponding to the split ambiguous term. According to the information processing method and device, ambiguous term judgment is conducted on the text to be processed, when the ambiguous term exists in the text to be processed, the ambiguous term is split form the text to be processed, and the split ambiguous term is split according to the term splitting rule corresponding to the ambiguous term. The information processing method and device effectively remove the ambiguous term in the text to be processed, and the accuracy rate for information processing is improved.

Description

technical field [0001] The present invention relates to computer technology, in particular to an information processing method and device. Background technique [0002] In information processing technology, Chinese word segmentation has a wide range of applications, such as search engines, full-text retrieval of documents, automatic classification of documents, etc. [0003] Chinese word segmentation is the process of dividing Chinese sentences into Chinese word sets. Chinese sentences are composed of Chinese characters, but a single Chinese character basically does not have the function of expressing complete semantics. Therefore, to understand the semantics of Chinese sentences, it is first necessary to split the Chinese sentences composed of Chinese characters into Chinese word sets. [0004] At present, Chinese word segmentation methods are mainly based on dictionary matching for word segmentation. This method is to match the text to be segmented with the entries in a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 贾高峰龙江群闫慧丽
Owner RUN TECH CO LTD BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products