Text analysis method and text analyzer

A text analysis and text technology, applied in the direction of instruments, special data processing applications, electronic digital data processing, etc., can solve the problems of not distinguishing between entity words and non-entity words, and low accuracy rate of entity word text analysis, so as to improve the accuracy rate Effect

Active Publication Date: 2013-05-01
新浪技术(中国)有限公司
View PDF6 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] It can be seen from the above that the existing text analysis methods use a unified strategy to analyze the text without distinguishing between entity words and non-entity words, which makes the text analysis accuracy of entity words low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text analysis method and text analyzer
  • Text analysis method and text analyzer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] The technical solutions of the various embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0056] The existing text analysis method adopts a unified strategy to analyze the text without distinguishing entity words and non-entity words, that is, both entity words and non-entity words are used for word segmentation and part-of-speech tagging at a small granularity. In practical applications, for For entity words, the results of word segmentation and part-of-speech tagging cannot meet the application requirements, making the results of word segmentation and part-of-speech tag...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text analysis method and a text analyzer. The method comprises the following steps of: performing splitting processing on an acquired text by utilizing characters as a unit, and performing characteristic tagging on characters obtained by splitting according to preset character characteristics so as to form tagged word strings; performing word segmentation processing on the tagged word strings according to pre-constructed word segmentation models so as to obtain word segmentation results containing word orders; performing merging processing on the word orders contained in the word segmentation results, and performing characteristic tagging on words obtained by merging according to the preset character characteristics so as to obtain tagged word strings; performing part-of-speech tagging on the tagged word strings according to pre-constructed part-of-speech tagging models so as to obtain part-of-speech tagging results; and if confirming that the part-of-speech tagging results contain part-of-speech tags of entity words, merging the entity words containing the part-of-speech tags in the part-of-speech tagging results according to same adjacent rules, so as to obtain a text analysis result. By applying the text analysis method and the text analyzer, the entity word text analysis accuracy rate can be improved.

Description

technical field [0001] The invention relates to data mining technology, in particular to a text analysis method and a text analyzer. Background technique [0002] At present, in natural language processing technology, lexical analysis of text is the basis of other Chinese information processing. For example, the widely used search engines, machine translation, speech synthesis, automatic classification, automatic summarization, automatic proofreading, etc. Lexical analysis techniques. There are two purposes for lexical analysis of a sentence or text: word segmentation and part-of-speech tagging. Word segmentation is to divide the text sequence closely connected between words into words, so as to convert the text sequence into a word sequence; Basically, according to the context information of the sentence, the part-of-speech tagging is performed on the divided words, for example, the words are marked as verbs, nouns, adverbs, or adjectives. Among them, words are the smalle...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 戴明洋
Owner 新浪技术(中国)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products