Text analysis method and device

A text analysis and text technology, applied in the field of information retrieval, can solve problems such as post quality analysis, and achieve the effect of accurate analysis of text quality

Active Publication Date: 2014-04-02
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem in the prior art that there is no suitable method to analyze the qual...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text analysis method and device
  • Text analysis method and device
  • Text analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Please refer to figure 1 , which shows a method flow chart of the text analysis method provided in Embodiment 1 of the present invention. This embodiment is mainly illustrated by taking the text analysis method applied to quality analysis of posts in a forum. The text analysis methods include:

[0027] Step 101, obtaining one or more characteristic information of the target text;

[0028] The target text can be a post in a forum. The characteristic information of the target text includes the number of words in the title, the ratio of the number of keywords in the title to the number of words in the title, the number of category interest words in the title, the number of hot words in the title, whether the title contains advertising words , the number of text, the ratio of the number of punctuation points in the text to the number of text in the text, the ratio of the number of connecting words in the text to the number of sentences in the text, the information entropy...

Embodiment 2

[0033] Please refer to figure 2 , which shows a method flow chart of the text analysis method provided in Embodiment 2 of the present invention. This embodiment is mainly illustrated by taking the text analysis method applied to quality analysis of posts in a forum. The text analysis methods include:

[0034] Step 201, obtaining one or more characteristic information of the target text;

[0035] When the target text is a post in a forum, the feature information of the target text can include the number of words in the title, the ratio of the number of keywords in the title to the number of words in the title, the number of category interest words in the title, the number of hot words in the title, whether Including advertising words, number of text, ratio of punctuation in text to number of text in text, ratio of number of connecting words in text to number of sentences in text, vocabulary information entropy in text, number of independent parts of speech in text, informati...

Embodiment 3

[0107] Please refer to image 3 , which shows a structural block diagram of the text analysis device provided by the third embodiment of the present invention. The text analysis device can be realized as a forum server or a unit in the forum server. The text analysis device includes an information acquisition module 320 , a score calculation module 340 and a weight accumulation module 360 ​​.

[0108] An information obtaining module 320, configured to obtain one or more characteristic information of the target text.

[0109] The score calculation module 340 is configured to calculate quantized scores for each type of feature information of the target text acquired by the information acquisition module 320 .

[0110] The weight accumulation module 360 ​​is used for multiplying the quantized scores of each feature information of the target text calculated by the score calculation module 340 with their corresponding weights and then accumulating to obtain the total score of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text analysis method and a text analysis device and belongs to the field of information retrieval. The method comprises the steps of obtaining one or a plurality of kinds of characteristic information of a target text; respectively calculating a quantitative score of each kind of characteristic information of the target text; multiplying the quantitative score of each kind of characteristic information of the target text by respective corresponding weight and then accumulating each quantitative score to obtain the total scores of the target text. Since the quantitative scores of all kinds of characteristic information of the target text are respectively calculated, each quantitative score is accumulated according to the respective corresponding weight to obtain final scores and the characteristic weight and the effect of adaptive extension of characteristic items can be adaptively corrected in application scenarios, the problem that there is no proper method in the prior art for performing quality analysis on topics posted in forums is solved, and the effect of accurately analyzing the text quality of topic-type texts in the forums can be achieved.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a text analysis method and device. Background technique [0002] Text analysis is widely used in fields such as information retrieval, data mining, machine learning and statistics, and computational linguistics. [0003] Existing text analysis methods mainly include language probability model analysis methods, PageRank (page level) analysis methods and classification analysis methods. Among them, the language probability model analysis method mainly uses the language model based on the corpus to analyze whether the sentences in the text are naturally generated, rather than artificially tampered with, such as artificially piling up keywords to maliciously obtain a higher ranking; the PageRank analysis method mainly uses webpage The in-link and out-link information is used to calculate the effectiveness of the page, so as to achieve the ranking of the web page as a search resu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 翟俊杰姚从磊王亮温泉李亚楠
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products