Text analysis method and device

A text analysis and target text technology, applied in the field of information retrieval, can solve the problems of post quality analysis and achieve the effect of accurate text quality analysis

Active Publication Date: 2017-11-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem in the prior art that there is no suitable method to analyze the quality of the posts in the forum, the embodiment of the present invention provides a text analysis method and device

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text analysis method and device
  • Text analysis method and device
  • Text analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Please refer to figure 1 , which shows a method flow chart of the text analysis method provided in Embodiment 1 of the present invention. This embodiment is mainly illustrated by taking the text analysis method applied to quality analysis of posts in a forum. The text analysis methods include:

[0042] Step 101, obtaining one or more characteristic information of the target text;

[0043] The target text can be a post in a forum. The characteristic information of the target text includes the number of words in the title, the ratio of the number of keywords in the title to the number of words in the title, the number of category interest words in the title, the number of hot words in the title, whether the title contains advertising words , the number of text, the ratio of the number of punctuation points in the text to the number of text in the text, the ratio of the number of connecting words in the text to the number of sentences in the text, the information entropy...

Embodiment 2

[0048] Please refer to figure 2 , which shows a method flow chart of the text analysis method provided in Embodiment 2 of the present invention. This embodiment is mainly illustrated by taking the text analysis method applied to quality analysis of posts in a forum. The text analysis methods include:

[0049] Step 201, obtaining one or more characteristic information of the target text;

[0050] When the target text is a post in a forum, the feature information of the target text can include the number of words in the title, the ratio of the number of keywords in the title to the number of words in the title, the number of category interest words in the title, the number of hot words in the title, whether Including advertising words, number of text, ratio of punctuation in text to number of text in text, ratio of number of connecting words in text to number of sentences in text, vocabulary information entropy in text, number of independent parts of speech in text, informati...

Embodiment 3

[0123] Please refer to image 3 , which shows a structural block diagram of the text analysis device provided by the third embodiment of the present invention. The text analysis device can be realized as a forum server or a unit in the forum server. The text analysis device includes an information acquisition module 320 , a score calculation module 340 and a weight accumulation module 360 ​​.

[0124] An information obtaining module 320, configured to obtain one or more characteristic information of the target text.

[0125] The score calculation module 340 is configured to calculate quantized scores for each type of feature information of the target text acquired by the information acquisition module 320 .

[0126] The weight accumulation module 360 ​​is used for multiplying the quantized scores of each feature information of the target text calculated by the score calculation module 340 with their corresponding weights and then accumulating to obtain the total score of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text analysis method and device, belonging to the field of information retrieval. The method includes: obtaining one or more characteristic information of the target text; calculating quantized scores for each characteristic information of the target text; Accumulate after multiplication to obtain the total score of the target text. The present invention calculates quantization scores for various feature information of the target text respectively, and accumulates each quantization score according to their corresponding weights to obtain the final score, and can adaptively correct feature weights and feature item adaptive expansion in application scenarios As a result, it solves the problem that there is no suitable method for analyzing the quality of posts in the forum in the prior art, and achieves the effect that the text quality of the text of forum posts can be accurately analyzed.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a text analysis method and device. Background technique [0002] Text analysis is widely used in fields such as information retrieval, data mining, machine learning and statistics, and computational linguistics. [0003] Existing text analysis methods mainly include language probability model analysis methods, PageRank (page level) analysis methods and classification analysis methods. Among them, the language probability model analysis method mainly uses the language model based on the corpus to analyze whether the sentences in the text are naturally generated, rather than artificially tampered with, such as artificially piling up keywords to maliciously obtain a higher ranking; the PageRank analysis method mainly uses webpage The in-link and out-link information is used to calculate the effectiveness of the page, so as to achieve the ranking of the web page as a search resu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 翟俊杰姚从磊王亮温泉李亚楠
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products