Method and device for screening text content

A content and text technology, applied in the field of text content screening, can solve the problems of a lot of time, inconvenient operation, and more time, and achieve the effect of simplifying operation, saving screening time, and improving screening efficiency

Active Publication Date: 2015-08-19
TENCENT TECH (SHENZHEN) CO LTD +1
View PDF9 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This method has the following disadvantages: due to the large number of third-party novel websites, and the uneven network speed, update speed, and content quality, readers need to spend a lot of time on manual search and screening to select high-quality online novels, and the reading efficiency is low.
[0009] This method has the following defects: this scheme reduces the search time for readers, but it still needs to be manually screened after the search, which still takes a lot of time; and, for many chapters of a novel, some chapters may still appear in the Inconsistent quality on different websites may require frequent switching between websites when reading a book, disrupting the continuity of the reading experience
[0010] To sum up, in the case of multiple candidate text content, human participation is required for verification to filter out high-quality text content, which is inconvenient to operate and low in efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for screening text content
  • Method and device for screening text content
  • Method and device for screening text content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0026] In the present invention, the candidate text content is automatically obtained, the candidate text content is similarly compared with the standard text content, and the text content with high similarity is determined as high-quality text content; thus, the automatic screening of the text content is realized. see figure 1 , is a schematic flowchart of a method for screening text content in the present invention, which includes the following steps:

[0027] Step 101, obtain at least two candidate text contents from different data sources, perform word segmentation for each candidate text content, select a set number of words with the highest weight, and form a text feature vector, which is represented as a first text feature vector.

[0028] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a device for screening text content. The method comprises the following steps: obtaining at least two pieces of candidate text content from different data sources, carrying out word segmentation on each piece of candidate text content, selecting a preset quantity of words with the highest weights to form a text characteristic vector, and representing the text characteristic vector as a first text characteristic vector; obtaining standard text content which is associated with the candidate text content, carrying out word segmentation on the standard text content, selecting a preset quantity of words with the highest weights to form a text characteristic vector, and representing the text characteristic vector as a second text characteristic vector; and calculating a distance between the first text characteristic vector and the second text characteristic vector, judging whether the distance value obtained by calculation is greater than a distance set value or not, and if the distance value obtained by calculation is greater than the distance set value, reserving the corresponding candidate text content as a recommendation source, otherwise, removing the corresponding candidate text content, and taking the removed candidate text content as the candidate source. The scheme can realize a purpose that the high-quality text content can be automatically selected.

Description

technical field [0001] The invention relates to text information processing technology, in particular to a method and device for screening text content. Background technique [0002] When querying the target text content through the network, there are often multiple candidate text content, and high-quality text content needs to be screened out. The text content is, for example, a novel text, an online text about a specified subject, etc.; the following takes online reading of a novel text as an example for specific description. [0003] Online literature first emerged on the Internet and is a popular form of text on the PC side. With the rapid development of Internet literature, more and more Internet novel websites have emerged. The same novel often appears on many websites, but the quality and degree of impurity in the text of the novel vary from website to website. In practical applications, it is often necessary to screen novel texts to find high-quality novel texts. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 张红林
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products