Corpus filtering method and device

A filtering method and corpus technology, applied in the field of corpus Internet, can solve the problems of untimely information processing, high labor cost, delay, etc., and achieve the effect of shortening the time to reach users and fast and accurate corpus filtering

Active Publication Date: 2015-10-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF2 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the above-mentioned prior art has the following disadvantages: for the newly generated information, it needs to be manually reviewed before reaching the user, which will cause a delay and cause the information to be processed untimely, and because all polysemous keywords need manual review, High labor cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus filtering method and device
  • Corpus filtering method and device
  • Corpus filtering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The embodiment of the present invention provides a corpus filtering method and device, the whole process does not need manual review, the corpus filtering is fast and accurate, and the time for the corpus to reach the user is shortened.

[0032] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0033] The terms "first", "second" and the like (if any) in the description and claims o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a corpus filtering method and device. The method of the embodiment of the invention comprises the following steps: independently extracting and storing a plurality of keywords for each corpus to be processed; for each stored keyword, generating a corresponding keyword vector; according to the keyword vector corresponding to each keyword, independently calculating each corpus to be processed and a keyword correlation value between extracted keywords; obtaining a target keyword, and determining a corresponding target text corpus; and according to the keyword correlation value of the target keyword of each corpus in the target text corpus, filtering the corpuses which do not meet a value taking requirement of the keyword correlation value of the set target keyword in the target text corpus. The whole process of the embodiment of the invention does not need artificial check, corpus filtering speed is high and accurate, and time for the corpuses to reach a user is shortened.

Description

technical field [0001] The invention relates to the technical field of corpus internet, in particular to a corpus filtering method and device. Background technique [0002] With the explosive growth of Internet information, all users are faced with the problem of excess information. In order to help users better absorb information and eliminate interference, information applications (APP, Application) for keyword subscriptions have emerged as the times require. [0003] Since many words in Chinese often have multiple meanings, subscribing to information for keywords will cause some problems. For example, the word "article" can refer to "a kind of content carrier" or "articles by mainland entertainment male stars". ", the word "Lenovo" can refer to "a verb that expresses imagination" or "Lenovo, a domestic PC giant". This kind of polysemy problem is very common. It is very important to filter out articles corresponding to the mainstream meaning of words. For example, users ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 蔡兵
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products