Method and device for tagging crawler seed

A tag and seed technology, applied in the field of data processing, can solve the problems of crawler seeds without uniform standards and inconsistent tags, and achieve the effect of accurate tags and high efficiency

Active Publication Date: 2018-05-22
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because different people may have different understandings of the same crawler seed, this leads to different people labeling the same crawler seed inconsistently, that is to say, there is no uniform standard for labeling crawler seeds

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for tagging crawler seed
  • Method and device for tagging crawler seed
  • Method and device for tagging crawler seed

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0061] The following is an introduction to the specific content of the embodiments.

[0062] The embodiment of the present invention provides a method for tagging crawler seeds, refer to figure 1 , is a flowchart of a method for tagging reptile seeds provided by the present invention. The method for labeling reptile seeds specifically includes:

[0063] S101: Pre-establish an association relationship between tags and keyword arra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for tagging a crawler seed. The method includes: establishing an association relation between tags and keyword arrays in advance, and using a crawler seed to crawl webpage content while a random crawler seed to be tagged is received; extracting keywords from the webpage content, and clustering the keywords to acquire the word frequency of each keyword; sorting the keywords according to the word frequency of each keyword, and generating a position identification of each keyword; matching the keyword having the position identification and each keywordarray in the association relation between the tags and the keyword arrays; and finally using a tag having a highest matching degree with the keyword arrays as a tag of the crawler seed. Compared withthe conventional manual tagging manners, the method and device for tagging the crawler seed can automatically complete tagging of the crawler seed, are high in efficiency, and are accurate in tag.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method and device for tagging reptile seeds. Background technique [0002] A web crawler is a program or script that automatically grabs information on the World Wide Web according to certain rules. The crawler seed is the entry URL for the web crawler to grab information, indicating that the web crawler starts to grab website content information from this URL. [0003] At present, there is a large amount of information on the Internet that we can acquire and learn from. Since the content of websites automatically captured by web crawlers has not been manually reviewed and classified, it is difficult to determine what knowledge field the captured information belongs to, and the value of being used is very low. [0004] Although there are trillions of websites on the Internet, each website has its own knowledge characteristics. For example, if there are literature exchange websi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/9558
Inventor 贺达曹志明陈晓敏
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products