Check patentability & draft patents in minutes with Patsnap Eureka AI!

Label extracting method and device, apparatus and medium

A tag extraction and tag word technology, applied in the Internet field, can solve the problems of unable to extract hot topics and popular words, update professional dictionaries, etc.

Active Publication Date: 2018-03-30
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Because the existing technology cannot update professional dictionaries frequently and in a timely manner, it is impossible to extract labels for emerging hot topics and popular words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label extracting method and device, apparatus and medium
  • Label extracting method and device, apparatus and medium
  • Label extracting method and device, apparatus and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0062]figure 1 It is a flow chart of a tag extraction method provided in Embodiment 1 of the present invention. This embodiment is applicable to the case of extracting tags from newly emerging hot topics and hot words. The method can be executed by a label extracting device, and the device can be implemented in software and / or hardware. see figure 1 , the tag extraction method provided by the embodiment of the present invention includes:

[0063] S110. Segment the text data to obtain a plurality of content words, and determine candidate tag words according to the content words.

[0064] Wherein, the text data is text content to be tag extracted, and the text data may be web page text content, operation log text content, database text content, and the like. Content words are one of the Chinese part of speech. Words contain words with practical meaning, and content words can serve as sentence components alone, that is, words with lexical meaning and grammatical meaning. Gene...

Embodiment 2

[0090] figure 2 It is a flow chart of a label extraction method provided in Embodiment 2 of the present invention. This embodiment is an optional solution proposed on the basis of the first embodiment above. see figure 2 , the tag extraction method provided in this embodiment includes:

[0091] S210. Segment the text data to obtain a plurality of content words, and determine candidate tag words according to the content words.

[0092] Specifically, determining candidate label words according to the content words may include:

[0093] Using a preset model to determine the semantic vector of the content word;

[0094] determining the semantic distance between the content words according to the semantic vector;

[0095] For each content word, according to the semantic distance, take the current content word as the neighborhood center, and determine the current neighborhood as the radius with the set radius value;

[0096] If the number of content words in the current neig...

Embodiment 3

[0120] Figure 4 It is a schematic structural diagram of a label extracting device provided in Embodiment 3 of the present invention. see Figure 4 , the tag extraction device provided in this embodiment includes: a candidate tag word module 10 , a popularity value determination module 20 and a tag extraction module 30 .

[0121] Wherein, the candidate label words module 10 is used for word segmentation to text data, obtains a plurality of content words, and determines the candidate label words according to the content words;

[0122] The popularity value determination module 20 is used to take each candidate tag word as the current candidate tag word in turn, and determine the current candidate tag word at the current moment according to the popularity trend of the current candidate tag word in the text data. heat value;

[0123] The tag extraction module 30 is used to judge whether the popularity value satisfies the set tag word condition, and if so, use the current candi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a label extracting method and device, an apparatus and a medium, and relates to the technical field of Internet. The method includes the steps of segmenting text data to obtaina plurality of real words, and determining candidate label words according to the real words; sequentially selecting each candidate label word as the current candidate label word, and determining a heat value of the current candidate label word at the current moment according to a trend of the popularity of the current candidate label word in the text data; determining whether the heat value satisfies a condition of setting a label word, and if yes, using the current candidate label word as a label word. The label extracting method and device, the apparatus and the medium achieve the extraction of labels of emerging hot topics and hot words.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of the Internet, and in particular to a label extraction method, device, equipment and medium. Background technique [0002] Tags, as a characteristic of content, play a vital role in content understanding and recommendation systems. [0003] At present, for the extraction of tags in the industry, it is common to use professional dictionaries to extract tags from professional documents. For example, a plurality of words obtained by segmenting the text data of travel-related webpages, and then among the plurality of words, if there are keywords pre-stored in the travel dictionary, and the frequency of occurrence of the keywords is greater than the set threshold, the Keywords serve as labels for the text content of the web page. [0004] However, with the explosive growth of Internet data, new hot topics and hot words often appear. Because the existing technology cannot update prof...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/9535G06F16/9577G06F40/258G06F40/284G06F40/289
Inventor 孙健
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More