Text label extraction method and device, equipment and storage medium

A technology of text labeling and extraction methods, applied in the computer field, can solve the problems of low efficiency of text labeling, insufficient personalization, poor scalability, etc., and achieve the goal of solving low efficiency of label extraction, improving personalization and comprehensiveness, and improving accuracy Effect

Pending Publication Date: 2021-04-23
BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of realizing the present invention, the inventors found that there are at least the following problems in the prior art: (1) There are problems such as errors, deletions, and inaccurate expressions in the information filled in manually, and the labels filled in manually are relatively standardized and not personalized enough ; (2) Although the method of automatically extracting tags from structured information can correct manual errors to a certain extent and improve the efficiency of manually filling in the specified attributes, it still has the problem of insufficient individuation; (3) Although from the item introduction The method of automatically extracting tags from unstructured information such as detailed maps can solve the problem of insufficient personalization of text tags to a certain extent, but it requires manual data annotation, which makes text tag extraction inefficient and poor in scalability

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text label extraction method and device, equipment and storage medium
  • Text label extraction method and device, equipment and storage medium
  • Text label extraction method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0039] The text label extraction method provided in this embodiment is applicable to the case of extracting labels from multiple texts under the same topic, and is especially suitable for extracting text labels in an e-commerce platform. The method can be executed by a text tag extraction device, which can be realized by software and / or hardware, and which can be integrated in a device such as a personal computer or a server. see Figure 1a , the method of this embodiment specifically includes the following steps:

[0040] S110. Obtain each text of the label to be extracted, and vectorize each text to obtain a text vector corresponding to the corresponding text.

[0041] Wherein, the label refers to a characteristic description of a certain aspect of the text, for example, it may be a keyword expressing the focus of the text. In an e-commerce platform, a tag can be a description of specific attributes of an item, such as the specification attribute of the item, the extended a...

Embodiment 2

[0065] On the basis of the first embodiment above, this embodiment further optimizes "obtaining each text of the label to be extracted". On this basis, it is also possible to further optimize "determining the text label of each text according to each label candidate word corresponding to each text clustering result". The explanations of terms that are the same as or corresponding to the above-mentioned embodiments will not be repeated here.

[0066] For the convenience of subsequent descriptions, the application scenario in this embodiment is set to extract text labels from item introduction detail diagrams on the e-commerce platform. In addition to the structured item specification attributes and item extension attributes, the item introduction detail map also contains a large number of unstructured item information descriptions, and these item information includes a large number of personalized tags of the item. These personalized tags can be used as corrections and supplem...

Embodiment 3

[0091] On the basis of the above-mentioned embodiments, this embodiment describes the steps of automatically tagging the text to be tagged. The explanations of terms that are the same as or corresponding to the above-mentioned embodiments will not be repeated here.

[0092] S310. Obtain the text to be tagged with tags to be tagged, perform word segmentation and stop word removal on the text to be tagged, and obtain a word segmentation result to be tagged corresponding to the text to be tagged.

[0093] The text to be marked is the text that needs to be automatically marked. It can be ordinary text, or it can be the text obtained from the item introduction detail map according to the method of S210-S220 in the second embodiment. Since the automatic labeling needs to use the text labels that have been extracted from a large number of texts, the text to be labeled should belong to the same topic as the text extracted from each text label. After the text to be marked is obtained,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a text label extraction method and device, equipment and a storage medium. The method comprises the steps of obtaining each text of a to-be-extracted label, and vectorizing each text to obtain a text vector corresponding to the corresponding text; clustering the text vectors to obtain at least one text clustering result; carrying out keyword extraction on each text clustering result to obtain each label candidate word corresponding to each text clustering result; determining a text label of each text according to each label candidate word corresponding to each text clustering result. By means of the technical scheme, automatic extraction of the text label is achieved, and the accuracy and comprehensiveness of text label extraction and the expandability of the label extraction method are improved.

Description

technical field [0001] Embodiments of the present invention relate to computer technology, and in particular to a method, device, device and storage medium for extracting text tags. Background technique [0002] In application scenarios such as information search and information recommendation, data mining is usually required, one of which is the extraction of text labels. Taking the e-commerce platform as an example, the objects extracted from the text tags are usually product-related information, such as product details (referred to as item introduction details), product specifications and comments, etc. Among them, the item introduction detail map contains more detailed and comprehensive product description information, such as related to product use occasions, applicable people, and marketing labels such as "small footprint" and "high suction power"; product specification parameters are stored in structured forms such as tables , which includes specification attributes ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/284G06F40/216G06Q30/06
CPCG06Q30/0601G06F16/35
Inventor 窦方正
Owner BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products