Subject term extraction method and device based on TF-IDF, equipment and storage medium

A TF-IDF and extraction method technology, applied in devices, equipment and storage media, in the field of TF-IDF-based subject word extraction methods, can solve the problems of unsatisfactory short text subject word extraction effects and the like

Active Publication Date: 2021-09-14
QINGDAO UNIV OF SCI & TECH
View PDF14 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Existing methods usually use the LDA topic extraction model, the TextRank keyword extraction algorithm or the LSI model to extract the subject terms from the text in the product description document set. However, the existing methods The extraction effect is not ideal when extracting short text keywords

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Subject term extraction method and device based on TF-IDF, equipment and storage medium
  • Subject term extraction method and device based on TF-IDF, equipment and storage medium
  • Subject term extraction method and device based on TF-IDF, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

[0061] Product description is a special kind of short text, which is a kind of simple descriptive text. The product description contains important information about the product, which is a way for customers to obtain product information in addition to the product title and product category. In addition, the popularity of e-commerce has reached an unprecedented state, and a large number of new products will appear on the platform every moment. The extraction result of the efficient and accurate subject word extraction algorithm can be combined with algorithms such as string matching or similarity calculation to a certain extent to judge whether the product title corresponds to the product description, so as to prevent the confusion between...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a subject term extraction method and device based on TF-IDF, equipment and a storage medium, and belongs to the field of subject term extraction. The method comprises the steps of obtaining a plurality of commodity description texts and performing word segmentation processing; obtaining a first weight of the target segmented word according to the occurrence frequency of the target segmented word in the target text and the reverse text frequency of the target segmented word; obtaining a second weight according to the part-of-speech of the target segmented word; obtaining a third weight according to the position of the target segmented word in the target text; obtaining a fourth weight according to the occurrence frequency of the target segmented word in the plurality of commodity description texts; and determining a target weight of the target segmented word according to the first weight, the second weight, the third weight and the fourth weight, and obtaining a subject word extraction result of the target text according to the target weight. The subject term recognition and extraction accuracy is improved, and the method has a certain practical application value.

Description

technical field [0001] The invention relates to the field of subject word extraction, in particular to a subject word extraction method, device, equipment and storage medium based on TF-IDF. Background technique [0002] With the rapid development of B2O and e-commerce platforms, online shopping has become an indispensable part of daily life. According to the latest data released by Alibaba for fiscal year 2021 (April 1, 2020 - March 31, 2021), the number of new active merchants on Taobao hit the highest value in five fiscal years since 2017. At present, Taobao merchants with sales of more than 1 million yuan last year increased by 50% compared with 2017. The data shows that since 2017, the number of active merchants on Taobao has continued to grow overall. It is understood that in the months after March 2020, Taobao opened an average of 40,000 new stores every day. With the rapid rise of the e-commerce industry, the number of people engaged in e-commerce has increased, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/242G06F40/258G06F40/279G06Q30/06
CPCG06F40/242G06F40/258G06F40/279G06Q30/0627
Inventor 王华东张津烽王军
Owner QINGDAO UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products