Supercharge Your Innovation With Domain-Expert AI Agents!

A method and system for short text classification of commodity names based on attention mechanism

A technology of product names and classification methods, applied in the field of inventions involving probability models, can solve problems such as difficulty in bearing costs, difficulty in ensuring efficiency and accuracy, and strong subjectivity, and achieve the effect of improving accuracy and efficiency

Active Publication Date: 2022-07-22
ZHEJIANG UNIV OF TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (2) Most of the product names are short text types, consisting of only a few words, which cannot effectively extract context information, which also leads to the limitation of current mainstream natural language processing methods on this issue
[0006] (3) In my country, there are more than 4,000 classifications of five-level tax codes, and there are many types. It is a super-classification problem. At present, it is difficult to have an effective method to solve it.
Although many scholars have done a lot of research and improvement based on this, the limitations of feature engineering are still unavoidable: 1. The characteristics of short texts are not comprehensively considered. Some consider the frequency of words and ignore the part of speech and location information. Considering the co-occurrence information of words and ignoring the text structure information, etc., which affects the accuracy of keyword extraction
2. The scoring mechanism for short essays is too subjective, using people's prior knowledge as the interpretation standard for scoring rules or not explaining the basis for setting scoring rules at all
However, some companies have millions or even tens of millions of product records a year. At this time, it is unrealistic and subjective to rely solely on manual classification of tax codes, making it difficult to guarantee efficiency and accuracy. Classification requires some professional tax personnel to complete, resulting in a further increase in cost, and it is difficult for ordinary enterprises to bear the cost
At the same time, the classification of tax codes faces many difficulties: on the one hand, the registration process of commodity names may record some misleading noises, and some core keywords are not the helpful information we need; on the other hand, most of the commodity names There are only a few words, and there is a serious lack of contextual semantics. It is difficult for general classification algorithms to solve the problem of short texts, which further increases the difficulty of classification.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for short text classification of commodity names based on attention mechanism
  • A method and system for short text classification of commodity names based on attention mechanism
  • A method and system for short text classification of commodity names based on attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0105] In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0106] In order to solve the tax code classification problem of the existing commodity names, in view of the various shortcomings of the tax code classification algorithm at the current stage, the present invention proposes an attention mechanism-based ultra-short text classification method for commodity names, which can comprehensively consider professional When people classify tax codes, they grab the core words for judgment, and combine the attention mechanism in deep learning to obtain the importance of different words for correct tax code classification through training methods, so as to avoid people's tax code classification. The subjectivity of code classification, combined with the entity linking method, introduced external kn...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An attention mechanism-based short text classification method for commodity names, including: preprocessing commodity names, removing non-Chinese fields and some special characters; dividing the preprocessed commodity short texts into several words through jieba word segmentation, and removing Stop words, make short complements and long cuts on the obtained words, and unify the length of the words to the pre-set number of words; use the Global Entity Linking algorithm to disambiguate and link each word, and link to Baidu Encyclopedia. External knowledge base, use its results to expand and explain words in short texts, and use Bert to encode the results of entity links for word embedding to obtain corresponding feature vectors; feed the obtained vectors into the Transformer network, and use the self-attention mechanism, Mining the sharing degree of different words for tax code classification, assigning different weights to different words, and finally classifying them through Softmax, and taking the tax code category with the highest probability as the category to which the product name belongs. The present invention also includes a system for implementing the above-described inventive method.

Description

technical field [0001] The invention relates to a method and system for classifying short texts of commodity names based on an attention mechanism, in particular to the classification of tax codes corresponding to commodity names. Use the Chinese text word segmentation tool to segment the text, and cut the words of each product name to make up for short and long cuts to adjust the number of unified words, use Bert to perform word embedding on each word to get the corresponding word vector, and feed the word vector into After Transformer, the attention mechanism is used to obtain the weight information of each word, and finally it is classified by Softmax. The present invention relates to the fields of probability models, speech models, deep learning and the like, in particular to the field of modeling based on deep learning. Background technique [0002] With the continuous development of society, the tax code classification system is becoming more and more complex, and how...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F40/289G06F40/30G06N20/00
CPCG06F16/35G06F40/289G06F40/30G06N20/00
Inventor 高楠陈国鑫陈磊杨归一方添斌俞果
Owner ZHEJIANG UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More