Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Comment label extraction method based on albert pre-training model and kmean algorithm

A comment tag, pre-training technology, applied in computing, computer parts, character and pattern recognition, etc., can solve problems such as slow training speed, many arithmetic resources, and text correlation logic deviation, and achieve accurate prediction accuracy. , the training speed is fast, the model is small

Pending Publication Date: 2021-01-12
深圳市洪堡智慧餐饮科技有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In the field of food delivery, for food delivery review data, the content of customer comments serves as a communication bridge with merchants. Extracting useful information from reviews plays an important role in improving the situation of merchants. Applying natural language processing technology to labels of takeaway reviews The extraction can achieve a relatively ideal effect. In this process, the tfidf algorithm is generally used for identification and analysis, but when the tf continues to increase, the TF Score will increase without limit, resulting in a logical deviation of the text correlation. The takeaway comment data is generally short in length , so the information used in the previous article is limited, the general training model has limited effect in short time series, the training speed is slow, and the calculation resources consumed are many

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Comment label extraction method based on albert pre-training model and kmean algorithm
  • Comment label extraction method based on albert pre-training model and kmean algorithm
  • Comment label extraction method based on albert pre-training model and kmean algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0033] The technical solution provided in this embodiment is: a method for extracting comment labels based on the albert pre-training model and the kmean algorithm, and the steps of the method are as follows:

[0034] Step 1. Crawl the review data of the store and import the data into the database;

[0035] Step 2, performing data cleaning on the data in the database;

[0036] Step 3, use the albert pre-training model to obtain word vectors;

[0037] Step 4. Evaluate the average accuracy of the model.

[0038] As a preference of this embodiment, the cleaning step in step 2 includes: removing stop words, removing html format, removing spaces, manually labeling a small amount of data, importing the cleaned data into the database, and analyzing with actual examples below , the data is shown in the table below:

[0039]

[0040]

[0041] As a preference of this embodiment, the specific operation of step 3 is: based on a small amount of labeled data, take the last layer of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing, and in particular relates to a comment label extraction method based on an albert pre-training model and a kmean algorithm.The method comprises the steps of 1, crawling comment data of a store, and importing the data into a database; 2, performing data cleaning on the data of the database; 3, obtaining a word vector by utilizing an albert pre-training model; and 4, evaluating the average accuracy of the model. According to the invention, the albert is used as a pre-training model, the model is small, the training speed is high, the effect is better under the condition of large-scale data, the kmean algorithm is used as an unsupervised clustering algorithm, the last layer of word vector of the albert is obtained as input, finally clustering is carried out through the kmean clustering algorithm, and more accurate estimation accuracy is achieved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for extracting comment labels based on an albert pre-training model and a kmean algorithm. Background technique [0002] In the field of food delivery, for food delivery review data, the content of customer comments serves as a communication bridge with merchants. Extracting useful information from reviews plays an important role in improving the situation of merchants. Applying natural language processing technology to labels of takeaway reviews The extraction can achieve a relatively ideal effect. In this process, the tfidf algorithm is generally used for identification and analysis, but when the tf continues to increase, the TF Score will increase without limit, resulting in a logical deviation of the text correlation. The takeaway comment data is generally short in length , so the information used in the previous article is limited, the general tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F16/951G06F16/215G06K9/62
CPCG06F40/289G06F16/951G06F16/215G06F18/23213
Inventor 廖杰邓方华张衍彬
Owner 深圳市洪堡智慧餐饮科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products