Text sentiment classification method and device, electronic equipment and storage medium

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By converting text into vectors for clustering and generating samples through synonym replacement, and then using a deep learning model to train a text sentiment classification model, the problem of existing models being unable to learn the differences within the same sentiment category is solved, thus improving the accuracy of text sentiment classification.

CN115221274BActive Publication Date: 2026-06-19PING AN TECH (SHENZHEN) CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: PING AN TECH (SHENZHEN) CO LTD
Filing Date: 2022-06-22
Publication Date: 2026-06-19

Application Information

Patent Timeline

22 Jun 2022

Application

19 Jun 2026

Publication

CN115221274B

IPC: G06F16/353; G06F16/334; G06F40/216; G06F40/289; G06F40/30; G06F18/241; G06F18/214; G06F18/213; G06F18/22; G06F18/23; G06N3/045; G06N3/048

AI Tagging

Application Domain

Digital data information retrieval Semantic analysis

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing text sentiment classification models are unable to effectively learn the differences between texts of the same sentiment category, resulting in low classification accuracy.

Method used

The text is converted into vectors and clustered. Text from the text clusters is randomly selected as training text. Synonym replacement is performed to generate positive samples, and similar texts with different sentiment labels are selected as negative samples. A deep learning model is used for training to select a high-accuracy text sentiment classification model.

Benefits of technology

It improves the accuracy of text sentiment classification by learning the differences between texts of the same category, thereby enhancing the training accuracy of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN115221274B_ABST

Patent Text Reader

Abstract

This invention relates to intelligent decision-making technology and discloses a text sentiment classification method, comprising: clustering a text set to obtain multiple text clusters; selecting texts from any one of the text clusters to construct positive and negative samples; training a pre-constructed first model and a second model with the same model to obtain a trained first model and a trained second model; performing model filtering on the trained first model and the trained second model to obtain a text sentiment classification model; when a text to be classified is obtained, classifying the text to be classified using the text sentiment classification model to obtain a sentiment classification result. This invention also relates to blockchain technology, wherein the text clusters can be stored in blockchain nodes. This invention also proposes a text sentiment classification device, apparatus, and medium. This invention can improve the accuracy of text sentiment classification.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to intelligent decision-making technology, and more particularly to a text sentiment classification method, apparatus, electronic device, and storage medium. Background Technology

[0002] With the development of natural language understanding technology, sentiment classification of text has received increasing attention.

[0003] However, existing text sentiment classification directly uses sentiment-tagged text (e.g., labeled as positive, negative, neutral) to train the model, and then uses the trained model to classify text sentiment (e.g., classifying text as positive, negative, neutral). But this training method causes the trained model to be unable to learn the differences between texts of the same sentiment category (e.g., all labeled as neutral, but some are more positive and some are more negative, the model cannot learn the differences between texts with the same sentiment label), thus resulting in low accuracy of text sentiment classification. Summary of the Invention

[0004] This invention provides a text sentiment classification method, apparatus, electronic device, and storage medium, the main purpose of which is to improve the accuracy of text sentiment classification.

[0005] Obtain a text set, wherein each text in the text set has a corresponding sentiment tag;

[0006] Each text is converted into a vector to obtain a text vector, and the text vectors are used to cluster all the texts in the text set to obtain a preset number of text clusters;

[0007] A predetermined number of texts are randomly selected from any of the text clusters to obtain a training text set;

[0008] The texts in the training text set are selected sequentially as training texts, and the training texts are replaced with synonyms to obtain the positive sample texts corresponding to the training texts;

[0009] By filtering out similar texts in the training text set that have different sentiment labels from the training text, negative sample texts corresponding to the training text are obtained.

[0010] Using each training sample and the corresponding positive and negative sample texts, train a pre-constructed first model and a second model that are identical to each other to obtain a trained first model and a trained second model.

[0011] The first and second trained models are screened to obtain a text sentiment classification model.

[0012] When the text to be classified is obtained, the text sentiment classification model is used to classify the text to be classified, and the sentiment classification result is obtained.

[0013] Optionally, the step of clustering all texts in the text set using the text vectors to obtain a preset number of text clusters includes:

[0014] Step A: Randomly select a preset number of text vectors from all the text vectors, and use each selected text vector as its centroid;

[0015] Step B: Calculate the distance between each text vector and each centroid, and aggregate each text vector to the nearest centroid to obtain the corresponding initial vector cluster;

[0016] Step C: Calculate the centroid fluctuation based on the initial vector cluster and the centroid to obtain the centroid fluctuation value;

[0017] Step D: Determine whether the centroid fluctuation value is 0.

[0018] Step E: When the centroid fluctuation value is 0, the initial vector cluster is determined as the text vector cluster, and the text corresponding to all text vectors in each text vector cluster is summarized to obtain the corresponding text cluster;

[0019] Step F: When the centroid fluctuation value is not 0, take the cluster average value as the new centroid and return to step B.

[0020] Optionally, the step of performing synonym replacement on the training text to obtain the corresponding positive sample text includes:

[0021] Replace any one or more words in the training text with a synonym of the corresponding word to obtain the positive sample text corresponding to the training text.

[0022] Optionally, the step of filtering similar texts in the training text set that have different sentiment labels from the training text to obtain negative sample texts corresponding to the training text includes:

[0023] By filtering out texts in the training text set that have different sentiment labels from the training texts, a filtered text set is obtained.

[0024] Calculate the similarity between the training text and each text in the selected text set to obtain the corresponding text similarity.

[0025] The text with the highest text similarity in the selected text set is identified as the negative sample text corresponding to the training text.

[0026] Optionally, the step of training a pre-constructed first model and a second model identical to each training sample and the corresponding positive and negative sample texts to obtain the trained first model and the trained second model includes:

[0027] The first model is used to extract features from the training text to obtain the training text feature vector;

[0028] The second model is used to extract features from the positive sample text to obtain the positive sample text feature vector;

[0029] The negative sample text is feature extracted using the second model to obtain a negative sample text feature vector.

[0030] The similarity between the training text feature vector and the positive sample text feature vector and the negative sample text feature vector is calculated respectively to obtain a first similarity score and a second similarity score;

[0031] Based on a preset loss function, the target loss value is obtained by calculating using the first similarity score and the second similarity score.

[0032] When the target loss value is greater than or equal to the preset loss threshold, the model parameters of the first model and the second model are updated, and the step of randomly selecting a preset number of texts from any of the text clusters is returned.

[0033] When the target loss value is less than the preset loss threshold, the first model and the second model that have been trained are output.

[0034] Optionally, the step of performing model filtering on the trained first model and the trained second model to obtain a text sentiment classification model includes:

[0035] Obtain a test text set, wherein each test text in the test text set has a corresponding sentiment tag;

[0036] The first model, after training, classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the first test accuracy.

[0037] The trained second model classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the second test accuracy.

[0038] Determine whether the accuracy of the first test is greater than the accuracy of the second test, and perform model filtering on the first and second trained models based on the determination result to obtain the text sentiment classification model.

[0039] Optionally, the step of filtering the trained first model and the trained second model based on the test results to obtain the text sentiment classification model includes:

[0040] When the judgment result is that the accuracy of the first test is greater than the accuracy of the second test, the first model that has been trained is determined as the text sentiment classification model;

[0041] If the judgment result is that the accuracy of the first test is not greater than the accuracy of the second test, the trained second model is determined as the text sentiment classification model.

[0042] To address the aforementioned problems, the present invention also provides a text sentiment classification device, the device comprising:

[0043] A positive and negative sample construction module is used to acquire a text set, wherein each text in the text set has a corresponding sentiment tag; convert each text into a vector to obtain a text vector, and use the text vector to cluster all texts in the text set to obtain a preset number of text clusters; randomly select a preset number of texts from any text cluster to obtain a training text set; sequentially select texts from the training text set as training texts, and perform synonym replacement on the training texts to obtain positive sample texts corresponding to the training texts; filter similar texts in the training text set that have different sentiment tags from the training texts to obtain negative sample texts corresponding to the training texts.

[0044] The model training and screening module is used to train a pre-built first model and a second model with the same model using each training sample and the corresponding positive and negative sample texts to obtain a trained first model and a trained second model; and to screen the trained first model and the trained second model to obtain a text sentiment classification model.

[0045] The text sentiment classification module is used to classify the text to be classified using the text sentiment classification model when the text to be classified is obtained, and to obtain the sentiment classification result.

[0046] To address the above problems, the present invention also provides an electronic device, the electronic device comprising:

[0047] Memory, storing at least one computer program; and

[0048] The processor executes the computer program stored in the memory to implement the text sentiment classification method described above.

[0049] To address the aforementioned problems, the present invention also provides a computer-readable storage medium storing at least one computer program, which is executed by a processor in an electronic device to implement the text sentiment classification method described above.

[0050] In this embodiment of the invention, each text is converted into a vector to obtain a text vector, and all texts in the text set are clustered using the text vectors to obtain a preset number of text clusters. A preset number of texts are randomly selected from any text cluster to obtain a training text set. Texts in the training text set are selected sequentially as training texts, and synonym replacement is performed on the training texts to obtain positive sample texts corresponding to the training texts. Similar texts in the training text set that have different sentiment tags from the training texts are filtered to obtain negative sample texts corresponding to the training texts. Each training sample and its corresponding positive and negative sample texts are used to train a pre-constructed first model and a second model with the same model to obtain a trained first model and a trained second model. By clustering the text set and selecting texts from each text cluster to train the model, the model can learn the differences between texts of the same category, resulting in higher accuracy of the trained model and thus improving the accuracy of text sentiment classification. Therefore, the text sentiment classification method, device, electronic device, and readable storage medium proposed in this embodiment of the invention improve the efficiency of text sentiment classification. Attached Figure Description

[0051] Figure 1 This is a flowchart illustrating a text sentiment classification method provided in an embodiment of the present invention.

[0052] Figure 2 This is a schematic diagram of a text sentiment classification device provided in an embodiment of the present invention;

[0053] Figure 3 This is a schematic diagram of the internal structure of an electronic device that implements a text sentiment classification method according to an embodiment of the present invention.

[0054] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation

[0055] It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

[0056] This invention provides a text sentiment classification method. The execution entity of the text sentiment classification method includes, but is not limited to, at least one of the following: a server, a terminal, or an electronic device that can be configured to execute the method provided in this application embodiment. In other words, the text sentiment classification method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes, but is not limited to: a single server, a server cluster, a cloud server, or a cloud server cluster, etc. The server can be an independent server or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.

[0057] Reference Figure 1 The flowchart shown is a schematic diagram of a text sentiment classification method provided in an embodiment of the present invention. In this embodiment, the text sentiment classification method includes:

[0058] S1. Obtain a text set, wherein each text in the text set has a corresponding sentiment tag;

[0059] In this embodiment of the invention, the text set is a collection of texts labeled with sentiment tags, where the sentiment tags are labels that indicate the sentiment type of the text, such as "positive", "negative", or "neutral".

[0060] S2. Convert each text into a vector to obtain a text vector, and use the text vectors to cluster all texts in the text set to obtain a preset number of text clusters;

[0061] In order to filter texts with similar semantics, this embodiment of the invention converts the text into a vector to obtain a text vector.

[0062] Specifically, in this embodiment of the invention, the text is converted into a vector to obtain a text vector, including:

[0063] The text is segmented into words to obtain multiple segmented words;

[0064] Each of the segmented words is converted into a vector to obtain a segmentation vector;

[0065] All the word segmentation vectors are combined according to the order of the corresponding word segments in the text to obtain the text vector.

[0066] The method of converting to a vector is not limited in the embodiments of the present invention.

[0067] In another embodiment of the present invention, each of the texts is converted into a vector to obtain a text vector, including:

[0068] Furthermore, in this embodiment of the invention, the text vectors are used to cluster all texts in the text set to obtain a preset number of text clusters, including:

[0069] Step A: Randomly select a preset number of text vectors from all the text vectors, and use each selected text vector as its centroid;

[0070] Step B: Calculate the distance between each text vector and each centroid, and aggregate each text vector to the nearest centroid to obtain the corresponding initial vector cluster;

[0071] Step C: Calculate the centroid fluctuation based on the initial vector cluster and the centroid to obtain the centroid fluctuation value;

[0072] Step D: Determine whether the centroid fluctuation value is 0.

[0073] Step E: When the centroid fluctuation value is 0, the initial vector cluster is determined as the text vector cluster, and the text corresponding to all text vectors in each text vector cluster is summarized to obtain the corresponding text cluster;

[0074] Step F: When the centroid fluctuation value is not 0, take the cluster average value as the new centroid and return to step B.

[0075] Further, in this embodiment of the invention, centroid fluctuation calculation is performed based on the initial vector cluster and the centroid to obtain the centroid fluctuation value, including:

[0076] Calculate the average value of all text vectors in the initial vector cluster to obtain the cluster average vector;

[0077] For example: the initial vector cluster contains two text vectors, namely... and Then the corresponding cluster mean vector is That is, the cluster average vector is

[0078] The centroid fluctuation value is obtained by calculating the cluster average vector and the vector distance between the centroid and the initial vector cluster.

[0079] In another embodiment of the present invention, the text cluster can be stored in a blockchain node, utilizing the high throughput of the blockchain node to improve the efficiency of data retrieval.

[0080] S3. Randomly select a preset number of texts from any of the text clusters to obtain a training text set;

[0081] In this embodiment of the invention, in order to ensure that the texts in each training text set are in the same text cluster, a preset number of texts are randomly selected from any one of the text clusters to obtain the training text set.

[0082] S4. Select the texts in the training text set as training texts in sequence, and replace the training texts with synonyms to obtain the positive sample texts corresponding to the training texts;

[0083] In detail, in this embodiment of the invention, the training text is replaced with synonyms to obtain the positive sample text corresponding to the training text, including: replacing any one or more words in the training text with the synonyms of the corresponding words to obtain the positive sample text corresponding to the training text.

[0084] S5. Filter similar texts in the training text set that have different sentiment tags from the training text to obtain negative sample texts corresponding to the training text;

[0085] In detail, in this embodiment of the invention, similar texts in the training text set that have different sentiment tags from the training text are filtered to obtain negative sample texts corresponding to the training text, including:

[0086] By filtering out texts in the training text set that have different sentiment labels from the training texts, a filtered text set is obtained.

[0087] Calculate the similarity between the training text and each text in the selected text set to obtain the corresponding text similarity.

[0088] The text with the highest text similarity in the selected text set is identified as the negative sample text corresponding to the training text.

[0089] Specifically, in this embodiment of the invention, calculating the similarity between the training text and each text in the selected text set to obtain the corresponding text similarity includes:

[0090] The training text is converted into a vector to obtain the training text vector;

[0091] Each text in the filtered text set is converted into a vector to obtain the corresponding text vector;

[0092] Calculate the similarity between the training text vector and the text vector of each text in the selected text set to obtain the corresponding text similarity.

[0093] In this embodiment of the invention, no limitation is placed on the method for calculating the similarity between the training text and each text in the selected text set.

[0094] S6. Using each training sample and the corresponding positive and negative sample texts, train the first and second models that are identical to the pre-built model to obtain the trained first model and the trained second model.

[0095] In this embodiment of the invention, the first model and the second model are the same deep learning model. This embodiment of the invention does not limit the deep learning model. Preferably, the deep learning model in this embodiment of the invention is the BERT model.

[0096] In detail, in this embodiment of the invention, a first model and a second model identical to the pre-constructed model are trained using each training sample and the corresponding positive and negative sample texts to obtain the trained first model and the trained second model, including:

[0097] The first model is used to extract features from the training text to obtain the training text feature vector;

[0098] Specifically, in this embodiment of the invention, when the first model is a BERT model, the training text feature vector is the CLS vector output by the first model after the training text is input into the first model.

[0099] The second model is used to extract features from the positive sample text to obtain the positive sample text feature vector;

[0100] The negative sample text is feature extracted using the second model to obtain a negative sample text feature vector.

[0101] The similarity between the training text feature vector and the positive sample text feature vector and the negative sample text feature vector is calculated respectively to obtain a first similarity score and a second similarity score;

[0102] Based on a preset loss function, the target loss value is obtained by calculating using the first similarity score and the second similarity score.

[0103] When the target loss value is greater than or equal to the preset loss threshold, the model parameters of the first model and the second model are updated, and the step of randomly selecting a preset number of texts from any of the text clusters is returned.

[0104] When the target loss value is less than the preset loss threshold, the first model and the second model that have been trained are output.

[0105] Further, in this embodiment of the invention, the similarity between the training text feature vector and the positive sample text feature vector and the negative sample text feature vector is calculated respectively to obtain a first similarity score and a second similarity score, including:

[0106] The training text feature vector is concatenated with the positive sample text feature vector to obtain the positive sample concatenated vector;

[0107] The positive sample concatenation vector is extracted using a multilayer perceptron to obtain a positive sample similarity feature vector;

[0108] The first similarity score is obtained by calculating the similarity feature vector of the positive samples using the softmax function;

[0109] The training text feature vector is concatenated with the negative sample text feature vector to obtain the negative sample concatenated vector;

[0110] The similarity feature vector of the negative sample concatenation vector is obtained by using a multilayer perceptron to extract similar features.

[0111] The second similarity score is obtained by calculating the similarity feature vector of the negative sample using the softmax function.

[0112] Specifically, the loss function described in this embodiment of the invention is:

[0113]

[0114] Where, k + Let k be the first similarity score. - Let τ be the second similarity score, τ be the preset loss parameter, and L be the loss function.

[0115] S7. Perform model filtering on the first model and the second model that have been trained to obtain a text sentiment classification model;

[0116] In detail, in this embodiment of the invention, the trained first model and the trained second model are subjected to model screening to obtain a text sentiment classification model, including:

[0117] Obtain a test text set, wherein each test text in the test text set has a corresponding sentiment tag;

[0118] In this embodiment of the invention, the test text is of the same type as the text but has different content.

[0119] The first model, after training, classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the first test accuracy.

[0120] The trained second model classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the second test accuracy.

[0121] For example: If there are 10 test texts in the test text set, and the classification results of 9 of the test texts are consistent with the corresponding sentiment tags, then the accuracy of the corresponding second test is 9 / 10*100%=90%.

[0122] Determine whether the accuracy of the first test is greater than the accuracy of the second test, and perform model filtering on the first and second trained models based on the determination result to obtain the text sentiment classification model.

[0123] In detail, in this embodiment of the invention, the first trained model and the second trained model are screened based on the test results to obtain the text sentiment classification model, including:

[0124] When the judgment result is that the accuracy of the first test is greater than the accuracy of the second test, the first model that has been trained is determined as the text sentiment classification model;

[0125] If the judgment result is that the accuracy of the first test is not greater than the accuracy of the second test, the trained second model is determined as the text sentiment classification model.

[0126] In another embodiment of the present invention, model filtering is performed on the first trained model and the second trained model to obtain a text sentiment classification model, including:

[0127] Extract the maximum parameter value of each model parameter in the first model and the second model after training;

[0128] Replace the parameter values corresponding to the same model parameters in the first trained model with the maximum parameter value of each of the aforementioned model parameters to obtain the text sentiment classification model; or

[0129] The maximum parameter value of each of the aforementioned model parameters is used to replace the parameter values corresponding to the same model parameters in the trained second model to obtain the text sentiment classification model.

[0130] S8. When the text to be classified is obtained, the text sentiment classification model is used to classify the text to be classified to obtain the sentiment classification result.

[0131] In this embodiment of the invention, the text to be classified is text with the same format as the text but different content and without sentiment tags.

[0132] Furthermore, in this embodiment of the invention, the text to be classified is input into the sentiment classification model to obtain the sentiment classification result.

[0133] Specifically, in this embodiment of the invention, the text sentiment classification model is used to classify the text to be classified, and the sentiment classification result is obtained, including:

[0134] The text sentiment features are used to extract features from the text to be classified, resulting in a text vector to be classified.

[0135] The softmax function is used to calculate the recognition probability of different preset sentiment categories on the text vector to be classified;

[0136] Optionally, the emotion categories described in this embodiment of the invention include: positive, negative, and neutral.

[0137] The emotion category corresponding to the highest recognition probability is identified as the emotion classification result.

[0138] like Figure 2 The diagram shown is a functional block diagram of the text sentiment classification device of the present invention.

[0139] The text sentiment classification device 100 of the present invention can be installed in an electronic device. Depending on the functions implemented, the text sentiment classification device may include a positive and negative sample construction module 101, a model training and screening module 102, and a text sentiment classification module 103. The module described in the present invention may also be referred to as a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can perform a fixed function, and are stored in the memory of the electronic device.

[0140] In this embodiment, the functions of each module / unit are as follows:

[0141] The positive and negative sample construction module 101 is used to acquire a text set, wherein each text in the text set has a corresponding sentiment tag; convert each text into a vector to obtain a text vector, and use the text vector to cluster all texts in the text set to obtain a preset number of text clusters; randomly select a preset number of texts from any text cluster to obtain a training text set; sequentially select texts from the training text set as training texts, and perform synonym replacement on the training texts to obtain positive sample texts corresponding to the training texts; filter similar texts in the training text set that have different sentiment tags from the training texts to obtain negative sample texts corresponding to the training texts;

[0142] The model training and screening module 102 is used to train a first model and a second model with the same pre-constructed model using each training sample and the corresponding positive and negative sample texts to obtain a trained first model and a trained second model; and to screen the trained first model and the trained second model to obtain a text sentiment classification model.

[0143] The text sentiment classification module 103 is used to classify the text to be classified using the text sentiment classification model when the text to be classified is obtained, so as to obtain the sentiment classification result.

[0144] In detail, each module in the text sentiment classification device 100 described in this embodiment of the invention adopts the same approach as described above when in use. Figure 1 The text sentiment classification method described herein uses the same technical means and can produce the same technical effect, so it will not be elaborated here.

[0145] like Figure 3 The diagram shown is a schematic representation of the electronic device that implements the text sentiment classification method of the present invention.

[0146] The electronic device may include a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may also include a computer program, such as a text sentiment classification program, stored in the memory 11 and capable of running on the processor 10.

[0147] The memory 11 includes at least one type of readable storage medium, such as flash memory, portable hard drive, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 11 can be an internal storage unit of an electronic device, such as a portable hard drive. In other embodiments, the memory 11 can be an external storage device of the electronic device, such as a plug-in portable hard drive, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, etc. Furthermore, the memory 11 can include both internal and external storage units of the electronic device. The memory 11 can be used not only to store application software and various types of data installed on the electronic device, such as the code of a text sentiment classification program, but also to temporarily store data that has been output or will be output.

[0148] In some embodiments, the processor 10 may be composed of integrated circuits, such as a single packaged integrated circuit or multiple integrated circuits with the same or different functions, including combinations of one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and various control chips. The processor 10 is the control unit of the electronic device, connecting various components of the entire electronic device through various interfaces and lines. It executes programs or modules (such as text sentiment classification programs) stored in the memory 11, and calls data stored in the memory 11 to perform various functions of the electronic device and process data.

[0149] The communication bus 12 can be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. This bus can be divided into an address bus, a data bus, a control bus, etc. The communication bus 12 is configured to enable communication between the memory 11 and at least one processor 10, etc. For ease of illustration, only one thick line is used in the figure, but this does not indicate that there is only one bus or one type of bus.

[0150] Figure 3 Only electronic devices with components are shown; it will be understood by those skilled in the art that... Figure 3 The structure shown does not constitute a limitation on the electronic device and may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0151] For example, although not shown, the electronic device may also include a power supply (such as a battery) to power the various components. Preferably, the power supply can be logically connected to the at least one processor 10 through a power management device, thereby enabling functions such as charging management, discharging management, and power consumption management. The power supply may also include one or more DC or AC power supplies, recharging devices, power fault classification circuits, power converters or inverters, power status indicators, and other arbitrary components. The electronic device may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be described in detail here.

[0152] Optionally, the communication interface 13 may include a wired interface and / or a wireless interface (such as a Wi-Fi interface, a Bluetooth interface, etc.), which is typically used to establish communication connections between the electronic device and other electronic devices.

[0153] Optionally, the communication interface 13 may further include a user interface, which may be a display, an input unit (such as a keyboard), or, optionally, a standard wired or wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, or an OLED (Organic Light-Emitting Diode) touchscreen, etc. The display may also be appropriately referred to as a screen or display unit, used to display information processed in the electronic device and to display a visual user interface.

[0154] It should be understood that the embodiments described are for illustrative purposes only and are not limited to this structure in the scope of the patent application.

[0155] The text sentiment classification program stored in the memory 11 of the electronic device is a combination of multiple computer programs, which, when run in the processor 10, can achieve the following:

[0156] Obtain a text set, wherein each text in the text set has a corresponding sentiment tag;

[0157] Each text is converted into a vector to obtain a text vector, and the text vectors are used to cluster all the texts in the text set to obtain a preset number of text clusters;

[0158] A predetermined number of texts are randomly selected from any of the text clusters to obtain a training text set;

[0159] The texts in the training text set are selected sequentially as training texts, and the training texts are replaced with synonyms to obtain the positive sample texts corresponding to the training texts;

[0160] By filtering out similar texts in the training text set that have different sentiment labels from the training text, negative sample texts corresponding to the training text are obtained.

[0161] Using each training sample and the corresponding positive and negative sample texts, train a pre-constructed first model and a second model that are identical to each other to obtain a trained first model and a trained second model.

[0162] The first and second trained models are screened to obtain a text sentiment classification model.

[0163] When the text to be classified is obtained, the text sentiment classification model is used to classify the text to be classified, and the sentiment classification result is obtained.

[0164] Specifically, the processor 10's implementation method of the above-mentioned computer program can be found in [reference needed]. Figure 1 The descriptions of the relevant steps in the corresponding embodiments are not repeated here.

[0165] Furthermore, if the modules / units integrated into the electronic device are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. The computer-readable medium can be non-volatile or volatile. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a portable hard drive, a magnetic disk, an optical disk, a computer memory, or a read-only memory (ROM).

[0166] Embodiments of the present invention may also provide a computer-readable storage medium storing a computer program, which, when executed by a processor of an electronic device, can perform the following:

[0167] Obtain a text set, wherein each text in the text set has a corresponding sentiment tag;

[0168] Each text is converted into a vector to obtain a text vector, and the text vectors are used to cluster all the texts in the text set to obtain a preset number of text clusters;

[0169] A predetermined number of texts are randomly selected from any of the text clusters to obtain a training text set;

[0170] The texts in the training text set are selected sequentially as training texts, and the training texts are replaced with synonyms to obtain the positive sample texts corresponding to the training texts;

[0171] By filtering out similar texts in the training text set that have different sentiment labels from the training text, negative sample texts corresponding to the training text are obtained.

[0172] Using each training sample and the corresponding positive and negative sample texts, train a pre-constructed first model and a second model that are identical to each other to obtain a trained first model and a trained second model.

[0173] The first and second trained models are screened to obtain a text sentiment classification model.

[0174] When the text to be classified is obtained, the text sentiment classification model is used to classify the text to be classified, and the sentiment classification result is obtained.

[0175] Furthermore, the computer's usable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store the operating system, applications required for at least one function, etc.; and the data storage area may store data created based on the use of blockchain nodes, etc.

[0176] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of modules is only a logical functional division, and other division methods may be used in actual implementation.

[0177] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0178] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0179] Furthermore, the functional modules in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional modules.

[0180] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.

[0181] Therefore, the embodiments should be considered exemplary and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be embraced within the invention. No appended diagram markings in the claims should be construed as limiting the scope of the claims.

[0182] The blockchain referred to in this invention is a novel application model of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanisms, and encryption algorithms. Essentially, a blockchain is a decentralized database, a chain of data blocks linked together using cryptographic methods. Each data block contains information about a batch of network transactions, used to verify the validity of the information (anti-counterfeiting) and generate the next block. A blockchain can include an underlying blockchain platform, a platform product service layer, and an application service layer.

[0183] Furthermore, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices recited in a system claim may also be implemented by a single unit or device through software or hardware. The term "second class" is used to indicate names and does not indicate any specific order.

[0184] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A text sentiment classification method, characterized in that, The method includes: Obtain a text set, wherein each text in the text set has a corresponding sentiment tag; Each text is converted into a vector to obtain a text vector, and the text vectors are used to cluster all the texts in the text set to obtain a preset number of text clusters; A predetermined number of texts are randomly selected from any of the text clusters to obtain a training text set; The texts in the training text set are selected sequentially as training texts, and the training texts are replaced with synonyms to obtain the positive sample texts corresponding to the training texts; Texts with different sentiment tags from the training text in the training text set are selected to obtain a filtered text set. The similarity between the training text and each text in the filtered text set is calculated to obtain the corresponding text similarity. The text with the highest text similarity in the filtered text set is identified as the negative sample text corresponding to the training text. The first and second models with the same pre-built model are trained using each training sample and the corresponding positive and negative sample texts. The first similarity score between the concatenation of the text feature vector of the training sample and the text feature vector of the positive sample text is calculated, and the second similarity score between the text feature vector of the training sample and the text feature vector of the negative sample text is calculated. Based on a preset loss function, the target loss value is calculated using the first and second similarity scores. When the target loss value is less than a preset loss threshold, training is stopped, and the first and second models that have been trained are output. The first and second trained models are screened to obtain a text sentiment classification model. When the text to be classified is obtained, the text sentiment classification model is used to classify the text to be classified, and the sentiment classification result is obtained.

2. The text sentiment classification method of claim 1, wherein, The step of clustering all texts in the text set using the text vectors to obtain a preset number of text clusters includes: Step A: Randomly select a preset number of text vectors from all the text vectors, and use each selected text vector as its centroid; Step B: Calculate the distance between each text vector and each centroid, and aggregate each text vector to the nearest centroid to obtain the corresponding initial vector cluster; Step C: Calculate the centroid fluctuation based on the initial vector cluster and the centroid to obtain the centroid fluctuation value; Step D: Determine whether the centroid fluctuation value is 0. Step E: When the centroid fluctuation value is 0, the initial vector cluster is determined as a text vector cluster, and the text corresponding to all text vectors in each text vector cluster is summarized to obtain the corresponding text cluster; Step F: When the centroid fluctuation value is not 0, take the cluster average value as the new centroid and return to step B.

3. The text sentiment classification method of claim 1, wherein, The step of performing synonym replacement on the training text to obtain the corresponding positive sample text includes: Replace any one or more words in the training text with a synonym of the corresponding word to obtain the positive sample text corresponding to the training text.

4. The text sentiment classification method of claim 1, wherein, The method further includes: The first model is used to extract features from the training text to obtain the training text feature vector; The second model is used to extract features from the positive sample text to obtain the positive sample text feature vector; The negative sample text is then used to extract features using the second model to obtain a negative sample text feature vector.

5. The text sentiment classification method as described in claim 1, characterized in that, The process of filtering the trained first model and the trained second model to obtain a text sentiment classification model includes: Obtain a test text set, wherein each test text in the test text set has a corresponding sentiment tag; The first model, after training, classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the first test accuracy. The trained second model classifies each test text in the test text set to determine whether the classification result is consistent with the sentiment label of the corresponding test text, thereby obtaining the second test accuracy. Determine whether the accuracy of the first test is greater than the accuracy of the second test, and perform model filtering on the first and second trained models based on the determination result to obtain the text sentiment classification model.

6. The method of text sentiment classification as claimed in claim 5, wherein, The step of filtering the trained first model and the trained second model based on the judgment result to obtain the text sentiment classification model includes: When the judgment result is that the accuracy of the first test is greater than the accuracy of the second test, the first model that has been trained is determined as the text sentiment classification model; If the judgment result is that the accuracy of the first test is not greater than the accuracy of the second test, the trained second model is determined as the text sentiment classification model.

7. A text sentiment classification apparatus characterized by comprising: include: A positive and negative sample construction module is used to acquire a text set, wherein each text in the text set has a corresponding sentiment tag; convert each text into a vector to obtain a text vector, and use the text vector to cluster all texts in the text set to obtain a preset number of text clusters; randomly select a preset number of texts from any text cluster to obtain a training text set; sequentially select texts from the training text set as training texts, and perform synonym replacement on the training texts to obtain the positive sample texts corresponding to the training texts; The positive and negative sample construction module is also used to filter texts in the training text set that have different sentiment tags from the training text to obtain a filtered text set, calculate the similarity between the training text and each text in the filtered text set to obtain the corresponding text similarity, and identify the text with the highest text similarity in the filtered text set as the negative sample text corresponding to the training text. The model training and screening module is used to train a pre-built first model and a second model with the same model using each training sample and the corresponding positive and negative sample texts. It calculates a first similarity score between the concatenation of the text feature vector of the training sample and the text feature vector of the positive sample text, and calculates a second similarity score between the text feature vector of the training sample and the text feature vector of the negative sample text. Based on a preset loss function, it calculates a target loss value using the first and second similarity scores. When the target loss value is less than a preset loss threshold, it stops training and outputs the first model and the second model that have been trained. The model training and screening module is also used to screen the first model and the second model that have been trained to obtain a text sentiment classification model. The text sentiment classification module is used to classify the text to be classified using the text sentiment classification model when the text to be classified is obtained, and to obtain the sentiment classification result.

8. An electronic device, comprising: The electronic device includes: At least one processor; and, A memory that is communicatively connected to the at least one processor; The memory stores a computer program that is executed by the at least one processor, which enables the at least one processor to perform the text sentiment classification method as described in any one of claims 1 to 6.

9. A computer readable storage medium storing a computer program, wherein the computer program comprises program instructions configured to cause a processor to perform the method according to any one of claims 1 to 8. When the computer program is executed by a processor, it implements the text sentiment classification method as described in any one of claims 1 to 6.