Computer-implemented method and apparatus for processing data
By using a weighted combination method of total word vectors and optimizing word vectors with recurrent neural networks and attention functions, the high-dimensionality problem is solved, and the efficiency and accuracy of text classification are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ROBERT BOSCH GMBH
- Filing Date
- 2020-10-28
- Publication Date
- 2026-06-12
AI Technical Summary
The concatenated word representations used in existing technologies result in high-dimensional vectors, which increases the number of parameters that must be learned, and word or context-dependent word representations are not optimized enough.
A weighted combination method of total word vectors is adopted. By using a recurrent neural network and attention function to weight word vectors according to the characteristics of text modules, a total word vector is constructed for classification.
It effectively reduced the number of parameters, optimized the use of word vectors, and improved the efficiency and accuracy of text classification.
Smart Images

Figure CN112749276B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to computer-implemented methods and apparatus for processing text data, particularly using artificial neural networks, wherein the text data comprises multiple text modules. Background Technology
[0002] For example, recurrent neural networks are used in conjunction with Conditional Random Field Classifiers (CRF) to process text data. Here, each word of the text is represented by a distributed vector. For this purpose, concatenated word representations are used, for example, which have been trained on large amounts of unlabeled text data. An example of this is disclosed in Akbik et al.'s 2018 paper, "Contextual String Embeddings for Sequence Labeling".
[0003] "A novel neural sequence model with multiple attentions for word sense disambiguations" by Ahmed Mahtab et al. 2018 17 th The IEEE International Conference on Machine Learning and Applications (ICMLA) discusses the application of attention mechanisms in models for word ambiguity resolution.
[0004] The paper "Dynamic Meta-Embeddings for Improved Sentence Representations" by Douwe Kiela et al., Cornell University Library, 2018, describes a method for monitoring learning in NLP systems.
[0005] These concatenated word representations used in existing technologies disadvantageously involve high-dimensional vectors. This further increases the number of parameters that must be learned in order to perform classification based on word representations. Furthermore, word representations that are relevant to the word or its context are desirable. Summary of the Invention
[0006] This is achieved through the subject matter of this public disclosure.
[0007] This disclosure relates to a computer-implemented method for processing text data comprising multiple text modules, wherein a representation of the text is provided, and wherein a model is used to predict the classification of each text module for the text based on the text representation, wherein providing the text representation includes providing a total word vector for each text module for the text, wherein the total word vector consists of at least two, preferably multiple, word vectors, and the respective word vectors are weighted according to the characteristics of each text module.
[0008] Preferably, text modules are identified according to a model and assigned to a category in a set of categories. A text module is, for example, a word in the text. The model classifies each word in the current text individually into belonging to a pre-given set of categories, such as people, places, materials, etc.
[0009] Therefore, the total word vector is not a concatenation of individual word vectors, but is advantageously constructed as a weighted combination of word vectors according to the characteristics of each text module. Advantageously, this makes it possible to weight the word vectors in relation to words and / or domains, and thus provides the possibility of preferentially selecting or ignoring specific word vectors in relation to words and / or domains.
[0010] The model preferably includes a recurrent neural network. This model is particularly well-suited for classification.
[0011] According to one implementation, the method further includes calculating weights for each word vector. The model, for example, includes an attention function configured to weight the individual word vectors of a text module according to the weights.
[0012] According to one implementation, weights for each word vector are additionally calculated based on the respective word vectors.
[0013] According to one implementation, a first characteristic of each text module represents the relative frequency of the text module in the text, and / or a second characteristic of each text module represents the length of the text module, and / or a third characteristic of each text module represents the form of the text module, and / or a fourth characteristic of each text module represents the syntactic category of the text module. These characteristics are advantageously used to calculate the weights of the respective word vectors.
[0014] According to one implementation, the weights of the word vectors in the total word vector are transformed into a range of values between 0 and 1. For example, the weights are transformed into values between 0 and 1 using the Softmax function, where the total value is 1.
[0015] According to one implementation, a total word vector for each text module is constructed by summing at least two, preferably multiple, weighted word vectors. The word vectors are multiplied by their weights and summed to obtain the total word vector. Thus, for each text module, the total word vector is used in the representation of the text.
[0016] This disclosure also relates to a method for machine learning, wherein a trained model is used to implement a method according to an embodiment for automatically classifying text modules of a text based on the representation of the text, wherein the representation includes a total word vector for each text module of the text, wherein the total word vector consists of at least two, preferably multiple, word vectors, and the respective word vectors are weighted according to the characteristics of each text module.
[0017] According to one implementation, the model includes an attention function, and the method includes training the attention function. The attention function is, for example, constructed to weight the individual word vectors of a text module according to weights. Advantageously, the model is trained on text data to compute the weights used for optimizing the individual word vectors.
[0018] According to one implementation, the model is trained to weight specific word vectors, especially domain-specific word vectors, more strongly than other word vectors, especially domain-unspecific word vectors, for specific text modules.
[0019] This disclosure relates to an apparatus for processing text data, wherein the apparatus includes a memory and a processor for an artificial neural network, the memory and processor being configured to implement a method according to an embodiment.
[0020] This disclosure relates to an apparatus for machine learning, wherein the apparatus includes a memory and a processor for an artificial neural network, the memory and processor being configured to implement a method according to an embodiment.
[0021] This disclosure further relates to a computer program that includes computer-readable instructions that, when implemented on a computer, run a method according to an embodiment.
[0022] Other embodiments relate to computer-implemented methods for processing text data according to embodiments for applications such as automatically extracting information from text data about entities, especially people, places, organizations, etc., and / or about concepts, especially proteins, chemicals, materials, etc.
[0023] Other implementations involve computer-implemented methods for creating databases, particularly structured knowledge databases, and especially knowledge graphs, within a model. The application of these methods, according to the implementation, is used to extract information from textual data and to use said information to create databases, particularly structured knowledge databases, and especially knowledge graphs.
[0024] The method described in the implementation can be applied to texts in different languages and from different domains.
[0025] Furthermore, the method according to the implementation can also be applied to the fields of computational linguistics, natural language processing, and especially syntactic analysis, relation extraction, and text summarization. Attached Figure Description
[0026] Other advantageous embodiments will be derived from the following description and accompanying drawings. In the drawings:
[0027] Figure 1 A schematic diagram of a device for processing text data is shown.
[0028] Figure 2 A schematic diagram of a device used for machine learning is shown;
[0029] Figure 3 The steps of a method for processing text data are shown;
[0030] Figure 4 The steps of a method for machine learning are shown; and
[0031] Figure 5 This is a schematic diagram showing the representation of a text module. Detailed Implementation
[0032] Figure 1 A device 100 for processing text data 102 is shown.
[0033] The device 100 includes a memory 106 and a processor 104 for a model, particularly a recurrent neural network. In this example, the device 100 includes an interface 108 for input and output data. The processor 104, memory 106, and interface 108 are connected via at least one data line 110, particularly a data bus. The processor 104 and memory 106 can be integrated into a microcontroller. The device 100 can also be configured as a distributed system within a server infrastructure. These are configured for implementation according to the following... Figure 3The method described is for processing text data 102. Data 102' derived from processing the text 102, which is input to interface 108, is... Figure 1 The output of interface 108 is shown in the middle.
[0034] Figure 2 A device 200 for machine learning is schematically illustrated. The device 200 includes a memory 204 and a processor 202 for a neural network. In this example, the device 200 includes an interface 206 for input and output data. The processor 202, memory 204, and interface 206 are connected via at least one data line 208. The device 200 can also be configured as a distributed system within a server infrastructure. These are configured to implement methods for machine learning, as described below. Figure 4 To describe.
[0035] Figure 3 The steps in method 300 for processing text data are shown.
[0036] Method 300 for processing text data comprising multiple text modules includes a step 310 for providing a representation of the text and a step 320 for using a model that predicts a classification for each text module of the text based on the text representation. Step 320 may be performed, for example, using a Conditional Random Field Classifier (CRF).
[0037] Providing a representation of the text 310 includes providing a total word vector for each text module of the text 310', wherein the total word vector consists of at least two, preferably multiple, word vectors, and the word vectors are weighted according to the characteristics of each text module.
[0038] Preferably, text modules are identified according to a model and assigned to a category in a set of categories. A text module is, for example, a word in a text.
[0039] Providing 310' total word vectors for each text module advantageously includes the following steps:
[0040] Provide at least two, or more, word vectors for each text module.
[0041] The 310' option is advantageously further provided by means of, in particular, transforming the word vectors to the same dimension using linear transformations.
[0042] In another step 312, weights for the word vectors are calculated. According to this embodiment, weights for the respective word vectors are calculated based on the characteristics of each text module. The model includes, for example, an attention function constructed to weight the individual word vectors of the text module according to the weights.
[0043] According to one implementation, a first characteristic of each text module represents the relative frequency of the text module in the text, and / or a second characteristic of each text module represents the length of the text module, and / or a third characteristic of each text module represents the form of the text module, and / or a fourth characteristic of each text module represents the syntactic category of the text module. These characteristics are advantageously used to calculate the weights of the respective word vectors.
[0044] Advantageously, different word vectors are weighted differently according to the characteristics of their respective text modules.
[0045] For example, it may be advantageous to more strongly weight letter-based word vectors for text modules with low relative frequencies. This can advantageously compensate for the fact that text modules with low relative frequencies may be poorly mapped or partially or not detected at all in word-based word vectors.
[0046] For example, it may be advantageous to more strongly weight letter-based or word-part-based word vectors for longer text modules. This can advantageously compensate for the fact that longer text modules, especially word-based word vectors, may be poorly mapped or partially or not detected at all.
[0047] The form of a text module is understood as its representation, such as uppercase and lowercase letters, numbers and / or punctuation marks and / or unknown symbols, especially the presence of Unicode symbols. Depending on the form, it may also be advantageous to weight different word vectors with varying intensities.
[0048] The syntactic category (part-of-speech) of a text module is understood as the word type of a language assigned based on common grammatical features. It may also be advantageous to weight different word vectors with varying degrees of intensity based on the syntactic category of the text module.
[0049] According to one implementation, weights for each word vector are additionally calculated based on the respective word vectors.
[0050] In another step 313, the weights of the word vectors in the total word vectors are transformed into a range of values between 0 and 1. For example, the weights are transformed into values between 0 and 1 using the Softmax function, where the total value is 1.
[0051] In step 314, the sum of at least two, preferably multiple, weighted word vectors constitutes the total word vector for each text module. The word vectors are multiplied by their weights and summed to obtain the total word vector. Thus, for each text module, the total word vector is used in the representation of the text.
[0052] Therefore, the total word vector is not a concatenation of individual word vectors, but is advantageously constructed as a weighted combination of word vectors according to the characteristics of each text module. Advantageously, this makes it possible to weight the word vectors in relation to words and / or domains, and thus provides the possibility of preferentially selecting or ignoring specific word vectors in relation to words and / or domains.
[0053] Based on the text representation provided above, the model predicts the classification for each text module. To this end, the model classifies each word in the current text as belonging to a pre-given set of categories, such as people, places, materials, etc.
[0054] Figure 4 The steps in method 400 for machine learning are shown.
[0055] The method 400 for machine learning includes training a model 410 to implement the method according to the implementation of the method for automatically classifying text modules of text based on a representation of the text, wherein the representation includes a total word vector for each text module of the text, wherein the total word vector consists of at least two, preferably multiple, word vectors, and the respective word vectors are weighted according to the characteristics of each text module.
[0056] Method 400 includes an attention function for training the model 411. The attention function is, for example, constructed to weight the individual word vectors of the text module according to their weights. Advantageously, the model is trained on the text data to compute the weights used for optimizing the individual word vectors.
[0057] According to one implementation, the model is trained to weight specific word vectors, especially domain-specific word vectors, more strongly than other word vectors, especially domain-unspecific word vectors, for specific text modules.
[0058] at last, Figure 5 A schematic diagram of the total word vector 500 is shown. According to the embodiment shown, the total word vector 500 includes four word vectors 510, wherein each word vector 510 is multiplied by a weight 520.
[0059] Other embodiments relate to a computer-implemented method 300 for processing text data according to the embodiments, for applications such as automatically extracting information from text data 102 about entities, especially people, places, organizations, etc. and / or about concepts, especially proteins, chemicals, materials, etc.
[0060] Other embodiments involve a computer-implemented method 300 used in a model to create a database, particularly a structured knowledge database, particularly a knowledge graph, wherein the application, according to the method of the embodiment, is used to extract information from text data 102 and to use said information to create a database, particularly a structured knowledge database, particularly a knowledge graph.
Claims
1. A computer-implemented method (300) for processing text data (102) comprising multiple words, for automatically extracting information from the text data (102) and / or for creating a database in a model, the method comprising: Provide (310) a representation of the text data (102), wherein for each of the plurality of words, at least two word vectors (510) are generated, a weight is calculated for each of the at least two word vectors, and a total word vector (500) is generated as a weighted combination of the at least two word vectors (510) based on the calculated weights, wherein each calculated weight depends on at least one characteristic of the respective word in the text data (102) and different weights are applied to different word vectors, and The model is used to predict the classification of each of the plurality of words based on the representation of the text data (102).
2. The method (300) according to claim 1, wherein the database is a structured knowledge database or a knowledge graph.
3. The method (300) according to any one of the preceding claims, wherein a weight (520) for the respective word vector (510) is additionally calculated based on the respective word vector (510).
4. The method (300) according to any one of claims 1 to 2, wherein a first characteristic of each word represents the relative frequency of the word in the text, and / or a second characteristic of each word represents the length of the word and / or a third characteristic of each word represents the form of the word and / or a fourth characteristic of each word represents the syntactic category of the word.
5. The method (300) according to any one of claims 1 to 2, wherein the weights (520) of the word vectors (510) of the total word vectors (500) are transformed (313) into a range of values between 0 and 1.
6. The method (300) according to any one of claims 1 to 2, wherein the sum of more than two weighted word vectors (510) constitutes (314) the total word vector (500) for each word.
7. The method (300) according to any one of claims 1 to 2, wherein the method is used to extract information from text data (102), and the information is used to create a database.
8. A computer-implemented method (400) for machine learning, wherein a model is trained (410) to implement the method (300) according to any one of claims 1 to 7 to automatically classify words of the text (102) based on a representation of the text (102), wherein the representation includes a total word vector (500) for each word of the text (102), wherein at least two word vectors (510) are generated for each word, a weight is calculated for each of the at least two word vectors, and a total word vector (500) is generated based on the calculated weights as a weighted combination of the at least two word vectors (510), wherein each calculated weight depends on at least one attribute of the respective word in the text (102).
9. The method (400) of claim 8, wherein the model includes an attention function, and the method (400) includes training (411) the attention function.
10. The method (400) of claim 8, wherein the model is trained to weight domain-specific word vectors (510) more strongly than domain-nonspecific word vectors (510) for domain-specific words.
11. An apparatus (100) for processing text data (102), wherein the apparatus (100) includes a memory (106) and a processor (104) for an artificial neural network, the memory and processor being configured to implement the method (300) according to any one of claims 1 to 7.
12. An apparatus (200) for machine learning, wherein the apparatus (200) includes a memory (204) and a processor (202) for an artificial neural network, the memory and processor being configured to implement the method (400) according to any one of claims 8 to 10.
13. A computer program product comprising a computer program, wherein the computer program includes computer-readable instructions that, when implemented on a computer, execute the method (300) according to any one of claims 1 to 7.