Information extraction method and device, processing core, electronic equipment and readable medium
By combining short-term memory information extraction and long-term memory information extraction methods, the problems of gradient vanishing and gradient explosion in recurrent neural networks are solved, improving information extraction efficiency and computation speed.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- LYNXI TECH CO LTD
- Filing Date
- 2021-10-27
- Publication Date
- 2026-06-23
Smart Images

Figure CN116052028B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of processor technology, and in particular to an information extraction method and apparatus, a processing core, an electronic device, and a readable medium. Background Technology
[0002] Recurrent Neural Networks (RNNs) are neural networks used to process sequential data. Currently, RNNs are widely used in fields such as natural language processing, video data processing, and audio data processing.
[0003] When processing data, recurrent neural networks (RNNs) need to extract long and short-term sequence information from the input sequence data to obtain the contextual semantics of the input sequence data. However, when using RNNs, the stacking of multiple hidden layers can lead to gradient explosion or vanishing gradients, affecting the network's learning performance. Summary of the Invention
[0004] This disclosure provides an information extraction method and apparatus, a processing core, an electronic device, and a readable medium.
[0005] In a first aspect, this disclosure provides an information extraction method, which includes: acquiring sequence data from a sequence dataset; extracting short-term memory information from the acquired current sequence data to obtain currently forgettable short-term memory information and currently easily remembered short-term memory information; extracting long-term memory information based on the currently forgettable short-term memory information, the currently easily remembered short-term memory information, and the current sequence data to obtain current long-term memory information; acquiring selective forgetting information corresponding to the currently easily remembered short-term memory information, and combining the current long-term memory information and selective forgetting information to obtain the information extraction result of the current sequence data; acquiring the next sequence data from the sequence dataset, until the number of acquisitions equals the total number of sequence data in the sequence dataset, and using the information extraction result of the last acquired sequence data as the information extraction result of the sequence dataset.
[0006] Secondly, this disclosure provides an information extraction device, comprising: a short-term information extraction module, used to acquire sequence data from a sequence dataset, and extract short-term memory information from the acquired current sequence data to obtain currently forgettable short-term memory information and currently easily remembered short-term memory information; a long-term information extraction module, used to extract long-term memory information based on the currently forgettable short-term memory information, the currently easily remembered short-term memory information, and the current sequence data to obtain current long-term memory information; an information combining module, used to acquire selective forgetting information corresponding to the currently easily remembered short-term memory information, and combine the current long-term memory information and selective forgetting information to obtain the information extraction result of the current sequence data; and a result determination module, used to acquire the next sequence data from the sequence dataset until the number of acquisitions equals the total number of sequence data in the sequence dataset, and use the information extraction result of the last acquired sequence data as the information extraction result of the sequence dataset.
[0007] Thirdly, this disclosure provides a processing core that includes the aforementioned information extraction device.
[0008] Fourthly, this disclosure provides an electronic device comprising: a plurality of processing cores; and an on-chip network configured to interact with data between the plurality of processing cores and external data; wherein one or more processing cores store one or more instructions, and the one or more instructions are executed by the one or more processing cores to enable the one or more processing cores to perform the aforementioned information extraction method.
[0009] Fifthly, this disclosure provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processing core, implements the above-described information extraction method.
[0010] The information extraction method, apparatus, processing core, electronic device, and readable medium provided in this disclosure can perform short-term memory information extraction once for each current sequence data. That is, at each time step, a short-term memory stimulus is added, and then long-term memory information is extracted based on the short-term memory information extraction result and the current sequence data. The current short-term memory information and long-term memory information are then fused to obtain the information extraction result of the current sequence data, thereby increasing the overall memory retention ratio and improving the ability to extract both short-term and long-term memory information. Furthermore, the extraction of short-term memory information in the embodiments of this disclosure is relatively independent, that is, the extraction of short-term memory information does not depend on the memory information of the previous sequence data. Therefore, parallel computing can be performed, thereby solving the problem of low computational efficiency of recurrent neural networks with gating units and improving computational efficiency.
[0011] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of this disclosure, nor is it intended to limit the scope of this disclosure. Other features of this disclosure will become readily apparent from the following description. Attached Figure Description
[0012] The accompanying drawings are provided to further illustrate the present disclosure and form part of the specification. They are used together with the embodiments of the present disclosure to explain the disclosure and do not constitute a limitation thereof. The above and other features and advantages will become more apparent to those skilled in the art from the detailed description of exemplary embodiments with reference to the accompanying drawings, in which:
[0013] Figure 1 A flowchart illustrating an information extraction method provided in this embodiment of the disclosure;
[0014] Figure 2 This diagram illustrates a specific flowchart of short-term memory information extraction provided in an embodiment of this disclosure.
[0015] Figure 3 This diagram illustrates a specific flowchart of the long-term memory information extraction provided in an embodiment of this disclosure.
[0016] Figure 4 This diagram illustrates a specific flowchart of the update memory state provided in an embodiment of the present disclosure;
[0017] Figure 5 This diagram illustrates the network structure of the pulse memory loop unit provided in an embodiment of the present disclosure.
[0018] Figure 6 This diagram illustrates a multilayer pulse memory loop network provided in an embodiment of the present disclosure.
[0019] Figure 7 This diagram illustrates a bidirectional pulse memory loop network provided in an embodiment of the present disclosure.
[0020] Figure 8 This illustrates the long-short-term memory fusion curve of the pulse memory loop unit in an embodiment of this disclosure;
[0021] Figure 9 This is a block diagram of an information extraction device provided in an embodiment of the present disclosure;
[0022] Figure 10 This is a block diagram of an electronic device provided in an embodiment of the present disclosure. Detailed Implementation
[0023] To enable those skilled in the art to better understand the technical solutions of this disclosure, exemplary embodiments of this disclosure are described below with reference to the accompanying drawings, including various details of the embodiments of this disclosure to aid understanding. These should be considered merely exemplary. Therefore, those skilled in the art should recognize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of this disclosure. Similarly, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.
[0024] Where there is no conflict, the various embodiments of this disclosure and the features thereof in the embodiments may be combined with each other.
[0025] As used herein, the term “and / or” includes any and all combinations of one or more related enumerated entries.
[0026] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit this disclosure. As used herein, the singular forms “a” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “made of” are used in this specification, they specify the presence of features, integrals, steps, operations, elements, and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof. Words such as “connected” or “linked” are not limited to physical or mechanical connections but can include electrical connections, whether direct or indirect.
[0027] Unless otherwise specified, all terms used herein (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and this disclosure, and will not be interpreted as having an idealized or overly formal meaning, unless expressly so defined herein.
[0028] In this embodiment of the disclosure, a recurrent neural network can be used to process time-series data. Time-series data can be understood as data with time-series characteristics, such as data recorded in chronological order, having a logical order, or having contextual relevance.
[0029] In the fields of Natural Language Processing (NLP), audio processing, and video processing, recurrent neural networks can be used to process time-series data such as text data, video data, and / or audio data to solve specific problems such as text classification, syntactic analysis, machine translation, speech recognition, language model building, and behavior recognition.
[0030] In some embodiments, when using an RNN to learn sequential data, the vanishing gradient problem or the exploding gradient problem may be encountered. Vanishing and exploding gradients are essentially the same; both occur in neural networks when the learning rate of earlier hidden layers is lower than that of later hidden layers. In other words, as the number of hidden layers increases, the learning accuracy decreases. For example, gradients can be used to update the weights of a neural network. However, as the gradient propagates over time, if the gradient value becomes very small, the network will stop learning.
[0031] The vanishing or exploding gradient problem in RNNs arises because RNNs ignore crucial information at long-term, fine-grained intervals during learning and training, resulting in poor retention of long-term memory information. In other words, RNNs are susceptible to short-term memory issues. When the input sequence data is long, the RNN will miss important data from earlier time steps as time progresses. For example, if a sentence is very long, the RNN may forget important information from the beginning of the sentence by the end.
[0032] As the number of hidden layers in a network increases, the vanishing and exploding gradient problems become increasingly pronounced. To address this, the RNN network structure can be improved. The improved model can selectively store information through gating units, allowing for selective memorization and forgetting of information at each time step. This better captures the dependencies between data points with large time step distances in the time series, thereby mitigating the vanishing and exploding gradient problems.
[0033] For example, the improved model could be a Long Short Term Memory (LSTM) network or a Gated Recurrent Unit Network (GRU).
[0034] In Long Short Term Memory (LSTM) networks, three gating units (forget gate, input gate, and output gate) and one memory state unit can be used to extract short- and long-term time series information, thereby achieving good processing performance. However, the addition of these four units makes the computational cost of LSTM four times that of RNN, and these four units rely on the extraction of long-term information while extracting short-term information, resulting in a very slow network processing speed.
[0035] Gated Recurrent Unit Networks (GRUs) consist of two gated units (update gate and reset gate) and one hidden state unit. The GRU network structure represents a further simplification of the LSTM network structure. These three units in the GRU make its computational cost three times that of an RNN. Compared to LSTM, the GRU network structure reduces computational cost and accelerates training efficiency. However, when extracting short-term information, the three units in the GRU still rely on the extraction of long-term information, resulting in relatively low training efficiency.
[0036] This disclosure proposes an information retrieval method, apparatus, processing core, electronic device, and readable medium, which can retrieve information from long and short-term memory and improve information retrieval efficiency.
[0037] Figure 1 A flowchart illustrating an information extraction method provided in this embodiment of the disclosure. (Refer to...) Figure 1 This disclosure provides an information extraction method, which includes the following steps.
[0038] S110: Obtain sequence data from the sequence dataset, extract short-term memory information from the obtained current sequence data, and obtain short-term memory information that is currently easily forgotten and short-term memory information that is currently easily remembered.
[0039] S120: Extract long-term memory information based on currently easily forgotten short-term memory information, currently easily remembered short-term memory information, and current sequence data to obtain current long-term memory information.
[0040] S130: Obtain selective forgetting information corresponding to the currently easily remembered short-term memory information, and combine it with the current long-term memory information and selective forgetting information to obtain the information extraction result of the current sequence data.
[0041] S140, obtain the next sequence data from the sequence dataset until the number of times it is obtained is equal to the total number of sequence data in the sequence dataset, and use the information extraction result of the last obtained sequence data as the information extraction result of the sequence dataset.
[0042] According to the information extraction method of this disclosure, short-term memory information extraction is performed once a current sequence data is acquired. That is, a short-term memory stimulus is added at each time step. Then, long-term memory information extraction is performed based on the short-term memory information extraction result and the current sequence data. The current short-term memory information and long-term memory information are fused to obtain the information extraction result of the current sequence data. This allows for a higher overall memory retention rate and improves the ability to extract both short-term and long-term memory information. Furthermore, the extraction of short-term memory information in this disclosure is relatively independent, meaning that the extraction of short-term memory information does not depend on the memory information of the previous sequence data. Therefore, parallel computing can be performed, thereby solving the problem of low computational efficiency of recurrent neural networks with gating units and improving computational efficiency.
[0043] In this embodiment of the disclosure, short-term memory information can be understood as information obtained and retained from the current sequence data; long-term memory information can be understood as information obtained and retained from the current sequence data and multiple sequence data preceding the current sequence data.
[0044] In some embodiments, the sequence data in step S110 above is information in vector form; the sequence dataset includes any one of text data, video data, and audio data; the sequence data includes any one of word vectors, video frame vectors, and audio frame vectors.
[0045] In the following description of the embodiments, the current sequence data can be denoted as x. t Here, t represents the current time; therefore, the current sequence data can also be understood as the sequence data at the current time. The previous sequence data is denoted as x. t-1 That is, the sequence data of the previous time step; the sequence data of the next time step is denoted as x. t+1 That is, the sequence data at the next time step.
[0046] In some embodiments, text data is composed of words arranged according to logic or rules, exhibiting sequence characteristics. Word segmentation of the text data yields segmentation results; mapping the words in the segmentation results to a fixed-dimensional vector space yields word vectors; word vectors can be used to represent the syntactic and semantic information of a word, i.e., word features. In this example, text-type sequence data may include word vectors of multiple words from the text content segmentation results. For example, x t That is, the word vector of the current word; x t-1 That is, the word vector of the previous word, x t+1 That is, the word vector of the next word.
[0047] As an example, video data can be composed of consecutive video frames; audio data can be composed of consecutive audio segments; both have sequence characteristics.
[0048] Taking video data as an example, when the video data contains 25 frames per second, each time step (Δt) equals 1 / 25 of a second. Each frame in a continuous video sequence corresponds to a moment, and the interval between two adjacent moments is one time step. In this example, mapping the video frame data at each moment to a fixed-dimensional vector space yields the vector information of the video frame data at each moment; for example: the current sequence data x t This represents the vector information of the video data at the current moment, with the previous sequence of data denoted as x. t-1 The vector information of the video data from the previous moment is denoted as x, and the data of the next sequence is denoted as x. t+1 This is the vector information of the video data at the next moment.
[0049] Taking audio data as an example, segmenting the audio data into frames yields continuous audio segments, each corresponding to a specific moment in time, with a time interval of one time beat between adjacent moments. In this example, mapping each audio segment to a fixed-dimensional vector space yields a vector for each audio segment. In this example, the current sequence data x... t That is, the vector information of the audio data at the current moment, with the previous sequence of data denoted as x. t-1 That is, the vector information of the audio data at the previous moment, and the data of the next sequence is denoted as x. t+1 That is, the vector information of the audio data at the next moment.
[0050] It should be understood that the information extraction method of this disclosure can extract information from sequence data in various fields. In other words, this disclosure does not specifically limit the data type to be processed, as long as the processed data meets the characteristics of time series. Therefore, it can be widely used in multiple fields.
[0051] Figure 2 This diagram illustrates a specific flowchart of short-term memory information retrieval provided in an embodiment of this disclosure. (See reference...) Figure 2 In some embodiments, the step S110 above, which involves extracting short-term memory information from the acquired current sequence data to obtain short-term memory information that is currently easily forgotten and short-term memory information that is currently easily remembered, may specifically include the following sub-steps.
[0052] S21, process the current sequence data according to the preset first short-term memory stimulus component to obtain the current easily forgotten short-term memory information; wherein, the parameters of the first short-term memory stimulus component include training weights for easily forgotten information and corresponding bias terms.
[0053] In this step, the first short-term memory activation component can be represented by the following expression (1).
[0054] f t=g1(w f *x t +b f (1)
[0055] In the above expression (1), x t For the current sequence data, w f For training weights that are prone to forgetting information, b f f represents the bias term corresponding to information that is easily forgotten. t For the short-term memory information that is currently easily forgotten, g1 is the first short-term memory activation component, and g1 can be represented by the following expression (2).
[0056]
[0057] In the above expression (2), thresh is the short-term memory forgetting threshold. For example, thresh can be 0.5. It should be understood that the value of thresh can be customized according to actual needs in actual application scenarios, and this disclosure does not impose specific limitations.
[0058] S22, process the current sequence data according to the preset second short-term memory stimulus component to obtain the current easy-to-remember short-term memory information; wherein, the parameters of the second short-term memory stimulus component include training weights for easy-to-remember information and corresponding bias terms.
[0059] In this step, the second short-term memory activation component can be represented by the following expression (3).
[0060] r t =g2(w r *x t +b r (3)
[0061] In the above expression (1), x t For the current sequence data, w r For training weights that facilitate the memorization of information, b r For the bias term corresponding to easily remembered information, r t For the short-term memory information that is currently easy to remember, g2 is the second short-term memory activation component, and g2 can be expressed as the above expression (4).
[0062]
[0063] In the above expression (4), the meaning of thresh is the same as the meaning of thresh in the above expression (2).
[0064] In the above expressions (1)-(4), the current sequence data is processed by the first short-term memory activation component and the second short-term memory activation component to obtain the currently easily remembered short-term memory information and the currently easily forgotten short-term memory information. Among them, the currently easily remembered short-term memory information has a higher probability of being retained over time during the information retrieval process, while the currently easily forgotten short-term memory information has a lower probability of being retained over time during the information retrieval process.
[0065] In practical applications, short-term memory information can exhibit the following characteristics: information that is of interest to the user is remembered vividly, while information that is not of interest is remembered vaguely. Therefore, in some embodiments, the training weights w for easily forgotten information in the above expressions (1) and (3) are... f Training weights w for easily memorized information r Users can customize the settings according to the content they are interested in.
[0066] In some embodiments, the training weights w for easily forgotten information in the above expression f Training weights for easily memorized information w r and the current sequence data x t The value of is between [-1, 1].
[0067] Through the above steps S21 and S22, short-term memory information can be extracted from the sequence data at each time step. From the above expressions (1) and (3), it can be seen that the current sequence data x... t The short-term memory activation function g retrieves easily forgotten information f. t And easy-to-remember information r t This means that the calculation of the short-term memory information retrieval process does not require the information of the previous memory unit, thus enabling parallel computation to improve training efficiency.
[0068] Figure 3 This diagram illustrates a specific flowchart of long-term memory information extraction provided in an embodiment of this disclosure. (See reference...) Figure 3 In some embodiments, step S120 may specifically include the following sub-steps.
[0069] S31, based on the currently easily forgotten short-term memory information and the current sequence data, update the long-term memory state corresponding to the previous sequence data to obtain the current long-term memory state.
[0070] S32, extract long-term memory information from the current long-term memory state according to the preset long-term memory stimulus component.
[0071] S33, the extracted long-term memory information is fused with the currently easily remembered short-term memory information to obtain the current long-term memory information.
[0072] Through the above steps S31-S33, the current memory state can be updated based on the stimulation of short-term memory information, and recall can be enhanced by combining the retrieved short-term memory information with long-term memory information, thereby improving the representation ability of the retrieved temporal information.
[0073] Figure 4 A detailed flowchart illustrating the updating of memory state provided in an embodiment of this disclosure is shown. (See also...) Figure 4 In some embodiments, step S31 may specifically include the following sub-steps.
[0074] S41, based on the long-term memory forgetting coefficient, the previous long-term memory state information is attenuated to obtain the corresponding long-term memory attenuation information.
[0075] S42, integrate the currently easily forgotten short-term memory information with the long-term memory decay information to obtain the recall information.
[0076] S43, obtain selective memory information in the current sequence data that corresponds to the currently forgotten short-term memory information.
[0077] S44, combining recall information and selective memory information, to obtain the current long-term memory state.
[0078] In this embodiment of the disclosure, based on the characteristics of biological memory, long-term memory becomes increasingly blurred over time. That is, with the changes and accumulation over time, more and more memories are forgotten. Therefore, the information extraction method of this embodiment of the disclosure can supplement a short-term memory stimulus based on the currently acquired sequence data at every time step (i.e., after acquiring a sequence data from the sequence dataset), and fuse the easily forgotten short-term memory information obtained from the short-term memory stimulus with the decay information of the memory state at the previous moment to obtain the information that can be recalled. The information that can be recalled is then fused with the selectively remembered information in the current sequence data to obtain the current memory state; so that long-term information can be extracted based on the current memory state, thereby combining its own long-term memory decay characteristics to make the overall memory retention rate higher.
[0079] In some embodiments, the memory state update process described by the above steps S41-S44 can be represented by the following expressions (5) and (6).
[0080] c t =f t ⊙c t-1 *α+(1-f t )⊙x t (5)
[0081] α=exp(-n)*dt (6)
[0082] In expressions (5) and (6) above, c t-1 This represents information from the previous memory state, α represents the long-term memory forgetting coefficient, and c t-1 *α indicates that the long-term memory forgetting factor α is used to evaluate the previous long-term memory state information c. t-1 The decay process is performed to obtain long-term memory decay information; f t ⊙c t-1 *α represents the short-term memory information f that is currently easily forgotten. t This information is then fused with long-term memory decay information to obtain recall information. This recall information can be understood as: information from currently easily forgotten short-term memory f. t The information that can be recalled is obtained from the middle.
[0083] Continuing to refer to expressions (5) and (6), where (1-f t )⊙x t c represents the selective memory information in the current sequence data that corresponds to the currently forgotten short-term memory information. t It represents the current long-term memory state obtained by combining recalled information and selectively remembered information; the symbol ⊙ represents the vector inner product, that is, the dot product operation of vector elements, and the symbol + represents the vector addition operation.
[0084] In the above expression (6), since a sequence data is obtained from the sequence dataset for processing every time step, n can represent the number of sequence data contained in the sequence dataset, and dt is the smallest time unit of the impulse response of a biological neuron, for example, 0.1ms.
[0085] In some embodiments, the information extraction method of this disclosure can be implemented by a recurrent neural network, which includes at least one spiking memory recurrent unit (SMRU). In the above expression (6), n can be understood as the number of SMRUs.
[0086] In some embodiments, step S130, which involves obtaining selective forgetting information corresponding to the currently easily remembered short-term memory information and combining it with the current long-term memory information and selective forgetting information to obtain the information extraction result of the current sequence data, can be expressed as the following expressions (7) and (8).
[0087] h t =r t ⊙s(c t )+(1-r t )⊙x t (7)
[0088]
[0089] In the above expression (7), x t This represents the current time series data (input sequence), s(c t ) represents long-term memory information retrieved from the current long-term memory state, that is, long-term stimulus information of the current memory state, r t ⊙s(c t This represents the fusion of easily remembered information with long-term memory information in the current memory state, thereby enhancing the memory of information that can be recalled (x>=0) (s=1), while completely forgetting the memory of information that cannot be recalled (x<0) (s=0); (1-r t )⊙x t This represents the selective forgetting information in the current sequence data that corresponds to the currently easily remembered short-term memory information; h t This indicates the information extraction result of the current sequence data. The symbols ⊙ and + have the same meaning as those in expressions (5) and (6) above, and will not be repeated here.
[0090] In this embodiment, long-term memory information exhibits memory decay characteristics, meaning that the older the memory, the more it is forgotten and decays over time. Therefore, in this embodiment, short-term memory is stimulated by extracting short-term memory information, and the extracted, easily forgotten short-term memory information is fused with long-term memory decay information to obtain recall information. This enhances the memory of recallable information and forgets the information that cannot be recalled, resulting in a higher overall memory retention rate and improved retrieval capabilities for both long and short-term memory.
[0091] In some embodiments, the information extraction method of this disclosure is implemented by a recurrent neural network. The recurrent neural network includes at least one pulse memory recurrent unit. The pulse memory recurrent unit includes a first short-term memory activation component, a second short-term memory activation component, and a long-term memory activation component. The pulse memory recurrent unit is used to process the sequence data in the sequence dataset to obtain the information extraction result of the sequence data.
[0092] Figure 5 A schematic diagram of the network structure of the pulse memory loop unit provided in an embodiment of this disclosure is shown. Figure 5 In the diagram, g1 represents the first short-term memory stimulus component, g2 represents the second short-term memory stimulus component, and s represents the long-term memory stimulus component.
[0093] exist Figure 5 In the middle, x t Given the current time series data (input sequence), h tThe information extraction result (output sequence) of the current sequence data is the current time series data x. t Information extraction is performed to obtain the information extraction result h of the current sequence data. t The information extraction process may include the following steps.
[0094] First, the current timing information x is processed by the first short-term memory activation component g1. t To obtain short-term memory information that is currently easily forgotten. t The current timing information x is processed through the second short-term memory stimulus component g2. t To obtain the short-term memory information that is currently easy to remember. t .
[0095] Secondly, based on the long-term memory forgetting coefficient α, the information c of the previous long-term memory state... t-1 The decay process is performed to obtain the corresponding long-term memory decay information c. t-1 *α, representing the currently easily forgotten short-term memory information f t With long-term memory decay information (c t-1 *α) is fused to obtain short-term memory information f that is currently easily forgotten. t Information that can be recalled (f) obtained from t ⊙c t-1 *α).
[0096] Next, identify selective memory information (1-f) t ), and calculate the selective memory information (1-f) in the current sequence data corresponding to the currently forgotten short-term memory information. t )⊙x t Selective memory information (1-f) corresponding to the currently forgotten short-term memory information in the current sequence data. t )⊙x t Compared with short-term memory information that is easily forgotten at present f t Information that can be recalled (f) obtained from t ⊙c t-1 *α) is fused to obtain the updated current long-term memory state c. t .
[0097] Next, based on the long-term memory activation component s, the current long-term memory state c is... t The process is performed to obtain long-term memory information s(c) extracted from the current long-term memory state. t ), making information easy to remember r t Long-term memory information s(c) with the current memory state t This process involves fusing information to enhance currently recalled memories, resulting in current long-term memory information.t ⊙s(c t ).
[0098] Then, determine the selective forgetting information (1-r). t ), and continue to calculate the selective forgetting information (1-r) in the current sequence data corresponding to the currently easily remembered short-term memory information. t )⊙x t .
[0099] Finally, the selectively forgotten information corresponding to the currently easily remembered short-term memory information ((1-r)) will be used to further clarify the information. t )⊙x t ) and current long-term memory information r t ⊙s(c t By combining these methods, we obtain the information extraction result h of the current sequence data. t .
[0100] The pulse memory loop unit of this embodiment can combine the parallelization characteristics of artificial neural networks to balance short-term and long-term memory. The retrieval of short-term memory information no longer heavily relies on long-term memory information, making it easier to parallelize computation and improve training efficiency. In addition, the addition of long-term memory state c t The forgetting mechanism is closer to biological characteristics, and ultimately integrates short-term and long-term memory information to complete the implicit state output, that is, the information extraction result h of the current sequence data. t Therefore, the pulse memory loop unit of this embodiment can combine the characteristics of short-term and long-term memory (short-term self-attention memory and long-term recall and forgetting) in neuroscience, and take advantage of the low power consumption and low computation of pulse signal excitation to extract short-term and long-term memory, thereby improving information retrieval efficiency.
[0101] The following is combined Figure 6 and Figure 7 The network structure of the recurrent neural network according to embodiments of this disclosure is described. Figure 6 This diagram illustrates a multilayer pulse memory loop network provided in an embodiment of the present disclosure. Figure 7 A schematic diagram of a bidirectional pulse memory loop network provided in an embodiment of this disclosure is shown.
[0102] exist Figure 6 and Figure 7 In the middle, x t Given the current time series data (input sequence), h t U is the information extraction result (output sequence) of the current sequence data. t This represents the pulse memory loop unit at the current moment; t-1, t, and t+1 represent the previous moment, the current moment, and the next moment, respectively; x t-1 and x t+1Let h represent the previous time series data and the next time series data (i.e., the time series data at the previous time t-1 and the next time t+1), respectively. t-1 and h t+1 They represent x respectively t-1 The corresponding information extraction results and x t+1 The corresponding information extraction results.
[0103] Figure 6 and Figure 7 The recurrent neural networks in the diagram are all networks that can be expanded over time; at each time iteration, the current time-series pulse memory recurrent unit can simultaneously receive the long-term memory state c corresponding to the current time-series data and the previous time-series data. t-1 .
[0104] Figure 6 and Figure 7 The difference lies in that when the number of time-pulse memory loop units is multiple (greater than one), in Figure 6 In the recurrent neural network shown, multiple SMRUs constitute a unidirectionally connected network structure, while... Figure 7 In the recurrent neural network shown, multiple SMRUs form a bidirectional network structure.
[0105] exist Figure 6 In the unidirectional network structure shown, the SMRU self-loop collects feature information at different times, extracts complete time-series information, and forms a multi-layer pulse memory loop network; while... Figure 7 In the bidirectional network structure shown, multiple SMRUs can extract the time sequence information in reverse (t+1, t, t-1) based on the unidirectional (t-1, t, t+1) time sequence information extraction of SMRUs, thereby better extracting contextual information.
[0106] In practical applications, some recurrent neural networks, such as RNNs, LSTMs, and GRUs, can have unidirectional or bidirectional network structures. That is, the recurrent neural network of this embodiment differs from the aforementioned RNNs, LSTMs, and GRUs in its processing logic of the recurrent units. The recurrent unit in the recurrent neural network of this embodiment is the SMRU. During the information extraction process of the current time-series information through the SMRU, the short-term memory retrieval and calculation are relatively independent, not relying on the previous time-series memory information, resulting in higher parallel computing efficiency.
[0107] Furthermore, the long and short-term impulse excitation mechanism of the SMRU in this embodiment is manifested in the following way: by issuing a short-term strong memory impulse threshold (thresh), it extracts currently easily remembered short-term memory information and currently easily forgotten short-term memory information, and combines this with the decay of long-term memory information to fuse the long and short-term information. The feature information is sparsified in the entire vector calculation process, which improves processing efficiency while reducing the amount of computation compared to RNN, LSTM, and GRU, thereby improving the utilization rate of information extracted from sequence data; the entire vector calculation process can effectively solve the gradient explosion or gradient vanishing problem compared to RNN, and the fusion of long and short-term memory information can better extract temporal information.
[0108] Figure 8 The long-short-term memory fusion curve of the pulse memory loop unit in an embodiment of this disclosure is shown. Figure 8 In the diagram, the horizontal axis represents time, where dt is the smallest time unit of the impulse response of a biological neuron; the vertical axis represents the stored and retained memory information.
[0109] exist Figure 8 In the figure, curve C1 is the memory attenuation of biological neurons. Curve C1 can reflect the memory characteristics of biological neurons: as time changes and accumulates, more and more memories are forgotten; g represents the short-term memory fire incentive component, and α represents the long-term memory decay coefficient.
[0110] pass Figure 8 As can be seen, in this embodiment of the pulse memory loop unit, short-term memory activation can be performed once every dt through the short-term memory activation component g, and then combined with its own long-term memory decay characteristic α, to obtain curve C2, that is, the long and short term memory fusion curve. As can be seen from curve C2, the pulse memory loop unit and information retrieval process in this embodiment of the present disclosure can increase the overall memory retention percentage, thereby achieving recall enhancement through the combination of short-term memory activation and long-term memory activation, thus updating the current memory state and improving the representation ability of temporal information.
[0111] Figure 9 This is a block diagram of an information extraction device provided in an embodiment of the present disclosure.
[0112] Reference Figure 9 This disclosure provides an information extraction device 900, which includes the following modules.
[0113] The short-term information extraction module 910 is used to obtain sequence data from the sequence dataset, extract short-term memory information from the obtained current sequence data, and obtain short-term memory information that is currently easily forgotten and short-term memory information that is currently easily remembered.
[0114] The long-term information extraction module 920 is used to extract long-term memory information based on the currently easily forgotten short-term memory information, the currently easily remembered short-term memory information, and the current sequence data, so as to obtain the current long-term memory information.
[0115] The information combination module 930 is used to acquire selective forgetting information corresponding to the currently easily remembered short-term memory information, and combine the current long-term memory information and selective forgetting information to obtain the information extraction result of the current sequence data;
[0116] The result determination module 940 is used to obtain the next sequence data from the sequence dataset until the number of times it is obtained is equal to the total number of sequence data in the sequence dataset, and to use the information extraction result of the last obtained sequence data as the information extraction result of the sequence dataset.
[0117] In some embodiments, the short-term information extraction module 910 may include: a first short-term stimulus unit, configured to process current sequence data according to a preset first short-term memory stimulus component to obtain currently easily forgotten short-term memory information; wherein the parameters of the first short-term memory stimulus component include training weights for easily forgotten information and corresponding bias terms; and a second short-term stimulus unit, configured to process current sequence data according to a preset second short-term memory stimulus component to obtain currently easily remembered short-term memory information; wherein the parameters of the second short-term memory stimulus component include training weights for easily remembered information and corresponding bias terms.
[0118] In some embodiments, the long-term information extraction module 920 includes: a memory state update unit, configured to update the long-term memory state corresponding to the previous sequence data according to the currently easily forgotten short-term memory information and the current sequence data, to obtain the current long-term memory state; a long-term memory activation unit, configured to extract long-term memory information from the current long-term memory state according to a preset long-term memory activation component; and a first fusion unit, configured to fuse the extracted long-term memory information with the currently easily remembered short-term memory information to obtain the current long-term memory information.
[0119] In some embodiments, the memory state update unit includes: a decay processing subunit, used to decay the previous long-term memory state information according to the long-term memory forgetting coefficient to obtain the corresponding long-term memory decay information; a recall information evoked subunit, used to fuse the currently easily forgotten short-term memory information with the long-term memory decay information to obtain recall evoked information; a selective information acquisition subunit, used to acquire selective memory information in the current sequence data that corresponds to the currently forgotten short-term memory information; and an information combining subunit, used to combine the recall evoked information and the selective memory information to obtain the current long-term memory state.
[0120] In some embodiments, the information extraction device is implemented by a recurrent neural network. The recurrent neural network includes at least one pulse memory recurrent unit. The pulse memory recurrent unit includes a first short-term memory activation component, a second short-term memory activation component, and a long-term memory activation component. The pulse memory recurrent unit is used to process the sequence data in the sequence dataset to obtain the information extraction result of the sequence data.
[0121] In some embodiments, the sequence data is information in vector form; the sequence dataset includes any one of text data, video data, and audio data; the sequence data includes any one of word vectors, video frame vectors, and audio frame vectors.
[0122] According to the information extraction apparatus of the present disclosure, the extraction of short-term memory information does not depend on the memory information of the previous sequence of data, so parallel computing can be performed, thereby solving the problem of low computational efficiency of recurrent neural networks with gating units and improving computational efficiency.
[0123] It should be clarified that this disclosure is not limited to the specific configurations and processes described in the foregoing embodiments and shown in the figures. For the sake of convenience and brevity, detailed descriptions of known methods are omitted here, and the specific working processes of the systems, modules, and units described above can be referred to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
[0124] This disclosure also provides a processing core that includes the information extraction device described above.
[0125] This disclosure also provides a processing core for loading the recurrent neural network in this disclosure to complete information extraction.
[0126] Figure 10 This is a block diagram of an electronic device provided in an embodiment of the present disclosure.
[0127] Reference Figure 10This disclosure provides an electronic device that includes multiple processing cores 1001 and an on-chip network 1002. The multiple processing cores 1001 are all connected to the on-chip network 1002, and the on-chip network 1002 is used to exchange data between the multiple processing cores and external data.
[0128] One or more processing cores 1001 store one or more instructions, and the one or more instructions are executed by one or more processing cores 1001 to enable one or more processing cores 1001 to perform the above-mentioned information extraction method.
[0129] Furthermore, this disclosure also provides a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processing core, implements the above-described information extraction method.
[0130] It will be understood by those skilled in the art that all or some of the steps, systems, or apparatuses disclosed above, and their functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof. In hardware implementations, the division between functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed collaboratively by several physical components. Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disc (DVD) or other optical disc storage, magnetic cartridges, magnetic tape, disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and can be accessed by a computer. Furthermore, it is well known to those skilled in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.
[0131] Example embodiments have been disclosed herein, and while specific terminology has been used, it is for illustrative purposes only and should be construed as such, and is not intended to be limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in connection with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in connection with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of this disclosure as set forth by the appended claims.
Claims
1. An information extraction method, comprising: Sequence data is obtained from the sequence dataset. Short-term memory information is extracted from the obtained current sequence data to obtain short-term memory information that is currently easily forgotten and short-term memory information that is currently easily remembered. Based on the currently easily forgotten short-term memory information, the currently easily remembered short-term memory information, and the current sequence data, long-term memory information is extracted to obtain the current long-term memory information; Selective forgetting information corresponding to the currently easily remembered short-term memory information is obtained, and combined with the current long-term memory information and the selective forgetting information, the information extraction result of the current sequence data is obtained; The next sequence data is obtained from the sequence dataset until the number of times it is obtained is equal to the total number of sequence data in the sequence dataset. The information extraction result of the last obtained sequence data is used as the information extraction result of the sequence dataset. in, The sequence data is information in vector form; The sequence dataset includes any one of text data, video data, and audio data; The sequence data includes any one of word vectors, video frame vectors, and audio frame vectors.
2. The method according to claim 1, characterized in that, The step of extracting short-term memory information from the acquired current sequence data to obtain short-term memory information that is easily forgotten and short-term memory information that is easily remembered includes: The current sequence data is processed according to the preset first short-term memory stimulus component to obtain the current short-term memory information that is easily forgotten; wherein, the parameters of the first short-term memory stimulus component include training weights for easily forgotten information and corresponding bias terms; The current sequence data is processed according to the preset second short-term memory stimulus component to obtain the current easy-to-remember short-term memory information; wherein, the parameters of the second short-term memory stimulus component include training weights for easy-to-remember information and corresponding bias terms.
3. The method according to claim 1, characterized in that, The extraction of long-term memory information based on currently easily forgotten short-term memory information, currently easily remembered short-term memory information, and current sequence data yields current long-term memory information, including: Based on the currently easily forgotten short-term memory information and the current sequence data, the long-term memory state corresponding to the previous sequence data is updated to obtain the current long-term memory state; Based on the preset long-term memory stimulus components, extract long-term memory information from the current long-term memory state; The extracted long-term memory information is fused with the currently easily remembered short-term memory information to obtain the current long-term memory information.
4. The method according to claim 3, wherein, The step of updating the long-term memory state corresponding to the previous sequence data based on the currently easily forgotten short-term memory information and the current sequence data to obtain the current long-term memory state includes: The long-term memory state information described above is attenuated based on the long-term memory forgetting coefficient to obtain the corresponding long-term memory attenuation information. The information is fused with the short-term memory information that is currently easily forgotten and the long-term memory decay information to obtain the recall information; Retrieve selective memory information from the current sequence data that corresponds to the currently forgotten short-term memory information; By combining the recall information and the selective memory information, the current long-term memory state is obtained.
5. The method according to claim 1, wherein, The method is implemented by a recurrent neural network, which includes at least one pulse memory recurrent unit, and the pulse memory recurrent unit includes a first short-term memory activation component, a second short-term memory activation component, and a long-term memory activation component. The pulse memory loop unit is used to process the sequence data in the sequence dataset to obtain the information extraction result of the sequence data.
6. An information extraction device, comprising: The short-term information extraction module is used to obtain sequence data from the sequence dataset, extract short-term memory information from the obtained current sequence data, and obtain short-term memory information that is currently easily forgotten and short-term memory information that is currently easily remembered. The long-term information extraction module is used to extract long-term memory information based on the currently easily forgotten short-term memory information, the currently easily remembered short-term memory information, and the current sequence data, so as to obtain the current long-term memory information. The information combination module is used to obtain selective forgetting information corresponding to the currently easily remembered short-term memory information, and combine the current long-term memory information and the selective forgetting information to obtain the information extraction result of the current sequence data; The result determination module is used to obtain the next sequence data from the sequence dataset until the number of times it is obtained is equal to the total number of sequence data in the sequence dataset, and to use the information extraction result of the last obtained sequence data as the information extraction result of the sequence dataset. in, The sequence data is information in vector form; The sequence dataset includes any one of text data, video data, and audio data; The sequence data includes any one of word vectors, video frame vectors, and audio frame vectors.
7. A processing core comprising the information extraction device of claim 6.
8. An electronic device, comprising: Multiple processing cores; as well as The on-chip network is configured to interact with data between the multiple processing cores and external data; One or more processing cores store one or more instructions, and the one or more instructions are executed by one or more processing cores to enable one or more processing cores to perform the information extraction method of any one of claims 1-5.
9. A computer-readable medium having a computer program stored thereon, wherein, When the computer program is executed by the processing core, it implements the information extraction method as described in any one of claims 1-5.