Language decoding method and device based on electroencephalogram signals and electronic equipment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By employing a parallel decoding architecture and multi-dimensional decomposition, the problems of model learning difficulty and insufficient generalization ability in Chinese tone language decoding are solved, achieving high-precision and high-stability decoding under limited data conditions.

CN121905152BActive Publication Date: 2026-06-19AFFILIATED HUSN HOSPITAL OF FUDAN UNIV +1

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: AFFILIATED HUSN HOSPITAL OF FUDAN UNIV
Filing Date: 2026-03-24
Publication Date: 2026-06-19

Application Information

Patent Timeline

24 Mar 2026

Application

19 Jun 2026

Publication

CN121905152B

IPC: G10L15/00; G10L15/22; G10L15/32; G10L15/02; G10L19/16; G10L21/01; G06F18/10; G06F18/25; G06N3/04

AI Tagging

Application Domain

Neural architectures Speech recognition

Technology Topics

SyllableDecoding methods

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A speech therapy treatment instrument abnormal voice detection method and system
CN122290639ASyllableAbnormal voice
An AI-based automatic music composition and lyrics generation system and method
CN122313929AEmotion perceptionSyllable
An artificial intelligence-based english oral english intelligent correction method and system
CN122157694ASpeech analysisSyllableSpoken language
A system and method for converting a chinese dialect phonetic transcription into international phonetic alphabet transcription
CN122454958ASyllableNatural language processing
Call quality assessment and anomaly detection method with anti-noise dialect adaptation
CN122417085ASyllableAnomaly detection

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing speech brain-computer interface technologies face challenges in decoding tonal languages such as Mandarin, including difficulties in model learning and insufficient generalization due to the massive number of syllable categories. In particular, they struggle to effectively handle unfamiliar syllables under limited training data conditions.

Method used

A parallel decoding architecture is adopted to decompose the language decoding of EEG signals into multiple independent neural network models, which respectively process sub-units of orthogonal dimensions such as acoustic segment dimension and suprasegmental pitch dimension, and generate legal syllable sequences through temporal probability accumulation, phonological rule verification and semantic context disambiguation.

Benefits of technology

Under limited data conditions, the model's zero-sample generalization ability is significantly improved, ensuring the stability and accuracy of decoding results and solving the decoding challenges of tonal languages such as Chinese.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121905152B_ABST

Patent Text Reader

Abstract

This invention relates to a language decoding method, apparatus, and electronic device based on electroencephalogram (EEG) signals. The method includes: acquiring EEG signals from a subject and preprocessing them to obtain a neural feature time series; inputting the neural feature time series into a parallel decoding architecture, which includes at least two decoding branches for decoding sub-units of language with different orthogonal dimensions from the neural feature time series; obtaining the sub-unit probability sequence output by each decoding branch through the parallel decoding architecture; performing temporal probability accumulation and fusion on the sub-unit probability sequences of each decoding branch to obtain a stable sub-unit sequence; performing legality verification and combination of the stable sub-unit sequences according to phonological rules to generate a legal syllable sequence; and performing disambiguation processing on the syllable sequence based on semantic context to output continuous natural language text. This invention achieves effective modeling and generalized decoding of massive syllable categories under limited training data conditions.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of brain-computer interface technology, and in particular to a language decoding method, device, and electronic device based on electroencephalogram (EEG) signals. Background Technology

[0002] Brain-computer interface (BCI) technology aims to establish a direct communication pathway between the brain and external devices. Decoding the user's linguistic intent from recorded neural signals is extremely challenging. Existing speech BCI decoding schemes typically borrow from the technical approach of automatic speech recognition, employing an end-to-end deep learning model architecture. These methods first acquire and preprocess electrical signals from the cerebral cortex, extracting neural activity features in specific frequency bands. Then, a single decoding model (e.g., a recurrent neural network or a convolutional neural network) is constructed, trained to directly map continuous neural feature time series into discrete sequences of language units, such as phonemes or characters. Finally, an external language model is used to post-process and correct the initial decoding results.

[0003] However, when the general framework designed for alphabetic languages (such as English) is applied to decoding tonal languages like Chinese, it faces an inherent and significant technical bottleneck. Chinese is a language with a vast number of syllables; its basic phonetic unit, the number of toned syllables, exceeds one thousand. If the currently prevalent end-to-end direct mapping strategy is adopted, each toned syllable must be modeled and recognized as an independent category label. This results in an extremely large classification space that the model needs to learn. In practical applications, due to strict limitations on the duration and conditions of subject experiments, the amount of EEG data that can be collected for training the model is extremely limited, making it difficult to cover such a large number of syllable categories, especially low-frequency syllables. Therefore, based on the currently constructed decoding model, under conditions of sparse data, it is difficult to fully learn the neural representation patterns of all target syllables, its generalization ability is severely insufficient, and it cannot effectively handle syllables that have not appeared in the training data, ultimately restricting the overall performance and practicality of the decoding system. Summary of the Invention

[0004] Therefore, it is necessary to provide a language decoding method, device, and electronic device based on EEG signals that can effectively model and generalize the decoding of massive syllable categories under limited training data conditions to address the above-mentioned technical problems.

[0005] This invention provides a language decoding method based on electroencephalogram (EEG) signals, the method comprising:

[0006] The subject's electroencephalogram (EEG) signals were collected and preprocessed to obtain a neural feature time series.

[0007] The neural feature time series is input into a parallel decoding architecture, which includes at least two decoding branches, each of which includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series.

[0008] The parallel decoding architecture simultaneously obtains the sub-unit probability sequence output by each decoding branch;

[0009] The probability sequences of subunits in each decoding branch are accumulated and fused in a temporal manner to obtain a stable subunit sequence.

[0010] The stabilized sub-unit sequences are validated and combined according to phonological rules to generate valid syllable sequences.

[0011] The syllable sequence is disambiguated based on semantic context to output continuous natural language text.

[0012] In one embodiment, the different orthogonal dimensions of the language include an acoustic segment dimension and a suprasegmental pitch dimension; the at least two decoding branches include a first decoding branch for decoding segment dimension subunits and a second decoding branch for decoding pitch dimension subunits.

[0013] In one embodiment, the different orthogonal dimensions of the language include initials, finals, and tones, and the parallel decoding architecture includes an initial decoding branch, a final decoding branch, and a tone decoding branch.

[0014] In one embodiment, the different orthogonal dimensions of the language include a syllable primitive dimension and a tone dimension, and the at least two decoding branches include a first decoding branch for decoding untone syllable subunits and a second decoding branch for decoding tone subunits.

[0015] In one embodiment, the different orthogonal dimensions of the language include a phoneme sequence dimension and a tone dimension; the at least two decoding branches include a first decoding branch for decoding the phoneme sequence and a second decoding branch for decoding the tone subunit.

[0016] In one embodiment, each of the decoding branches employs a connectionist temporal classification mechanism, which addresses the alignment issue between neural feature sequences and variable-length sub-unit sequences by introducing blank labels and outputs the sub-unit probability sequence.

[0017] In one embodiment, during the model training phase, all decoding branches are trained using a joint optimization loss function, which is the sum of the connectionist temporal classification losses of each decoding branch.

[0018] In one embodiment, the time-series probability accumulation includes:

[0019] Determine a flexible time window corresponding to the pronunciation event of a single language unit;

[0020] Within the elastic time window, the probability distributions of multiple frames output from the same decoding branch are weighted averaged or summed.

[0021] Specifically, the hysteresis determination logic is used when determining the elastic time window, including:

[0022] The window opens when a voice start event is detected.

[0023] The window is only considered closed when the end-of-speech feature is detected at multiple consecutive time steps.

[0024] In one embodiment, the legality verification and combination according to the phonetic rules includes:

[0025] The stabilized sub-unit sequence is verified based on a predefined table of legal phonological combinations.

[0026] When the subunits to be combined form an illegal syllable, a suboptimal but legal candidate unit is selected from the probability sequence of each subunit to replace it, so as to generate a legal syllable sequence.

[0027] In one embodiment, the disambiguation processing of the syllable sequence based on semantic context includes:

[0028] The syllable sequence is input into the language model, and the optimal word or sentence sequence is selected based on the context probability, and the output is the continuous natural language text.

[0029] The present invention also provides a language decoding device based on electroencephalogram (EEG) signals, the device comprising:

[0030] The EEG signal preprocessing module is used to collect the subject's EEG signals and preprocess the EEG signals to obtain a neural feature time series.

[0031] A neural feature time series input module is used to input neural feature time series into a parallel decoding architecture. The parallel decoding architecture includes at least two decoding branches, each of which includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series.

[0032] The subunit probability sequence output module is used to simultaneously obtain the subunit probability sequence output by each decoding branch through the parallel decoding architecture.

[0033] The temporal probability accumulation and fusion module is used to accumulate and fuse the sub-unit probability sequences of each decoding branch to obtain a stable sub-unit sequence.

[0034] The verification and combination module is used to verify and combine the stable sub-unit sequence according to the phonological rules to generate a valid syllable sequence.

[0035] The disambiguation processing module is used to disambiguate the syllable sequence based on semantic context and output continuous natural language text.

[0036] The present invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the language decoding method based on EEG signals as described above.

[0037] The aforementioned language decoding method, device, and electronic equipment based on EEG signals, through a parallel decoding architecture and sub-units of different orthogonal dimensions corresponding to each decoding branch, decompose the existing complex classification task of directly mapping massive numbers of syllables into multiple parallel sub-tasks targeting a limited number of basic language units. This reduces the number of categories to be modeled for each decoding branch from thousands of syllables to dozens of basic acoustic or linguistic units, thus greatly alleviating the contradiction between data sparsity and model capacity requirements under limited EEG training data conditions. The model only needs to learn the neural representation patterns of these basic units to cover and generate a large number of legal syllables that have not appeared in the training set through subsequent combination rules, realizing a paradigm shift from memorizing syllables to understanding and combining them, and improving the system's zero-shot generalization ability. At the same time, the processes of temporal probability accumulation and combination according to language rules, through temporal smoothing and legality constraints on the intermediate outputs of each parallel branch, jointly ensure the stability and accuracy of the final decoding result, constituting a complete solution capable of addressing the decoding challenges of tonal languages such as Chinese. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0039] Figure 1 Here is a flowchart of a language decoding method based on electroencephalogram (EEG) signals, as an example.

[0040] Figure 2 This is a schematic diagram of a language decoding device based on electroencephalogram (EEG) signals according to one embodiment;

[0041] Figure 3 This is an internal structural diagram of an electronic device according to one embodiment. Detailed Implementation

[0042] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0043] The following is combined Figures 1-3 The present invention describes a language decoding method, apparatus, and electronic device based on electroencephalogram (EEG) signals.

[0044] like Figure 1 As shown, in one embodiment, a language decoding method based on electroencephalogram (EEG) signals includes the following steps:

[0045] Step S110: Collect the subject's electroencephalogram (EEG) signals and preprocess the EEG signals to obtain a neural feature time series.

[0046] Electroencephalogram (EEG) signals include electrocorticography (ECoG), electroencephalography (EEG) of the scalp, and different types of neural signals acquired by different electrode types such as microelectrode arrays, such as action potentials (Spikes), local field potentials (LFP), and single / multi-unit activities (SUA / MUA).

[0047] Step S120: Input the neural feature time series into the parallel decoding architecture. The parallel decoding architecture includes at least two decoding branches, each of which includes an independent neural network model for decoding sub-units of different orthogonal dimensions of language from the neural feature time series.

[0048] First, the subject's electroencephalogram (EEG) signals are collected and preprocessed to obtain a neural feature time series. Then, the neural feature time series is input into a parallel neural decoding architecture based on orthogonal feature decomposition. This architecture includes at least two decoding branches, each containing an independent neural network model, which is used to decode different functionally orthogonal sub-units of language from the same neural feature time series in parallel. This decomposes the complex overall language unit recognition task into multiple simpler sub-tasks, thereby reducing the learning difficulty of the model and improving generalization ability.

[0049] Orthogonal dimensionality decomposition is based on the fundamental idea of decomposing the neural representation of Chinese speech into mutually independent acoustic segmental dimensions and suprasegmental pitch dimensions, and is specifically implemented in various embodiments based on principles of linguistics and neuroscience. Specifically, the different orthogonal dimensions of language include acoustic segmental dimensions and suprasegmental pitch dimensions; at least two decoding branches include a first decoding branch for decoding segmental dimension subunits and a second decoding branch for decoding pitch dimension subunits. Based on this fundamental principle, several specific implementations exist: Implementation A is a fine-grained three-stream decomposition of initials, finals, and tones. In this case, the different orthogonal dimensions of language include initials, finals, and tones. The parallel decoding architecture includes an initial decoding branch, a final decoding branch, and a tone decoding branch, which are processed separately by the corresponding initial decoding branch, final decoding branch, and tone decoding branch. This preferred embodiment reduces the classification task of thousands of syllables to the recognition of approximately 21 initials, 35 finals, and 5 tones, allowing the model to learn only a limited number of basic units to recognize syllables not appearing in the training set through combination, thereby achieving significant zero-sample generalization ability under data sparsity conditions. Implementation B is a coarse-grained two-stream decomposition of syllable primitive dimensions and tone dimensions. In this case, the different orthogonal dimensions of language include syllable primitive dimensions and tone dimensions. At least two decoding branches include a first decoding branch for decoding toneless syllable subunits and a second decoding branch for decoding tone subunits. This embodiment... The structure is simple and suitable for scenarios with relatively abundant data. Example C is an ultra-fine-grained decomposition of phoneme sequence dimension and tone dimension. In this case, different orthogonal dimensions of language include phoneme sequence dimension and tone dimension. At least two decoding branches include a first decoding branch for decoding phoneme sequence and a second decoding branch for decoding tone subunit. This example further decomposes syllables into phoneme streams (such as decomposing "zhang" into the zh-a-ng sequence), which describes coarticulation phenomena in a more refined manner and is highly adaptable to changes in speech rate. Example D is a decomposition of articulatory organ movement trajectory dimension and tone dimension based on physiological characteristics. The first decoding branch decodes the articulatory movement feature subunit representing muscle movement commands such as lip opening and closing, tongue movement, and jaw opening and closing, and the second decoding branch decodes the tone subunit. This example directly corresponds to the encoding mechanism of articulatory organ movements in the brain's motor cortex (vSMC) rather than abstract sound symbols, and therefore has the potential for cross-language or dialect transfer. These decomposition methods together constitute the parallel neural decoding architecture of orthogonal feature decomposition, which effectively solves the data sparsity and generalization problems caused by the huge classification space of existing single end-to-end models. In particular, by separating and decoding the information of phonetic segments (or articulation actions) and suprasegmental (tones), it ensures the effective extraction and utilization of Chinese tone discrimination information.

[0050] Step S130: Using a parallel decoding architecture, the sub-unit probability sequence output by each decoding branch is obtained simultaneously.

[0051] Each decoding branch employs a Connectionist Temporal Classification (CTC) mechanism. This mechanism addresses the alignment issue between neural feature sequences and variable-length sub-unit sequences by introducing a blank label, and outputs a sub-unit probability sequence. This connectionist temporal classification mechanism introduces a special "blank" label for each sub-unit category in the output layer, enabling the model to automatically learn and handle the non-linear alignment relationship between neural feature sequences and variable-length sub-unit sequences without prior frame-level alignment annotations. The model can output a "blank" at time steps where features are uncertain, and only outputs a non-blank label at certain times, directly outputting the sub-unit probability sequence. This significantly reduces data annotation costs and naturally adapts to changes in speech rate.

[0052] During the model training phase, multi-task training is performed on all decoding branches by jointly optimizing the loss function. This loss function is the sum of the connectionist temporal classification losses of each decoding branch.

[0053] in,

[0054] These correspond to the loss weighting coefficients for initials, finals, and tones, respectively. , , These represent the CTC losses for the initial consonant stream, the final vowel stream, and the tone stream, respectively. This joint training implicitly constrains different decoding branches to learn temporally consistent feature representations while sharing the extraction of underlying neural features, thereby improving the overall decoding efficiency and the temporal synergy of the outputs of each branch.

[0055] Step S140: Accumulate and fuse the sub-unit probability sequences of each decoding branch in a temporal sequence to obtain a stable sub-unit sequence.

[0056] To improve the stability of real-time decoding and overcome output jumps caused by the non-stationarity of EEG signals, an event-driven dynamic probability accumulation and temporal fusion decision mechanism is adopted. This mechanism accumulates and fuses the probability sequences of sub-units in each decoding branch to obtain a stable sub-unit sequence. Temporal probability accumulation includes: determining a flexible time window corresponding to the pronunciation event of a single language unit; within this flexible time window, weighted averaging or probability summing of the multi-frame probability distributions of the same decoding branch output. A hysteresis decision logic is used when determining the flexible time window, specifically: opening the window when a speech start event is detected, and closing the window only when speech end features are detected in multiple consecutive time steps. This multi-frame probability temporal accumulation mechanism is based on the principle that the neural activity patterns of the brain are temporally consistent when uttering the same language unit. By fusing multi-frame information in the temporal dimension, random transient noise is canceled out, while the true neural representation pattern is enhanced and highlighted in the probability accumulation, resulting in stable and reliable sub-unit determination results. This significantly improves the anti-interference and smoothness of the decoding output and effectively avoids boundary jitter problems.

[0057] Step S150: The stable sub-unit sequence is validated and combined according to the phonological rules to generate a valid syllable sequence.

[0058] Step S160: Disambiguate the syllable sequence based on semantic context to output continuous natural language text.

[0059] After obtaining the stable sub-unit sequence, a hierarchical error correction mechanism integrating hard phonological constraints and soft semantic constraints is employed: First, the stable sub-unit sequence is validated and combined according to phonological rules to generate a valid syllable sequence. Validation and combination according to phonological rules includes: verifying the stable sub-unit sequence based on a predefined phonological valid combination table; when the sub-units to be combined constitute illegal syllables, a suboptimal but valid candidate unit is selected from the probability sequence of each sub-unit to replace it, thereby generating a valid syllable sequence. This process, as a fusion of hard phonological constraints in the decoding process, fundamentally eliminates the generation of illegal syllables, reduces the search space, and lowers the dependence on the backend language model and the complexity of error correction. Then, the syllable sequence is disambiguated based on semantic context to output continuous natural language text. Disambiguation based on semantic context includes: inputting the syllable sequence into the language model, selecting the optimal word or sentence sequence based on the contextual probability, and outputting the continuous natural language text. This process, acting as a semantic soft constraint, resolves the ambiguity problem of homophones.

[0060] The aforementioned language decoding method based on EEG signals reduces learning difficulty and achieves zero-sample generalization through orthogonal decomposition, ensures stable real-time output through dynamic temporal fusion, and ensures legal and semantically accurate output through hierarchical constraints. It collaboratively achieves high-precision and high-stability decoding of tonal languages such as Chinese from EEG signals to continuous text. It systematically solves the main defects of the contradiction between huge classification space and data sparsity, easy loss or confusion of tonal information, poor real-time decoding stability, and difficulty in constraining illegal syllable combinations.

[0061] The language decoding device based on electroencephalogram (EEG) signals provided by the present invention is described below. The language decoding device based on EEG signals described below and the language decoding method based on EEG signals described above can be referred to and correspond to each other.

[0062] like Figure 2 As shown, in one embodiment, a language decoding device based on EEG signals includes an EEG signal preprocessing module 210, a neural feature time sequence input module 220, a sub-unit probability sequence output module 230, a time sequence probability accumulation and fusion module 240, a verification and combination module 250, and a disambiguation processing module 260.

[0063] The EEG signal preprocessing module 210 is used to collect the EEG signals of the subject and preprocess the EEG signals to obtain a neural feature time series.

[0064] The neural feature time series input module 220 is used to input the neural feature time series into the parallel decoding architecture. The parallel decoding architecture includes at least two decoding branches, each of which includes an independent neural network model for decoding sub-units of different orthogonal dimensions of language from the neural feature time series.

[0065] The subunit probability sequence output module 230 is used to simultaneously acquire the subunit probability sequence output by each decoding branch through a parallel decoding architecture.

[0066] The temporal probability accumulation and fusion module 240 is used to accumulate and fuse the sub-unit probability sequences of each decoding branch to obtain a stable sub-unit sequence.

[0067] The verification and combination module 250 is used to verify and combine the stable sub-unit sequences according to the phonological rules to generate a valid syllable sequence.

[0068] The disambiguation processing module 260 is used to disambiguate syllable sequences based on semantic context and output continuous natural language text.

[0069] Figure 3 This example illustrates a schematic diagram of the physical structure of an electronic device, which can be a smart terminal. Its internal structure diagram can be as follows: Figure 3As shown. The electronic device includes a processor, memory, and a network interface connected via a system bus. The processor provides computing and control capabilities. The memory includes a non-volatile storage medium and internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface is used to communicate with external terminals via a network connection. When the computer program is executed by the processor, it implements a language decoding method based on electroencephalogram (EEG) signals, which includes:

[0070] The subjects' electroencephalogram (EEG) signals were collected and preprocessed to obtain a time series of neural features.

[0071] The neural feature time series is input into the parallel decoding architecture, which includes at least two decoding branches. Each decoding branch includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series.

[0072] By using a parallel decoding architecture, the probability sequence of the sub-units output by each decoding branch can be obtained simultaneously.

[0073] The probability sequences of subunits in each decoding branch are accumulated and fused in a temporal manner to obtain a stable subunit sequence.

[0074] The stable sub-unit sequences are validated and combined according to phonological rules to generate a valid syllable sequence.

[0075] Disambiguation of syllable sequences is performed based on semantic context to output continuous natural language text.

[0076] Those skilled in the art will understand that Figure 3 The structure shown is merely a block diagram of a portion of the structure related to the present invention and does not constitute a limitation on the electronic device to which the present invention is applied. A specific electronic device may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.

[0077] On the other hand, the present invention also provides a computer storage medium storing a computer program, which, when executed by a processor, implements a language decoding method based on electroencephalogram (EEG) signals, the method comprising:

[0078] The subjects' electroencephalogram (EEG) signals were collected and preprocessed to obtain a time series of neural features.

[0079] The neural feature time series is input into the parallel decoding architecture, which includes at least two decoding branches. Each decoding branch includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series.

[0080] By using a parallel decoding architecture, the probability sequence of the sub-units output by each decoding branch can be obtained simultaneously.

[0081] The probability sequences of subunits in each decoding branch are accumulated and fused in a temporal manner to obtain a stable subunit sequence.

[0082] The stable sub-unit sequences are validated and combined according to phonological rules to generate a valid syllable sequence.

[0083] Disambiguation of syllable sequences is performed based on semantic context to output continuous natural language text.

[0084] In another aspect, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of an electronic device reads the computer instructions from the computer-readable storage medium, and when the processor executes the computer instructions, it implements a language decoding method based on electroencephalogram (EEG) signals, the method comprising:

[0085] The subjects' electroencephalogram (EEG) signals were collected and preprocessed to obtain a time series of neural features.

[0086] The neural feature time series is input into the parallel decoding architecture, which includes at least two decoding branches. Each decoding branch includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series.

[0087] By using a parallel decoding architecture, the probability sequence of the sub-units output by each decoding branch can be obtained simultaneously.

[0088] The probability sequences of subunits in each decoding branch are accumulated and fused in a temporal manner to obtain a stable subunit sequence.

[0089] The stable sub-unit sequences are validated and combined according to phonological rules to generate a valid syllable sequence.

[0090] Disambiguation of syllable sequences is performed based on semantic context to output continuous natural language text.

[0091] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. This computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by this invention can include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory.

[0092] By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM), RAMbus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

[0093] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0094] The above-described embodiments are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the appended claims.

Claims

1. A language decoding method based on electroencephalogram signals, characterized in that, The method includes: The subject's electroencephalogram (EEG) signals were collected and preprocessed to obtain a neural feature time series. The neural feature time series is input into a parallel decoding architecture, which includes at least two decoding branches, each of which includes an independent neural network model for decoding sub-units of language with different orthogonal dimensions from the neural feature time series. The parallel decoding architecture simultaneously obtains the sub-unit probability sequence output by each decoding branch; The probability sequences of subunits in each decoding branch are accumulated and fused in a temporal manner to obtain a stable subunit sequence. The stabilized sub-unit sequences are validated and combined according to phonological rules to generate valid syllable sequences. The syllable sequence is disambiguated based on semantic context to output continuous natural language text; The different orthogonal dimensions of the language include the acoustic segment dimension and the suprasegmental pitch dimension; the at least two decoding branches include a first decoding branch for decoding segment dimension subunits and a second decoding branch for decoding pitch dimension subunits; The time-series probability accumulation includes: Determine a flexible time window corresponding to the pronunciation event of a single language unit; Within the elastic time window, the probability distributions of multiple frames output from the same decoding branch are weighted averaged or summed. Specifically, the hysteresis determination logic is used when determining the elastic time window, including: The window opens when a voice start event is detected. The window is only considered closed when the end-of-speech feature is detected at multiple consecutive time steps.

2. The electroencephalogram-based language decoding method of claim 1, wherein, The different orthogonal dimensions of the language include initials, finals, and tones, and the parallel decoding architecture includes an initial decoding branch, a final decoding branch, and a tone decoding branch.

3. The electroencephalogram-based language decoding method of claim 1, wherein, The different orthogonal dimensions of the language include a syllable primitive dimension and a tone dimension, and the at least two decoding branches include a first decoding branch for decoding untone syllable subunits and a second decoding branch for decoding tone subunits.

4. The electroencephalogram-based language decoding method of claim 1, wherein, The different orthogonal dimensions of the language include a phoneme sequence dimension and a tone dimension; the at least two decoding branches include a first decoding branch for decoding phoneme sequences and a second decoding branch for decoding tone subunits.

5. The electroencephalogram-based language decoding method according to any one of claims 1 to 4, characterized in that, Each of the decoding branches adopts a connectionist temporal classification mechanism, which uses blank labels to handle the alignment problem between neural feature sequences and variable-length sub-unit sequences and outputs the sub-unit probability sequence.

6. The electroencephalogram-based language decoding method of claim 1, wherein, During the model training phase, multi-task training is performed on all decoding branches by jointly optimizing the loss function, which is the sum of the connectionist temporal classification losses of each decoding branch.

7. The electroencephalographic signal-based language decoding method of claim 1, 2, 3, or 4, wherein, The legality verification and combination according to the rules of the phonetic system includes: The stabilized sub-unit sequence is verified based on a predefined table of legal phonological combinations. When the subunits to be combined form an illegal syllable, a suboptimal but legal candidate unit is selected from the probability sequence of each subunit to replace it, so as to generate a legal syllable sequence.

8. The language decoding method based on electroencephalogram (EEG) signals according to claim 1, characterized in that, The disambiguation processing of the syllable sequence based on semantic context includes: The syllable sequence is input into the language model, and the optimal word or sentence sequence is selected based on the context probability, and the output is the continuous natural language text.

Citation Information

Patent Citations

CN120560496A
US20220301563A1

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

CN120560496A

US20220301563A1