Intranet voice hierarchical authority authentication method and device based on multi-level authority mixing

By constructing a multi-level permission algebraic model and a dynamic mixing weight calculation method, the problem of disconnect between permissions and mixing in the intranet voice control system was solved, achieving high security and high adaptability of voice mixing and authentication control in multi-user interaction scenarios, and improving the system's operational efficiency.

CN122268664APending Publication Date: 2026-06-23WU HAN LUO JI ZHI HUI KE JI YOU XIAN GONG SI

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WU HAN LUO JI ZHI HUI KE JI YOU XIAN GONG SI
Filing Date
2026-05-08
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

The existing intranet voice control system lacks a dynamic control mechanism that combines permissions and audio mixing in multi-user interaction scenarios. This makes high-privilege commands susceptible to interference from low-privilege noise, and the authentication process is lagging behind, with arbitration results lacking scientific validity.

Method used

A multi-level permission algebraic model is constructed, and a dynamic mixing weight calculation method is designed. Through a three-level authentication mechanism of terminal-side pre-authentication, server-side mid-authentication, and execution-side post-authentication, combined with a permission grid model, multi-dimensional conflict arbitration is carried out to achieve deep binding between user permission level and mixing weight.

Benefits of technology

It effectively reduces mixing conflicts caused by high-privilege critical commands being drowned out by low-privilege voices, improves the security and adaptability of the intranet voice control system, alleviates authentication lag issues, and enhances the scientific validity and reliability of arbitration results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122268664A_ABST
    Figure CN122268664A_ABST
Patent Text Reader

Abstract

This application discloses a method and apparatus for hierarchical voice authorization authentication in intranets based on multi-level permission mixing, belonging to the field of voice processing technology. The method includes: sending the voice packet corresponding to each voice stream to a pre-authentication module for digital signature verification to obtain an authentication flag for each voice stream; sending the voice activity detection energy value and permission level of each voice stream to a mixing weight calculation module to obtain a dynamic mixing weight for each voice stream; sending the dynamic mixing weight and authentication flag of each voice stream to a mixing synthesis module to obtain a mixed output; sending the mixed output to a conflict resolution module to obtain a target instruction; and sending the target instruction to a post-execution module to authenticate multiple voice streams using a permission grid model. By balancing voice energy and permission level through a dynamic weight function, the overwhelming of high-permission instructions by low-permission noise is reduced, improving the real-time performance and security of hierarchical voice authorization authentication in intranets.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application belongs to the field of speech processing technology, and in particular relates to a method and device for intranet speech hierarchical permission authentication based on multi-level permission mixing. Background Technology

[0002] Voice authentication is a core technology for intranet voice control systems in industrial control, command and dispatch, and intelligent buildings. Although voice recognition and permission verification technologies have gradually developed, in intranet scenarios with multi-party voice interaction, existing technologies still face many application challenges due to the lack of a dynamic management mechanism combining permissions and audio mixing. Current intranet voice authentication methods mainly fall into two categories: one is the method of mixing audio first and then authenticating, which, while achieving multi-channel voice fusion, suffers from authentication lag and is prone to causing sensitive commands to be executed incorrectly; the other is the method of authenticating first and then mixing audio, which, while avoiding permission risks, has a fixed mixing strategy and is prone to causing high-privilege critical commands to be drowned out by low-privilege audio.

[0003] Existing technologies suffer from the following major drawbacks in intranet voice control scenarios involving multiple users speaking simultaneously: First, access control is disconnected from audio mixing. There is a lack of a quantitative model that correlates user access levels with audio mixing weights, making it impossible to dynamically adjust the proportion of voice in the mixing based on access levels. This results in high-access commands being easily interfered with by low-access noise. Second, the authentication process is delayed. Traditional solutions often perform access verification after voice recognition, meaning sensitive commands are parsed before authentication, posing security risks to intranet operations. Third, conflict command arbitration methods are simplistic. Handling mutually exclusive voice commands often relies on a first-come, first-served basis or a single priority determination, failing to comprehensively consider multiple factors such as signal quality and user history, leading to unscientific arbitration results. Fourth, the access control model lacks mathematical support, failing to establish standardized access level relationships and authentication rules, resulting in insufficient rigor and auditability of access verification.

[0004] Therefore, how to build a dynamic control system that deeply integrates permissions and mixing, realize hierarchical authentication throughout the entire process of mixing, and establish a multi-dimensional conflict command arbitration mechanism to solve the problems of mixing conflicts, authentication delays and unreasonable arbitration in intranet multi-voice interactions are urgent technical challenges that need to be addressed. Summary of the Invention

[0005] This application aims to address at least one of the technical problems existing in the prior art. To this end, this application proposes an intranet voice hierarchical permission authentication method and device based on multi-level permission mixing. By constructing a multi-level permission algebraic model and designing a dynamic mixing weight calculation method, it achieves deep binding between user permission levels and mixing weights, endowing the intranet voice control system with permission-aware dynamic mixing capabilities. This effectively reduces mixing conflicts where high-permission critical commands are submerged by low-permission voice, and achieves higher security and more adaptable voice mixing and authentication control in multi-user interaction scenarios.

[0006] To address the aforementioned problems, according to a first aspect of the present invention, a method for hierarchical voice authorization in an intranet based on multi-level permission mixing is provided, the method comprising: The voice acquisition module acquires multiple voice streams from the intranet and extracts the voice packets corresponding to each voice stream. The voice packets include user identity identifiers, voice activity detection energy values, and permission levels. The voice packet corresponding to each voice stream is sent to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream. The speech activity detection energy value and permission level of each speech stream are sent to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. The dynamic mixing weights and authentication flags of each audio stream are sent to the mixing and synthesis module to obtain the mixed output. The mixing output is sent to the conflict sanctions module to obtain the target instruction; The target instruction is sent to the post-execution module, which uses a permission grid model to authenticate multiple voice streams, thereby achieving hierarchical authentication of multiple voice streams.

[0007] According to one embodiment of this application, the step of sending the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream includes: The voice packet corresponding to each voice stream is sent to the pre-authentication module. The module then checks whether the permission level of each voice stream is lower than the minimum access threshold of the scene. If there is a voice stream with a permission level lower than the minimum access threshold of the scene, the authentication flag of that voice stream is set to 0; otherwise, the authentication flag of that voice stream is set to 1.

[0008] According to one embodiment of this application, the step of sending the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream includes: The speech activity detection energy value of each speech stream is input into the energy normalization function for processing to obtain the energy weight components of each speech stream. The permission level of each audio stream is input into the permission normalization function for processing to obtain the permission weight component of each audio stream. Multiply the balance factor by the energy weight component to obtain the first component, and multiply the complement of the balance factor by the authority weight component to obtain the second component. The first component is added to the second component to obtain the dynamic mixing weights for each speech stream.

[0009] According to one embodiment of this application, the step of sending the dynamic mixing weights and authentication flags of each speech stream to the mixing and synthesis module to obtain the mixed output includes: The dynamic mixing weights, authentication flags, and audio data of each audio stream are multiplied together to obtain the weighted audio components of each audio stream. The weighted audio components of all speech streams are summed to obtain the mixed output.

[0010] According to one embodiment of this application, sending the mixing output to the conflict resolution module to obtain the target instruction includes: The mixed output is sent to the conflict resolution module, where the speech recognition module performs semantic parsing, extracts the instructions for each speech stream, and detects whether there are mutually exclusive instructions. If a mutual exclusion instruction is detected, calculate the arbitration index of each mutual exclusion instruction, and add the instruction with the largest arbitration index to the target instruction. If no mutual exclusion instruction is detected, add the instruction to the target instruction to obtain the target instruction.

[0011] According to one embodiment of this application, sending the target instruction to the post-execution module and authenticating the multi-channel voice streams through a permission grid model includes: The target instruction is sent to the post-execution module, which verifies whether the user's permissions meet the requirements of the target instruction through the permission grid model, and then obtains the execution result. The historical credibility factor is updated based on the execution results to authenticate multiple voice streams.

[0012] According to one embodiment of this application, the calculation formula for the permission grid model is as follows:

[0013] in, User permission levels, For action, Mapping permissions requirements.

[0014] According to a second aspect of the present invention, an intranet voice hierarchical permission authentication device based on multi-level permission mixing is provided, the device comprising: The acquisition module is used to acquire multiple voice streams in the intranet through the voice acquisition module, and extract the voice packet corresponding to each voice stream. The voice packet includes user identity identifier, voice activity detection energy value and permission level. The pre-authentication module is used to send the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification, and obtain the authentication flag bit of each voice stream. The mixing weight calculation module is used to send the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. The audio mixing and synthesis module is used to send the dynamic mixing weights and authentication flags of each audio stream to the audio mixing and synthesis module to obtain the mixed output; The conflict sanctions module is used to send the mixing output to the conflict sanctions module to obtain the target instruction; The post-execution module is used to send the target instruction to the post-execution module, and to authenticate the multi-channel voice stream through the permission grid model to realize hierarchical authentication of the multi-channel voice stream.

[0015] According to a third aspect of the present invention, an electronic device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in the first aspect above.

[0016] According to a fourth aspect of the present invention, a non-transitory computer-readable storage medium is provided, on which a computer program is stored, wherein when the computer program is executed by a processor, it implements the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in the first aspect above.

[0017] According to a fifth aspect of the present invention, a chip is provided, the chip including a processor and a communication interface, the communication interface being coupled to the processor, the processor being used to run programs or instructions to implement the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in the first aspect.

[0018] According to a sixth aspect of the present invention, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in the first aspect above.

[0019] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application.

[0020] The present invention provides an intranet voice hierarchical permission authentication method based on multi-level permission mixing, which has the following advantages over the prior art: (1) This invention realizes the deep binding of user permission level and mixing weight by constructing a multi-level permission algebra model and designing a dynamic mixing weight calculation method, which gives the intranet voice control system permission awareness dynamic mixing capability, effectively reduces the mixing conflict of high-permission key instructions being submerged by low-permission voice, and realizes higher security and higher adaptability of voice mixing and authentication control in multi-user interaction scenarios.

[0021] (2) This invention employs a three-tiered authentication mechanism—terminal-side pre-authentication, server-side mid-authentication, and execution-side post-authentication—to permeate the authentication process throughout the entire process of pre-mixing, mixing, and post-mixing. This effectively improves the problem of authentication lag and reduces the processing cost of invalid voice packets. Furthermore, the mathematical support of the permission grid model enables more stringent judgment rules and higher auditability for permission verification. In addition, by designing a multi-dimensional conflict arbitration mechanism that integrates permission level, signal-to-noise ratio, and historical reliability, the decision-making of mutually exclusive commands becomes more scientific and reasonable, significantly improving the accuracy and reliability of intranet voice control command execution.

[0022] (3) By deeply integrating historical credibility factors with authentication and arbitration processes, this invention achieves dynamic linkage between user behavior data and permission control, making the intranet voice authentication system more adaptive and intelligent. At the same time, all control strategies rely on quantitative models to achieve parameterized adjustment, making permission mixing and command arbitration more accurate and flexible, effectively improving the overall operating efficiency of the intranet voice control system. Attached Figure Description

[0023] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which: Figure 1 This is one of the flowcharts illustrating the intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment; Figure 2 This is a diagram of the hierarchical permission authentication system architecture provided in the embodiments of this application; Figure 3 This is a permission grid structure diagram provided in the embodiments of this application; Figure 4 This is a graph of the dynamic mixing weight function provided in the embodiments of this application; Figure 5 This is the second flowchart of the intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in the embodiments of this application; Figure 6 This is a schematic diagram of the conflict arbitration mechanism provided in the embodiments of this application; Figure 7 This is a schematic diagram of the intranet voice hierarchical permission authentication device based on multi-level permission mixing provided in this application embodiment; Figure 8 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0024] The technical solutions of the embodiments of this application will be clearly described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this application. All other embodiments obtained by those skilled in the art based on the embodiments of this application are within the scope of protection of this application.

[0025] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and not to describe a specific order or sequence. It should be understood that such use of data can be interchanged where appropriate to allow embodiments of this application to be implemented in orders other than those illustrated or described herein, and the objects distinguished by "first," "second," etc., are generally of the same class and the number of objects is not limited; for example, a first object can be one or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0026] The following description, in conjunction with the accompanying drawings, details the intranet voice hierarchical permission authentication method based on multi-level permission mixing, the intranet voice hierarchical permission authentication device based on multi-level permission mixing, the electronic device, and the readable storage medium provided in this application, through specific embodiments and application scenarios.

[0027] Among them, the intranet voice hierarchical permission authentication method based on multi-level permission mixing can be applied to the terminal, specifically executed by the hardware or software in the terminal.

[0028] The terminal includes, but is not limited to, portable communication devices such as mobile phones or tablets with touch-sensitive surfaces (e.g., touchscreen displays and / or touchpads). It should also be understood that, in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer with touch-sensitive surfaces (e.g., touchscreen displays and / or touchpads).

[0029] The following embodiments describe a terminal including a display and a touch-sensitive surface. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and joystick.

[0030] The intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment can be executed by an electronic device or a functional module or entity in an electronic device that can implement the intranet voice hierarchical permission authentication method based on multi-level permission mixing. The electronic devices mentioned in this application embodiment include, but are not limited to, mobile phones, tablets, computers, cameras, and wearable devices. The following uses an electronic device as the execution subject to illustrate the intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment.

[0031] Figure 1 This is one of the flowcharts illustrating the intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment, such as... Figure 1 As shown, the intranet voice hierarchical permission authentication method based on multi-level permission mixing includes steps 110, 120, 130, 140, 150 and 160.

[0032] Step 110: Acquire multiple voice streams in the intranet through the voice acquisition module, and extract the voice packet corresponding to each voice stream. The voice packet includes user identity identifier, voice activity detection energy value and permission level. It's easy to understand that data is collected from the intranet. Each audio stream contains a unique identifier and access level. etc. as voice packs.

[0033] Step 120: Send the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream. In some embodiments, sending the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream includes: The voice packet corresponding to each voice stream is sent to the pre-authentication module. The module then checks whether the permission level of each voice stream is lower than the minimum access threshold of the scene. If there is a voice stream with a permission level lower than the minimum access threshold of the scene, the authentication flag of that voice stream is set to 0; otherwise, the authentication flag of that voice stream is set to 1.

[0034] For example, the voice packet is digitally signed and verified on the terminal side. If the permission level is lower than the minimum access threshold of the scenario, the voice packet is discarded directly.

[0035] In this embodiment, by performing pre-authentication on the terminal side, the network transmission and server processing overhead of invalid voice packets are reduced, thereby improving real-time performance.

[0036] Step 130: Send the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. In some embodiments, sending the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream includes: The speech activity detection energy value of each speech stream is input into the energy normalization function for processing to obtain the energy weight components of each speech stream. The permission level of each audio stream is input into the permission normalization function for processing to obtain the permission weight component of each audio stream. Multiply the balance factor by the energy weight component to obtain the first component, and multiply the complement of the balance factor by the authority weight component to obtain the second component. The first component is added to the second component to obtain the dynamic mixing weights for each speech stream.

[0037] For example, on the mixing server side, there is a... The first voice stream input is calculated according to the following formula. Road voice stream at time Mix weights:

[0038] in, Detect energy values ​​for speech activity. For the first Road voice stream at time Mixing weights, The Sigmoid function is calculated using the following formula:

[0039] Mapping speech energy to interval, The permission normalization function is calculated using the following formula:

[0040] The higher the privilege ( The smaller the value, the larger the value. This is a balancing factor between energy and permissions, with a default value of 0.4. This is the energy sensitivity coefficient, with a default value of 5. This is the energy threshold, with a default value of 0.3.

[0041] Step 140: Send the dynamic mixing weights and authentication flags of each speech stream to the mixing and synthesis module to obtain the mixed output; In some embodiments, sending the dynamic mixing weights and authentication flags of each speech stream to the mixing and synthesis module to obtain the mixed output includes: The dynamic mixing weights, authentication flags, and audio data of each audio stream are multiplied together to obtain the weighted audio components of each audio stream. The weighted audio components of all speech streams are summed to obtain the mixed output.

[0042] The formula for calculating the mixing output is as follows:

[0043] in, This is the authentication flag. Set to 0 when user permissions are insufficient. For audio data, This is for mixing output.

[0044] Step 150: Send the mixing output to the conflict resolution module to obtain the target instruction; In some embodiments, sending the mixing output to the conflict sanctions module to obtain the target instruction includes: The mixed output is sent to the conflict resolution module, where the speech recognition module performs semantic parsing, extracts the instructions for each speech stream, and detects whether there are mutually exclusive instructions. If a mutual exclusion instruction is detected, calculate the arbitration index of each mutual exclusion instruction, and add the instruction with the largest arbitration index to the target instruction. If no mutual exclusion instruction is detected, add the instruction to the target instruction to obtain the target instruction.

[0045] It's easy to understand that when multiple voice streams contain mutually exclusive instructions, the arbitration index of each instruction is calculated, and the instruction with the highest arbitration index is selected for execution. The formula for calculating the arbitration index is:

[0046] in, Signal-to-noise ratio, value range , This is a historical credibility factor, dynamically updated based on users' historical behavior. These are the weighting coefficients; the default value is... , , It is the reciprocal of the permission level; the higher the permission, the larger the value.

[0047] The final execution instruction for a mutual exclusion instruction is to sum the arbitration indices of all speakers who support the same instruction, and select the instruction with the largest sum for execution. The calculation formula is as follows:

[0048] in, To support instructions The set of all speakers.

[0049] In this embodiment, by introducing a multi-dimensional arbitration index that integrates authority, signal quality, and historical credibility, the decision-making becomes more scientific and the rationality of arbitration is improved.

[0050] Step 160: Send the target instruction to the post-execution module, and authenticate the multi-channel voice streams through the permission grid model to achieve hierarchical authentication of the multi-channel voice streams.

[0051] In some embodiments, sending the target instruction to the post-execution module and authenticating the multiple voice streams using a permission grid model includes: The target instruction is sent to the post-execution module, which verifies whether the user's permissions meet the requirements of the target instruction through the permission grid model, and then obtains the execution result. The historical credibility factor is updated based on the execution results to authenticate multiple voice streams.

[0052] It is easy to understand that a permission lattice structure is formed by constructing a multi-level permission algebraic model, and the effective permission output is defined as:

[0053] Define the authentication threshold function:

[0054] Define permission set ,in With the highest level of authority, This is the lowest level of privilege. The privilege levels satisfy the total order relation:

[0055] The smaller the value, the higher the privilege level.

[0056] Permission grid structure: All permission levels form a chained grid, and the intersection of any two permissions is defined as:

[0057] The union operation of any two permissions is defined as:

[0058] Authentication function: Defines the mapping of permission requirements. ,in This is a set of actions. The authentication function is:

[0059] in, User permission levels, For action, Mapping permissions requirements.

[0060] Figure 2 This is a diagram of the hierarchical permission authentication system architecture provided in the embodiments of this application, such as... Figure 2 As shown in the diagram, the lines connecting the modules represent the data flow, and the arrows indicate the direction of data transmission. The system is divided into three layers from top to bottom: the terminal layer, the service layer, and the execution layer. The terminal layer deploys a pre-authentication module for voice acquisition and digital signature verification; the service layer deploys a mixing server, which includes a mixing weight calculation module, a mixing synthesis module, and a conflict arbitration module; the execution layer deploys a post-execution module to execute the final ruling instructions.

[0061] The terminal layer comprises multiple voice terminal devices, each with a built-in voice acquisition module and a pre-authentication module. The voice acquisition module collects the user's voice signal, while the pre-authentication module verifies the user's identity using a digital signature and generates an authentication flag. The authenticated voice packet carries the user's permission level identifier. It is transmitted to the service layer via the network.

[0062] The service layer includes a mixing server, which integrates the following modules: Mix weight calculation module: based on the formula Calculate the real-time weights of each audio stream; The audio mixing and synthesis module weights and superimposes the audio signals from each stream according to their respective weights to generate a mixed output. ; Conflict arbitration module: When a mutual exclusion instruction is detected, it is determined according to the arbitration index. Calculate the overall score of each instruction to determine the final instruction to be executed; Speech recognition module: Performs semantic analysis on the mixed speech and extracts the instruction content.

[0063] The execution layer contains multiple execution devices, each with a built-in post-execution module. The post-execution module receives the final instructions from the service layer and executes them according to the permission grid model. Perform final authentication, and execute the corresponding operation after successful verification.

[0064] Figure 3 This is a permission grid structure diagram provided in the embodiments of this application, such as... Figure 3 As shown, the permission grid is a chain structure containing four permission levels: System administrator, highest privilege level, value is 1; Security officer, with a privilege level of 2; : Regular operator, with a permission level of 3; Visitor: Lowest access level, value is 4.

[0065] The diagram is arranged from top to bottom, with the highest level of access control at the top and the lowest level at the bottom. The intersection operation of any two access levels (…) The definition is to take the level with higher authority (lower numerical value) from the two options, i.e. Union operation on any two permissions ( The definition is to take the level with the lower (larger) privilege level between the two, i.e. .

[0066] The authentication function is also marked in the figure. Example: When an action (such as "system shutdown") requires a privilege level of 10 ... At that time, only those with a permission level of or Only users with specific privileges can perform this action. and The user's authentication result is 0.

[0067] Figure 4 This is a graph of the dynamic mixing weight function provided in the embodiments of this application, such as... Figure 4 As shown, the curve shapes of the two component functions in the dynamic mixing weight function are illustrated. Figure 4 The left figure shows the speech energy normalization function. The curve. The horizontal axis represents the speech activity detection energy value. value range The vertical axis represents the energy weight component. The curve is S-shaped (Sigmoid), and when the energy is below a threshold... The weight is close to 0 at time, and rises rapidly to 1 when the energy exceeds a threshold. This is indicated in the diagram. , A typical curve at that time.

[0068] Figure 4 The right image shows the permission normalization function. The curve. The horizontal axis represents the permission level value. (1~4); the vertical axis represents the permission weight components. The curve shows a monotonically decreasing trend, with higher permissions (smaller values) having greater weights. The graph marks the weight values ​​corresponding to the four permission levels, among which... Highest weight, It has the lowest weight.

[0069] Final Mix Weights It is the weighted sum of the two components mentioned above, the balance factor. Control the relative importance of energy components and authority components.

[0070] Figure 5This is the second flowchart illustrating the intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment, as follows: Figure 5 As shown in the diagram, each decision node is represented by a diamond, and each processing node by a rectangle. The arrows indicate the flow of the process. The first level (pre-authentication) verifies the digital signature on the terminal side and filters out invalid identities. The second level (mid-authentication) performs real-time authentication on the mixing server side based on dynamic weights, with low-privilege voice messages having their weights reduced. The third level (post-authentication) verifies the legality of the final instruction on the execution side based on the permission grid model.

[0071] Phase 1: Pre-authentication (terminal side) 1. The voice acquisition module collects user voice data; 2. Extract user identity and digital signature; 3. Verify the validity of the digital signature: if invalid, discard the voice packet; if valid, continue. 4. Verify whether the user's permission level meets the scenario's access threshold: if not, discard the voice packet; if so, generate an authentication flag. and include permission identifiers in the voice pack. Send to the service layer.

[0072] Phase Two: Centralized Authentication (Mixer Server Side) 1. The mixing server receives multiple audio streams; 2. The mixing weight calculation module calculates the dynamic weights of each speech stream according to the formula. ; 3. The mixing and synthesis module generates a weighted mix output based on the weights. ; 4. The speech recognition module performs semantic analysis on the mixed output to extract the instruction content; 5. If a mutual exclusion command is detected, the conflict arbitration module is triggered, and the arbitration index is used to determine the appropriate arbitration method. Calculate the score for each instruction and select the instruction with the highest score.

[0073] Phase 3: Post-authentication (execution side) 1. The post-execution module receives the final instruction; 2. Verify whether the user's permissions meet the instruction requirements based on the permission grid model: if they do, execute the instruction; if not, refuse to execute and log the result. 3. The execution results are fed back to the system log, which is used to update the historical reliability factor. .

[0074] Figure 6 This is a schematic diagram of the conflict arbitration mechanism provided in the embodiments of this application, such as... Figure 6As shown in the figure, the arbitration index and the sum of the command scores for each user are displayed visually in the form of a bar chart, and the final arbitration result is marked. This illustrates the working principle of the conflict arbitration module when multiple voice streams contain mutually exclusive commands.

[0075] The example scenario in the image involves three users speaking simultaneously: User A: Permission Level (Administrator), signal-to-noise ratio 0.9, historical reliability 0.95, command is "shut down"; User B: permission level (Security Officer), signal-to-noise ratio 0.7, historical reliability 0.85, command is "reboot"; User C: permission level (Operator), signal-to-noise ratio 0.5, historical reliability 0.75, command is "shut down".

[0076] Arbitration Index Calculation:

[0077]

[0078]

[0079] Sum of instruction scores: "Shutdown" command score =

[0080] "Restart" command score =

[0081] Final execution instruction: Select the "shutdown" instruction with the higher score and execute it.

[0082] In this embodiment, by constructing a multi-level permission algebraic model and designing a dynamic mixing weight calculation method, a deep binding between user permission levels and mixing weights is achieved, giving the intranet voice control system permission awareness and dynamic mixing capabilities. This effectively reduces mixing conflicts where high-permission key instructions are drowned out by low-permission voice, and achieves higher security and more adaptable voice mixing and authentication control in multi-user interaction scenarios.

[0083] The intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application embodiment can be executed by an intranet voice hierarchical permission authentication device based on multi-level permission mixing. This application embodiment uses the intranet voice hierarchical permission authentication device based on multi-level permission mixing executing the intranet voice hierarchical permission authentication method based on multi-level permission mixing as an example to illustrate the intranet voice hierarchical permission authentication device based on multi-level permission mixing provided in this application embodiment.

[0084] This application also provides an intranet voice hierarchical permission authentication device based on multi-level permission mixing, such as... Figure 7As shown, the intranet voice hierarchical permission authentication device based on multi-level permission mixing includes: an acquisition module 710, a pre-authentication module 720, a mixing weight calculation module 730, a mixing synthesis module 740, a conflict sanctions module 750, and a post-execution module 760.

[0085] The acquisition module 710 is used to acquire multiple voice streams in the intranet through the voice acquisition module, and extract the voice packet corresponding to each voice stream. The voice packet includes user identity identifier, voice activity detection energy value and permission level. The pre-authentication module 720 is used to send the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification, and obtain the authentication flag bit of each voice stream. The mixing weight calculation module 730 is used to send the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. The mixing and synthesis module 740 is used to send the dynamic mixing weights and authentication flags of each speech stream to the mixing and synthesis module to obtain the mixed output; The conflict sanctions module 750 is used to send the mixing output to the conflict sanctions module to obtain the target instruction; The post-execution module 760 is used to send the target instruction to the post-execution module, and to authenticate the multi-channel voice stream through the permission grid model to realize hierarchical authentication of the multi-channel voice stream.

[0086] The intranet voice hierarchical permission authentication method based on multi-level permission mixing provided in this application's embodiments, through the deployment of a three-level authentication mechanism—terminal-side pre-authentication, server-side mid-authentication, and execution-side post-authentication—penetrates the authentication process throughout the entire process before, during, and after mixing. This effectively improves the problem of authentication lag and reduces the processing cost of invalid voice packets. Furthermore, the mathematical support of the permission grid model allows for stricter judgment rules and higher auditability in permission verification. In addition, by designing a multi-dimensional conflict arbitration mechanism that integrates permission level, signal-to-noise ratio, and historical reliability, the decision-making of mutually exclusive commands becomes more scientific and reasonable, significantly improving the accuracy and reliability of intranet voice control command execution.

[0087] The intranet voice hierarchical permission authentication device based on multi-level permission mixing provided in this application embodiment can achieve... Figures 1 to 6 The various processes implemented in the embodiment of the intranet voice hierarchical permission authentication method based on multi-level permission mixing will not be described in detail here to avoid repetition.

[0088] In some embodiments, such as Figure 8As shown, this application embodiment also provides an electronic device 800, including a processor 801, a memory 802, and a computer program stored on the memory 802 and executable on the processor 801. When the program is executed by the processor 801, it implements the various processes of the above-described intranet voice hierarchical permission authentication method embodiment based on multi-level permission mixing, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0089] It should be noted that the electronic devices in the embodiments of this application include the mobile electronic devices and non-mobile electronic devices described above.

[0090] This application also provides a non-transitory computer-readable storage medium storing a computer program. When the computer program is executed by a processor, it implements the various processes of the above-described intranet voice hierarchical permission authentication method based on multi-level permission mixing, and achieves the same technical effect. To avoid repetition, it will not be described again here.

[0091] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0092] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the above-described intranet voice hierarchical permission authentication method based on multi-level permission mixing.

[0093] The processor is the processor in the electronic device described in the above embodiments. The readable storage medium includes computer-readable storage media, such as computer read-only memory (ROM), random access memory (RAM), magnetic disk, or optical disk.

[0094] This application embodiment also provides a chip, which includes a processor and a communication interface. The communication interface and the processor are coupled. The processor is used to run programs or instructions to implement the various processes of the above-described embodiment of the intranet voice hierarchical permission authentication method based on multi-level permission mixing, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0095] It should be understood that the chip mentioned in the embodiments of this application may also be referred to as a device-level chip, device chip, chip device, or on-chip device chip, etc.

[0096] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element. Furthermore, it should be noted that the scope of the methods and apparatuses in the embodiments of this application is not limited to performing functions in the order shown or discussed, but may also include performing functions substantially simultaneously or in the reverse order, depending on the functions involved. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

[0097] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a computer software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal (which may be a mobile phone, computer, server, or network device, etc.) to execute the intranet voice hierarchical permission authentication method based on multi-level permission mixing of the various embodiments of this application.

[0098] In the description of this application, "first feature" and "second feature" may include one or more of the features.

[0099] In the description of this application, "multiple" means two or more.

[0100] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.

[0101] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "illustrative embodiment," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0102] Although embodiments of this application have been shown and described, those skilled in the art will understand that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of this application, the scope of which is defined by the claims and their equivalents.

Claims

1. A hierarchical voice permission authentication method for intranet based on multi-level permission mixing, characterized in that, The method includes: The voice acquisition module acquires multiple voice streams from the intranet and extracts the voice packets corresponding to each voice stream. The voice packets include user identity identifiers, voice activity detection energy values, and permission levels. The voice packet corresponding to each voice stream is sent to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream. The speech activity detection energy value and permission level of each speech stream are sent to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. The dynamic mixing weights and authentication flags of each audio stream are sent to the mixing and synthesis module to obtain the mixed output. The mixing output is sent to the conflict sanctions module to obtain the target instruction; The target instruction is sent to the post-execution module, which uses a permission grid model to authenticate multiple voice streams, thereby achieving hierarchical authentication of multiple voice streams.

2. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 1, characterized in that, The step of sending the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification to obtain the authentication flag bit of each voice stream includes: The voice packet corresponding to each voice stream is sent to the pre-authentication module. The module then checks whether the permission level of each voice stream is lower than the minimum access threshold of the scene. If there is a voice stream with a permission level lower than the minimum access threshold of the scene, the authentication flag of that voice stream is set to 0; otherwise, the authentication flag of that voice stream is set to 1.

3. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 2, characterized in that, The step of sending the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream includes: The speech activity detection energy value of each speech stream is input into the energy normalization function for processing to obtain the energy weight components of each speech stream. The permission level of each audio stream is input into the permission normalization function for processing to obtain the permission weight component of each audio stream. Multiply the balance factor by the energy weight component to obtain the first component, and multiply the complement of the balance factor by the authority weight component to obtain the second component. The first component is added to the second component to obtain the dynamic mixing weights for each speech stream.

4. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 3, characterized in that, The step of sending the dynamic mixing weights and authentication flags of each speech stream to the mixing and synthesis module to obtain the mixed output includes: The dynamic mixing weights, authentication flags, and audio data of each audio stream are multiplied together to obtain the weighted audio components of each audio stream. The weighted audio components of all speech streams are summed to obtain the mixed output.

5. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 4, characterized in that, The step of sending the mixed output to the conflict sanctions module to obtain the target instruction includes: The mixed output is sent to the conflict resolution module, where the speech recognition module performs semantic parsing, extracts the instructions for each speech stream, and detects whether there are mutually exclusive instructions. If a mutual exclusion instruction is detected, calculate the arbitration index of each mutual exclusion instruction, and add the instruction with the largest arbitration index to the target instruction. If no mutual exclusion instruction is detected, add the instruction to the target instruction to obtain the target instruction.

6. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 5, characterized in that, The step of sending the target instruction to the post-execution module and authenticating the multiple voice streams through the permission grid model includes: The target instruction is sent to the post-execution module, which verifies whether the user's permissions meet the requirements of the target instruction through the permission grid model, and then obtains the execution result. The historical credibility factor is updated based on the execution results to authenticate multiple voice streams.

7. The intranet voice hierarchical permission authentication method based on multi-level permission mixing according to claim 6, characterized in that, The calculation formula for the permission grid model is as follows: in, User permission levels, For action, Mapping permissions requirements.

8. An intranet voice hierarchical permission authentication device based on multi-level permission mixing, implemented using the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in any one of claims 1 to 7, characterized in that, The device includes: The acquisition module is used to acquire multiple voice streams in the intranet through the voice acquisition module, and extract the voice packet corresponding to each voice stream. The voice packet includes user identity identifier, voice activity detection energy value and permission level. The pre-authentication module is used to send the voice packet corresponding to each voice stream to the pre-authentication module for digital signature verification, and obtain the authentication flag bit of each voice stream. The mixing weight calculation module is used to send the speech activity detection energy value and permission level of each speech stream to the mixing weight calculation module to obtain the dynamic mixing weight of each speech stream. The audio mixing and synthesis module is used to send the dynamic mixing weights and authentication flags of each audio stream to the audio mixing and synthesis module to obtain the mixed output; The conflict sanctions module is used to send the mixing output to the conflict sanctions module to obtain the target instruction; The post-execution module is used to send the target instruction to the post-execution module, and to authenticate the multi-channel voice stream through the permission grid model to realize hierarchical authentication of the multi-channel voice stream.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements the intranet voice hierarchical permission authentication method based on multi-level permission mixing as described in any one of claims 1 to 7.