An audio intelligent collaborative interaction method based on big data processing
By constructing collaborative hypergraph data through big data processing and combining it with variational Bayesian hierarchical context inference, the problem of inaccurate device selection and control in collaborative interaction of multiple audio devices is solved, realizing efficient and intelligent multi-device collaborative control and improving the system's adaptive capability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUNNAN CHIYU TECH CO LTD
- Filing Date
- 2026-03-10
- Publication Date
- 2026-06-12
Smart Images

Figure CN122194784A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent speaker interaction technology, and in particular to an intelligent collaborative interaction method for speakers based on big data processing. Background Technology
[0002] With the rapid development of smart home and Internet of Things (IoT) technologies, smart speakers are gradually becoming important human-computer interaction terminals in the home environment. Smart speakers not only provide basic functions such as voice interaction, music playback, and information retrieval, but also can interact with various smart terminal devices to create a more convenient and intelligent home living environment. In practical applications, multiple speaker devices are often deployed in a user's home environment, such as living room speakers, bedroom speakers, and kitchen speakers. Through the collaborative work of these speaker devices, functions such as cross-space voice interaction, synchronized playback, multi-device linkage control, and cross-device relay playback can be achieved. Therefore, how to achieve efficient and accurate collaborative interactive control in a multi-speaker environment has gradually become an important research direction in the field of smart speaker technology.
[0003] In existing technologies, collaborative interaction between multiple audio devices typically relies on preset rules or simple device linkage strategies. For example, after a user issues a voice command, the system usually selects a specific audio device as the responding device based on preset device priority or device distance, and that device then completes the voice response or audio playback. Furthermore, in cases involving multi-device playback or device linkage control, the system typically achieves simple synchronous control through a fixed master-slave device control relationship. However, these methods mostly rely on fixed rules or static configurations for device selection and control, lacking the ability to comprehensively analyze complex environmental factors and multi-source data. Summary of the Invention
[0004] One objective of this invention is to propose a smart collaborative interaction method for audio equipment based on big data processing. This invention fully utilizes multi-source data analysis, collaborative hypergraph modeling, incremental hypergraph frequent pattern mining, and variational Bayesian hierarchical context inference technology to uniformly model and analyze device status data, environmental element data, and interaction event data in a multi-audio equipment network environment, enabling continuous learning and optimization of the system. This invention can effectively improve the intelligence level of collaborative interaction among multi-audio equipment and has the advantages of high accuracy in collaborative decision-making, high efficiency in device collaboration, and strong system adaptability.
[0005] According to an embodiment of the present invention, a method for intelligent collaborative interaction of audio based on big data processing includes the following steps:
[0006] Collect and preprocess multi-source raw data in a multi-audio device network environment;
[0007] Construct collaborative hypergraph data based on preprocessed multi-source raw data;
[0008] Perform dynamic topology updates on the collaborative hypergraph data to obtain a dynamic graph sequence;
[0009] Perform cooperative pattern mining processing on the dynamic graph sequence. The cooperative pattern mining processing includes incremental hypergraph frequent pattern mining on the dynamic graph sequence to obtain a cooperative pattern set, and performing variational Bayesian hierarchical context inference on the cooperative pattern set to obtain scene state results and user intent results.
[0010] A set of candidate response actions is generated based on the scene state results and user intent results, and a collaborative decision-making process based on constraint optimization is performed on the set of candidate response actions to obtain device selection results and collaborative interaction control strategies.
[0011] Based on the device selection results, the target of the control command is determined, and the control command is generated according to the collaborative interaction control strategy. The control command is then sent to the target response audio device and the set of collaborative audio devices, and an execution log is generated.
[0012] Collect user feedback data, voice transmission data, and execution logs during the interaction process to generate feedback samples;
[0013] Online incremental updates are performed on the collaborative pattern set based on feedback samples, and online updates are also performed on the parameters of variational Bayesian hierarchical context inference.
[0014] Optionally, the multi-source raw data includes user voice data, environmental acoustic data, user historical interaction data, user location or near-field presence data, audio equipment operating status data, and network link status data between devices. The data preprocessing includes noise reduction filtering, missing data completion, anomaly removal, deduplication and merging, feature scale unification, and data quality labeling.
[0015] Optionally, the construction of the collaborative hypergraph data specifically includes:
[0016] Based on the audio device information in the multi-audio device networking environment, determine the set of audio device nodes and map each audio device to an audio device node;
[0017] Based on the preprocessed multi-source raw data, the node features of each audio device node are extracted, and node feature data is generated based on the node features.
[0018] The spatial adjacency relationship between the audio device nodes is determined based on the spatial location information of each audio device node, and the communication relationship between the audio device nodes is determined based on the communication status information between the audio devices. Node relationship data is then generated based on the spatial adjacency relationship and the communication relationship.
[0019] For each interaction, obtain the set of audio device nodes involved in the interaction, the environmental elements involved in the interaction, and the triggered interaction events, and map the audio device nodes involved in the interaction, environmental elements, and interaction events together into the same hyperedge to generate a hyperedge set;
[0020] Generate hyperedge representation data based on the membership relationship between audio device nodes and hyperedges;
[0021] The hyperedge weight is calculated based on the number of audio device nodes, environmental elements and interactive events corresponding to each hyperedge, and hyperedge weight data is generated.
[0022] The audio device node set, node feature data, node relationship data, hyperedge set, hyperedge representation data, and hyperedge weight data are integrated to generate collaborative hypergraph data.
[0023] Optionally, obtaining the dynamic graph sequence specifically includes:
[0024] The collaborative hypergraph data is divided into time windows, and a collaborative hypergraph data structure is generated for each time window. The collaborative hypergraph data in each time window includes a set of audio device nodes, a set of hyperedges, hyperedge representation data, and hyperedge weight data.
[0025] Within each time window, perform node weight update processing on the audio device node to generate the updated node weight;
[0026] Within each time window, perform edge weight update processing on the hyperedge to generate the updated hyperedge weight;
[0027] Node availability tags are generated based on the real-time operating status information of each audio device node;
[0028] The updated node weights, updated hyperedge weights, and node availability markers are integrated with the collaborative hypergraph data within the corresponding time window to generate dynamic graph data corresponding to the time window. The dynamic graph data is then arranged in chronological order to form a dynamic graph sequence showing the changing collaborative relationships of audio equipment over time.
[0029] Optionally, obtaining the scene state result and the user intent result specifically includes:
[0030] Traverse each time window in the dynamic graph sequence in chronological order, and extract the correlation data between the corresponding audio equipment, environmental elements and interactive events in each time window;
[0031] Construct candidate collaboration patterns based on relational data;
[0032] Within each time window, determine whether each candidate collaborative mode appears. When the audio equipment, environmental elements and interactive events in the candidate collaborative mode appear simultaneously within the corresponding time window, mark the candidate collaborative mode as appearing in the time window; otherwise, mark it as not appearing.
[0033] After traversing each time window in the dynamic graph sequence, the occurrence status of each candidate cooperative pattern in each time window is counted, the number of time windows in which the candidate cooperative pattern is in the occurrence state is recorded, and the total number of time windows in the dynamic graph sequence is obtained. The number of time windows in which the candidate cooperative pattern is in the occurrence state is divided by the total number of time windows to obtain the support of the candidate cooperative pattern in the dynamic graph sequence.
[0034] Candidate collaboration patterns with support greater than or equal to a preset support threshold are identified as frequent collaboration patterns, and all frequent collaboration patterns are aggregated to generate a collaboration pattern set.
[0035] When a new time window is added to the dynamic graph sequence, the set of collaborative patterns is incrementally updated according to the occurrence status of each candidate collaborative pattern in the new time window, so as to update the support of each collaborative pattern.
[0036] Convert the set of cooperative patterns into pattern observation data showing the occurrence of each cooperative pattern in the current time window;
[0037] Variational Bayesian hierarchical context inference is performed based on pattern observation data to obtain scene state results and user intent results.
[0038] Optionally, the determination of the device selection result and the collaborative interaction control strategy specifically includes:
[0039] Generate a set of candidate response actions based on the scene category in the scene status results and the intent category in the user intent results;
[0040] Based on the audio equipment operation status data, audio equipment communication status data and audio equipment spatial location information in the preprocessed multi-source raw data, the range of audio equipment that can participate in collaborative decision-making is determined, forming a collaborative audio equipment set, and audio equipment that is online and responsive is identified as the target audio equipment.
[0041] Construct collaborative decision-making constraints based on constraint optimization, wherein the collaborative decision-making constraints include the uniqueness constraint of the target response audio device, the membership constraint of the collaborative audio device set, and the mutual exclusion constraint between the target response audio device and the collaborative audio device set;
[0042] Based on the scene confidence in the scene state results and the intent confidence in the user intent results, action evaluation processing is performed on each candidate response action in the candidate response action set to obtain the action evaluation result of each candidate response action.
[0043] Based on the action evaluation results and collaborative decision constraints, a constraint optimization-based collaborative decision-making solution is executed, outputting the device selection results and collaborative interactive control strategy.
[0044] Optionally, the generation of the execution log specifically includes:
[0045] Based on the equipment selection results, the target of the control command is determined, the target response audio device is determined as the master device, and each audio device in the set of cooperative audio devices is determined as the slave device;
[0046] Generate a set of control instructions based on a collaborative interactive control strategy;
[0047] The control instruction set is sorted according to the execution sequence in the collaborative interaction control strategy to generate a control instruction execution sequence;
[0048] Based on the cross-device synchronization parameters in the collaborative interaction control strategy, a unified playback progress reference time is configured for the control commands involving audio playback in the control command execution sequence, and a corresponding playback progress offset is configured for each audio device based on the unified playback progress reference time.
[0049] The control command execution sequence is sent to each audio device in the target response audio device and the set of cooperative audio devices respectively;
[0050] During the execution of control commands, the execution status information of each audio device is collected, and an execution log is generated.
[0051] Optionally, the generation of the feedback sample specifically includes:
[0052] After an interaction is completed, user feedback data, voice transmission data and execution logs corresponding to the interaction are collected, and an interaction identifier is generated for the interaction.
[0053] User feedback data is organized based on interactive identifiers and user feedback scores are generated.
[0054] The voice transmission data is organized and voice transmission features are extracted based on the interaction identifier. The voice transmission features include voice transmission validity markers, voice transmission completeness, and voice transmission quality. A voice transmission score is generated based on the voice transmission validity markers, voice transmission completeness, and voice transmission quality.
[0055] The execution logs are organized and their features are extracted based on the interaction identifiers. The execution log features include the control instruction execution success rate, control instruction execution latency, and control instruction conflict count. An execution log score is generated based on the control instruction execution success rate, control instruction execution latency, and control instruction conflict count.
[0056] The user feedback score, voice transmission score, and execution log score are merged to generate a comprehensive feedback score.
[0057] Based on the interaction identifier, the comprehensive feedback score is associated and encapsulated with user feedback data, voice feedback data, and execution logs to generate feedback samples.
[0058] Optionally, the feedback samples are associated and matched with each collaborative mode in the collaborative mode set. Based on the user feedback data, voice feedback data, and execution logs recorded in the feedback samples, incremental statistics are performed on the collaborative modes corresponding to the interactions. The newly added interaction data is incorporated into the mode statistics information in the collaborative mode set to complete the online incremental update of the collaborative mode set. The feedback samples are input into the variational Bayesian hierarchical context inference process to update the parameters in the variational Bayesian hierarchical context inference online. By combining the user feedback data, voice feedback data, and execution logs in the newly added feedback samples, the original parameters are iteratively adjusted so that the updated parameters reflect the changing trend of the scene state results and user intent results corresponding to the latest interaction data, thus completing the online update of the variational Bayesian hierarchical context inference parameters.
[0059] The beneficial effects of this invention are:
[0060] This invention proposes a smart collaborative interaction method for audio devices based on big data processing. By uniformly collecting and preprocessing multi-source raw data during user interaction in a multi-audio device network environment, and constructing a collaborative hypergraph based on this raw data, it can uniformly model the spatial adjacency relationships, communication relationships, and correlations between interactive events among audio devices. By mapping multiple audio devices, environmental elements, and interactive events to nodes and hyperedges in the collaborative hypergraph structure, this invention can more comprehensively characterize the complex collaborative relationships between multiple audio devices. This allows the system to perform structured analysis of the multi-device interaction process from a holistic perspective. Compared to traditional methods that only process based on a single device state or simple rules, this invention can more accurately describe the collaborative relationships between multiple devices, thereby improving the accuracy and completeness of multi-audio device collaborative interaction analysis.
[0061] This invention constructs a dynamic graph sequence that reflects changes in device state and interaction relationships by performing dynamic topology updates on collaborative hypergraph data. Incremental hypergraph frequent pattern mining is then performed based on this dynamic graph sequence to obtain a set of collaborative patterns formed by multiple audio devices during actual interaction. By performing variational Bayesian hierarchical context inference on the collaborative pattern set, scene state and user intent can be more accurately identified based on multi-device collaborative relationships. This effectively solves the problem of low recognition accuracy caused by relying solely on single voice commands or single device information for scene recognition in existing technologies, enabling the system to more accurately understand user interaction needs in complex multi-device environments.
[0062] After completing the scene state and user intent recognition, this invention generates a set of candidate response actions and executes constraint optimization-based collaborative decision processing to achieve a reasonable selection of the target response audio device and the set of collaborative audio devices, and generates a collaborative interaction control strategy based on this. In this way, in a multi-audio device environment, the optimal device combination can be determined comprehensively based on various factors such as device operating status, communication status, and spatial location, and the execution tasks between master and slave devices can be reasonably allocated. This significantly improves the efficiency of multi-device collaborative task execution and reduces response delays or control conflicts caused by unreasonable device selection.
[0063] This invention generates control commands based on device selection results and collaborative interaction control strategies, and issues these commands to the target responding audio device and the set of collaborative audio devices. This enables collaborative control of various interactive functions, including voice response, audio playback, volume parameter adjustment, equalization parameter adjustment, playback progress synchronization, and cross-device relay playback. An execution log is generated during the execution of the control commands, providing a complete record of the system's execution process. By uniformly collecting user feedback data, voice feedback data, and execution logs to generate feedback samples, this invention can fully utilize data generated during actual interaction to evaluate the system's performance. These feedback samples are used to update the collaborative mode set and variational Bayesian hierarchical context inference parameters online, allowing the system to continuously adapt and optimize as the user interacts with the system.
[0064] This invention achieves in-depth analysis of the collaborative relationships among multiple audio devices by constructing a collaborative hypergraph model and combining collaborative pattern mining and hierarchical context inference methods. Furthermore, it enhances the system's adaptability to complex interactive environments through collaborative decision-making and online update mechanisms. Compared to existing technologies, this invention significantly improves the accuracy and intelligence of collaborative interaction among multiple audio devices, possessing advantages such as strong collaborative relationship modeling capabilities, high scene recognition accuracy, high device collaboration efficiency, and strong system adaptability. Attached Figure Description
[0065] The accompanying drawings are provided to further illustrate the invention and form part of the specification. They are used in conjunction with embodiments of the invention to explain the invention and do not constitute a limitation thereof. In the drawings:
[0066] Figure 1 This is an overall flowchart of a smart collaborative interaction method for audio based on big data processing proposed in this invention;
[0067] Figure 2 This is a schematic diagram illustrating the construction of collaborative hypergraph data for a smart collaborative interaction method for audio based on big data processing proposed in this invention.
[0068] Figure 3 This is a schematic diagram illustrating the construction of scene state results and user intent results for a smart collaborative interaction method for audio based on big data processing proposed in this invention. Detailed Implementation
[0069] The present invention will now be described in further detail with reference to the accompanying drawings. These drawings are simplified schematic diagrams, illustrating only the basic structure of the invention, and therefore only show the components relevant to the invention.
[0070] refer to Figures 1-3 A method for intelligent collaborative interaction of audio systems based on big data processing includes the following steps:
[0071] Collect and preprocess multi-source raw data in a multi-audio device network environment;
[0072] Construct collaborative hypergraph data based on preprocessed multi-source raw data;
[0073] Perform dynamic topology updates on the collaborative hypergraph data to obtain a dynamic graph sequence;
[0074] The process involves performing cooperative pattern mining on dynamic graph sequences. This includes incremental hypergraph frequent pattern mining on the dynamic graph sequences to obtain a cooperative pattern set, and performing variational Bayesian hierarchical context inference on the cooperative pattern set to obtain scene state results and user intent results.
[0075] A set of candidate response actions is generated based on the scene state results and user intent results. Constraint-based collaborative decision-making processing is then performed on the set of candidate response actions to obtain device selection results and collaborative interaction control strategies.
[0076] Based on the device selection results, the target of the control command is determined, and the control command is generated according to the collaborative interaction control strategy. The control command is sent to the target response audio device and the set of collaborative audio devices, and an execution log is generated.
[0077] Collect user feedback data, voice transmission data, and execution logs during the interaction process to generate feedback samples;
[0078] Online incremental updates are performed on the collaborative pattern set based on feedback samples, and online updates are also performed on the parameters of variational Bayesian hierarchical context inference.
[0079] In this embodiment, the multi-source raw data includes user voice data, environmental acoustic data, user historical interaction data, user location or near-field presence data, audio equipment operating status data, and network link status data between devices. Data preprocessing includes noise reduction filtering, missing data completion, anomaly removal, deduplication and merging, feature scale unification, and data quality labeling.
[0080] In this embodiment, the construction of collaborative hypergraph data specifically includes:
[0081] Based on the audio device information in the multi-audio device networking environment, determine the set of audio device nodes and map each audio device to an audio device node;
[0082] The specific process is as follows: automatically discover devices in the network to obtain audio device information, identify and filter devices in the network based on audio device information, extract device identification information belonging to audio devices, and uniquely identify devices with audio functions. All identified audio devices are aggregated according to device identification to form an audio device node set.
[0083] Based on the preprocessed multi-source raw data, node features of each audio device node are extracted. The node features include device operating status information, device communication status information, device spatial location information, and environmental element information of the device's environment. Node feature data is then generated based on the node features.
[0084] The extraction process is as follows: Based on the preprocessed multi-source raw data, the data corresponding to each audio device node is classified, analyzed, and feature extracted. Device operating status information, including current working status, volume status, and playback status, is extracted from the device operating data in the multi-source raw data. Device communication status information, including network connection status, signal strength, and communication delay information, is extracted from the device communication data. Device spatial location information, used to characterize the spatial relationship of audio device nodes in the network environment, is extracted from the environmental data. Environmental element information, including environmental noise level, environmental acoustic characteristics, and environmental space type information, is extracted from the environmental data. After completing the above information extraction, the device operating status information, device communication status information, device spatial location information, and environmental element information are associated and integrated according to the audio device nodes to generate node feature data for the corresponding audio device nodes.
[0085] The spatial adjacency relationship between the audio device nodes is determined based on the spatial location information of each audio device node, and the communication relationship between the audio device nodes is determined based on the communication status information between the audio devices. Node relationship data is generated based on the spatial adjacency relationship and the communication relationship.
[0086] The determination of spatial adjacency and communication relationships is as follows: The spatial distance between audio device nodes is calculated based on their spatial location information. Audio device nodes with a spatial distance less than a preset spatial adjacency threshold are identified as having a spatial adjacency relationship. The spatial distance is calculated using the spatial location information of each audio device node. The communication relationship between audio device nodes is determined based on their communication status information. When two audio devices have a stable communication connection and the communication signal strength is higher than a preset communication threshold, the corresponding audio device nodes are identified as having a communication relationship. After determining the spatial adjacency and communication relationships, the spatial adjacency and communication relationships between each audio device node are associated and integrated to generate node relationship data representing the connection relationships between audio device nodes.
[0087] For each interaction, obtain the set of audio device nodes involved in the interaction, the environmental elements involved in the interaction, and the triggered interaction events, and map the audio device nodes involved in the interaction, environmental elements, and interaction events together into the same hyperedge to generate a hyperedge set;
[0088] The mapping process involves using the start and end times of an interaction as boundaries. From the pre-processed multi-source raw data, the set of associated audio device nodes, the environmental elements involved in the interaction, and the triggered interaction events are obtained within the start and end times. A unique interaction identifier is assigned to each interaction. A hyperedge is generated using this unique interaction identifier as the hyperedge identifier. Each audio device node in the set of audio device nodes is added to the hyperedge's member node list. The environmental elements involved in the interaction and the triggered interaction events are bound and stored as attribute information of the hyperedge. This completes the mapping of the participating audio device nodes, environmental elements, and interaction events to the same hyperedge, and generates a hyperedge set based on different interaction identifiers.
[0089] Generate hyperedge representation data based on the membership relationship between audio device nodes and hyperedges;
[0090] The generation process is as follows: traverse the member information of each audio device node corresponding to each hyperedge one by one, and establish a hyperedge representation structure with the audio device node set as rows and the hyperedge set as columns; determine whether each audio device node belongs to the member node set of the corresponding hyperedge. When an audio device node participates in the interaction represented by the corresponding hyperedge, mark the audio device node with a participation mark value at the corresponding hyperedge position; when an audio device node does not participate in the interaction represented by the corresponding hyperedge, mark the audio device node with a non-participation mark value at the corresponding hyperedge position; after completing the marking of the membership relationship between all audio device nodes and hyperedges, obtain the hyperedge representation data used to characterize the association relationship between audio device nodes and hyperedges.
[0091] The hyperedge weight is calculated based on the number of audio device nodes, environmental elements and interactive events corresponding to each hyperedge, and hyperedge weight data is generated.
[0092] The calculation process is as follows: the number of audio device nodes, the number of environmental elements, and the number of interactive events are proportionalized to obtain the device participation ratio, the environmental element influence ratio, and the interactive event complexity ratio. Based on the preset weight coefficients, the device participation ratio, the environmental element influence ratio, and the interactive event complexity ratio are weighted and fused to obtain the weight value of the corresponding hyperedge. The weight values of each hyperedge are then summarized to generate hyperedge weight data of the degree of collaborative influence of each hyperedge.
[0093] The audio device node set, node feature data, node relationship data, hyperedge set, hyperedge representation data, and hyperedge weight data are integrated to generate collaborative hypergraph data.
[0094] In this embodiment, obtaining the dynamic graph sequence specifically includes:
[0095] The collaborative hypergraph data is divided into time windows, and a collaborative hypergraph data structure is generated for each time window. The collaborative hypergraph data in each time window includes a set of audio device nodes, a set of hyperedges, hyperedge representation data, and hyperedge weight data.
[0096] Within each time window, perform node weight update processing on the audio device node to generate the updated node weight;
[0097] The node weight update process is as follows: Within each time window, the node weight corresponding to the previous time window is obtained for each audio device node in the collaborative hypergraph data. The number of interactive events participated in by each audio device node within the time window and the corresponding communication status information are counted. The node participation degree of the current time window is calculated based on the ratio of the number of interactive events participated in by the audio device node to the total number of interactive events within the time window. The communication stability is determined based on the communication status information of the audio device node. The node weight of the previous time window is weighted and fused with the node participation degree and communication stability calculated in the current time window according to the preset update coefficient to obtain the updated node weight of the audio device node in the current time window. The updated node weight is then written into the weight attribute of the corresponding audio device node in the collaborative hypergraph data.
[0098] Within each time window, perform edge weight update processing on the hyperedge to generate the updated hyperedge weight;
[0099] The edge weight update process is as follows: For each hyperedge in the collaborative hypergraph data, obtain the hyperedge weight corresponding to the previous time window, and count the number of audio device nodes, environmental elements, and interaction events corresponding to the hyperedge in the current time window. Calculate the current weight value of the hyperedge in the current time window based on the number of audio device nodes, environmental elements, and interaction events. Then, weight and merge the hyperedge weight from the previous time window with the current weight value according to a preset update coefficient to obtain the updated hyperedge weight in the current time window. When the hyperedge is not triggered in the current time window, the hyperedge weight from the previous time window is attenuated according to a preset attenuation coefficient to obtain the updated hyperedge weight.
[0100] Node availability tags are generated based on the real-time operating status information of each audio device node;
[0101] The generation process is as follows: Within each time window, based on the real-time operating status information of the audio device nodes collected from the multi-source raw data, the connection status and response status of each audio device node are detected, and the network connection status, device operating status, and command response status of the corresponding audio device node are obtained; the network connection status is used to determine whether the audio device node is online, and the device operating status and command response status are used to determine whether the audio device node has responsiveness; when the audio device node is online and responds to control commands normally, the availability flag of the audio device node is set to available; when the audio device node is offline or online but unable to respond to control commands, the availability flag of the audio device node is set to unavailable, and the availability flag is written into the status attribute of the corresponding audio device node in the collaborative hypergraph data;
[0102] The updated node weights, updated hyperedge weights, and node availability markers are integrated with the collaborative hypergraph data within the corresponding time window to generate dynamic graph data corresponding to the time window. The dynamic graph data is then arranged in chronological order to form a dynamic graph sequence showing the changing collaborative relationships of audio equipment over time.
[0103] In this embodiment, obtaining the scene state result and the user intent result specifically includes:
[0104] Traverse each time window in the dynamic graph sequence in chronological order, and extract the corresponding data on the relationship between audio equipment, environmental elements and interactive events in each time window;
[0105] The extraction process is as follows: while traversing each time window in the dynamic graph sequence, the interaction process data recorded in the corresponding time window is read, and the information of the audio equipment involved in the interaction in that time window, the information of the environmental elements corresponding to the interaction, and the information of the triggered interaction events are extracted based on the interaction process data; then, the audio equipment, environmental elements, and interaction events are matched and associated according to the timestamp of the interaction event to determine the association between the audio equipment, environmental elements, and interaction events that appear together in the same interaction process, and the association is summarized and organized to generate association relationship data that describes the relationship between audio equipment, environmental elements, and interaction events within the time window;
[0106] Candidate collaboration patterns are constructed based on relational data. The candidate collaboration patterns consist of a combination of audio devices, environmental elements, and interaction events that appear together in the same interaction process.
[0107] Within each time window, determine whether each candidate collaborative mode appears. When the audio equipment, environmental elements and interactive events in the candidate collaborative mode appear simultaneously within the corresponding time window, mark the candidate collaborative mode as appearing in the time window; otherwise, mark it as not appearing.
[0108] After traversing each time window in the dynamic graph sequence, the occurrence status of each candidate cooperative pattern in each time window is counted, the number of time windows in which the candidate cooperative pattern is in the occurrence state is recorded, and the total number of time windows in the dynamic graph sequence is obtained. The number of time windows in which the candidate cooperative pattern is in the occurrence state is divided by the total number of time windows to obtain the support of the candidate cooperative pattern in the dynamic graph sequence.
[0109] Candidate collaboration patterns with support greater than or equal to a preset support threshold are identified as frequent collaboration patterns, and all frequent collaboration patterns are aggregated to generate a collaboration pattern set.
[0110] When a new time window is added to the dynamic graph sequence, the set of collaborative patterns is incrementally updated according to the occurrence status of each candidate collaborative pattern in the new time window, so as to update the support of each collaborative pattern.
[0111] The update process is as follows: When a new time window is added to the dynamic graph sequence, the correlation data between the audio equipment, environmental elements, and interactive events within the new time window is extracted, and the occurrence status of each collaborative mode in the collaborative mode set in the new time window is determined based on the correlation data; support update processing is performed on each collaborative mode, specifically: based on the original occurrence count, when the collaborative mode is in the occurrence state in the new time window, the occurrence count of the collaborative mode is increased by one, otherwise the original occurrence count remains unchanged; the total number of time windows is updated to the total number of time windows after the addition of the new time window; the support of each collaborative mode is recalculated based on the updated occurrence count and the updated total number of time windows, and the updated support is written to the corresponding collaborative mode, completing the incremental update of the collaborative mode set;
[0112] Convert the set of cooperative patterns into pattern observation data showing the occurrence of each cooperative pattern in the current time window;
[0113] The conversion process is as follows: Within the current time window, the occurrence status records of each cooperative mode in the cooperative mode set corresponding to the time window are read, and the occurrence status is mapped and encoded according to the order of each cooperative mode in the cooperative mode set. The occurrence status corresponding to each cooperative mode is converted into a mode observation value. Specifically, when the cooperative mode is in the occurrence state in the current time window, the corresponding mode observation value is marked as the occurrence value; when the cooperative mode is in the non-occurrence state in the current time window, the corresponding mode observation value is marked as the non-occurrence value. The mode observation values are combined according to the arrangement order of each cooperative mode in the cooperative mode set to generate mode observation data that represents the occurrence status of each cooperative mode in the current time window.
[0114] Variational Bayesian hierarchical context inference is performed based on pattern observation data to obtain scene state results and user intent results. Scene state results include scene category and scene confidence, while user intent results include intent category and intent confidence.
[0115] The variational Bayesian hierarchical context inference process specifically involves: constructing a variational Bayesian hierarchical context inference structure based on pattern observation data obtained within the current time window; setting the scene state as the upper-level latent variable, the user intent as the lower-level latent variable, and the pattern observation data as the observed variable; the scene state having prior constraints on the user intent, and the user intent having conditional constraints on the generation of the pattern observation data; setting parameterizable variational posterior distributions for both the scene state and the user intent, and using the parameters of the variational posterior distributions as parameters to be optimized; and further calculating the variational posterior distribution parameters of the scene state and the user intent based on the pattern observation data. The new parameters update include the category probability update of the scene state and the category probability update of the user intent. The parameter update is iteratively updated according to a preset number of iterations. After the iteration update is completed, the category probability distribution of the scene state and the category probability distribution of the user intent are output. The category with the largest probability value in the category probability distribution of the scene state is determined as the scene category and the maximum probability value is determined as the scene confidence. The category with the largest probability value in the category probability distribution of the user intent is determined as the intent category and the maximum probability value is determined as the intent confidence. The scene state result and the user intent result are obtained.
[0116] In this embodiment, the determination of the device selection result and the collaborative interaction control strategy specifically includes:
[0117] A set of candidate response actions is generated based on the scene category in the scene state result and the intent category in the user intent result. Each candidate response action in the set of candidate response actions includes an action type and action parameters. The action type includes at least one of the following: voice response, audio playback, volume parameter adjustment, equalization parameter adjustment, playback progress synchronization, cross-device relay playback, and multi-device linkage control.
[0118] The generation process is as follows: After obtaining the scene category in the scene state result and the intent category in the user intent result, an action mapping relationship table between the scene category and the user intent category is established. The action mapping relationship table is used to record the candidate response action types corresponding to different combinations of scene categories and different user intent categories. Based on the current scene category and user intent category, the corresponding response action type is retrieved from the action mapping relationship table, and the corresponding action parameters are generated in combination with the function types supported by the audio devices in the multi-audio device networking environment, forming multiple candidate response actions. The candidate response actions are summarized according to the preset data structure to obtain a set of candidate response actions.
[0119] Based on the audio equipment operation status data, audio equipment communication status data and audio equipment spatial location information in the preprocessed multi-source raw data, the range of audio equipment that can participate in collaborative decision-making is determined, forming a collaborative audio equipment set, and audio equipment that is online and responsive is identified as the target audio equipment.
[0120] Construct collaborative decision-making constraints based on constraint optimization. The collaborative decision-making constraints include the uniqueness constraint of the target response audio device, the membership constraint of the collaborative audio device set, and the mutual exclusion constraint between the target response audio device and the collaborative audio device set. The uniqueness constraint of the target response audio device is used to limit the selection of only one audio device from the optional audio devices as the target response audio device. The membership constraint of the collaborative audio device set is used to limit the selection of only one audio device from the optional audio devices to be added to the collaborative audio device set. The mutual exclusion constraint between the target response audio device and the collaborative audio device set is used to limit the target response audio device from not being added to the collaborative audio device set.
[0121] Based on the scene confidence in the scene state results and the intent confidence in the user intent results, action evaluation processing is performed on each candidate response action in the candidate response action set to obtain the action evaluation result of each candidate response action.
[0122] The action evaluation process is as follows: First, the scene confidence score from the scene status result and the intent confidence score from the user intent result are read. Then, each candidate response action in the candidate response action set is traversed. The matching degree of each candidate response action is determined based on its matching relationship with the current scene category and user intent category. The matching degree is then weighted by combining the scene confidence score and the intent confidence score to obtain a basic evaluation value for the candidate response action. After obtaining the basic evaluation value, the feasibility of executing the candidate response action in the current multi-speaker network environment is further evaluated by combining the speaker operating status data, speaker communication status data, and speaker spatial location information. Feasibility includes whether the speaker has the functional capability to execute the corresponding action type, whether the communication status between the speaker devices meets the requirements for multi-device collaborative execution, and whether the spatial distribution between the speaker devices meets the conditions for collaborative playback or cross-device relay playback. The basic evaluation value is adjusted based on the feasibility evaluation results to obtain the final evaluation value for the candidate response action, which is then used as the action evaluation result for that candidate response action.
[0123] Based on the action evaluation results and collaborative decision constraints, a constraint optimization-based collaborative decision-making solution is executed, outputting device selection results and collaborative interaction control strategies. The device selection results include the target response audio device and the collaborative audio device set. The collaborative interaction control strategies include master-slave role allocation rules, task splitting rules, execution timing, cross-device synchronization parameters, and conflict resolution rules.
[0124] The output process is as follows: First, the constructed collaborative decision constraints are read, and the action evaluation results and collaborative decision constraints are used together as input data for collaborative decision-making. Second, using the action evaluation results as the optimization objective, and under the premise of satisfying the collaborative decision constraints, a combination selection process is performed on the audio devices in the multi-audio device network environment to determine the most suitable combination of audio devices to execute the current candidate response action. The collaborative decision constraints include allowing selection only from online and responsive audio devices, allowing only one audio device as the target response audio device, and the target response audio device not being added to the collaborative audio device set. After completing the device combination selection, the audio device selected to directly execute the response action is determined as the target response audio device, and the devices participating in the collaborative execution of related tasks are... The audio equipment is identified as a set of collaborative audio equipment, generating a device selection result. Based on the device selection result and the selected candidate response actions, the execution tasks corresponding to the candidate response actions are split, and the target response audio equipment is determined as the master device according to the master-slave role allocation rule, and the set of collaborative audio equipment is determined as the slave device set. Based on the task splitting rule, the candidate response actions are split into master device execution tasks and slave device set execution tasks. Furthermore, based on the execution timing, cross-device synchronization parameters, and conflict resolution rules, the order of master device execution tasks and slave device set execution tasks, the synchronization parameters for playback progress synchronization and cross-device relay playback, and the conflict resolution handling method when control conflicts occur in multi-device linkage control are determined, generating a collaborative interactive control strategy for controlling multiple audio equipment to collaboratively execute interactive tasks.
[0125] In this embodiment, the generation of the execution log specifically includes:
[0126] Based on the equipment selection results, the target of the control command is determined, the target response audio device is determined as the master device, and each audio device in the set of cooperative audio devices is determined as the slave device;
[0127] Generate a set of control instructions based on a collaborative interactive control strategy;
[0128] The generation process is as follows: Read the master-slave role allocation rules, task splitting rules, execution sequence, and cross-device synchronization parameters from the collaborative interaction control strategy; extract the master device execution tasks and slave device set execution tasks according to the task splitting rules; generate corresponding control instruction types based on the task type of each execution task, and determine the control instruction parameters based on the task parameters; associate each control instruction with a target device identifier and instruction trigger time; generate voice response control instructions when the execution task is a voice response task, audio playback control instructions when the execution task is an audio playback task, volume parameter adjustment control instructions when the execution task is a volume parameter adjustment task, equalizer parameter adjustment control instructions when the execution task is an equalizer parameter adjustment task, playback progress synchronization control instructions when the execution task is a playback progress synchronization task, cross-device relay playback control instructions when the execution task is a cross-device relay playback task, and multi-device linkage control instructions when the execution task is a multi-device linkage control task; finally, organize the control instructions according to the execution sequence, configure the corresponding instruction parameters, target device identifier, and instruction trigger time for each control instruction, and generate a control instruction set;
[0129] The control instruction set is sorted according to the execution sequence in the collaborative interaction control strategy to generate a control instruction execution sequence;
[0130] Based on the cross-device synchronization parameters in the collaborative interaction control strategy, a unified playback progress reference time is configured for the control commands involving audio playback in the control command execution sequence, and a corresponding playback progress offset is configured for each audio device based on the unified playback progress reference time. The playback progress offset is the time difference between the current playback progress time of the corresponding audio device and the unified playback progress reference time.
[0131] The control command execution sequence is sent to each audio device in the target response audio device and the set of cooperative audio devices respectively;
[0132] During the execution of control commands, the execution status information of each audio device is collected, and an execution log is generated;
[0133] The generation process is as follows: After the control command is issued to the target response audio device and the set of collaborative audio devices, the status of each audio device executing the control command is collected, and the execution status information fed back by each audio device is obtained in real time. The execution status information includes the control command reception status, the control command execution start time, the control command execution end time, and the control command execution result. According to the control command identifier in the control command set and the corresponding target device identifier, the execution status information is associated and organized, and the control command issuance time, control command execution time, and control command execution result are recorded. The execution status information corresponding to each control command is summarized in chronological order, and the control command identifier, target device identifier, control command issuance time, control command execution time, and control command execution result are recorded in a structured manner to generate an execution log.
[0134] In this embodiment, the generation of feedback samples specifically includes:
[0135] After an interaction is completed, user feedback data, voice transmission data and execution logs corresponding to the interaction are collected, and an interaction identifier is generated for the interaction.
[0136] The generation process is as follows: After an interaction is completed, the timestamp information corresponding to the interaction and the target response audio device identifier are obtained. Combined with the execution log identifier generated during the interaction, the timestamp information, the target response audio device identifier and the execution log identifier are combined to generate a unique identifier string. The unique identifier string is then subjected to preset encoding processing to obtain the interaction identifier that uniquely corresponds to the interaction.
[0137] User feedback data is organized based on interactive identifiers and user feedback scores are generated.
[0138] The generation process is as follows: After generating the interaction identifier, user feedback data records corresponding to the interaction identifier are extracted from the user feedback data. These user feedback data records are then processed for deduplication, missing field completion, and data format standardization. Based on the processed user feedback data, the user feedback result is determined, and the result is converted into a user feedback score according to preset scoring rules. These rules include: when the user feedback data is clearly positive, the user feedback score is set to a preset high score value; when the user feedback data is clearly negative, the user feedback score is set to a preset low score value; when the user feedback data is neutral, the user feedback score is set to a preset intermediate score value; and when the user feedback data is a continuous score, the continuous score is linearly mapped according to a preset scoring range to obtain the corresponding user feedback score.
[0139] The voice transmission data is organized and voice transmission features are extracted based on the interactive identifier. The voice transmission features include voice transmission validity markers, voice transmission completeness, and voice transmission quality. A voice transmission score is generated based on the voice transmission validity markers, voice transmission completeness, and voice transmission quality.
[0140] The generation process is as follows: Based on the interaction identifier, extract the voice feedback data record corresponding to the interaction identifier from the voice feedback data, and perform deduplication, time-sequence sorting, and data format unification processing on the voice feedback data record; then, perform feature extraction on the sorted voice feedback data to obtain the voice feedback validity marker, voice feedback completeness, and voice feedback quality. When valid voice content exists in the voice feedback data, the voice feedback validity marker is set to a valid state; when no valid voice content is detected, the voice feedback validity marker is set to an invalid state; determine the voice feedback completeness based on the matching degree between the actual duration of the voice feedback data and the preset voice duration, and determine the voice feedback quality based on the voice clarity, background noise level, and voice signal stability in the voice feedback data; after obtaining the voice feedback validity marker, voice feedback completeness, and voice feedback quality, perform a weighted calculation on the voice feedback completeness and voice feedback quality according to the preset scoring rules, and correct the calculation result based on the voice feedback validity marker to generate a voice feedback score.
[0141] The execution logs are organized and their features are extracted based on the interaction identifiers. The execution log features include the control command execution success rate, control command execution latency, and the number of control command conflicts. An execution log score is generated based on the control command execution success rate, control command execution latency, and the number of control command conflicts. The control command execution success rate is determined by the ratio between the number of successfully executed control commands and the total number of control commands. The control command execution latency is determined by the time difference between the control command execution time and the control command issuance time. The number of control command conflicts is determined by the number of conflict resolution processes recorded in the execution log.
[0142] The user feedback score, voice transmission score, and execution log score are merged to generate a comprehensive feedback score.
[0143] The fusion processing includes weighting and summing the user feedback score, voice transmission score, and execution log score according to preset weights to obtain a comprehensive feedback score;
[0144] Based on the interaction identifier, the comprehensive feedback score is associated and encapsulated with user feedback data, voice feedback data, and execution logs to generate feedback samples.
[0145] In this embodiment, feedback samples are associated and matched with each collaborative mode in the collaborative mode set. Based on the user feedback data, voice feedback data, and execution logs recorded in the feedback samples, incremental statistics are performed on the collaborative modes corresponding to the interactions. The newly added interaction data is incorporated into the mode statistics information in the collaborative mode set, completing the online incremental update of the collaborative mode set. The feedback samples are input into the variational Bayesian hierarchical context inference process, and the parameters in the variational Bayesian hierarchical context inference are updated online. By combining the user feedback data, voice feedback data, and execution logs in the newly added feedback samples, the original parameters are iteratively adjusted so that the updated parameters reflect the changing trend of the scene state result and user intent result corresponding to the latest interaction data, thus completing the online update of the variational Bayesian hierarchical context inference parameters.
[0146] Example 1: Testing was conducted in a home smart speaker network environment. In this test environment, three smart speaker devices were deployed: one in the living room, one in the bedroom, and one in the kitchen. All speaker devices were connected via the home wireless network and were able to collect real-time data on device operation status, communication status, spatial location, and environmental factors. In this environment, users could issue voice commands to any speaker device, such as playing music, adjusting volume, checking the weather, or controlling other smart devices.
[0147] In practical applications, when a user issues a voice command, each audio device first collects the user's voice data and device operating status data. The collected multi-source raw data undergoes unified preprocessing, including timestamp alignment, abnormal data filtering, and data format standardization. After data preprocessing, the system constructs a collaborative hypergraph based on the spatial adjacency and communication relationships between the audio devices. The system maps the participating audio devices, environmental elements, and interaction events to nodes and hyperedges in the hypergraph, thus forming a collaborative hypergraph structure that reflects the collaborative relationships between multiple devices.
[0148] Subsequently, the system dynamically updates the constructed collaborative hypergraph data in its topology and generates a dynamic graph sequence that reflects changes in device status. Within this dynamic graph sequence, the system uses an incremental hypergraph frequent pattern mining method to identify collaborative patterns formed between multiple devices in different interaction scenarios, such as "living room device priority response mode" and "cross-room playback relay mode." Based on these collaborative patterns, the system further utilizes a variational Bayesian hierarchical context inference method to analyze the current interaction scenario, thereby obtaining scenario state results and user intent results. For example, if a user moves from the living room to the kitchen and continues playing music, the system can identify the current scenario as a "cross-room playback scenario" and infer that the user's intent is "to continue playing the current music."
[0149] After identifying the user's intent and the scene state, the system generates a set of candidate response actions and determines the optimal device selection scheme through a constraint optimization-based collaborative decision-making method. In the aforementioned cross-room playback scenario, when the user enters the kitchen from the living room, the system automatically selects the kitchen audio equipment as the target response audio equipment and the living room audio equipment as the collaborating equipment, thus achieving a smooth handover of music playback. Subsequently, the system generates control commands based on the collaborative interaction control strategy and issues corresponding control commands to the target response audio equipment and the set of collaborating audio equipment, thereby achieving automatic switching of music playback devices and synchronization of playback progress.
[0150] During system operation, user feedback data, voice transmission data, and execution logs are continuously collected. For example, after the system successfully completes cross-device relay playback, the user may provide feedback via voice or the user interface, and the system will record information such as the execution time of control commands, execution results, and playback synchronization status. All of this data is uniformly organized and used to generate feedback samples, which are then used to update the collaborative pattern set and variational Bayesian hierarchical context inference parameters online. This allows the system to continuously optimize its collaborative decision-making strategy as usage increases.
[0151] To verify the effectiveness of the method of this invention in collaborative interaction among multiple audio devices, a comparative test was conducted between the method of this invention and the traditional single-device response method under the same test environment. The test lasted for seven days, recording more than 1200 user interaction data, and statistically analyzing indicators such as device response time, playback synchronization error, and user satisfaction. The test results are shown in Table 1.
[0152] Table 1 Comparison of Experimental Data on the Collaborative Interaction Performance of Multiple Audio Devices
[0153] index Traditional control methods Method of the present invention Average response time (milliseconds) 820 410 Cross-device playback synchronization error (milliseconds) 210 65 Controlling the occurrence rate of conflicts 7.8% 1.6% Scene recognition accuracy 82.4% 94.7% User satisfaction (out of 100) 78 92
[0154] As shown in Table 1, under the same test conditions, the method of this invention significantly outperforms the traditional control method in several key indicators. Firstly, regarding average response time, because this invention can identify the most suitable response device in advance through collaborative pattern mining and collaborative decision-making methods, the average system response time is reduced from 820 milliseconds in the traditional method to 410 milliseconds, improving response efficiency by approximately 50%. Secondly, regarding cross-device playback synchronization, this invention, through the playback progress synchronization mechanism in the collaborative interactive control strategy, significantly reduces playback errors between different audio devices, lowering the synchronization error from 210 milliseconds to 65 milliseconds, thus providing users with a more continuous audio experience when moving between rooms.
[0155] Regarding conflict control, the invention incorporates a conflict resolution mechanism in the collaborative decision-making process, effectively preventing multiple devices from simultaneously executing conflicting control commands, thus reducing the conflict rate from 7.8% to 1.6%. Furthermore, in terms of scene recognition accuracy, by utilizing collaborative hypergraph modeling and variational Bayesian hierarchical context inference, the system can more accurately identify user interaction scenarios, increasing the scene recognition accuracy from 82.4% to 94.7%. In addition, in user satisfaction surveys, most users found the system more intelligent and smoother in cross-device playback and voice response, raising the overall satisfaction score from 78 to 92.
[0156] By applying the intelligent collaborative interaction method for audio devices based on big data processing proposed in this invention in a multi-audio device network environment, the system can more accurately identify user interaction scenarios and user intentions, and realize intelligent collaborative control between multiple devices through collaborative decision-making, thereby significantly improving the system's response efficiency, device collaboration capabilities, and user experience, verifying the effectiveness and practical value of the method in practical applications.
[0157] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A method for intelligent collaborative interaction of audio systems based on big data processing, characterized in that: Includes the following steps: Collect and preprocess multi-source raw data in a multi-audio device network environment; Construct collaborative hypergraph data based on preprocessed multi-source raw data; Perform dynamic topology updates on the collaborative hypergraph data to obtain a dynamic graph sequence; Perform cooperative pattern mining processing on the dynamic graph sequence. The cooperative pattern mining processing includes incremental hypergraph frequent pattern mining on the dynamic graph sequence to obtain a cooperative pattern set, and performing variational Bayesian hierarchical context inference on the cooperative pattern set to obtain scene state results and user intent results. A set of candidate response actions is generated based on the scene state results and user intent results, and a collaborative decision-making process based on constraint optimization is performed on the set of candidate response actions to obtain device selection results and collaborative interaction control strategies. Based on the device selection results, the target of the control command is determined, and the control command is generated according to the collaborative interaction control strategy. The control command is then sent to the target response audio device and the set of collaborative audio devices, and an execution log is generated. Collect user feedback data, voice transmission data, and execution logs during the interaction process to generate feedback samples; Online incremental updates are performed on the collaborative pattern set based on feedback samples, and online updates are also performed on the parameters of variational Bayesian hierarchical context inference.
2. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The multi-source raw data includes user voice data, environmental acoustic data, user historical interaction data, user location or near-field presence data, audio equipment operating status data, and network link status data between devices. The data preprocessing includes noise reduction filtering, missing data completion, anomaly removal, deduplication and merging, feature scale unification, and data quality labeling.
3. The intelligent collaborative interaction method for audio based on big data processing according to claim 1, characterized in that, The construction of the collaborative hypergraph data specifically includes: Based on the audio device information in the multi-audio device networking environment, determine the set of audio device nodes and map each audio device to an audio device node; Based on the preprocessed multi-source raw data, the node features of each audio device node are extracted, and node feature data is generated based on the node features. The spatial adjacency relationship between the audio device nodes is determined based on the spatial location information of each audio device node, and the communication relationship between the audio device nodes is determined based on the communication status information between the audio devices. Node relationship data is then generated based on the spatial adjacency relationship and the communication relationship. For each interaction, obtain the set of audio device nodes involved in the interaction, the environmental elements involved in the interaction, and the triggered interaction events, and map the audio device nodes involved in the interaction, environmental elements, and interaction events together into the same hyperedge to generate a hyperedge set; Generate hyperedge representation data based on the membership relationship between audio device nodes and hyperedges; The hyperedge weight is calculated based on the number of audio device nodes, environmental elements and interactive events corresponding to each hyperedge, and hyperedge weight data is generated. The audio device node set, node feature data, node relationship data, hyperedge set, hyperedge representation data, and hyperedge weight data are integrated to generate collaborative hypergraph data.
4. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The process of obtaining the dynamic graph sequence specifically includes: The collaborative hypergraph data is divided into time windows, and a collaborative hypergraph data structure is generated for each time window. The collaborative hypergraph data in each time window includes a set of audio device nodes, a set of hyperedges, hyperedge representation data, and hyperedge weight data. Within each time window, perform node weight update processing on the audio device node to generate the updated node weight; Within each time window, perform edge weight update processing on the hyperedge to generate the updated hyperedge weight; Node availability tags are generated based on the real-time operating status information of each audio device node; The updated node weights, updated hyperedge weights, and node availability markers are integrated with the collaborative hypergraph data within the corresponding time window to generate dynamic graph data corresponding to the time window. The dynamic graph data is then arranged in chronological order to form a dynamic graph sequence showing the changing collaborative relationships of audio equipment over time.
5. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The specific methods for obtaining the scenario state results and user intent results include: Traverse each time window in the dynamic graph sequence in chronological order, and extract the correlation data between the corresponding audio equipment, environmental elements and interactive events in each time window; Construct candidate collaboration patterns based on relational data; Within each time window, determine whether each candidate collaborative mode appears. When the audio equipment, environmental elements and interactive events in the candidate collaborative mode appear simultaneously within the corresponding time window, mark the candidate collaborative mode as appearing in the time window; otherwise, mark it as not appearing. After traversing each time window in the dynamic graph sequence, the occurrence status of each candidate cooperative pattern in each time window is counted, the number of time windows in which the candidate cooperative pattern is in the occurrence state is recorded, and the total number of time windows in the dynamic graph sequence is obtained. The number of time windows in which the candidate cooperative pattern is in the occurrence state is divided by the total number of time windows to obtain the support of the candidate cooperative pattern in the dynamic graph sequence. Candidate collaboration patterns with support greater than or equal to a preset support threshold are identified as frequent collaboration patterns, and all frequent collaboration patterns are aggregated to generate a collaboration pattern set. When a new time window is added to the dynamic graph sequence, the set of collaborative patterns is incrementally updated according to the occurrence status of each candidate collaborative pattern in the new time window, so as to update the support of each collaborative pattern. Convert the set of cooperative patterns into pattern observation data showing the occurrence of each cooperative pattern in the current time window; Variational Bayesian hierarchical context inference is performed based on pattern observation data to obtain scene state results and user intent results.
6. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The specific methods for obtaining the device selection results and the collaborative interaction control strategy include: Generate a set of candidate response actions based on the scene category in the scene status results and the intent category in the user intent results; Based on the audio equipment operation status data, audio equipment communication status data and audio equipment spatial location information in the preprocessed multi-source raw data, the range of audio equipment that can participate in collaborative decision-making is determined, forming a collaborative audio equipment set, and audio equipment that is online and responsive is identified as the target audio equipment. Construct collaborative decision-making constraints based on constraint optimization, wherein the collaborative decision-making constraints include the uniqueness constraint of the target response audio device, the membership constraint of the collaborative audio device set, and the mutual exclusion constraint between the target response audio device and the collaborative audio device set; Based on the scene confidence in the scene state results and the intent confidence in the user intent results, action evaluation processing is performed on each candidate response action in the candidate response action set to obtain the action evaluation result of each candidate response action. Based on the action evaluation results and collaborative decision constraints, a constraint optimization-based collaborative decision-making solution is executed, outputting the device selection results and collaborative interactive control strategy.
7. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The generation of the execution log specifically includes: Based on the equipment selection results, the target of the control command is determined, the target response audio device is determined as the master device, and each audio device in the set of cooperative audio devices is determined as the slave device; Generate a set of control instructions based on a collaborative interactive control strategy; The control instruction set is sorted according to the execution sequence in the collaborative interaction control strategy to generate a control instruction execution sequence; Based on the cross-device synchronization parameters in the collaborative interaction control strategy, a unified playback progress reference time is configured for the control commands involving audio playback in the control command execution sequence, and a corresponding playback progress offset is configured for each audio device based on the unified playback progress reference time. The control command execution sequence is sent to each audio device in the target response audio device and the set of cooperative audio devices respectively; During the execution of control commands, the execution status information of each audio device is collected, and an execution log is generated.
8. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The generation of the feedback sample specifically includes: After an interaction is completed, user feedback data, voice transmission data and execution logs corresponding to the interaction are collected, and an interaction identifier is generated for the interaction. User feedback data is organized based on interactive identifiers and user feedback scores are generated. The voice transmission data is organized and voice transmission features are extracted based on the interaction identifier. The voice transmission features include voice transmission validity markers, voice transmission completeness, and voice transmission quality. A voice transmission score is generated based on the voice transmission validity markers, voice transmission completeness, and voice transmission quality. The execution logs are organized and their features are extracted based on the interaction identifiers. The execution log features include the control instruction execution success rate, control instruction execution latency, and control instruction conflict count. An execution log score is generated based on the control instruction execution success rate, control instruction execution latency, and control instruction conflict count. The user feedback score, voice transmission score, and execution log score are merged to generate a comprehensive feedback score. Based on the interaction identifier, the comprehensive feedback score is associated and encapsulated with user feedback data, voice feedback data, and execution logs to generate feedback samples.
9. The audio intelligent collaborative interaction method based on big data processing according to claim 1, characterized in that, The feedback samples are associated and matched with each collaborative mode in the collaborative mode set. Based on the user feedback data, voice feedback data, and execution logs recorded in the feedback samples, incremental statistics are performed on the collaborative modes corresponding to the interactions. The newly added interaction data is incorporated into the mode statistics information in the collaborative mode set, completing the online incremental update of the collaborative mode set. The feedback samples are input into the variational Bayesian hierarchical context inference process, and the parameters in the variational Bayesian hierarchical context inference are updated online. By combining the user feedback data, voice feedback data, and execution logs in the newly added feedback samples, the original parameters are iteratively adjusted so that the updated parameters reflect the changing trend of the scene state results and user intent results corresponding to the latest interaction data, thus completing the online update of the variational Bayesian hierarchical context inference parameters.