A multi-modal human-computer interaction device and method for traditional color light source
By automatically identifying the load type of the lanterns and driving dynamic light effects through a modular human-computer interaction device, the problem of traditional lanterns lacking interactive capabilities is solved, realizing convenient and low-cost multimodal interaction and enhancing the fun and artistic expression of the lanterns.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SICHUAN UNIVERSITY OF SCIENCE AND ENGINEERING
- Filing Date
- 2026-04-30
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional colored lanterns lack human-computer interaction capabilities. Existing improvement solutions are difficult to modify, costly, damage to structure and appearance, and have poor versatility, making it difficult to achieve flexible and convenient interactive functions.
Through a modular human-computer interaction device, the system automatically identifies the load type of colored light sources, collects interactive input signals by combining a multimodal sensing module, analyzes user intentions and generates light control commands to drive the colored light sources to present dynamic light effects, and is compatible with ordinary and smart light strings.
It achieves multimodal human-computer interaction for traditional colored lanterns, with strong compatibility, convenient installation, low cost, and the ability to respond to external environment and human actions, enhancing fun and artistic expression.
Smart Images

Figure CN122269540A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of multimodal interaction device technology, and in particular to a multimodal human-computer interaction device and method for traditional colored light sources. Background Technology
[0002] Traditional colored lights, such as festival string lights, lanterns, and decorative lights, mostly have fixed light emission patterns or simple cyclical changes, lacking human-computer interaction capabilities. They cannot change their light emission status in real time according to external triggers or user operations, limiting their application in interactive displays, scene creation, and artistic creation. This device aims to add flexible and convenient interactive functions to traditional colored lights, enabling them to respond to human movements, sounds, touch, or other signals, and achieve dynamic interaction of light modes, brightness, color, and rhythm, thereby enhancing their fun, artistic expression, and scene adaptability. The light sources used in traditional colored lights are generally of two types: ordinary light strings that are not addressable and intelligent light strings that can be independently addressed.
[0003] In existing technologies, traditional colored lights typically use pre-programmed controllers to achieve simple flashing and gradient effects, with limited interaction methods, such as only switching on and off or fixed modes, lacking the ability to perceive and respond to external environments or human actions. Some improved solutions achieve interaction by connecting complex external sensors or custom controllers, but these have problems such as high modification difficulty, high cost, damage to the original structure and appearance of the colored lights, and poor versatility. For example, configuring a separate sensor and control unit for each string of colored lights results in cumbersome wiring and difficulty in compatibility with diverse colored light styles.
[0004] To address the above shortcomings, the inventors propose a modular, easy-to-install, and highly compatible multimodal human-computer interaction device and method that can conveniently add various interactive functions to traditional colored lanterns without significantly altering their original structure. The device should be able to sense common interactive signals and output corresponding control signals to drive changes in the state of the colored lantern light source, while maintaining low cost and ease of integration. Summary of the Invention
[0005] The purpose of this invention is to provide a multimodal human-computer interaction device and method for traditional colored light sources, so as to solve the problems mentioned in the background art; the specific technical solution is as follows:
[0006] The first objective of this invention is to provide a multimodal human-computer interaction method for traditional colored light sources, comprising the following steps:
[0007] S1. Device identification steps: By applying a detection signal to the connected colored light source and detecting its electrical response characteristics, the load type of the colored light source is automatically identified. The load type includes two mutually exclusive categories: non-addressable ordinary light strings and independently addressable smart light strings.
[0008] S2. Calibration Steps: When the colored light source is identified as an independently addressable smart light string, each independent light-emitting unit in the smart light string is controlled to emit an identification signal in sequence. A video stream containing the light-emitting units is acquired through the camera of the mobile terminal. The identification of each light-emitting unit is decoded from the video stream. At the same time, the three-dimensional position coordinates of the light-emitting unit in physical space are obtained using a visual inertial odometry system. A spatial mapping relationship between the identification of the light-emitting unit and its three-dimensional position coordinates is established. When the colored light source is identified as an unaddressable ordinary light string, the calibration steps are skipped.
[0009] S3. Interactive sensing step: Collect interactive input signals from the user through a multimodal sensing module. The interactive input signals include at least one of audio signals, tactile signals, and posture signals.
[0010] S4. Intent parsing step: Perform feature extraction and classification on the interactive input signal to determine the type of user interaction intent;
[0011] S5. Interaction semantic mapping step: Map the user interaction intent type to a unified interaction semantic description, wherein the interaction semantic description is independent of the load type;
[0012] S6. Feedback Generation Step: Based on the interaction semantic description, combined with the load type and the spatial mapping relationship, a lighting control instruction sequence is generated; wherein, when the load is an independently addressable smart light string, a first lighting control instruction sequence with spatial directionality is generated; when the load is an unaddressable ordinary light string, a second lighting control instruction sequence that drives the entire string of lights is generated; the first lighting control instruction sequence and the second lighting control instruction sequence carry the same interaction semantics, but the control granularity is different;
[0013] S7. Feedback execution step: Apply the lighting control command sequence to the connected colored light source through the driver interface, and drive the colored light source to present dynamic lighting effects corresponding to the interactive semantics.
[0014] Preferably, the device identification step is as follows:
[0015] By applying pairs of detection pulses containing both positive and negative polarities to the connected colored light sources and detecting the symmetry characteristics of the positive and negative current responses, as well as the capacitive and resistive load characteristics, the electrical characteristics of the colored light sources can be determined, thereby accurately distinguishing between non-addressable ordinary light strings and independently addressable intelligent light strings; wherein:
[0016] When the forward and reverse current responses are asymmetrical and exhibit capacitive charging characteristics, it is determined to be a low-voltage LED string.
[0017] When the forward and reverse current responses are symmetrical and exhibit resistive characteristics, it is determined to be a tungsten filament bulb string or a high-voltage LED string.
[0018] When the load characteristics of the power supply are unclear, a query command is sent through the digital protocol communication port and the response signal is detected to confirm whether it is an independently addressable smart light string.
[0019] Preferably, the drive interface includes three parallel drive channels: a relay switch circuit, a PWM voltage regulation circuit, and a digital protocol communication circuit. When the load is a non-addressable ordinary light string, the relay switch circuit or the PWM voltage regulation circuit is automatically selected according to the electrical characteristics of the traditional colored light. When the load is an independently addressable intelligent light string, the digital protocol communication circuit is selected, and the working power is provided through the relay switch circuit or the PWM voltage regulation circuit.
[0020] Preferably, before the calibration step begins, the mobile terminal sends its camera frame rate parameters to the control device, and the control device dynamically adjusts the Gray code bit period time to an integer multiple of the frame interval accordingly to achieve frame rate synchronization.
[0021] In the calibration step, the identification signal is a binary Gray code sequence; the mobile terminal decodes the Gray code sequence to determine the identification of the light-emitting unit by analyzing the temporal brightness change corresponding to the imaging area of the light-emitting unit in the video stream; the mobile terminal also performs visual inertial odometry calculation by using the built-in inertial measurement unit data and the changes in image feature points in the video stream to obtain the three-dimensional position coordinates of the light-emitting unit in the physical space.
[0022] Preferably, in the interactive semantic mapping step, at least the following interactive semantic types are defined: welcoming semantics, wishing semantics, relaxation semantics, beat synchronization semantics, and environmental resonance semantics;
[0023] Each type of interactive semantics corresponds to a set of spatial effect templates and a set of complete effect templates;
[0024] When the load is an independently addressable smart light string, the spatial effect template is invoked in conjunction with the spatial mapping relationship to generate a spatially directional lighting control instruction sequence;
[0025] When the load is a non-addressable ordinary light string, the entire string effect template is called to generate a complete sequence of lighting control instructions.
[0026] Preferably, in the intent parsing step, an audio event classification model is run to classify the ambient sound signals in order to identify the type of ambient atmosphere in which the user is located;
[0027] In the interactive semantic mapping step, when the classification results of multiple consecutive time windows are all of the same environmental atmosphere type and the confidence level is higher than a preset threshold, the corresponding environmental resonance semantics are triggered.
[0028] In the feedback generation step, a dynamic light effect instruction sequence matching the environmental atmosphere type is generated based on the environmental resonance semantics and the spatial mapping relationship.
[0029] Preferably, the mobile terminal is further configured to acquire color information of real-world objects through its camera; in the feedback generation step, when the load is an independently addressable smart light string, a sequence of light effect instructions is generated to drive the light-emitting unit to present a hue, saturation, or brightness distribution that has a spatial mapping relationship with the color information, based on the color information and the spatial mapping relationship, to achieve a color gradient effect from bottom to top or from near to far; when the load is an unaddressable ordinary light string, the entire string of colored lights is driven to present a main color tone corresponding to the color information.
[0030] Preferably, the multimodal sensing module includes an inertial measurement unit; in the intent parsing step, the action events of tapping or shaking the device are identified by analyzing the acceleration and angular velocity changes of the inertial measurement unit, and the action events are mapped to the corresponding interaction intent types.
[0031] A second objective of this invention is to provide a modular colored light human-computer interaction device, comprising:
[0032] The housing has a mounting structure for fixing.
[0033] The power input terminal and the power output terminal adopt a male-female plug-in structure. The power input terminal is used to connect to an external power source, and the power output terminal is used to connect to the power plug of the colored light source.
[0034] A multimodal sensing module, integrated inside or on the surface of the housing, includes at least one of a microphone, a touch sensor, an infrared motion sensor, and an inertial measurement unit;
[0035] The drive interface module, located inside the housing, includes three parallel drive channels: a relay switch circuit, a PWM voltage regulation circuit, and a digital protocol communication circuit.
[0036] The load type identification module is configured to automatically identify the load type of the connected colored light source by applying a detection signal to the power output terminal and detecting the electrical response characteristics, and automatically select the corresponding drive channel according to the identification result;
[0037] Wireless communication module, used to establish a wireless connection with a mobile terminal;
[0038] A processor, disposed within the housing, is electrically connected to the multimodal sensing module, the driving interface module, the load type identification module, and the wireless communication module, respectively, and the processor is configured to perform the steps of the method as described in any one of claims 1 to 8.
[0039] Preferably, the mounting structure includes at least one of a buckle, a cable tie slot, and a magnetic clasp; the power input terminal and the power output terminal of the housing adopt a male-female plug-in structure to achieve tool-free quick-connect installation.
[0040] Beneficial effects:
[0041] This invention automatically identifies the load type and selects the corresponding drive channel through electrical detection. A single device can be compatible with all types of colored lights, from tungsten filament bulb strings to addressable LED light strips. Users do not need to manually configure them, truly achieving plug and play. By introducing an interactive semantic mapping layer, the parsing of the user's natural interactive intent is decoupled from the generation of specific lighting control commands, so that the same interactive semantic can be presented at different granularities on different types of lighting hardware, greatly improving the system's scalability and compatibility.
[0042] In addressable smart light string scenarios, the three-dimensional spatial mapping relationship obtained by visual inertial SLAM enables refined light effect choreography with spatial directionality and narrative.
[0043] The modular device is connected in series in the power circuit through a male-female plug-in structure, without altering the original structure of the colored lights, and no tools are required for installation and removal. Attached Figure Description
[0044] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0045] Figure 1 This is a flowchart of the method described in this invention;
[0046] Figure 2 This is an electrical detection timing diagram for load type identification in the device identification process;
[0047] Figure 3 This is a timing diagram of frame rate synchronization and Gray code decoding;
[0048] Figure 4 This is a diagram comparing two load types in some application scenarios using interactive semantics. Detailed Implementation
[0049] The following detailed description, in conjunction with the accompanying drawings and specific embodiments, provides a more detailed explanation of the multimodal human-computer interaction method for traditional colored light sources proposed in this invention. The advantages and features of this invention will become clearer from the following description. It should be noted that the accompanying drawings are all in a very simplified form and use non-precise proportions, and are only used to facilitate and clearly illustrate the embodiments of this invention.
[0050] In the description of this invention, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features; thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature; in the description of this invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0051] It should be noted that the illustrations provided in the following embodiments are only schematic representations of the basic concept of the present invention. Therefore, the illustrations only show the components related to the present invention and are not drawn according to the actual number, shape and size of the components in the actual implementation. In the actual implementation, the state, quantity and proportion of each component can be arbitrarily changed, and the layout of the components may also be more complex.
[0052] In some embodiments, a multimodal human-computer interaction device for a traditional colored light source includes:
[0053] The outer shell is made of flame-retardant ABS plastic, with overall dimensions of approximately 80 mm in length, 50 mm in width, and 30 mm in height. The front of the shell has a touch-sensitive area, the side has a microphone pickup hole, and the back has a buckle, cable tie slot, and four embedded neodymium magnets. It provides three tool-free installation methods and can be installed on a pole or on the wiring of the decorative lights.
[0054] The power input is a standard two-prong plug, and the power output is a standard two-prong socket. The male and female plug-in structure allows the installation process to be completed within ten seconds.
[0055] The core of the device is a custom six-layer PCB board, which integrates the following main components: processor, MEMS microphone, capacitive touch controller, infrared motion sensor, six-axis IMU, magnetic latching relay, MOSFET, and level conversion chip; the processor is responsible for running multimodal interaction algorithms and processing sound, touch, and motion signals in real time;
[0056] The driver interface module contains three parallel drive channels: the first channel is a relay switch channel, consisting of a magnetic latching relay, which is energized only during switching and is suitable for on / off control of tungsten filament bulb strings; the second channel is a PWM voltage regulation channel, consisting of an N-channel MOSFET, a freewheeling diode, and a low-pass LC filter, with 20kHz PWM frequency modulation, suitable for stepless dimming of ordinary LED light strings; the third channel is a digital protocol communication channel, consisting of a 74HCT245 level conversion chip, with its output pin connected to an independent XH2.54 terminal block, suitable for data communication of addressable light strings. The processor automatically selects the corresponding channel based on the load type identification result through the analog switch chip.
[0057] When working, please refer to Figure 1-4 When the device is powered on for the first time or a new load is detected, the processor executes the load type detection process:
[0058] First, the processor applies a pair of probe pulses, +5V for 1 millisecond and -5V for 1 millisecond, to the output terminal through the H-bridge circuit, and measures the current response waveform through a 0.1-ohm high-precision sampling resistor.
[0059] For low-voltage LED strings containing rectifier diodes and smoothing capacitors, the current spikes first and then decays exponentially. Under a forward pulse, it exhibits a capacitive charging curve, while under a reverse pulse, the current decreases significantly, and the ratio of the integral area of the forward and reverse currents exceeds 5:1.
[0060] For a tungsten filament bulb string with a purely resistive load, the current response under forward and reverse pulses is symmetrical, exhibits a linear proportional relationship, has no capacitive spikes, and the ratio of the integral area of the forward and reverse currents is close to 1:1.
[0061] For addressable smart light strings: the power supply load characteristics may be similar to ordinary LED light strings. The processor then sends a DMX512 query command or a single-line zero code reset signal through the digital protocol communication port. If a response is received, it is confirmed to be an addressable smart light string.
[0062] This dual detection mechanism, which combines power supply electrical characteristic analysis and data protocol detection, ensures highly accurate load type identification.
[0063] When the processor detects an addressable smart light string, it notifies the paired mobile terminal via BLE. After the user selects to start calibration:
[0064] 1) Frame rate synchronization: The mobile terminal App sends the camera frame rate parameters through BLE, and the processor adjusts the Gray code bit period to an integer multiple of the frame interval accordingly to ensure that each frame image completely covers one bit period;
[0065] 2) Gray code transmission and decoding: The processor assigns a unique 8-bit Gray code to each LED, driving the LEDs to transmit sequentially in a loop. The mobile terminal converts each frame of the image to grayscale, identifies overexposed areas, extracts the brightness change curve within 1600 milliseconds, performs cross-correlation matching with the pre-stored Gray code book, and accurately identifies the LED ID.
[0066] 3) 3D coordinate acquisition: The mobile terminal runs a visual inertial odometry calculation method based on a lightweight variant of ORB-SLAM3, while maintaining a tracking thread and a mapping thread. It uses the camera pose and spot pixel coordinates output by SLAM to solve the 3D coordinates of the LED beads through the PnP algorithm. The multi-view observations are jointly bundled and adjusted to optimize accuracy.
[0067] 4) Spatial mapping table generation: Pair all LED IDs with three-dimensional coordinates to generate a spatial topology data mapping table, and transmit it back to the device storage via BLE;
[0068] The entire calibration process takes approximately 30 to 60 seconds for 256 LEDs. Once completed, the app displays a 3D point cloud preview for user confirmation.
[0069] The interaction semantic mapping layer divides the mapping relationship between user interaction intent and light control commands into two layers:
[0070] The first layer maps interactive intents to interactive semantics. For example, the interactive intent of touching for 3 seconds and blowing air is mapped to the interactive semantics of making a wish request.
[0071] The second layer maps interactive semantics to lighting control commands. For example, when the interactive semantics of a wish request and an independently addressable smart light string are obtained, they are mapped to a random color space lighting effect that diffuses outward from the center; when the interactive semantics of a wish request and an unaddressable ordinary light string are obtained, they are mapped to an overall breathing gradient effect for the ordinary light string.
[0072] When adding new interactive semantics, there is no need to modify the underlying driver code. When adding a new lighting hardware type, you only need to add the corresponding lighting effect template. Interactive semantics can be reused across hardware types.
[0073] In some specific application scenarios, when a user covers the heart-shaped touch area with their finger for more than 3 seconds, the light string enters a wishing standby state. All the light beads turn into a low-brightness amber color and breathe with fluctuations of 1 / f. The user then blows, and the microphone captures and identifies the event as a blowing event by the audio classification model. The processor first drives all the light beads to turn off sequentially from the periphery to the geometric center for 1.5 seconds, visually like dandelion seeds being blown away. Then, there is a 2-second period of complete darkness. The processor calls a random color generation algorithm: randomly extracts the basic hue from 120 pre-screened emotional color SV color wheel, drifts within ±15 degrees, and has a saturation of 60%-90% and a brightness of 70%-100% to ensure that the generated color is both unique and beautiful. Finally, the light string is driven to light up layer by layer from the geometric center outward for 3 seconds, presenting a unique random color. The whole process sequentially forms a narrative emotional interaction loop of touch, waiting, blowing, light off, darkness, and rebirth.
[0074] When the load is a traditional string of lights, the same wish semantics are simplified to: after a breath is detected, the entire string of lights first turns off, and then slowly lights up in random colors after 2 seconds, presenting a breathing gradient effect.
[0075] In other application scenarios, the audio event classification model is trained with 500 hours of environmental audio and quantized into 8-bit integer TFLite format. When the sound is identified as rain in three consecutive time windows and the confidence level exceeds 80%, the rainy environment resonance semantics is triggered. For smart light strings: 50 high-level light beads are randomly selected as raindrop sources. 8-12 light beads are selected below each source to form a falling path. After the source lights up in cyan, the light beads on the path light up and turn off in sequence to simulate the trailing effect of raindrops. Multiple sources are activated in a staggered manner to form a multi-point raindrop effect in space.
[0076] For traditional string lights: the whole string is mainly blue-green, and random brightness fluctuations are achieved through PWM voltage regulation to simulate the atmosphere of a rainy day.
[0077] When the long press release intention is detected, the processor runs the Voss-McCartney algorithm to generate a pink noise sequence: 16 independent pseudo-random number generators are initialized, each running at a different sampling rate, and weighted summation is performed to generate a 1 / f fluctuation sequence with long-range correlation. This sequence is mapped to the PWM duty cycle, producing irregular and unpredictable fluctuations in brightness, like the flickering of a real candle flame.
[0078] Color temperature control is coupled to the brightness sequence: the color temperature is cooler at the brightness peak and warmer at the brightness trough, enhancing the sense of naturalness and immersion.
[0079] This effect can be further combined with spatial mapping on smart light strings to achieve a spatial breathing effect that ripples from one end to the other; on traditional light strings, it manifests as a uniform brightness and darkness throughout the string. This spatial breathing effect makes the light strings seem to have life, constantly changing with ambient sound and user interaction, creating an immersive emotional experience.
[0080] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims
1. A multimodal human-computer interaction method for traditional colored light sources, characterized in that, Includes the following steps: S1. Device identification steps: By applying a detection signal to the connected colored light source and detecting its electrical response characteristics, the load type of the colored light source is automatically identified. The load type includes two mutually exclusive categories: non-addressable ordinary light strings and independently addressable smart light strings. S2. Calibration Steps: When the colored light source is identified as an independently addressable smart light string, each independent light-emitting unit in the smart light string is controlled to emit an identification signal in sequence. A video stream containing the light-emitting units is acquired through the camera of the mobile terminal. The identification of each light-emitting unit is decoded from the video stream. At the same time, the three-dimensional position coordinates of the light-emitting unit in physical space are obtained using a visual inertial odometry system. A spatial mapping relationship between the identification of the light-emitting unit and its three-dimensional position coordinates is established. When the colored light source is identified as an unaddressable ordinary light string, the calibration steps are skipped. S3. Interactive sensing step: Collect interactive input signals from the user through a multimodal sensing module. The interactive input signals include at least one of audio signals, tactile signals, and posture signals. S4. Intent parsing step: Perform feature extraction and classification on the interactive input signal to determine the type of user interaction intent; S5. Interaction semantic mapping step: Map the user interaction intent type to a unified interaction semantic description, wherein the interaction semantic description is independent of the load type; S6. Feedback Generation Step: Based on the interaction semantic description, combined with the load type and the spatial mapping relationship, a lighting control instruction sequence is generated; wherein, when the load is an independently addressable smart light string, a first lighting control instruction sequence with spatial directionality is generated; when the load is an unaddressable ordinary light string, a second lighting control instruction sequence that drives the entire string of lights is generated; the first lighting control instruction sequence and the second lighting control instruction sequence carry the same interaction semantics, but the control granularity is different; S7. Feedback execution step: Apply the lighting control command sequence to the connected colored light source through the driver interface, and drive the colored light source to present dynamic lighting effects corresponding to the interactive semantics.
2. The method according to claim 1, characterized in that, The device identification steps are as follows: By applying pairs of detection pulses containing both positive and negative polarities to the connected colored light sources and detecting the symmetry characteristics of the positive and negative current responses, as well as the capacitive and resistive load characteristics, the electrical characteristics of the colored light sources can be determined, thereby accurately distinguishing between non-addressable ordinary light strings and independently addressable intelligent light strings; wherein: When the forward and reverse current responses are asymmetrical and exhibit capacitive charging characteristics, it is determined to be a low-voltage LED string. When the forward and reverse current responses are symmetrical and exhibit resistive characteristics, it is determined to be a tungsten filament bulb string or a high-voltage LED string. When the load characteristics of the power supply are unclear, a query command is sent through the digital protocol communication port and the response signal is detected to confirm whether it is an independently addressable smart light string.
3. The method according to claim 1, characterized in that, The drive interface includes three parallel drive channels: a relay switch circuit, a PWM voltage regulation circuit, and a digital protocol communication circuit. When the load is a non-addressable ordinary light string, the relay switch circuit or the PWM voltage regulation circuit is automatically selected according to the electrical characteristics of the traditional colored light. When the load is an independently addressable intelligent light string, the digital protocol communication circuit is selected, and the working power is provided through the relay switch circuit or the PWM voltage regulation circuit.
4. The method according to claim 1, characterized in that, Before the calibration step begins, the mobile terminal sends its camera frame rate parameters to the control device, and the control device dynamically adjusts the Gray code bit period time to an integer multiple of the frame interval accordingly to achieve frame rate synchronization. In the calibration step, the identification signal is a binary Gray code sequence; the mobile terminal decodes the Gray code sequence to determine the identification of the light-emitting unit by analyzing the temporal brightness change corresponding to the imaging area of the light-emitting unit in the video stream; the mobile terminal also performs visual inertial odometry calculation by using the built-in inertial measurement unit data and the changes in image feature points in the video stream to obtain the three-dimensional position coordinates of the light-emitting unit in the physical space.
5. The method according to claim 1, characterized in that, In the interactive semantic mapping step, at least the following interactive semantic types are defined: welcoming semantics, wishing semantics, relaxation semantics, beat synchronization semantics, and environmental resonance semantics; Each type of interactive semantics corresponds to a set of spatial effect templates and a set of complete effect templates; When the load is an independently addressable smart light string, the spatial effect template is invoked in conjunction with the spatial mapping relationship to generate a spatially directional lighting control instruction sequence; When the load is a non-addressable ordinary light string, the entire string effect template is called to generate a complete sequence of lighting control instructions.
6. The method according to claim 1, characterized in that, In the intent parsing step, an audio event classification model is run to classify the ambient sound signals in order to identify the type of ambient atmosphere in which the user is located. In the interactive semantic mapping step, when the classification results of multiple consecutive time windows are all of the same environmental atmosphere type and the confidence level is higher than a preset threshold, the corresponding environmental resonance semantics are triggered. In the feedback generation step, a dynamic light effect instruction sequence matching the environmental atmosphere type is generated based on the environmental resonance semantics and the spatial mapping relationship.
7. The method according to claim 1, characterized in that, The mobile terminal is also configured to acquire color information of real-world objects through its camera; in the feedback generation step, when the load is an independently addressable smart light string, a sequence of light effect instructions is generated to drive the light-emitting unit to present a hue, saturation, or brightness distribution that has a spatial mapping relationship with the color information, based on the color information and the spatial mapping relationship, so as to achieve a color gradient effect from bottom to top or from near to far; when the load is an unaddressable ordinary light string, the entire string of colored lights is driven to present a main color tone corresponding to the color information.
8. The method according to claim 1, characterized in that, The multimodal sensing module includes an inertial measurement unit; in the intent parsing step, the action events of tapping or shaking the device are identified by analyzing the acceleration and angular velocity changes of the inertial measurement unit, and the action events are mapped to the corresponding interaction intent types.
9. A modular colored light human-computer interaction device, characterized in that, include: The housing has a mounting structure for fixing. The power input terminal and the power output terminal adopt a male-female plug-in structure. The power input terminal is used to connect to an external power source, and the power output terminal is used to connect to the power plug of the colored light source. A multimodal sensing module, integrated inside or on the surface of the housing, includes at least one of a microphone, a touch sensor, an infrared motion sensor, and an inertial measurement unit; The drive interface module, located inside the housing, includes three parallel drive channels: a relay switch circuit, a PWM voltage regulation circuit, and a digital protocol communication circuit. The load type identification module is configured to automatically identify the load type of the connected colored light source by applying a detection signal to the power output terminal and detecting the electrical response characteristics, and automatically select the corresponding drive channel according to the identification result; Wireless communication module, used to establish a wireless connection with a mobile terminal; A processor, disposed within the housing, is electrically connected to the multimodal sensing module, the driving interface module, the load type identification module, and the wireless communication module, respectively, and the processor is configured to perform the steps of the method as described in any one of claims 1 to 8.
10. The apparatus according to claim 9, characterized in that, The mounting structure includes at least one of a buckle, a cable tie slot, and a magnetic closure; the power input and power output terminals of the housing adopt a male-female plug-in structure to achieve tool-free quick-connect installation.