Wearable camera device fusing ai voice interaction and cellular calling
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Utility models(China)
- Current Assignee / Owner
- SHANGHAI FORTUNE TECHGROUP CO LTD
- Filing Date
- 2025-08-14
- Publication Date
- 2026-06-23
Smart Images

Figure CN224401601U_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of smart devices, and in particular to a personal camera device that integrates AI voice interaction and cellular calling. Background Technology
[0002] With the rapid development of portable electronic devices, especially the continuous breakthroughs in image acquisition, voice recognition, and mobile communication, more and more miniature camera devices are being applied to various scenarios such as home monitoring, smart companionship, childcare, and emotional support. These devices are becoming increasingly smaller and their designs more optimized, greatly improving their portability. However, existing miniature camera devices still face many technical challenges in terms of multi-functional integration and interactivity.
[0003] For example, existing portable camera devices generally employ single-function or simple multi-functional integration, failing to achieve seamless integration of multiple functions such as video surveillance, voice interaction, and telephone calls. Although some devices possess voice recognition capabilities, most are basic command responses, lacking context awareness and emotional feedback mechanisms, making it difficult to support prolonged multi-turn dialogues or emotional interactions. This may result in a lack of interactive experience in scenarios requiring extended interaction, such as child companionship or elderly care.
[0004] Furthermore, some devices rely primarily on Bluetooth or smartphone connectivity for voice functionality, lacking an independent communication module. This results in response delays in emergency calls and instant messaging scenarios, impacting the actual user experience. Alternatively, some devices can only connect via Wi-Fi, limiting their use in environments without a stable network. This is especially problematic when out and about or traveling, severely restricting device functionality and failing to meet users' needs anytime, anywhere, thus affecting the device's convenience and flexibility.
[0005] Therefore, there is a need for a new type of small, portable camera device that can integrate multiple functions to meet users' needs in various scenarios such as home care, emotional companionship, and intelligent assistance, while providing a smoother and more efficient user experience. Utility Model Content
[0006] In order to overcome or mitigate at least one of the shortcomings of the prior art, one object of this application is to provide a personal camera device that integrates AI voice interaction and cellular calling.
[0007] To achieve the above-mentioned objectives, the present application may adopt the following technical solutions.
[0008] This application provides a portable camera device integrating AI voice interaction and cellular calling, comprising: a shell; a cellular platform, which communicates with a mobile base station and the Internet; a computing platform, which communicates with the cellular platform and is used for voice recognition, natural language processing, and control command generation; and an audio acquisition and processing module, including a microphone and a speaker, wherein the microphone is connected to the cellular platform and the computing platform respectively through an audio input interface, and is used to acquire audio data and synchronously transmit it to the cellular platform or the computing platform; and the speaker is connected to the cellular platform through an audio output interface and is connected to the audio input interface of the computing platform. The system includes an audio input interface, where the cellular platform drives the speaker to play audio, and the computing platform collects the sound signal from the speaker through the audio input interface; a video acquisition and processing module, comprising a camera component and an ISP platform, wherein the camera component is electrically connected to the ISP platform for transmitting raw video data; the ISP platform is communicatively connected to the cellular platform, and the video data processed by the ISP platform is shared with the cellular platform to support video encoding and transmission; the cellular platform and the computing platform work together to timestamp and synchronize the audio data and the video data, thereby enabling video calls, telephone calls, and AI voice interaction functions.
[0009] In at least one embodiment, the camera assembly is a wide-angle camera, wherein the horizontal viewing angle is not less than 120 degrees and the vertical viewing angle is not less than 80 degrees.
[0010] In at least one embodiment, the device includes a UART interface and an SPI interface; the computing platform is communicatively connected to the cellular platform via the UART interface, and the cellular platform is communicatively connected to the ISP platform via the SPI interface.
[0011] In at least one embodiment, the device includes a plurality of heat sinks, which are spaced apart between the cellular platform and the computing platform, and the spacing between adjacent heat sinks is 2-5 mm.
[0012] In at least one embodiment, the device includes a cable tray, a power line, and a signal line. The cable tray is S-shaped and has a width of 2-3 mm. The power line and the signal line are respectively arranged on both sides of the cable tray, and an insulating partition is provided between the power line and the signal line.
[0013] In at least one embodiment, the device includes a switch button, a function button, and an indicator light; the switch button and the function button are provided with anti-accidental touch grooves, and the pressing stroke of the switch button and the function button is 0.5mm.
[0014] In at least one embodiment, the front side of the housing is provided with an indicator light mounting hole, a light guide post is embedded in the indicator light mounting hole, and multiple LED beads are installed in the light guide post.
[0015] In at least one embodiment, the rear side of the housing is provided with a trapezoidal groove that extends through the length of the housing and has a depth of 2-4 mm.
[0016] In at least one embodiment, the device includes a collar clip, comprising a first arm and a second arm connected to each other; the first arm is detachably embedded in the trapezoidal groove, and its cross-sectional shape is adapted to the trapezoidal groove; the second arm extends outward from the opening of the trapezoidal groove to form a free end for clamping clothing.
[0017] In at least one embodiment, the lower side of the housing is provided with a charging interface and a sealing groove, the sealing groove being disposed radially outside the charging interface, and a sealing ring being embedded in the sealing groove.
[0018] By adopting the above technical solution, this application provides a portable camera device that integrates AI voice interaction and cellular calling, thereby realizing video calls, telephone calls, and AI voice interaction functions. Through the collaborative work of the cellular platform and the computing platform, the device achieves timestamp synchronization of audio and video data, ensuring efficient and stable voice recognition, voice command feedback, and video calls. Simultaneously, the video acquisition and processing module can provide high-quality video data, meeting users' needs for remote monitoring and real-time interaction. This device is particularly suitable for scenarios such as childcare and emotional support, enhancing interactivity and security among family members and meeting the diverse needs of modern families for smart devices. Attached Figure Description
[0019] Figure 1 This is a top view of a portable camera device that integrates AI voice interaction and cellular calling according to an embodiment of this application;
[0020] Figure 2 This is a schematic diagram of the right-side structure of a personal camera device integrating AI voice interaction and cellular calling according to an embodiment of this application;
[0021] Figure 3 This is a bottom-view structural diagram of a portable camera device integrating AI voice interaction and cellular calling according to an embodiment of this application;
[0022] Figure 4 for Figure 3 Another structural diagram from a different perspective;
[0023] Figure 5 for Figure 1A schematic diagram showing the connection relationships between various modules such as the computing platform, cellular platform, and ISP platform of the device.
[0024] Explanation of reference numerals in the attached figures
[0025] 10. Outer casing;
[0026] 11. Trapezoidal groove;
[0027] 20. Camera components;
[0028] 30 microphones;
[0029] 31 speakers;
[0030] 40. Switch button;
[0031] 41 Function buttons;
[0032] 42 Charging ports;
[0033] 50 indicator lights Detailed Implementation
[0034] Exemplary embodiments of this application are described below with reference to the accompanying drawings. It should be understood that these specific descriptions are for teaching those skilled in the art how to implement this application only, and are not intended to exhaust all possible methods of this application, nor to limit the scope of this application.
[0035] Embodiments of this application provide a personal camera device (hereinafter, sometimes simply referred to as the "device") that integrates AI voice interaction and cellular calling.
[0036] The present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0037] like Figure 1 As shown, embodiments of this application provide a portable camera device that integrates AI voice interaction and cellular calling, which may include a housing 10, a cellular platform, a computing platform, an audio acquisition and processing module, a video acquisition and processing module, etc.
[0038] Among them, see Figure 1 The outer casing 10 serves as the external load-bearing structure of the equipment, housing and protecting the various functional modules. The outer casing 10 can be made of high-strength plastic (such as ABS+PP) and may have chamfered edges for easy gripping. The surface of the outer casing 10 has reserved module interfaces and control areas for easy operation and maintenance.
[0039] The dimensions of the outer casing 10 can be designed according to specific requirements. Preferably, see [reference needed]. Figure 1 and Figure 2 The length of the outer casing 10 ( Figure 1 The vertical direction (in the middle) can be set to 60mm, and the width ( Figure 1 The left and right directions can be set to 50mm, and the thickness (in the middle) Figure 2 The left and right directions (in the middle) can be set to 15mm.
[0040] In this embodiment, the cellular platform can serve as a communication module to enable communication connections with mobile base stations and the Internet; the computing platform is used for speech recognition, natural language processing, and control command generation.
[0041] Furthermore, the device may include multiple heat sinks, which can be spaced apart between the cellular platform and the computing platform to dissipate heat. The spacing between adjacent heat sinks is preferably 2-5mm to improve heat dissipation efficiency.
[0042] The audio acquisition and processing module may include a microphone 30 and a speaker 31. The microphone 30 is used for audio acquisition, and the speaker 31 is used for audio output, supporting voice feedback and telephone calls. The video acquisition and processing module may include a camera component 20 and an ISP (Image Signal Processor) platform. The camera component 20 is responsible for image acquisition, and the ISP platform is responsible for image data processing and encoding.
[0043] Preferably, the camera assembly 20 can be a wide-angle camera with a horizontal viewing angle of not less than 120 degrees and a vertical viewing angle of not less than 80 degrees, so as to expand the field of view for video calls.
[0044] The device can be configured with a UART (Universal Asynchronous Receiver-Transmitter) interface and an SPI (Serial Peripheral Interface) interface for data communication between modules. Specifically, the UART interface is used for communication between the computing platform and the cellular platform, while the SPI interface is used for video and audio data transmission between the cellular platform and the ISP platform, ensuring the synchronous transmission of audio, video, and control signals.
[0045] In this embodiment, as Figure 3 As shown, the device may include a battery and a charging interface 42 for providing power to the device and supporting long-term use and charging functions. The charging interface 42 is preferably a Type-C interface, and the radially outer side of the charging interface 42 (i.e., the side away from the center of the charging interface 42) may be provided with an annular sealing groove, and a sealing ring may be embedded in the sealing groove to improve the protection performance.
[0046] See Figure 1 and Figure 2The device may include a power switch 40, a function button 41, and an indicator light 50. Users can use these buttons to perform operations such as powering on / off, answering and hanging up calls, and starting or pausing the device. The function button 41 and the power switch 40 are preferably made of silicone, with a pressing travel of 0.5mm, and are equipped with anti-accidental touch grooves.
[0047] Furthermore, the indicator light 50 can be a tri-color LED, which can indicate the device's working status (such as recording, taking pictures, charging, network connection, etc.) through different colors and flashing patterns. For example, a 300-millisecond flashing cycle indicates video recording, while a continuous 1-second light indicates that a picture has been taken. The front side of the housing 10 can have an indicator light mounting hole, in which a light guide post (with an outer diameter of 2mm) can be embedded. The light guide post can be annular, and multiple LED beads can be installed inside, which are then filled and sealed with epoxy resin.
[0048] Furthermore, the device may include cable trays, power cables, and signal cables. In this embodiment, the cable trays may be S-shaped, with a width of 2-3 mm. The power cables and signal cables may be arranged on opposite sides of the cable tray and separated by an insulating partition to avoid interference and exposure, thereby improving the safety and aesthetics of the device.
[0049] like Figure 3 and Figure 4 As shown, the rear side of the housing 10 may be provided with a trapezoidal groove 11 for mounting a collar clip, allowing the user to easily fix the device to clothing or other locations for carrying or fixed use. This trapezoidal groove 11 may extend through the length of the housing 10. Figure 3 (in the top and bottom directions) Figure 3 The middle arrow indicates the area where the collar clip will be installed.
[0050] Specifically, the collar clip may include a first arm and a second arm connected to each other. The first arm is detachably embedded in the trapezoidal groove 11, and its cross-sectional shape is adapted to the trapezoidal groove 11; the second arm extends outward from the opening of the trapezoidal groove 11 to form a free end for clamping clothing.
[0051] Preferably, in this embodiment, the width of the trapezoidal groove 11 ( Figure 3 The center (left and right direction) can be set to 6mm, and the depth ( Figure 4 The vertical direction can be set to 2-4mm.
[0052] Furthermore, the device in this embodiment can also be fixed by magnetic attraction. For example, the device may include a magnetic attraction component, and the wall of the trapezoidal groove 11 may also be provided with a plurality of symmetrically distributed magnetic positioning holes, in which the magnetic attraction component can be embedded.
[0053] Specifically, the magnetic assembly may include multiple magnets and an anti-slip silicone layer, which can cover the outside of the magnets to ensure the stability and anti-slip effect of the device.
[0054] like Figure 5 As shown, this device integrates a cellular platform and a computing platform to achieve multi-functional collaborative operation. The cellular platform can communicate with both mobile base stations and the Internet, supporting audio and video interaction between the device and remote platforms. Specifically, the Internet can support the operation of an aPaaS platform (Application Platform as a Service) and a video server, providing related services.
[0055] The aPaaS platform is responsible for running large AI models to compute and process user voice commands, voice recognition, and other tasks, providing intelligent services. The video server supports video transmission, video calls, and cloud video surveillance, meeting the audio and video interaction needs between devices and remote users or other devices.
[0056] Further, see Figure 5 The microphone 30 can be connected to the cellular platform and the computing platform respectively through the audio input interface, so as to synchronously transmit the collected audio data to the cellular platform or the computing platform, ensuring that the audio signal is shared between the two platforms.
[0057] See Figure 5 The speaker 31 can be connected to the cellular platform via an audio output interface and simultaneously connected to the computing platform via an audio input interface. The cellular platform can drive the speaker 31 to play audio, or the computing platform can sample the sound signal from the speaker 31 via the audio input interface for subsequent algorithm processing.
[0058] See Figure 5 The power button 40 is connected to the device control system via the cellular platform's input interface, allowing users to power on or off the device by pressing and holding it for 5 seconds. The function buttons 41 are also implemented via the cellular platform's input interface, enabling users to answer calls, hang up calls, or force interruptions via short presses, providing a convenient interactive experience.
[0059] See Figure 5 The indicator light 50 can be connected to the cellular platform. The cellular platform controls the color and flashing mode of the indicator light 50 according to the different working statuses of the device, providing device status feedback.
[0060] See Figure 5The camera component 20 can be electrically connected to the ISP platform to transmit raw video data. Simultaneously, the ISP platform can communicate with the cellular platform; the video data processed by the ISP platform can be shared with the cellular platform for video encoding and transmission, ensuring synchronous transmission of audio and video data.
[0061] In this way, the cellular platform and the computing platform work together to ensure the synchronization of audio and video data timestamps, thereby enabling functions such as video calls, telephone calls, and AI voice interaction. Through this collaborative work, the device can provide efficient audio and video transmission and an intelligent voice interaction experience.
[0062] For example, in the AI chat function, the computing platform first detects the wake word by collecting data from microphone 30. Upon detecting the wake word, it notifies the cellular platform to begin operation via the UART interface. After receiving the wake-up command, the cellular platform begins collecting microphone 30 data and stores the most recent 200 milliseconds of audio locally. The computing platform continues collecting microphone 30 data to determine human voice. When a human voice is detected, it notifies the cellular platform to enter speaking mode via the UART interface. Upon receiving this command, the cellular platform uploads the locally stored 200 milliseconds of audio to the aPaaS platform and continues collecting subsequent microphone 30 data to transmit to the aPaaS platform.
[0063] If the computing platform determines there is no voice, it will notify the cellular platform to enter a waiting-for-response state. Upon receiving this instruction, the cellular platform continues to collect data from microphone 30 and waits for an audio response from the AI model of the aPaaS platform. When the cellular platform receives an audio response from the aPaaS platform, it broadcasts it through speaker 31. After the broadcast, it continues to collect data through microphone 30 and retains the audio from the most recent 200 milliseconds. If the cellular platform receives a function call response from the aPaaS platform, it will trigger local actions, such as making a phone call. If no voice is detected within 2 minutes, the computing platform can notify the cellular platform to stop collecting data from microphone 30 and re-enter wake-up mode.
[0064] In "Thousand-Mile Vision" mode, after receiving a two-way audio / video communication request from the video server, the cellular platform can notify the ISP platform to enable the video function via the SPI interface and start microphone 30 for data acquisition. Simultaneously, the computing platform can stop microphone 30 data acquisition. Upon receiving the enable command, the ISP platform starts camera component 20 for data acquisition and video encoding. The encoded data is then transmitted to the cellular platform via the SPI interface. After receiving the video data from the SPI interface, the cellular platform packages it together with the microphone 30 data and transmits it to the video server. Upon receiving the audio data from the video server, the cellular platform plays it using speaker 31. If the user or the video server needs to stop the two-way communication, it can do so via button control or the video server can actively stop the communication. Subsequently, the computing platform can resume microphone 30 data acquisition and wait for a wake-up word.
[0065] In the call dialing and answering functions, the cellular platform first receives a call request from the mobile base station and plays a caller ID tone through speaker 31. The user can answer the call using function button 41. During the call, the user can hang up using function button 41. When the AI chat function triggers a call, the cellular platform receives a call call function call reply from the aPaaS platform and dials the phone number or a pre-stored person's name according to the reply. The computing power platform can notify the microphone 30 to collect data to ensure audio capture during the call. The user can end the call before or after using function button 41, or the other party can actively terminate the call. After the call ends, the computing power platform resumes microphone 30 data collection and waits for the wake-up word.
[0066] This application provides a device that integrates AI voice and telephone calling. The cellular platform communicates with a computing platform and an ISP platform, and simultaneously connects to a mobile base station and the Internet. A microphone 30 is connected to both the cellular platform and the computing platform, supporting simultaneous sound acquisition. A speaker 31 is connected to the audio output port of the cellular platform and to the audio input port of the computing platform. The cellular platform plays sound through the speaker 31, and the computing platform performs sound feedback. This device achieves efficient voice recognition, video calling, and remote monitoring, meeting users' needs in scenarios such as childcare and emotional support, and enhancing family interactivity and security.
[0067] It should be understood that the above-described embodiments, examples, or examples are merely exemplary and are not intended to limit this application. Those skilled in the art can make various modifications and changes to the above-described embodiments, examples, or examples under the teachings of this application without departing from the scope of this application.
Claims
1. A portable camera device integrating AI voice interaction and cellular calling, characterized in that, include: shell; A cellular platform that communicates with mobile base stations and the Internet; A computing power platform, which is communicatively connected to the cellular platform, is used for speech recognition, natural language processing, and control command generation. An audio acquisition and processing module includes a microphone and a speaker. The microphone is connected to the cellular platform and the computing platform respectively through an audio input interface, and is used to acquire audio data and transmit it synchronously to the cellular platform or the computing platform. The speaker is connected to the cellular platform through an audio output interface and is connected to the audio input interface of the computing platform. The cellular platform drives the speaker to play audio, and the computing platform collects back the sound signal from the speaker through the audio input interface. A video acquisition and processing module includes a camera component and an ISP platform, wherein the camera component is electrically connected to the ISP platform for transmitting raw video data; The ISP platform is communicatively connected to the cellular platform, and the video data processed by the ISP platform is shared with the cellular platform to support video encoding and transmission; The cellular platform and the computing platform work together to timestamp and synchronize the audio data and video data, thereby enabling video calls, telephone calls and AI voice interaction functions.
2. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The camera assembly is a wide-angle camera, with a horizontal viewing angle of not less than 120 degrees and a vertical viewing angle of not less than 80 degrees.
3. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The device includes a UART interface and an SPI interface; The computing platform communicates with the cellular platform via the UART interface, and the cellular platform communicates with the ISP platform via the SPI interface.
4. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The device includes multiple heat sinks, which are spaced apart between the cellular platform and the computing platform, with a spacing of 2-5 mm between adjacent heat sinks.
5. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The device includes a cable tray, a power cable, and a signal cable. The cable tray is S-shaped and has a width of 2-3 mm. The power cable and the signal cable are respectively arranged on both sides of the cable tray, and an insulating partition is provided between the power cable and the signal cable.
6. The portable camera device integrating AI voice interaction and cellular calling according to any one of claims 1 to 5, characterized in that, The device includes a power switch, function buttons, and indicator lights; The switch button and the function button are provided with anti-accidental touch grooves, and the pressing stroke of the switch button and the function button is 0.5mm.
7. The portable camera device integrating AI voice interaction and cellular calling according to claim 6, characterized in that, The front side of the housing is provided with an indicator light mounting hole, and a light guide column is embedded in the indicator light mounting hole. Multiple LED beads are installed in the light guide column.
8. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The rear side of the outer casing is provided with a trapezoidal groove that extends through the length of the outer casing and has a depth of 2-4 mm.
9. The portable camera device integrating AI voice interaction and cellular calling according to claim 8, characterized in that, The device includes a collar clip, comprising a first arm and a second arm connected to each other; the first arm is detachably embedded in the trapezoidal groove, and its cross-sectional shape is adapted to the trapezoidal groove; the second arm extends outward from the opening of the trapezoidal groove to form a free end for clamping clothing.
10. The portable camera device integrating AI voice interaction and cellular calling according to claim 1, characterized in that, The lower side of the housing is provided with a charging interface and a sealing groove. The sealing groove is located on the radial outer side of the charging interface, and a sealing ring is embedded in the sealing groove.