Live Streaming Mode Switching Method and Related Devices Based on Audience Emotions

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By detecting the live streaming mode and automatically switching to the "person behind the character" mode using bullet screen analysis, the problem of insufficient interactivity when the audience's emotions change in virtual object live streaming is solved, thus improving the interactivity and efficiency of live streaming.

CN116684688BActive Publication Date: 2026-06-30MOFA (SHANGHAI) INFORMATION TECH CO LTD +1

2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: MOFA (SHANGHAI) INFORMATION TECH CO LTD
Filing Date: 2023-05-29
Publication Date: 2026-06-30

Application Information

Patent Timeline

29 May 2023

Application

30 Jun 2026

Publication

CN116684688B

IPC: H04N21/442; H04N21/466; H04N21/4788; H04N21/488

AI Tagging

Technology Topics

Human–computer interaction Computer engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing live streaming mode switching methods cannot meet the needs of customers for virtual object live streaming, especially in terms of the inability to adjust in time when the audience's emotions change, resulting in insufficient interactivity and user stickiness.

Method used

By detecting whether the live stream is in AI mode, analyzing audience emotions using bullet comments, and automatically switching to the "person behind the scenes" mode, the person behind the scenes takes over the live stream to provide personalized interaction.

Benefits of technology

It improved the interactivity and user engagement of live streams, reduced the workload of operations staff, and improved the efficiency and quality of live streams.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116684688B_ABST

Patent Text Reader

Abstract

This application provides a method and related apparatus for switching live streaming modes based on audience emotions. The live streaming modes include AI mode and a virtual person mode. The method includes: detecting whether the current live streaming mode is AI mode; if so, acquiring live chat information based on a chat acquisition strategy; and based on the chat information, switching the current live streaming mode from AI mode to virtual person mode for the virtual person to conduct the live stream. This effectively meets the needs of clients who want to stream to virtual objects. By understanding audience emotions and needs in real time, adjustments can be made promptly, switching to virtual person mode to allow the virtual person to enhance the interactivity of the live stream, making the audience more engaged and involved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the technical fields of virtual humans and artificial intelligence, and in particular to a method and related apparatus for switching live streaming modes based on audience emotions. Background Technology

[0002] Virtual objects include virtual humans, virtual animals, and virtual cartoon characters. Virtual humans, in particular, are anthropomorphic figures constructed using CG technology and operating in code form, possessing various interactive capabilities such as language communication, facial expressions, and action demonstrations. Virtual human technology has rapidly developed in the field of artificial intelligence and has been applied in many technological areas, including film, media, gaming, finance, cultural tourism, education, and healthcare.

[0003] Existing live streaming mode switching methods are mostly applicable to traditional fields and cannot adequately meet the needs of customers streaming virtual objects. Therefore, this application provides a live streaming mode switching method, a live streaming mode switching device, and related devices to improve upon existing technologies. Summary of the Invention

[0004] The purpose of this application is to provide a method and related device for switching live streaming modes based on audience emotions, which can well meet the needs of customers for live streaming of virtual objects.

[0005] The objective of this application is achieved through the following technical solution:

[0006] Firstly, this application provides a method for switching live streaming modes based on viewer emotions, wherein the live streaming modes include AI mode and in-person mode, and the live streaming mode switching method includes:

[0007] Detect whether the current live streaming mode in the live streaming room is AI mode;

[0008] If so, then obtain the live stream chat information based on the chat acquisition strategy;

[0009] Based on the live stream chat information, the current live stream mode is switched from AI mode to the "person behind the scenes" mode, so that the person behind the scenes can conduct the live stream.

[0010] The beneficial effects of this technical solution are as follows: During the live stream, it detects whether the current live stream mode is AI mode. If it is AI mode, it uses a bullet screen acquisition strategy to obtain live stream bullet screen information. Based on the live stream bullet screen information, it judges the current audience's emotions and reactions. If it finds that the audience is not very interested in the current AI mode or has other needs, it automatically switches to the "human-centric mode." This allows the "human-centric" to take over the live stream at any time, providing more interaction and a personalized experience. On the one hand, by understanding the audience's emotions and needs in real time and making timely adjustments, switching to the "human-centric mode," the "human-centric" enhances the interactivity of the live stream, making the audience more engaged and involved. On the other hand, since the needs and interests of the audience vary, this method can better meet the needs of different audiences. Furthermore, viewers can gain a deeper understanding of the live stream content through interaction with the "human-centric," thereby enhancing their interest and trust in the live stream content and increasing user stickiness.

[0011] In some possible implementations, the live stream bullet screen information includes at least one bullet screen message and the bullet screen content corresponding to each bullet screen message;

[0012] The step of switching the current live stream mode to the "person in the middle" mode based on the live stream bullet screen information includes:

[0013] Based on the content of each bullet comment, a viewer emotion score is determined.

[0014] When the audience's emotional score is greater than the preset emotional score, a prompt message is sent to indicate that the person in the prompt has arrived.

[0015] Upon detecting the presence of the person in the video, the current live stream mode will be switched from AI mode to person in video mode.

[0016] The beneficial effects of this technical solution are as follows: By performing sentiment analysis on each bullet comment, an audience sentiment score is determined and compared with a preset sentiment score. When the audience sentiment score is higher than the preset score, a notification message is sent to remind the virtual host to arrive. Once the virtual host's arrival is detected, the current live stream mode is automatically switched to virtual host mode. On the one hand, by recognizing audience emotions, the needs of the audience can be better met, increasing the interactivity and attractiveness of the live stream. On the other hand, switching the current live stream mode from AI mode to virtual host mode and notifying the virtual host to arrive can avoid potential risks, such as the live stream mode being switched to virtual host mode but the virtual host not arriving. At the same time, virtual host mode can better meet the needs of the audience, improving the quality and credibility of the live stream. Furthermore, by automatically analyzing audience emotions and the arrival of the virtual host, the workload of operations staff can be reduced, improving the efficiency of the live stream.

[0017] In some possible implementations, determining the audience sentiment score based on the content of each bullet comment includes:

[0018] Based on one or more preset keywords and the content of each bullet screen message, determine the individual emotion score corresponding to each bullet screen message;

[0019] The audience's emotional score is determined based on all the individual emotional scores.

[0020] The beneficial effects of this technical solution are as follows: By analyzing whether each bullet comment contains one or more preset keywords, the individual sentiment score of each bullet comment is determined. Then, an audience sentiment score is calculated based on all individual sentiment scores. On the one hand, by pre-setting relevant keywords, individual audience sentiment scores can be identified and calculated more accurately. On the other hand, by understanding the audience's emotional state, timely appropriate actions can be taken, such as adjusting the live stream content and enhancing interaction, thereby improving audience participation and experience. Furthermore, by automatically analyzing the individual sentiment scores of each bullet comment, the workload of operations staff can be reduced, and live stream efficiency can be improved.

[0021] In some possible implementations, determining the audience sentiment score based on the content of each bullet comment includes:

[0022] The content of each bullet comment is input into the emotion recognition model to obtain the emotion tag corresponding to each bullet comment.

[0023] Based on the emotion tag corresponding to each bullet screen message, the emotion ratio is determined; the emotion ratio includes the proportion of each emotion to all emotions.

[0024] The audience's emotional score is determined based on one or more of the emotional proportions.

[0025] The beneficial effects of this technical solution are as follows: It analyzes the content of each bullet comment based on an emotion recognition model, thereby obtaining an emotion tag for each bullet comment. The emotion ratio is determined based on the emotion tag, that is, the proportion of each emotion among all emotions. Based on these emotion ratios, an audience emotion score can be calculated. On the one hand, it allows for timely understanding of audience reactions and attitudes towards the live stream content, enabling adjustments to the live stream mode and content. For example, if most viewers are happy with the live stream content in AI mode, the AI mode can be switched to the "human voice" mode to further guide viewers to purchase products. On the other hand, using an emotion recognition model allows for rapid and accurate emotion analysis of a large amount of bullet comment information, saving manpower and time costs and improving the efficiency and accuracy of data processing. Furthermore, the emotion ratio-based scoring method is more objective and comprehensive, not only reflecting the overall emotion of the audience but also revealing contradictory emotional factors, providing operators with more comprehensive feedback and reference.

[0026] In some possible implementations, the steps for obtaining the preset emotion score include:

[0027] Obtain the real-time transaction data of the live broadcast room;

[0028] Based on the real-time transaction data and the preset rating adjustment strategy, the preset sentiment score is determined.

[0029] The beneficial effects of this technical solution are as follows: Based on real-time transaction data from the live stream, and combined with a pre-defined rating adjustment strategy, a preset sentiment score is determined. For example, if the goal is to increase product sales, the preset sentiment score can be linked to sales-related metrics such as conversion rate and order volume. On the one hand, acquiring and analyzing real-time transaction data from the live stream to derive the preset sentiment score allows for a more accurate judgment on whether to switch live stream modes. On the other hand, combining the preset rating adjustment strategy allows for effective adjustment of the preset sentiment score and, to some extent, prediction of future audience emotions and behaviors, thereby better guiding live stream content and interaction patterns.

[0030] In some possible implementations, the bullet screen acquisition strategy includes the number of bullet screens acquired;

[0031] The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes:

[0032] Based on the number of bullet comments received, the live stream bullet comment information is obtained.

[0033] The beneficial effects of this technical solution are as follows: It acquires live stream bullet screen information based on a bullet screen acquisition strategy. This strategy includes the quantity of bullet screens acquired; for example, it can set how many bullet screen messages to acquire per minute, or adjust the quantity and rate of acquisition based on metrics such as the number of viewers and their activity levels. On one hand, it allows for real-time acquisition of live stream bullet screen information, improving the ability to perceive audience emotions and reactions during the live stream. On the other hand, based on the bullet screen acquisition strategy, the quantity and rate of acquired bullet screens can be flexibly controlled, thus better adapting to different live stream scenarios and audience groups. Furthermore, analyzing and processing live stream bullet screen information allows for a better understanding of audience interests and needs, thereby adjusting live stream content and interaction methods. Finally, by performing sentiment analysis on the acquired bullet screen information, it enables automated judgment and feedback of live stream emotions, enhancing the real-time nature and effectiveness of live stream interaction.

[0034] In some possible implementations, the bullet screen acquisition strategy includes a correspondence between the number of real-time viewers and the number of bullet screens acquired.

[0035] The step of obtaining the live stream bullet comment information based on the number of bullet comments includes:

[0036] Obtain the real-time number of viewers in the live stream;

[0037] The number of bullet comments acquired is adjusted based on the real-time number of viewers and the corresponding relationship.

[0038] The beneficial effects of this technical solution are as follows: Based on the correspondence between the number of real-time viewers and the number of bullet comments (danmaku) acquired, the number of bullet comments acquired can be adjusted to achieve better acquisition of live stream bullet comment information. The bullet comment acquisition strategy includes the correspondence between the number of bullet comments acquired and the number of real-time viewers. For example, the number of bullet comments acquired per minute can be set based on the number of real-time viewers. The number of bullet comments acquired is adjusted according to the correspondence between the number of real-time viewers and the bullet comment acquisition strategy. If the number of real-time viewers is large, the number of bullet comments acquired can be appropriately increased to ensure sufficient information acquisition. If the number of real-time viewers is small, the number of bullet comments acquired can be appropriately decreased. On the one hand, the number of bullet comments acquired can be accurately adjusted according to the correspondence between the number of real-time viewers and the bullet comment acquisition strategy, making the acquisition of live stream bullet comment information more comprehensive and effective. On the other hand, analyzing the number of real-time viewers allows for a more accurate understanding of the characteristics and features of the audience group, which is beneficial for precision marketing and audience management.

[0039] In some possible implementations, the bullet screen acquisition strategy includes a bullet screen acquisition time range and an audience acquisition range;

[0040] The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes:

[0041] The live stream bullet screen information is obtained based on the bullet screen acquisition time range and the audience acquisition range.

[0042] The beneficial effects of this technical solution are as follows: Based on a set time range for bullet comment acquisition, it filters bullet comments sent within the corresponding time period. Based on a set audience acquisition range, it limits the number of viewers participating in the live stream, thereby filtering out duplicate bullet comments sent by the same viewer. Combining the bullet comment acquisition time range and the audience acquisition range, it obtains live stream bullet comment information that meets the criteria. On the one hand, by using the bullet comment acquisition time range and the audience acquisition range, it can filter out live stream bullet comment information that meets the criteria, improving the accuracy of bullet comment acquisition. On the other hand, by limiting the audience acquisition range, it can remove duplicate information sent by the same viewer, thereby avoiding interference from a large number of identical messages and making the bullet comment information clearer. This significantly reduces the number of bullet comments that need to be processed, improving data processing efficiency.

[0043] In some possible implementations, the live stream bullet screen information includes at least one bullet screen message and the sender of each bullet screen message;

[0044] The live streaming mode switching method also includes:

[0045] Obtain the historical transaction data corresponding to each of the aforementioned bullet screen senders;

[0046] Based on the live stream bullet screen information and the historical transaction data corresponding to each bullet screen sender, a product recommendation model corresponding to each bullet screen sender is established;

[0047] When the current live streaming mode switches from AI mode to the in-house mode, based on the product recommendation model corresponding to each of the bullet screen senders, recommended products and recommended scripts are generated for each of the bullet screen senders.

[0048] In response to a selection operation for one of the bullet screen senders, display recommended products and scripts corresponding to the selected bullet screen sender; or,

[0049] Based on all the recommended products, a description order for each recommended product is generated.

[0050] The beneficial effects of this technical solution are as follows: It acquires historical transaction data for each commenter, such as purchase records and followed products. Based on the live stream comment information and historical transaction data, a product recommendation model can be built for each commenter. This model can include characteristics such as the commenter's interests, purchasing habits, and purchasing power, and can be used to predict the products the commenter will follow. When the live stream mode switches from AI mode to human-centric mode, personalized recommended products and scripts can be generated based on the commenter's product recommendation model to improve purchase conversion rates and user satisfaction. When the human or operations personnel select a commenter, recommended products and scripts corresponding to that commenter can be displayed. The human can then use the scripts to introduce recommended products, increasing the commenter's purchase desire and improving satisfaction. Based on all recommended products, a presentation order for the recommended products is generated. For example, the expected level of a recommended product can be determined by the frequency of its appearance. The more times a product appears, the higher its expected level. The presentation order of recommended products is then generated based on the expected level. Prioritizing the presentation of recommended products with the highest expected level from all viewers can improve the live stream effect and increase user viewing time and purchasing behavior.

[0051] Secondly, embodiments of this application provide a live streaming mode switching device based on viewer emotions, the live streaming mode switching device comprising:

[0052] The mode detection module is used to detect whether the current live streaming mode is AI mode;

[0053] The bullet screen acquisition module is used to acquire live bullet screen information based on the bullet screen acquisition strategy when the current live streaming mode is AI mode.

[0054] The mode switching module is used to switch the current live streaming mode to the "person in the middle" mode based on the live streaming barrage information, so that the person in the middle can conduct the live streaming.

[0055] Thirdly, this application provides an electronic device including a memory and at least one processor, the memory storing a computer program, and the at least one processor being configured to execute the computer program to perform the following steps:

[0056] Detect whether the current live streaming mode in the live streaming room is AI mode;

[0057] If so, then obtain the live stream chat information based on the chat acquisition strategy;

[0058] Based on the live stream chat information, the current live stream mode is switched from AI mode to the "person behind the scenes" mode, so that the person behind the scenes can conduct the live stream.

[0059] Fourthly, this application provides a computer-readable storage medium storing a computer program that, when executed by at least one processor, implements the steps of any of the above methods or the functions of any of the above electronic devices. Attached Figure Description

[0060] The present application will be further described below with reference to the accompanying drawings and embodiments.

[0061] Figure 1 This is a flowchart illustrating a live streaming mode switching method based on audience emotions, provided in an embodiment of this application.

[0062] Figure 2 This is a flowchart illustrating a switching mode for a person in the middle, as provided in an embodiment of this application.

[0063] Figure 3 This is a schematic diagram of a process for determining audience emotion scores provided in an embodiment of this application.

[0064] Figure 4 This is a schematic diagram of another process for determining audience emotion scores provided in an embodiment of this application.

[0065] Figure 5 This is a schematic diagram of a process for determining a preset emotion score provided in an embodiment of this application.

[0066] Figure 6 This is a schematic diagram of a process for determining a preset emotion score provided in an embodiment of this application.

[0067] Figure 7 This is a schematic diagram of the structure of a live streaming mode switching device provided in an embodiment of this application.

[0068] Figure 8 This is a structural block diagram of an electronic device provided in an embodiment of this application.

[0069] Figure 9 This is a schematic diagram of the structure of a program product provided in an embodiment of this application. Detailed Implementation

[0070] The technical solutions in this application will be described below with reference to the accompanying drawings and specific embodiments. It should be noted that, without conflict, the various embodiments or technical features described below can be arbitrarily combined to form new embodiments.

[0071] In this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any implementation or design described as "exemplary" or "for example" in this application should not be construed as being better or more advantageous than other implementations or designs. Specifically, the use of terms such as "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0072] The technical field and related terms of the embodiments of this application are briefly described below.

[0073] Virtual objects include virtual humans, virtual animals, and virtual cartoon characters. Virtual humans, in particular, are anthropomorphic figures constructed using CG technology and operating in code form, possessing various interactive capabilities such as language communication, facial expressions, and action demonstrations. Virtual human technology has rapidly developed in the field of artificial intelligence and has been applied in many technological areas, including film, media, games, finance, culture and tourism, education, and healthcare. It allows for the customization of virtual hosts, virtual anchors, virtual idols, virtual customer service representatives, virtual lawyers, virtual financial advisors, virtual teachers, virtual doctors, virtual tour guides, and virtual assistants, and can also generate videos with a single click from text or audio. Among virtual humans, service-oriented virtual humans primarily function to replace real-life services and provide daily companionship; they are virtualizations of real-world service roles. Their industrial value mainly lies in reducing costs in existing service industries, thereby improving efficiency and reducing costs in the existing market.

[0074] Artificial Intelligence (AI) is the theory, methods, technology, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to achieve optimal results. In other words, AI is a comprehensive technology within computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a way similar to human intelligence. AI studies the design principles and implementation methods of various intelligent machines, enabling them to possess perception, reasoning, and decision-making capabilities. AI technology is a comprehensive discipline involving a wide range of fields, encompassing both hardware and software technologies. Fundamental AI technologies generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly include computer vision, speech processing, natural language processing, as well as machine learning / deep learning, autonomous driving, and intelligent transportation.

[0075] Machine learning (ML) is a multidisciplinary field involving probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory, among others. A computer program can learn experience E given a certain type of task T and a performance metric P. If its performance on task T can be precisely measured by P, then it improves with experience E. Machine learning specifically studies how computers can simulate or implement human learning behavior to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to endow computers with intelligence; its applications span all areas of artificial intelligence.

[0076] Deep learning is a special type of machine learning that learns to represent the world using nested hierarchical concepts, achieving tremendous functionality and flexibility. Each concept is defined as being associated with a simpler one, while more abstract representations are computed in a less abstract manner. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and learning by demonstration.

[0077] An Application Programming Interface (API) provides a set of standardized interfaces and functions for communication between different applications. An API can be understood as a bridge for data exchange between two applications; through the API, two different applications can call functions or interfaces provided by each other, thereby achieving interactive operations.

[0078] A Software Development Kit (SDK) is a collection of tools for rapidly developing applications. SDKs typically include reusable code libraries, APIs, documentation, and samples, helping developers quickly build applications that meet requirements, improving development efficiency and quality.

[0079] Web crawling is a technology that automates the acquisition of information from the internet. It simulates human browser behavior to search, crawl, and extract data from the web. By writing web crawler programs, large amounts of data can be automatically obtained from the internet, and this data can be processed and analyzed.

[0080] In virtual human technology, the "person in the virtual world" refers to the person who interprets and perfects the image of a virtual human through motion capture and facial capture technology. This enables the virtual human to interact with the real world, allowing the virtual human to interact freely with real people.

[0081] (Methods for switching live streaming modes based on audience emotions)

[0082] See Figure 1 , Figure 1 which is a schematic flowchart of a live broadcast mode switching method based on the emotions of the audience provided by an embodiment of the present application.

[0083] An embodiment of the present application provides a live broadcast mode switching method based on the emotions of the audience. The live broadcast modes include the AI mode and the real person mode. The live broadcast mode switching method includes:

[0084] Step S101: Detect whether the current live broadcast mode in the live broadcast room is the AI mode;

[0085] Step S102: If so, obtain the live broadcast barrage information based on the barrage acquisition strategy;

[0086] Step S103: Based on the live broadcast barrage information, switch the current live broadcast mode from the AI mode to the real person mode for the real person to conduct the live broadcast.

[0087] In an embodiment of the application, the AI mode refers to using AI to drive, using text information to drive the expressions and lip movements of virtual objects, using action information to drive the actions of virtual objects, and rendering to obtain the live broadcast screen. When using AI to drive, fast and automated interactions can be achieved.

[0088] In an embodiment of the application, the real person mode refers to using the real person to drive, collecting the voice information of the real person and converting it into text information, using the text information to drive the expressions and lip movements of virtual objects, collecting the real-time image of the real person, extracting the action information of the real person from the real-time image, using the action information of the real person to drive the actions of virtual objects, and rendering to obtain the live broadcast screen.

[0089] In an embodiment of the present application, the virtual object includes one or more of virtual humans, virtual animals, and virtual cartoon characters. As an example, the virtual object is the virtual human "JING" (Chinese name: Mirror).

[0090] In an embodiment of the present application, the barrage acquisition strategy refers to the strategy for obtaining the live broadcast barrage information sent by the audience during the live broadcast. Among them, the live broadcast barrage information refers to the information generated by the barrage. The barrage is an interaction method between the audience and the anchor during the live broadcast, which enables the audience to express their views, emotions, and suggestions during the real-time live broadcast, and at the same time can receive the responses and interactions from the anchor. Therefore, obtaining the barrage information can provide timely feedback to the anchor, helping them better grasp the live broadcast rhythm and content, and also providing a more personalized and interesting viewing experience for the audience.

[0091] In this embodiment, live stream bullet screen information can be obtained through API interfaces, SDKs, web crawlers, etc., provided by the live streaming platform. API interfaces and SDKs are standardized methods provided by the live streaming platform, capable of reliably obtaining bullet screen data, but may not meet some customized needs. Web crawler technology, on the other hand, can obtain live stream bullet screen information more flexibly, but may trigger the live streaming platform's anti-crawling mechanisms.

[0092] During the live stream, the system checks whether the current live stream mode is AI mode. If it is AI mode, a bullet screen acquisition strategy is used to obtain live stream bullet screen information. Based on the bullet screen information, the system judges the current audience's emotions and reactions. If it finds that the audience is not very interested in the current AI mode or has other needs, it automatically switches to the "human voice" mode. This allows the "human voice" to take over the live stream at any time, providing more interaction and a personalized experience. On the one hand, by understanding the audience's emotions and needs in real time, adjustments can be made in a timely manner, switching to the "human voice" mode, allowing the "human voice" to improve the interactivity of the live stream and make the audience more engaged and involved. On the other hand, since the needs and interests of the audience vary, this method can better meet the needs of different audiences. Furthermore, viewers can gain a deeper understanding of the live stream content through interaction with the "human voice," thereby enhancing their interest and trust in the live stream content and increasing user stickiness.

[0093] See Figure 2 , Figure 2 This is a flowchart illustrating a switching mode for a person in the middle, as provided in an embodiment of this application.

[0094] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and the bullet screen content corresponding to each bullet screen message;

[0095] The step of switching the current live stream mode to the "person in the middle" mode based on the live stream bullet screen information (step S103) includes:

[0096] Step S201: Determine the audience's emotional score based on the content of each bullet screen message;

[0097] Step S202: When the audience's emotion score is greater than the preset emotion score, a prompt message is sent to indicate that the person in the prompt has arrived.

[0098] Step S203: If the person in the video is detected to be in place, switch the current live streaming mode from AI mode to person in the video mode.

[0099] In this embodiment of the application, the bullet screen information includes the corresponding bullet screen content, the sender of the bullet screen, the sending time, the bullet screen color, the font size, the transparency, and the position of the bullet screen, etc. There are no limitations on the bullet screen information here.

[0100] In this embodiment of the application, "the person in the center is in place" means that the person in the center of the real-time image being captured arrives at the designated location.

[0101] Therefore, by performing sentiment analysis on each bullet comment, an audience sentiment score is determined and compared with a preset sentiment score. When the audience sentiment score is higher than the preset sentiment score, a notification message is sent to remind the "voice actor" to arrive. Once the voice actor's arrival is detected, the current live stream mode automatically switches to voice actor mode. On the one hand, by recognizing audience sentiment, the needs of the audience can be better met, increasing the interactivity and attractiveness of the live stream. On the other hand, switching the current live stream mode from AI mode to voice actor mode and notifying the voice actor of their arrival can avoid potential risks, such as the live stream mode being switched to voice actor mode but the voice actor not arriving. At the same time, voice actor mode can better meet the needs of the audience, improving the quality and credibility of the live stream. Furthermore, by automatically analyzing audience sentiment and the arrival of the voice actor, the workload of operations staff can be reduced, improving the efficiency of the live stream.

[0102] See Figure 3 , Figure 3 This is a schematic diagram of a process for determining audience emotion scores provided in an embodiment of this application.

[0103] In some embodiments, determining the audience sentiment score based on the content of each bullet comment (step S201) includes:

[0104] Step S301: Based on one or more preset keywords and the content of each bullet screen message, determine the individual emotion score corresponding to each bullet screen message;

[0105] Step S302: Determine the audience's emotion score based on all the individual emotion scores.

[0106] In this application's embodiments, keywords include happy, joyful, liking, comfortable, pleasant, angry, furious, annoyed, frustrated, annoyed, sad, grief-stricken, grieving, bursting into tears, lost, surprised, delighted, shocked, unexpected, astonished, afraid, fearful, uneasy, tense, timid, disgusted, averse, dislike, contemptuous, and disdainful, etc. No limitation is imposed on the keywords here.

[0107] As an example, when the keywords are "like," "happy," or "great," the individual's emotional score will be higher. When the keywords are "annoyed," "unhappy," or "disgusted," the individual's emotional score will be lower. When the keywords are "nervous," "fearful," or "panicked," the individual's emotional score will be lower. When the keywords are "moved," "excited," or "inspiring," the individual's emotional score will be higher.

[0108] As another example, the comment "This live stream is so much fun, I'm so happy!" is assigned an individual emotion score of 8 based on the keyword "happy". The comment "This live stream made me incredibly sad" is assigned an individual emotion score of 1 based on the keyword "sad". The comment "This streamer is so infuriating" is assigned an individual emotion score of 2 based on the keyword "angry". The comment "This live stream is amazing, I can't believe my eyes" is assigned an individual emotion score of 9 based on the keyword "surprised".

[0109] In this embodiment, the average value method is used to calculate all individual emotion scores to obtain the audience emotion score. Specifically, the average value method involves summing the individual emotion scores and then dividing by the number of individual emotion scores to obtain the average value, which is then used as the audience emotion score.

[0110] Therefore, by analyzing whether each bullet comment contains one or more preset keywords, the individual sentiment score of each bullet comment is determined. The audience sentiment score is then calculated based on all individual sentiment scores. On one hand, by pre-setting relevant keywords, individual audience sentiment scores can be identified and calculated more accurately. On the other hand, understanding the audience's emotional state allows for timely adjustments, such as adjusting live stream content and enhancing interaction, thereby improving audience participation and experience. Furthermore, automatically analyzing the individual sentiment scores of each bullet comment reduces the workload of operations staff and improves live stream efficiency.

[0111] In some embodiments, determining the audience's emotional score based on all of the individual emotional scores includes:

[0112] The audience's emotional score is determined based on all the individual emotional scores and the weight information corresponding to each individual emotional score.

[0113] The weight information can be determined based on the sender of the bullet screen corresponding to each individual's emotion score, or it can be determined based on the score range in which each individual's emotion score is located. There is no limitation on the weight information here.

[0114] As an example, the individual sentiment score of a comment sent by commenter A is 5 points, while the individual sentiment score of a comment sent by commenter B is 2 points. Commenter A is a very active viewer, so their weight is relatively high, at 2 points. Commenter B is a regular viewer, so their weight is relatively low, at 1 point.

[0115] As another example, the individual emotion rating has a minimum value of 1 and a maximum value of 10, and is divided into five rating intervals, each with different weight information. The weight information for the first rating interval is 1.2, the weight information for the second rating interval is 1.1, the weight information for the third rating interval is 1, the weight information for the fourth rating interval is 1.1, and the weight information for the fifth rating interval is 1.2.

[0116] Therefore, by using multiple individual emotion scores and their corresponding weights, and then averaging them to obtain the audience emotion score, we can help assess the audience's emotional state and thus better understand their needs and reactions.

[0117] See Figure 4 , Figure 4 This is a schematic diagram of another process for determining audience emotion scores provided in an embodiment of this application.

[0118] In some embodiments, determining the audience sentiment score based on the content of each bullet comment (step S201) includes:

[0119] Step S401: Input the content of each bullet screen message into the emotion recognition model to obtain the emotion tag corresponding to each bullet screen message;

[0120] Step S402: Determine the emotion ratio based on the emotion tag corresponding to each bullet screen message; the emotion ratio includes the proportion of each emotion to all emotions;

[0121] Step S403: Determine the audience's emotion score based on one or more of the emotion proportions.

[0122] In this embodiment of the application, emotion labels include happy, joyful, liking, comfortable, pleasant, angry, furious, annoyed, frustrated, annoyed, sad, grief-stricken, heartbroken, tearful, disappointed, surprised, delighted, shocked, unexpected, astonished, afraid, fearful, uneasy, tense, timid, disgusted, averse, dislike, contemptuous, and disdainful, etc. No limitation is imposed on emotion labels here.

[0123] As an example, the content of each bullet comment is input into the emotion recognition model to obtain the emotion tag for each bullet comment. The frequency of each emotion in all bullet comments is counted, and the proportion of each emotion among all emotions is calculated to obtain the emotion ratio. Based on one or more of these emotion ratios, the audience's emotion score is determined. For example, the proportion of happy emotions can be used as a scoring criterion; a higher proportion of happy emotions results in a higher audience emotion score. Alternatively, the proportion of dislike emotions can be used as a scoring criterion; a higher proportion of dislike emotions results in a lower audience emotion score. Furthermore, both the proportions of dislike and aversion can be used as scoring criterion simultaneously, and their weights are calculated or summed to determine the overall audience emotion score.

[0124] In this embodiment, the emotion recognition model can be trained based on a convolutional neural network model or a recurrent neural network model. The implementation method of the emotion recognition model is not limited here.

[0125] Therefore, based on the emotion recognition model, the content of each bullet comment is analyzed to obtain the emotion tag corresponding to each bullet comment. The emotion ratio is determined based on the emotion tag, that is, the proportion of each emotion among all emotions. Based on these emotion ratios, an audience emotion score can be calculated. On the one hand, this allows for timely understanding of the audience's reactions and attitudes towards the live stream content, enabling adjustments to the live stream mode and content. For example, if most viewers are happy with the live stream content in AI mode, the AI mode can be switched to the "human voice" mode to further guide viewers to purchase products. On the other hand, using the emotion recognition model allows for rapid and accurate emotion analysis of a large amount of bullet comment information, saving manpower and time costs and improving the efficiency and accuracy of data processing. Furthermore, the emotion ratio-based scoring method is more objective and comprehensive, not only reflecting the overall emotion of the audience but also revealing contradictory emotional factors, providing operators with more comprehensive feedback and reference.

[0126] In some embodiments, the training process of the emotion recognition model includes:

[0127] Obtain a training set, which includes multiple training data sets, each of which includes a sample bullet screen content and the labeled data of the emotion tag corresponding to the sample bullet screen content;

[0128] For each training data point in the training set, the following processing is performed:

[0129] The sample bullet screen content in the training data is input into a preset deep learning model to obtain the predicted data of the emotion tag corresponding to the sample bullet screen content.

[0130] Based on the predicted and labeled data of the emotion tags, the model parameters of the deep learning model are updated.

[0131] The system checks whether the preset training termination condition is met; if so, the trained deep learning model is used as the emotion recognition model; if not, the deep learning model is trained again using the next set of training data.

[0132] Therefore, by designing and establishing an appropriate number of neural computing nodes and a multi-layered computational hierarchy, and selecting suitable input and output layers, a pre-defined deep learning model can be obtained. Through the learning and optimization of this deep learning model, a functional relationship from input to output can be established. Although it is not possible to find a 100% accurate functional relationship between input and output, it can approximate the real-world correlation as closely as possible. The emotion recognition model trained in this way can obtain corresponding emotion tags based on bullet screen content, has a wide range of applications, and the calculation results are highly accurate and reliable.

[0133] In some embodiments of this application, an emotion recognition model can be trained.

[0134] In some other embodiments of this application, a pre-trained emotion recognition model may be employed.

[0135] In this embodiment, the preset deep learning model can be a convolutional neural network model or a recurrent neural network model. The implementation method of the preset deep learning model is not limited here.

[0136] This application does not limit the training process of the emotion recognition model. For example, it can adopt the supervised learning training method described above, or the semi-supervised learning training method, or the unsupervised learning training method.

[0137] This application does not limit the preset training termination conditions. For example, it may be that the number of training sessions reaches a preset number (the preset number of training sessions may be 1, 3, 10, 100, 1000, 10000, etc.), or it may be that the training data in the training set has been trained once or multiple times, or it may be that the total loss value obtained in this training is not greater than the preset loss value.

[0138] Therefore, predicting the content of bullet comments using an emotion recognition model yields more accurate emotion tags and faster processing speed, greatly improving the work efficiency and experience of configuration personnel.

[0139] See Figure 5 , Figure 5 This is a schematic diagram of a process for determining a preset emotion score provided in an embodiment of this application.

[0140] In some embodiments, the step of obtaining the preset emotion score includes:

[0141] Step S501: Obtain the real-time transaction data of the live broadcast room;

[0142] Step S502: Determine the preset sentiment score based on the real-time transaction data and the preset scoring adjustment strategy.

[0143] In this embodiment of the application, the real-time transaction data includes product name, specifications, price, sales quantity, buyer, payment method, order amount, delivery method, delivery time, delivery address, and contact information.

[0144] In this embodiment of the application, the scoring adjustment strategy includes the correspondence between real-time transaction data and preset sentiment scores.

[0145] As an example, if real-time transaction data is high, the preset sentiment score is lowered accordingly in order to switch to the "person behind the scenes" mode as soon as possible, so that the person behind the scenes can take over the live broadcast room, thereby better guiding the audience to make purchases and improving audience satisfaction.

[0146] Therefore, based on real-time transaction data from the live stream and a pre-defined rating adjustment strategy, a preset sentiment score is determined. For example, if the goal is to increase product sales, the preset sentiment score can be linked to sales-related metrics such as conversion rate and order volume. On one hand, acquiring and analyzing real-time transaction data from the live stream to derive the preset sentiment score allows for a more accurate judgment on whether to switch live stream modes. On the other hand, combining the preset rating adjustment strategy allows for effective adjustment of the preset sentiment score and, to some extent, prediction of future audience emotions and behaviors, thereby better guiding live stream content and interaction patterns.

[0147] In some embodiments, the bullet screen acquisition strategy includes the number of bullet screens acquired;

[0148] The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes:

[0149] Based on the number of bullet comments received, the live stream bullet comment information is obtained.

[0150] Among them, the number of bullet comments obtained is used to indicate the number of bullet comment information obtained.

[0151] In some embodiments of this application, the number of bullet comments obtained can be 1, 2, 5, 10, 20, 40, 80, 200, 300, 500, 1000, 2000, 5000, and 10000, etc., and the number of bullet comments obtained is not limited here.

[0152] In other embodiments of this application, the number of bullet comments acquired can be 1, 2, 5, 10, 20, 40, 80, 200, 300, 500, 1000, 2000, 5000, and 10000 per minute. There is no limitation on the number of bullet comments acquired here.

[0153] Therefore, a bullet screen acquisition strategy is used to acquire live stream bullet screen information. This strategy includes the quantity of bullet screens acquired; for example, the number of bullet screens acquired per minute can be set, or the quantity and rate of acquisition can be adjusted based on metrics such as the number of viewers and their activity levels. On one hand, real-time acquisition of live stream bullet screen information improves the ability to perceive audience emotions and reactions during the live stream. On the other hand, based on the bullet screen acquisition strategy, the quantity and rate of acquired bullet screens can be flexibly controlled, thus better adapting to different live stream scenarios and audience groups. Furthermore, analyzing and processing live stream bullet screen information allows for a better understanding of audience interests and needs, thereby adjusting live stream content and interaction methods. Finally, by performing sentiment analysis on the acquired bullet screen information, automated judgment and feedback on live stream emotions can be achieved, enhancing the real-time nature and effectiveness of live stream interaction.

[0154] In some embodiments, the bullet screen acquisition strategy includes a correspondence between the number of real-time viewers and the number of bullet screens acquired.

[0155] The step of obtaining the live stream bullet comment information based on the number of bullet comments includes:

[0156] Obtain the real-time number of viewers in the live stream;

[0157] The number of bullet comments acquired is adjusted based on the real-time number of viewers and the corresponding relationship.

[0158] The real-time audience count in a live stream refers to the number of viewers currently watching the live stream content. This number will constantly change during the live stream, as new viewers continuously enter the live stream while old viewers leave at any time.

[0159] In this embodiment, the ratio between the number of real-time viewers and the number of bullet comments received can be 1:1, 2:1, 3:1, 5:1, 10:1, 20:1, 50:1, 100:1, 500:1, and 1000:1, etc. The specific ratio is not limited here. Specifically, 1:1 means the number of real-time viewers and the number of bullet comments received are equal; 2:1 means the number of real-time viewers is twice the number of bullet comments received; and other ratios follow the same pattern, which will not be elaborated further here.

[0160] Therefore, based on the correspondence between the number of real-time viewers and the number of bullet comments (danmaku) acquired, the number of bullet comments acquired is adjusted to achieve better acquisition of live stream bullet comment information. The bullet comment acquisition strategy includes the correspondence between the number of bullet comments acquired and the number of real-time viewers. For example, the number of bullet comments acquired per minute can be set based on the number of real-time viewers. The number of bullet comments acquired is adjusted according to the correspondence between the number of real-time viewers and the bullet comment acquisition strategy. If the number of real-time viewers is large, the number of bullet comments acquired can be appropriately increased to ensure sufficient information acquisition. If the number of real-time viewers is small, the number of bullet comments acquired can be appropriately decreased. On the one hand, by accurately adjusting the number of bullet comments acquired based on the correspondence between the number of real-time viewers and the bullet comment acquisition strategy, the acquisition of live stream bullet comment information becomes more comprehensive and effective. On the other hand, analyzing the number of real-time viewers allows for a more accurate understanding of the characteristics and features of the audience group, which is beneficial for precision marketing and audience management.

[0161] In some embodiments, the bullet screen acquisition strategy includes a bullet screen acquisition time range and an audience acquisition range;

[0162] The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes:

[0163] The live stream bullet screen information is obtained based on the bullet screen acquisition time range and the audience acquisition range.

[0164] Among them, the bullet screen acquisition time range is used to indicate the time range for acquiring bullet screen information.

[0165] As an example, if the operator sets the time range for obtaining bullet comments to "the past 5 minutes," then the operator will obtain the live stream bullet comments from the bullet comment information within the past 5 minutes. If the operator sets the time range for obtaining bullet comments to "the entire live stream," then the operator will obtain the live stream bullet comments from all the bullet comment information throughout the entire live stream.

[0166] In this embodiment of the application, the audience acquisition range is used to indicate the range of audiences who acquire bullet screen information.

[0167] As an example, if the audience includes live stream fans, visitors, leaderboard viewers, and VIP users of the live stream platform, then the audience acquisition range can be live stream fans, leaderboard viewers, and VIP users of the live stream platform. In other words, the audience acquisition range can be used to limit the scope of live stream bullet screen information.

[0168] As another example, the audience acquisition range can limit the number of bullet comments sent by a single audience member, such as 1, 2, 3, 5, and 10. When the audience acquisition range is set to acquire only 1 bullet comment sent by a single audience member, if a audience member sends 10 bullet comments, the audience acquisition range limit will only acquire 1 of those 10 bullet comments when acquiring live stream bullet comments.

[0169] Therefore, based on the set time range for obtaining bullet comments, bullet comments sent within the corresponding time period are filtered out. Based on the set audience acquisition range, the number of viewers participating in the live stream is limited, thus filtering out duplicate bullet comments sent by the same viewer. Combining the bullet comment acquisition time range and the audience acquisition range, live stream bullet comment information that meets the criteria is obtained. On the one hand, by using the bullet comment acquisition time range and the audience acquisition range, live stream bullet comment information that meets the criteria can be filtered out, improving the accuracy of bullet comment acquisition. On the other hand, by limiting the audience acquisition range, duplicate information sent by the same viewer can be removed, thus avoiding interference from a large number of identical messages, making the bullet comment information clearer. This significantly reduces the number of bullet comments that need to be processed, improving data processing efficiency.

[0170] In some embodiments, the audience acquisition scope includes a unique audience identifier;

[0171] The method for obtaining live stream bullet comment information based on the bullet comment acquisition strategy also includes:

[0172] The live stream bullet screen information is deduplicated based on the unique identifier of the viewer to obtain the deduplicated live stream bullet screen information.

[0173] As an example, deduplication using unique viewer identifiers ensures that comments sent by the same viewer are treated as a single comment. When retrieving live stream comment information, the unique viewer identifier for each comment is recorded, and before returning the comment information results, deduplication is performed on comments sent by the same viewer identifier, thus ensuring that the same viewer is counted only once.

[0174] Therefore, by using unique audience identifiers for deduplication, on the one hand, the bullet comments sent by the same audience will be deduplicated, reducing the number of duplicate bullet comments and making the data more accurate. On the other hand, after deduplication using unique audience identifiers, it is possible to more accurately count the number of bullet comments sent by different audiences, the trend of bullet comment quantity changes, etc., thereby better understanding the audience's interactive behavior and feedback.

[0175] See Figure 6 , Figure 6 This is a schematic diagram of a process for determining a preset emotion score provided in an embodiment of this application.

[0176] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and the sender of each bullet screen message;

[0177] The live streaming mode switching method also includes:

[0178] Step S601: Obtain the historical transaction data corresponding to each of the bullet screen senders;

[0179] Step S602: Based on the live stream bullet screen information and the historical transaction data corresponding to each bullet screen sender, establish a product recommendation model corresponding to each bullet screen sender;

[0180] Step S603: When the current live streaming mode is switched from AI mode to the mode of the person in the middle, based on the product recommendation model corresponding to each of the bullet screen senders, generate recommended products and recommended scripts corresponding to each of the bullet screen senders;

[0181] Step S604: In response to a selection operation for one of the bullet screen senders, display recommended products and scripts corresponding to the selected bullet screen sender; or,

[0182] Based on all the recommended products, a description order for each recommended product is generated.

[0183] In this embodiment of the application, the historical transaction data may be the historical purchase records of the person who sent the bullet screen in the live broadcast room, or the historical purchase records of the person who sent the bullet screen on the live broadcast platform. There is no limitation on the historical transaction data here.

[0184] In this embodiment, the product recommendation model can be trained based on a convolutional neural network model or a recurrent neural network model; the implementation method of the product recommendation model is not limited here. The training method of the product recommendation model is similar to that of the emotion recognition model, and will not be described in detail here.

[0185] As an example, in a beauty livestream, viewers leave comments asking questions about cosmetics. By collecting viewers' historical transaction data, we can obtain each viewer's purchase and behavior records on the livestream platform or within the livestream room. For instance, viewer A has previously purchased skincare products on the platform and frequently interacts in the livestream. Viewer B is a new user with no purchase history. At this point, a product recommendation model is built based on the historical transaction data of viewer A and viewer B. Based on this model, when the livestream mode switches from AI mode to human-centric mode, personalized product recommendations and corresponding scripts are generated based on the product recommendation models of viewer A and viewer B. The operator or the human can select viewer A or viewer B to display the corresponding recommended products and scripts on their interactive device's screen. For viewer A, skincare products related to their historical purchase records can be recommended, choosing brands with high customer ratings. For commenter B, the most popular beauty products can be recommended, along with reminders to actively participate in the live stream and learn more about the products. Since each commenter may recommend different products, the order in which each product is explained is generated based on all recommended products. For example, if 20 commenters recommend shampoo and 10 recommend facial cleanser, it indicates a higher expectation for shampoo, so shampoo will be explained first, followed by facial cleanser.

[0186] This allows for the acquisition of historical transaction data for each commenter, such as purchase records and followed products. Based on the live stream comment information and historical transaction data, a product recommendation model can be built for each commenter. This model can include characteristics such as the commenter's interests, purchasing habits, and purchasing power, and can be used to predict recommended products. When the live stream mode switches from AI mode to human-centric mode, personalized recommended products and scripts can be generated based on the commenter's product recommendation model to improve purchase conversion rates and user satisfaction. When the human or operations staff selects a commenter, recommended products and scripts corresponding to that commenter can be displayed. The human can then use the scripts to introduce recommended products, increasing the commenter's purchase desire and improving satisfaction. Based on all recommended products, a presentation order for the recommended products is generated. For example, the expected level of a recommended product can be determined by the frequency of its appearance. The more times a product appears, the higher its expected level. The presentation order of recommended products is then generated based on the expected level. Prioritizing the presentation of recommended products with the highest expected level from all viewers can improve the live stream effect and increase user viewing time and purchasing behavior.

[0187] In some embodiments, generating the explanation order for each of the recommended products based on all the recommended products includes:

[0188] Clustering is performed on all the recommended products to group the same or similar recommended products into the same category;

[0189] Based on the number of times each recommended product is recommended, the order in which the recommended products in each category are explained is determined.

[0190] The order of explanation for each category is determined based on the number of recommended products in each category;

[0191] Based on the explanation order of each category and the explanation order of the recommended products in each category, the explanation order of all the recommended products is determined.

[0192] The number of times a product is recommended reflects its popularity and the effectiveness of the recommendation. This number can be the number of times the product has been recommended in this live stream, or it can be the historical number of times the product has been recommended; there is no limit to the number of times a product has been recommended here.

[0193] As an example, in a live stream, there are 30 people sending comments. Based on steps S601 and S602, historical transaction data for each commenter can be obtained, and a product recommendation model can be established for each commenter. At this point, the live stream switches from AI mode to human-centric mode, proceeding to step S603. Based on each commenter's product recommendation model, we can analyze their interests and purchasing tendencies to recommend different products. If 10 commenters' recommended products are categorized as sports equipment, 5 as beauty and skincare, and 15 as electronics, then based on the number of recommended products in each category, we determine to prioritize explaining electronics, followed by sports equipment, and finally beauty and skincare. Furthermore, based on the number of times each product is recommended within each category, we determine the order in which to explain the recommended products within each category. If, in the electronics category, 10 commenters recommend digital cameras, 3 recommend iPhones, and 2 recommend smartwatches, then the digital camera will be explained first, followed by the iPhone, and finally the smartwatch. This process continues until the final order of explanation for all recommended products is determined.

[0194] Therefore, focusing on explaining a single product category has several advantages. First, it allows related recommended products to be presented together, establishing a clear logical framework and avoiding disjointed or disjointed presentations. This also helps viewers better understand and remember the recommended product information. Second, it gives viewers a clearer understanding of their purchasing needs and direction, thereby increasing the purchase rate. When viewers see multiple product categories displayed together, they may experience decision paralysis, leading to decreased purchase intent and potentially causing them to leave the live stream. Furthermore, if the presenter focuses on explaining a single product category, more viewers are likely to follow and participate in the live stream, expressing their opinions and asking questions. This increases viewer interaction and allows for more detailed and emotionally resonant product presentations, thus evoking greater audience engagement and purchase desire.

[0195] In a specific application scenario, this application embodiment also provides a method for switching live streaming modes based on audience emotions. The live streaming modes include Mode A and Mode with the person behind the scenes. The method for switching live streaming modes includes:

[0196] Detect whether the current live streaming mode in the live streaming room is AI mode;

[0197] If so, then obtain the real-time number of viewers in the live stream;

[0198] Based on the real-time audience count and the correspondence between the real-time audience count and the number of bullet comments acquired, the number of bullet comments acquired is adjusted.

[0199] Based on the number of bullet comments obtained, the live stream bullet comment information is obtained; the live stream bullet comment information includes at least one bullet comment and the bullet comment content corresponding to each bullet comment.

[0200] Based on one or more preset keywords and the content of each bullet screen message, determine the individual emotion score corresponding to each bullet screen message;

[0201] The audience's emotional score is determined based on all the individual emotional scores.

[0202] When the audience's emotional score is greater than the preset emotional score, a prompt message is sent to indicate that the person in the prompt has arrived.

[0203] Upon detecting the arrival of the person behind the scenes, the current live streaming mode is switched from AI mode to person behind the scenes mode, allowing the person behind the scenes to conduct the live stream.

[0204] (Live streaming mode switching device based on audience emotions)

[0205] See Figure 7 , Figure 7This is a schematic diagram of the structure of a live streaming mode switching device provided in an embodiment of this application.

[0206] This embodiment can be applied to situations where live streaming modes are switched. Specifically, this embodiment does not limit the application scenarios of the live streaming mode switching method. It can be applied to any application scenario that involves performing sentiment analysis on live streaming bullet screen information and then switching the live streaming mode.

[0207] This application provides a live streaming mode switching device based on viewer emotions, the live streaming mode switching device comprising:

[0208] The mode detection module is used to detect whether the current live streaming mode is AI mode;

[0209] The bullet screen acquisition module is used to acquire live bullet screen information based on the bullet screen acquisition strategy when the current live streaming mode is AI mode.

[0210] The mode switching module is used to switch the current live streaming mode to the "person in the middle" mode based on the live streaming barrage information, so that the person in the middle can conduct the live streaming.

[0211] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and bullet screen content corresponding to each bullet screen message; the mode switching module includes:

[0212] An emotion scoring unit is used to determine the audience's emotion score based on the content of each bullet screen message.

[0213] The information prompting unit is used to send a prompting message when the audience's emotion score is greater than a preset emotion score, so that the person in the prompting is in place;

[0214] The person-in-the-middle switching unit is used to switch the current live streaming mode from AI mode to person-in-the-middle mode when the person-in-the-middle is detected to be in place.

[0215] In some embodiments, the emotion rating unit includes:

[0216] The first scoring unit is used to determine the individual emotion score corresponding to each bullet screen message based on one or more preset keywords and the bullet screen content corresponding to each bullet screen message;

[0217] The first calculation unit is used to determine the audience's emotion score based on all the individual emotion scores.

[0218] In some embodiments, the emotion scoring unit further includes:

[0219] An emotion detection unit is used to input the content of each bullet screen message into the emotion recognition model to obtain the emotion tag corresponding to each bullet screen message.

[0220] The second scoring unit is used to determine the emotion ratio based on the emotion tag corresponding to each of the bullet screen messages; the emotion ratio includes the proportion of each emotion to all emotions;

[0221] The second calculation unit is used to determine the audience's emotion score based on one or more of the emotion proportions.

[0222] In some embodiments, the information prompting unit includes:

[0223] A real-time transaction data acquisition unit is used to acquire real-time transaction data of the live broadcast room;

[0224] A preset sentiment determination unit is used to determine the preset sentiment score based on the real-time transaction data and a preset scoring adjustment strategy.

[0225] In some embodiments, the bullet screen acquisition strategy includes the number of bullet screens acquired; the bullet screen acquisition module includes:

[0226] The first acquisition unit is used to acquire the live stream bullet screen information based on the number of bullet screens acquired.

[0227] In some embodiments, the bullet screen acquisition strategy includes a correspondence between the number of real-time viewers and the number of bullet screens acquired, and the first acquisition unit includes:

[0228] The real-time audience acquisition subunit is used to acquire the real-time number of viewers in the live broadcast room;

[0229] The quantity acquisition adjustment subunit is used to adjust the quantity of bullet comments acquired based on the real-time number of viewers and the corresponding relationship.

[0230] In some embodiments, the bullet screen acquisition strategy includes a bullet screen acquisition time range and an audience acquisition range; the bullet screen acquisition module further includes:

[0231] The second acquisition unit is used to acquire the live stream bullet screen information based on the bullet screen acquisition time range and the audience acquisition range.

[0232] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and the sender of each bullet screen message; the live stream mode switching device further includes:

[0233] The history acquisition module is used to acquire the historical transaction data corresponding to each of the bullet screen senders;

[0234] The model building module is used to build a product recommendation model for each of the live stream bullet screens based on the live stream bullet screen information and the historical transaction data corresponding to each bullet screen sender.

[0235] The recommendation generation module is used to generate recommended products and recommendation scripts for each of the bullet screen senders based on the product recommendation model corresponding to each bullet screen sender when the current live broadcast mode is switched from AI mode to the in-house mode.

[0236] The recommendation function module is used to respond to a selection operation for one of the bullet screen senders, display recommended products and recommended scripts corresponding to the selected bullet screen sender; or, based on all the recommended products, generate the explanation order of each of the recommended products.

[0237] (Electronic devices)

[0238] This application also provides an electronic device, the specific embodiments of which are consistent with the embodiments and technical effects achieved in the above method embodiments, and some contents will not be repeated.

[0239] The electronic device includes a memory and at least one processor, the memory storing a computer program, and the at least one processor being configured to execute the computer program to perform the following steps:

[0240] Detect whether the current live streaming mode in the live streaming room is AI mode;

[0241] If so, then obtain the live stream chat information based on the chat acquisition strategy;

[0242] Based on the live stream chat information, the current live stream mode is switched from AI mode to the "person behind the scenes" mode, so that the person behind the scenes can conduct the live stream.

[0243] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and the bullet screen content corresponding to each bullet screen message;

[0244] The at least one processor is further configured to switch the current live streaming mode to a live stream mode based on the live stream comment information when executing the computer program:

[0245] Based on the content of each bullet comment, a viewer emotion score is determined.

[0246] When the audience's emotional score is greater than the preset emotional score, a prompt message is sent to indicate that the person in the prompt has arrived.

[0247] Upon detecting the presence of the person in the video, the current live stream mode will be switched from AI mode to person in video mode.

[0248] In some embodiments, the at least one processor is further configured to determine the audience sentiment score based on the content of the bullet screen corresponding to each bullet screen message when executing the computer program:

[0249] Based on one or more preset keywords and the content of each bullet screen message, determine the individual emotion score corresponding to each bullet screen message;

[0250] The audience's emotional score is determined based on all the individual emotional scores.

[0251] In some embodiments, the at least one processor is further configured to determine the audience sentiment score based on the content of the bullet screen corresponding to each bullet screen message when executing the computer program:

[0252] The content of each bullet comment is input into the emotion recognition model to obtain the emotion tag corresponding to each bullet comment.

[0253] Based on the emotion tag corresponding to each bullet screen message, the emotion ratio is determined; the emotion ratio includes the proportion of each emotion to all emotions.

[0254] The audience's emotional score is determined based on one or more of the emotional proportions.

[0255] In some embodiments, the at least one processor is further configured to obtain the preset emotion score when executing the computer program in the following manner:

[0256] Obtain the real-time transaction data of the live broadcast room;

[0257] Based on the real-time transaction data and the preset rating adjustment strategy, the preset sentiment score is determined.

[0258] In some embodiments, the bullet screen acquisition strategy includes the number of bullet screens acquired;

[0259] The at least one processor is further configured to, when executing the computer program, acquire live stream bullet comment information based on a bullet comment acquisition strategy in the following manner:

[0260] Based on the number of bullet comments received, the live stream bullet comment information is obtained.

[0261] In some embodiments, the bullet screen acquisition strategy includes a correspondence between the number of real-time viewers and the number of bullet screens acquired, and the at least one processor is further configured to acquire the live stream bullet screen information based on the number of bullet screens acquired when executing the computer program:

[0262] Obtain the real-time number of viewers in the live stream;

[0263] The number of bullet comments acquired is adjusted based on the real-time number of viewers and the corresponding relationship.

[0264] In some embodiments, the bullet screen acquisition strategy includes a bullet screen acquisition time range and an audience acquisition range; the at least one processor is further configured to acquire live bullet screen information based on the bullet screen acquisition strategy when executing the computer program in the following manner:

[0265] The live stream bullet screen information is obtained based on the bullet screen acquisition time range and the audience acquisition range.

[0266] In some embodiments, the live stream bullet screen information includes at least one bullet screen message and the sender of each bullet screen message; the at least one processor is configured to execute the computer program to perform the following steps:

[0267] Obtain the historical transaction data corresponding to each of the aforementioned bullet screen senders;

[0268] Based on the live stream bullet screen information and the historical transaction data corresponding to each bullet screen sender, a product recommendation model corresponding to each bullet screen sender is established;

[0269] When the current live streaming mode switches from AI mode to the in-house mode, based on the product recommendation model corresponding to each of the bullet screen senders, recommended products and recommended scripts are generated for each of the bullet screen senders.

[0270] In response to a selection operation for one of the bullet screen senders, display recommended products and scripts corresponding to the selected bullet screen sender; or,

[0271] Based on all the recommended products, a description order for each recommended product is generated.

[0272] See Figure 8 , Figure 8 This is a structural block diagram of an electronic device 10 provided in an embodiment of this application.

[0273] Electronic device 10 may include, for example, at least one memory 11, at least one processor 12, and a bus 13 connecting different platform systems.

[0274] The memory 11 may include a readable medium in the form of volatile memory, such as random access memory (RAM) 111 and / or cache memory 112, and may further include read-only memory (ROM) 113.

[0275] The memory 11 also stores a computer program, which can be executed by the processor 12 to enable the processor 12 to implement the steps of any of the above methods.

[0276] The memory 11 may also include a utility 114 having at least one program module 115, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of these examples may include an implementation of a network environment.

[0277] Accordingly, processor 12 can execute the aforementioned computer program, and can also execute utility 114.

[0278] The processor 12 may employ one or more application-specific integrated circuits (ASICs), DSPs, programmable logic devices (PLDs), complex programmable logic devices (CPLDs), field-programmable gate arrays (FPGAs), or other electronic components.

[0279] Bus 13 can represent one or more types of bus structures, including a memory bus or memory controller, peripheral bus, graphics acceleration port, processor, or a local bus using any bus structure with multiple bus structures.

[0280] Electronic device 10 can also communicate with one or more external devices, such as keyboards, pointing devices, Bluetooth devices, etc., and with one or more devices capable of interacting with it, and / or with any device that enables it to communicate with one or more other computing devices (e.g., routers, modems, etc.). This communication can be performed through input / output interface 14. Furthermore, electronic device 10 can communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and / or public networks, such as the Internet) via network adapter 15. Network adapter 15 can communicate with other modules of electronic device 10 via bus 13. It should be understood that, although not shown in the figures, in practical applications, other hardware and / or software modules can be used in conjunction with electronic device 10, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms.

[0281] (Computer-readable storage medium)

[0282] This application also provides a computer-readable storage medium, the specific embodiments of which are consistent with the embodiments and technical effects achieved in the above method embodiments, and some contents will not be repeated.

[0283] The computer-readable storage medium stores a computer program that, when executed by at least one processor, implements the steps of any of the above methods or the functions of any of the above electronic devices.

[0284] See Figure 9 , Figure 9 This is a schematic diagram of the structure of a program product provided in an embodiment of this application.

[0285] The program product is used to implement the steps of any of the above methods or to implement the functions of any of the above electronic devices. The program product may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto. In the embodiments of this application, the readable storage medium may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, apparatus, or device. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of readable storage media (a non-exhaustive list) include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof.

[0286] Computer-readable storage media may include data signals propagated in baseband or as part of a carrier wave, carrying readable program code. Such propagated data signals may take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. The readable storage medium may also be any readable medium capable of sending, propagating, or transmitting a program for use by or in conjunction with an instruction execution system, apparatus, or device. The program code contained on the readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical fiber, RF, or any suitable combination thereof. Program code for performing operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar languages. The program code may be executed entirely on a user computing device, partially on a user device, as a standalone software package, partially on a user computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing devices can be connected to user computing devices via any type of network, including local area networks (LANs) or wide area networks (WANs), or they can be connected to external computing devices (e.g., via the Internet using an Internet service provider).

[0287] This application describes the invention from the perspectives of purpose, performance, progress, and novelty, and it meets the functional enhancement and use requirements emphasized by the Patent Law. The above description and drawings are merely preferred embodiments of this application and are not intended to limit this application. Therefore, all structures, devices, features, etc., that are similar to or identical to those of this application, i.e., all equivalent substitutions or modifications made in accordance with the scope of this patent application, shall fall within the scope of protection of this patent application.

Claims

1. A method for switching live streaming modes based on audience emotions, wherein the live streaming modes include AI mode and "the person behind the character" mode, characterized in that... The live streaming mode switching method includes: Detect whether the current live streaming mode in the live streaming room is AI mode; If so, then obtain the live stream chat information based on the chat acquisition strategy; Based on the live stream barrage information, the current live stream mode is switched from AI mode to the mode of the person behind the scenes, so that the person behind the scenes can conduct the live stream; The live stream bullet screen information includes at least one bullet screen message and the sender of each bullet screen message; The live streaming mode switching method also includes: Obtain the historical transaction data corresponding to each of the aforementioned bullet screen senders; Based on the live stream bullet screen information and the historical transaction data corresponding to each bullet screen sender, a product recommendation model corresponding to each bullet screen sender is established; When the current live streaming mode switches from AI mode to the in-house mode, based on the product recommendation model corresponding to each of the bullet screen senders, recommended products and recommended scripts are generated for each of the bullet screen senders. In response to a selection operation for one of the bullet comment senders, the system displays recommended products and scripts corresponding to the selected bullet comment sender; or, based on all the recommended products, it generates an explanation order for each of the recommended products, including: Clustering is performed on all the recommended products to group the same or similar recommended products into the same category; Based on the number of times each recommended product is recommended, the order in which the recommended products in each category are explained is determined. The order of explanation for each category is determined based on the number of recommended products in each category; Based on the explanation order of each category and the explanation order of the recommended products in each category, the explanation order of all the recommended products is determined.

2. The live streaming mode switching method according to claim 1, characterized in that, The live stream bullet screen information includes at least one bullet screen message and the bullet screen content corresponding to each bullet screen message; The step of switching the current live stream mode to the "person in the middle" mode based on the live stream bullet screen information includes: Based on the content of each bullet comment, a viewer emotion score is determined. When the audience's emotional score is greater than the preset emotional score, a prompt message is sent to indicate that the person in the prompt has arrived. Upon detecting the presence of the person in the video, the current live stream mode will be switched from AI mode to person in video mode.

3. The live streaming mode switching method according to claim 2, characterized in that, The process of determining the audience's emotional score based on the content of each bullet comment includes: Based on one or more preset keywords and the content of each bullet screen message, determine the individual emotion score corresponding to each bullet screen message; The audience's emotional score is determined based on all the individual emotional scores.

4. The live streaming mode switching method according to claim 2, characterized in that, The process of determining the audience's emotional score based on the content of each bullet comment includes: The content of each bullet comment is input into the emotion recognition model to obtain the emotion tag corresponding to each bullet comment. Based on the emotion tag corresponding to each bullet screen message, the emotion ratio is determined; the emotion ratio includes the proportion of each emotion to all emotions. The audience's emotional score is determined based on one or more of the emotional proportions.

5. The live streaming mode switching method according to claim 2, characterized in that, The steps for obtaining the preset emotion score include: Obtain the real-time transaction data of the live broadcast room; Based on the real-time transaction data and the preset rating adjustment strategy, the preset sentiment score is determined.

6. The live streaming mode switching method according to any one of claims 1 to 5, characterized in that, The bullet screen acquisition strategy includes the number of bullet screens acquired; The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes: Based on the number of bullet comments received, the live stream bullet comment information is obtained.

7. The live streaming mode switching method according to claim 6, characterized in that, The bullet screen acquisition strategy includes the correspondence between the number of real-time viewers and the number of bullet screens acquired. The step of obtaining the live stream bullet comment information based on the number of bullet comments includes: Obtain the real-time number of viewers in the live stream; The number of bullet comments acquired is adjusted based on the real-time number of viewers and the corresponding relationship.

8. The live streaming mode switching method according to claim 1, characterized in that, The bullet screen acquisition strategy includes the bullet screen acquisition time range and the audience acquisition range; The method for obtaining live stream bullet screen information based on the bullet screen acquisition strategy includes: The live stream bullet screen information is obtained based on the bullet screen acquisition time range and the audience acquisition range.

9. A live streaming mode switching device, characterized in that, The live streaming mode switching device is used to implement the live streaming mode switching method as described in claim 1, and the live streaming mode switching device includes: The mode detection module is used to detect whether the current live streaming mode is AI mode; The bullet screen acquisition module is used to acquire live bullet screen information based on the bullet screen acquisition strategy when the current live streaming mode is AI mode. The mode switching module is used to switch the current live streaming mode to the "Zhongren" mode based on the live streaming bullet screen information, so that Zhongren can conduct the live streaming.

10. An electronic device, characterized in that, The electronic device is used to implement the live streaming mode switching method as described in claim 1. The electronic device includes a memory and at least one processor. The memory stores a computer program, and the at least one processor executes the computer program to perform the following steps: Detect whether the current live streaming mode in the live streaming room is AI mode; If so, then obtain the live stream chat information based on the chat acquisition strategy; Based on the live stream chat information, the current live stream mode is switched from AI mode to the "person behind the scenes" mode, so that the person behind the scenes can conduct the live stream.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1-8.