A multimedia content intelligent push system and method
By assigning content candidate source tag combinations and multi-dimensional interference evaluation values to the in-vehicle multimedia system, and dynamically adjusting the pushed content based on the occupant's identity and driving status, the problem of content being disconnected from safety and scenario in the in-vehicle multimedia push system is solved, and personalized and safe content push is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU HONGCHUANG NETWORK TECHNOLOGY CO LTD
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-30
AI Technical Summary
Existing in-vehicle multimedia push systems fail to effectively link the composition of occupants with real-time driving scenarios, resulting in a disconnect between the pushed content and safety and scenario requirements. In particular, they lack suitability when multiple occupants are present and ignore the impact of audio interference on driver attention in complex driving scenarios.
The pre-analysis unit assigns tag combinations and multi-dimensional interference evaluation values to content candidate sources. The real-time analysis unit identifies the occupant's identity and generates a set of interest tags. It combines vehicle status data to calculate complex driving parameters, performs multiple screenings and recommendation sorting of the content library, and dynamically adjusts the pushed content to adapt to the occupant's needs and driving environment.
It achieves dynamic correlation assessment between in-vehicle multimedia push content and driver attention interference, automatically removes highly interfering content, adapts to personalized push in multi-occupant scenarios, and improves driving safety and user experience.
Smart Images

Figure REF-OBJ-1774926316852-000019
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing system technology, and specifically to an intelligent push system and method for multimedia content. Background Technology
[0002] Currently, in-vehicle multimedia systems have evolved from simple radios and CD players into intelligent entertainment terminals integrating online music, podcasts, videos, and news, serving as an important platform for drivers and passengers to obtain information and enjoy entertainment. To enhance user experience, personalized recommendation technology has been gradually introduced into in-vehicle scenarios, aiming to push content that matches a user's interests based on their viewing and listening history. Existing in-vehicle multimedia recommendations are mainly based on explicit user selections (such as favorites and on-demand playback) or simple collaborative filtering recommendations.
[0003] Existing in-vehicle multimedia push solutions typically only focus on the matching degree between content and user interests. On the one hand, they ignore the potential interference of the acoustic characteristics of audio content itself with the driver's attention. In particular, inappropriate audio may cause the driver's attention to be diverted, increasing driving risks. On the other hand, the push decision is not dynamically linked to the composition of the occupants in the vehicle and the real-time driving scenario. It cannot automatically adapt the push subject (especially child passengers) and content suitability when multiple occupants are present, nor can it suppress interfering content in complex driving scenarios. As a result, the push is out of touch with safety and scenario requirements, and has shortcomings. Summary of the Invention
[0004] In view of the above-mentioned shortcomings of the existing technology, the present invention provides an intelligent multimedia content push system and method, which can effectively solve the problem that the decision-making of in-vehicle multimedia push is difficult to dynamically associate with the composition of the occupants and the real-time driving scenario in the existing technology.
[0005] To achieve the above objectives, the present invention provides the following technical solution: This invention provides an intelligent multimedia content push system, comprising at least: The pre-analysis unit performs pre-analysis on content candidate sources, assigning each content candidate source a tag combination and a multi-dimensional interference evaluation value; The real-time analysis unit identifies the driver and passengers in the vehicle and determines the initial profiles of multiple passengers based on the image data captured by the camera. The initial profiles are divided into children, adult males and adult females. Based on the initial profiles, the unit determines the users to be pushed to. Video viewing and audio listening data during the use of in-vehicle multimedia are recorded as viewing and listening data. An interest tag set is generated based on the initial profile of the user to be identified and combined with the user's historical viewing and listening data. Acquire current vehicle driving status data and road condition information, and analyze and calculate complex driving parameters. The push filtering unit constructs an initial content library composed of multiple content candidate sources based on the interest tag set of multiple potential users. It performs a first screening of the initial content library based on the initial profile of the potential users, and a second screening of the initial content library based on driving complex parameters combined with multi-dimensional interference evaluation values to obtain the candidate content library. The push combination unit calculates the matching recommendation value between each content candidate source and the interest vector in the initial content library, and sets the content sorting distribution on the content recommendation homepage based on the matching recommendation value.
[0006] Furthermore, the tag combination construction process is as follows: There is a pre-set resource database containing multiple pre-stored candidate sources. Multiple classification tags corresponding to each pre-stored candidate source are obtained. All classification tags in the resource database are extracted to obtain a total tag library. Multiple classification tags in the total tag library are identified based on semantic algorithms. Multiple seed tags are obtained by summarizing and combining multiple seed tags into a seed tag library. Based on the categorization tags corresponding to the content candidate source, multiple seed tags are determined to form the tag combination corresponding to the content candidate source.
[0007] Furthermore, the process for obtaining the multidimensional interference evaluation value is as follows: Obtain the equivalent sound level change line graph of the corresponding sound source of the content candidate source, record the average value of the equivalent sound level as the loudness influence value, and calculate the maximum peak height in the equivalent sound level change line graph as the dynamic influence value. A loudness threshold is preset. The sustained impact value is determined based on the maximum length of the portion above the loudness threshold in the equivalent sound level change line graph. The loudness impact value, dynamic impact value, and sustained impact value are normalized and multiplied by the corresponding preset weight coefficients to obtain the multidimensional interference evaluation value of the corresponding sound source.
[0008] Furthermore, the process for determining the pending users is as follows: When any passenger seat is detected to be occupied, image data of the corresponding seat area is obtained. Based on the face detection model, the visible faces corresponding to each seat area are identified, the facial features of the visible faces are extracted and input into the age and gender estimation model to determine the gender and age corresponding to the face data, thus forming the initial portrait of the passenger corresponding to each seat area. If the initial profiles of all occupants in the vehicle do not include children, all occupants in the vehicle will be marked as pending users. When the initial profiles of all occupants in the vehicle include children, the child occupants are marked as pending users, and the other occupants are marked as accompanying users.
[0009] Furthermore, the process for determining the set of interest tags is as follows: There are multiple pre-set original tag sets, each containing multiple type tags and an interest index for each type tag. Each initial profile corresponds to one original tag set. Based on the initial profile of the user to be determined, the corresponding original tag set is obtained and recorded as the tag base set. The listening and viewing data of the user to be determined in the past rides are obtained, and the tag base set and historical listening and viewing data are merged to generate the corresponding interest tag set.
[0010] Furthermore, the fusion process is as follows: Determine the interest index corresponding to each type of label in the original label set and construct the original dataset. The original dataset contains multiple interest index values with serial numbers. Obtain the content label, occurrence time, listening duration, and listening completeness corresponding to each listening record in the historical listening data. Calculate the historical interest frequency of each type of label based on the weighted statistics of content label, occurrence time, listening duration, and listening completeness. The interest index values in the original dataset are fused and corrected based on the historical interest frequencies corresponding to each type of label to obtain a corrected dataset, and the set of interest labels is obtained from the corrected dataset.
[0011] Furthermore, the calculation process for complex driving parameters is as follows: The system acquires vehicle speed, lateral acceleration, and distance to the vehicle ahead. It also acquires current road conditions and weather. Based on the road conditions and weather, it assigns a road condition influence value and a weather influence value, respectively. The sum of these values yields the scene influence value. The system captures the driver's image and performs attention detection based on the driver's facial feature data in the image, calculating the focus influence value. Finally, it combines the normalized vehicle speed, lateral acceleration, distance to the vehicle ahead, scene influence value, and focus influence value to calculate complex driving parameters.
[0012] Furthermore, the process of focusing on obtaining the impact value is as follows: The system detects the driver's eyelid opening and closing, pupil offset, and head coordinates in the image. It records the changes in each value within a preset fixed detection time, analyzes and calculates the corresponding first, second, and third changes, and then multiplies them by the corresponding preset weights to calculate the focus influence value.
[0013] Furthermore, the initial content library construction process is as follows: Multiple interest tag sets are obtained and fused to obtain a target tag set. An interest vector is constructed based on the interest index corresponding to each type of tag in the target tag set. A candidate vector is constructed based on the tag combination corresponding to each content candidate source. Given a preset library capacity, multiple content candidate sources are selected to form an initial content library. The initial content library satisfies the following conditions: Condition 1: The number of candidate content sources in the initial content library is equal to the library capacity; Condition 2: The sum of the candidate vectors of multiple content candidate sources matches the interest vector with a preset matching threshold.
[0014] A method for intelligently pushing multimedia content includes the following steps: Step 1: Analyze the content candidate sources in the cloud, assign corresponding tag combinations to each content candidate source, obtain the equivalent sound level change line graph of the content candidate source audio source, analyze and calculate the multi-dimensional interference evaluation value, and calculate the driving complex parameters by comprehensively normalizing the vehicle speed, lateral acceleration, distance to the vehicle in front, scene influence value and attention influence value. Step Two: Facial recognition is performed using the in-vehicle camera to distinguish between the driver and passengers. Passengers are recorded as potential users. Based on the initial passenger profile, potential users are determined, including: When a child is in the car, mark them as the only pending user; By combining the original set of tags corresponding to the users to be determined with historical viewing data, a set of interest tags is obtained through weighted statistics and fusion correction.
[0015] Step 3: Based on the interest vectors corresponding to the target tag set and the candidate vectors of the content candidate sources, select multiple content candidate sources to form an initial content library; When the user to be identified is a child, the content candidate source containing the restriction tag in the child restriction set will be removed from the initial content library. When the driving complexity parameter is greater than or equal to the driving impact threshold, the content candidate source with the multidimensional interference evaluation value greater than or equal to the multidimensional interference threshold will be removed from the initial content library. The candidate content sources are replenished to obtain a pool of potential content. Step 4: Calculate the matching recommendation value between each content candidate source and the interest vector, and set the content sorting distribution on the recommended homepage based on the matching recommendation value.
[0016] The technical solution provided by this invention has the following advantages compared with the known prior art: 1. This invention calculates multidimensional interference assessment values and driving complexity parameters, and then combines the two to filter the pushed content, realizing a dynamic correlation assessment between the acoustic characteristics of the content and driving risks. Compared with the existing technology that only focuses on the matching degree between content and user interests and completely ignores audio interference, this invention can automatically remove audio content with high interference assessment values when driving complexity parameters exceed the standard, avoiding driver startle reflex, cognitive overload or distraction caused by sudden loud noises or continuous high-decibel content, and solving the technical problem of the disconnect between in-vehicle multimedia push and driving safety needs.
[0017] 2. This invention intelligently identifies passenger identity and age attributes, automatically designating child passengers as the primary target for push notifications, and removing inappropriate content based on a child restriction set. Simultaneously, it combines initial passenger profiles with historical viewing data to generate personalized interest tags. Compared to existing technologies that focus on a single driver or logged-in user and fail to consider content adaptation in multi-passenger scenarios, this solution achieves dynamic adaptive adjustment of the push subject and content type, avoiding the negative impact of pushing adult-oriented content to children, and solving the technical problem of lacking scenario-specific and age-appropriate content when multiple passengers coexist. Attached Figure Description
[0018] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the accompanying drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are merely some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without any creative effort.
[0019] Figure 1 This is an overall module block diagram of the present invention. Detailed Implementation
[0020] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative effort are within the scope of protection of the present invention.
[0021] The present invention will be further described below with reference to embodiments.
[0022] See Figure 1 A multimedia content intelligent push system, suitable for in-vehicle multimedia content push, includes at least: The pre-analysis unit performs pre-analysis on content candidate sources, which refer to video sources, audio sources, etc. stored in the cloud as potential content candidates. Each content candidate source corresponds to a separate source, and each content candidate source is assigned a combination of tags and a multi-dimensional interference evaluation value.
[0023] It should be noted that the pre-analysis unit is mainly used for cloud analysis before the push content is assembled. That is, it parses and classifies the content candidate sources during the upload stage, thereby summarizing the content type through a series of tag combinations, and systematically converting the content characteristics of each content candidate source into quantifiable multi-dimensional evaluation values. This gives each content candidate source feature data that can be used for data-driven comparison, which helps with the subsequent screening and combination process and lays the foundation for subsequent steps.
[0024] Specifically, the tag combination construction process is as follows: A resource database is pre-set, which contains multiple pre-stored candidate sources. Pre-stored candidate sources refer to pre-stored content candidate sources. Multiple classification tags corresponding to each pre-stored candidate source are obtained (determined by the classification tags of the resource file by each resource providing platform). All classification tags in the resource database are extracted to obtain a tag library. Multiple classification tags in the tag library are identified based on semantic algorithms and summarized to obtain multiple seed tags. Each seed tag corresponds to multiple classification tags. The seed tag and its multiple corresponding classification tags have the same semantics. Multiple seed tags are combined to form a seed tag library. Based on the categorization tags corresponding to the content candidate source, multiple seed tags are determined to form the tag combination corresponding to the content candidate source.
[0025] It is worth noting that different platforms use different classification standards, so there may be classification tags with the same semantic expression but different classification tags. If no classification processing is done, similar content candidate sources may be pushed differently due to different classification tags. Furthermore, multiple classification tags corresponding to the same semantic meaning will also lead to redundancy when classifying resources, resulting in too many tags and affecting storage and computing costs.
[0026] It should be noted that tag classification based on semantic recognition algorithms essentially utilizes existing mature semantic similarity calculation or clustering algorithms to deduplicate and merge tag sets. This technology is existing and widely used in natural language processing and information retrieval, and will not be elaborated upon further here. By constructing a seed tag library, semantic normalization is achieved, thereby resolving tag redundancy and ambiguity issues and providing a unified, robust, and scalable semantic foundation for recommendation systems. This is a key step in constructing a content candidate source classification definition system.
[0027] Specifically, the process for obtaining the multidimensional interference evaluation value is as follows: Obtain the equivalent sound level (equivalent continuous sound pressure level, using A-weighted sound level) change line graph of the corresponding sound source of the content candidate source. Record the average value of the equivalent sound level as the loudness influence value. Calculate the maximum peak height in the equivalent sound level change line graph and record it as the dynamic influence value. A loudness threshold is preset (usually set to 70dB). Determine the duration influence value based on the length of the portion in the equivalent sound level change line graph that is above the loudness threshold. Normalize the loudness influence value, dynamic influence value, and duration influence value (convert them to dimensionless, the same below) and multiply them by the corresponding preset weight coefficients to obtain the multidimensional interference evaluation value of the corresponding sound source.
[0028] In the line graph, the peak height of any peak is equal to the peak value minus the minimum value of the two troughs (the greater the peak height, the greater the loudness change of the audio, and the greater the impact on the human ear). The equivalent sound level change line is divided into multiple part lines by the straight line corresponding to the loudness threshold. The multiple part lines above the loudness threshold are recorded as the target line. The projection length of the target line on the time axis is recorded as its length. The maximum length is selected as the duration of the effect (corresponding to the duration of the sound source, the longer the duration of the effect on the human ear).
[0029] It should be noted that by linearly weighting and fusing the average loudness, maximum dynamic impact, and duration of exceeding the limit of the audio, the time-varying physical characteristics of the audio are compressed into a single value to obtain the persistent impact value. The persistent impact value reflects the contribution of each dimension to driving interference through weighting coefficients; finally, a specific value is output to quantify the potential interference capability of audio content, providing an intuitive and thresholdable safety decision-making basis for in-vehicle intelligent push.
[0030] The real-time analysis unit identifies the driver and passengers using an in-vehicle camera (using facial recognition technology) and assigns them a user ID. The user ID is bound to their facial data. A database is set up for each user to record their usage data. When both the driver and passengers are present, the passengers are the primary push recipients. When only the driver is present, the driver is the primary push recipient. This invention mainly targets the former (the latter's content push can use existing technology). It should be noted that vehicles in the current technology are usually equipped with in-vehicle cameras (some vehicles have cameras in both the front and rear rows). The cameras are used for face detection and other related data collection. The collected image data is processed only locally, and the processing method mainly adopts edge computing to avoid the leakage of privacy information.
[0031] Initial profiles of multiple occupants in the vehicle are determined based on image data captured by cameras. The initial profiles are divided into children, adult males, and adult females. The presence or absence of children in the initial profiles determines potential users. Video viewing and audio listening data during the use of in-vehicle multimedia are recorded as viewing and listening data. A set of interest tags is generated based on the initial profiles of potential users and their historical viewing and listening data.
[0032] Specifically, the process for determining pending users is as follows: When any passenger seat is detected to be occupied by a pressure sensor, the image data of the corresponding seat area is captured by the camera, the visible face corresponding to each seat area is identified by the existing face detection model, the facial features of the visible face are extracted by the deep learning model and input into the existing age and gender estimation model to determine the gender and age corresponding to the face data, thus forming the initial portrait of the passenger corresponding to each seat area. If the initial profiles of all occupants in the vehicle do not include children, all occupants in the vehicle will be marked as pending users. When the initial profiles of all occupants in the vehicle include children, the child occupants are marked as pending users, and the other occupants are marked as accompanying users.
[0033] It should be noted that when the in-vehicle multimedia system is in use, all occupants in the vehicle can hear and see it. Therefore, under normal circumstances, all occupants are considered as potential users. However, if there are children in the vehicle, the push notifications should be primarily directed at the children. This controls the push notification mode to prioritize children, making the push results more consistent with the current in-vehicle push scenario, avoiding erroneous pushes that could cause adverse effects, and improving the user experience.
[0034] It is worth noting that existing technologies can support the recognition and analysis of facial attributes. Typically, a camera device is used to acquire a portrait image containing a face, and a face detection algorithm is used to locate the face region in the image. Then, a deep learning model is used to extract features from the face region, thereby realizing facial attribute analysis, including gender recognition and age estimation. This will not be elaborated further here.
[0035] Specifically, the process for determining the set of interest tags is as follows: There are multiple pre-set original tag sets, each containing multiple type tags (using seed tags) and an interest index for each type tag (represented by a value in the range of 0-1; the sum of the interest indices of all type tags in the original tag set is 0; the number of items in different original tag sets is the same). Each initial profile corresponds to one original tag set. Based on the initial profile of the user to be determined, the corresponding original tag set is obtained and recorded as the tag base set. The listening and viewing data of the user to be determined in the past rides are obtained. Based on the tag base set and the tag data in the historical listening and viewing data, the corresponding interest tag set is generated by fusion.
[0036] It should be noted that the original tag combinations are obtained through big data statistics, corresponding to the degree of interest of adult men, adult women, or children in different push content tags. The basic tag combinations are used to determine the approximate interest range of the corresponding prospective users. Then, based on this range, the tags are refined according to their historical viewing data, thereby improving the matching degree of the interest tag set. This makes the construction of the interest tag set different for each person and improves the personalization of the push content.
[0037] in: Determine the interest index corresponding to each type of label in the original label set and construct the original dataset. The original dataset contains multiple interest index values with serial numbers. Obtain the content label, occurrence time, viewing duration, and viewing completeness (i.e., the proportion of the viewing part to the complete part, with a value of 0-1) corresponding to each viewing record in the historical viewing data. Based on the above data, perform weighted statistics on each type of label in the historical viewing records to obtain the historical interest frequency of each type of label. The original dataset is represented as , where i represents the index of the type label, i=1,2,3,...,j, and j represents the total number of type labels (in this invention, the total number of type labels is constant). This represents the interest index value corresponding to type label i, and the historical interest frequency of type label i. The calculation formula is Where n represents the sequence number of the historical audiovisual record, indicating The interval between the time of occurrence of historical audiovisual recordings and the current time. Indicates the duration of listening or viewing. The preset baseline duration coefficient (in one specific embodiment, it is set to 5 minutes). Indicates the completeness of the listening and viewing experience. The function representing the effect of time intervals is as follows: λ is a preset constant value (as the attenuation coefficient).
[0038] Furthermore, the interest index values in the original dataset are fused and corrected based on the historical interest frequencies corresponding to each type of label to obtain a corrected dataset, and the set of interest labels is obtained from the corrected dataset.
[0039] The formula for adjusting the interest index value is as follows: , This represents the corrected interest index value corresponding to the type label with index i, where k is a preset fusion coefficient. After normalization, the corrected dataset is represented as follows: .
[0040] The system acquires current vehicle driving status data and road condition information, analyzes and calculates current driving complexity parameters, which represent the driver's attention requirements under current driving conditions. The higher the value of the driving complexity parameter, the higher the driver's attention needs to ensure driving safety under these conditions.
[0041] Specifically, the calculation process for complex driving parameters is as follows: The system acquires vehicle speed, lateral acceleration, and distance to the vehicle ahead using sensors. It also obtains current road conditions and weather information through navigation software. Based on the road conditions and weather, a road condition impact value and a weather impact value are assigned respectively (the larger the impact value, the more unfavorable it is for driving). These are summed to obtain the scene impact value. The system detects the driver's attention using an in-vehicle camera and calculates the focus impact value (the larger the focus impact value, the more focused the driver is). The system combines the normalized vehicle speed, lateral acceleration, distance to the vehicle ahead, scene impact value, and focus impact value to calculate complex driving parameters.
[0042] Driving complex parameters The calculation formula is Where v, a, and d represent the normalized vehicle speed, lateral acceleration, and distance to the vehicle in front, respectively. These represent the scene impact value and focus impact value after normalization, respectively. All of these are preset weighting coefficients.
[0043] It should be noted that the driving complexity parameter integrates vehicle motion status, external environmental risks, and the driver's own concentration level, accurately depicting the real-time difficulty of the driving task and representing it with a quantitative value. The higher the parameter value, the more complex and risky the current driving task is, and the less cognitive margin the driver has available for handling non-driving tasks (such as receiving multimedia content). By calculating the driving complexity parameter, a decision-making basis is provided for the intelligent recommendation of in-vehicle multimedia content, avoiding distraction or information overload caused by inappropriate recommendations.
[0044] More specifically, the road condition impact value is assigned based on the speed limit value corresponding to the road. There are five preset speed limit gradients. The road condition impact value is assigned 1, 2, 3, 4 or 5 according to the speed limit gradient corresponding to the speed limit value of the road (the higher the speed limit, the faster the average speed of the vehicle, and the higher the degree of danger in the event of an accident). Weather scenarios are divided into sunny, rainy / snowy, and foggy days, with corresponding weather impact values of 1, 2, and 3, respectively.
[0045] More specifically, the in-vehicle camera captures an image facing the driver, and detects the driver's eyelid opening and closing (the distance between the upper and lower eyelids), pupil offset (the distance between the midpoint of the line connecting the centers of the two pupils and the head center line corresponding to the bridge of the nose), and head coordinates (the planar coordinates of the lowest point of the chin). Based on the changes in each value within a fixed detection time (preset value), the first change (eyelid shrinkage), the second change (gaze offset), and the third change (head offset, coordinate movement distance) are calculated. After normalization, these values are multiplied by the corresponding preset weights to calculate the focus influence value.
[0046] It should be noted that by calculating the focus impact value, a quantitative assessment of the driver's attention shift can be achieved, thereby determining whether the driver is currently in a focused driving state. The lower the value, the higher the degree of driver attention shift, and the more it affects safe driving.
[0047] The push filtering unit constructs an initial content library composed of multiple content candidate sources based on the interest tag set of multiple potential users. It performs a first screening of the initial content library based on the initial profile of the potential users, and a second screening of the initial content library based on driving complex parameters combined with multi-dimensional interference evaluation values to obtain the candidate content library.
[0048] It should be noted that the logic for push notifications is usually based primarily on user preferences and secondarily on the actual scenario. This allows the push content to be adaptively adjusted according to changes in the actual scenario while taking into account user needs, thereby improving the user experience.
[0049] Specifically, the initial content library construction process is as follows: Multiple interest tag sets are obtained and fused to obtain a target tag set. An interest vector is constructed based on the interest index corresponding to each type of tag in the target tag set. A candidate vector is constructed based on the tag combination corresponding to each content candidate source. A library capacity is preset (40 in a specific embodiment). Multiple content candidate sources are selected to form an initial content library. The initial content library satisfies the following conditions: Condition 1: The number of candidate content sources in the initial content library is equal to the library capacity; Condition 2: The sum of the candidate vectors of multiple content candidate sources matches the interest vector with a preset matching threshold.
[0050] The matching degree between vectors is equal to their cosine similarity. A higher matching degree means that the two vectors tend to point in the same direction in the multidimensional space, thus reflecting that their numerical distribution characteristics (i.e., the relative proportions of values in each dimension) are more similar. Suppose the set of interest tags is {comedy, action, science fiction, romance, horror}, and the interest indices corresponding to comedy, action, science fiction, romance, and horror are 0.9, 0.2, 0.8, 0.1, and 0, respectively. Then the interest vector is represented as [0.9, 0.2, 0.8, 0.1, 0]. When the tag combination corresponding to the content candidate source is {comedy, romance}, then its corresponding candidate vector is [1, 0, 0, 1, 0]. When the tag combination corresponding to the content candidate source is {action, science fiction}, then its corresponding candidate vector is [0, 1, 1, 0, 0].
[0051] It should be noted that by requiring the sum of the tag distributions of all candidate sources (i.e., the overall vector) to match the user's interest vector with a degree exceeding a threshold, it is possible to ensure that the tag ratio of the entire content library is consistent with the user's diverse interests and preferences under limited storage or display resources. At the same time, it provides a high-quality candidate pool with balanced coverage and interest relevance for secondary filtering in subsequent real-time scenarios. Compared to pursuing a high match of a single piece of content, it does not just satisfy a single interest in a scattered way, but covers the user's diverse interest fields as a whole, avoiding content homogenization caused by recommending only a single highly matched individual.
[0052] More specifically, the value of any element in the target tag set is equal to the average value of the corresponding elements in multiple interest tag sets. For example, if the interest indices for the science fiction tag in the three interest tag sets are 0.3, 0.5, and 0.6 respectively, then the interest indices for the science fiction tag in the target tag set are equal to the average value of 0.4, which is 0.4.
[0053] Specifically, the screening process is as follows: When the user to be identified is a child, a preset set of child restrictions is obtained. The set of child restrictions contains multiple restriction tags. Content candidates containing restriction tags are removed from the initial content library. By removing content candidates corresponding to the child restriction set, it is ensured that the initial content library only contains content suitable for children to watch and listen to.
[0054] Specifically, the secondary screening process is as follows: There are preset driving impact threshold and multidimensional interference threshold. When the driving complexity parameter is greater than or equal to the driving impact threshold, the content candidate source with a multidimensional interference evaluation value greater than or equal to the multidimensional interference threshold will be removed from the initial content library.
[0055] It should be noted that when the driving complexity parameter is greater than or equal to the driving influence threshold, it means that the more complex and risky the current driving task is, the less cognitive margin the driver has to deal with non-driving tasks. At this time, the interference of in-vehicle multimedia playback content on the driver's driving behavior should be reduced, especially the audio impact. By eliminating content candidate sources with multi-dimensional interference evaluation values greater than or equal to the multi-dimensional interference threshold, it is possible to effectively avoid those content candidate sources with sound interference, thereby helping the driver maintain attention.
[0056] The push recommendation unit calculates the matching recommendation value (cosine similarity) between each content candidate source and the interest vector in the initial content library, based on the user's interest vector and the candidate vector of the content candidate sources. The content ranking distribution on the homepage is then set according to the ranking of the matching recommendation values. By constructing the homepage recommendation content, an initial selection set is provided to the user, and the content ranking can be adjusted according to the user's habits, thereby improving the user experience and facilitating further updates to the recommended content combination based on subsequent user actions.
[0057] A method for intelligently pushing multimedia content includes the following steps: Step 1: Analyze candidate content sources in the cloud; Extract all categorization tags and summarize them into seed tags based on semantic algorithms, build a seed tag library, and assign a corresponding tag combination to each content candidate source; Obtain the equivalent sound level change line graph of the content candidate source sound source, calculate the average loudness influence value, the maximum peak dynamic influence value and the continuous influence value above the loudness threshold, and obtain the multidimensional interference evaluation value by normalized weighted summation; Step 2: Use the in-vehicle camera to perform facial recognition to distinguish between the driver and passengers, and record the passengers as potential users. Determine the potential users based on the initial profile of the passengers. By combining the original set of tags corresponding to the users to be determined with historical viewing data (including tags, duration, completeness, etc.), a set of interest tags is obtained through weighted statistics and fusion correction. When a child is present in the vehicle, they are automatically marked as a potential user, while other passengers are designated as accompanying users. This proactively adapts the pushed content to the child's needs, avoiding inappropriate content that could have a negative impact, and improving scenario matching and user experience. By combining the initial tag set corresponding to the passenger's profile with historical viewing data through weighted statistics and fusion correction, the generated interest tag set varies from person to person, significantly improving the accuracy and relevance of interest profiles.
[0058] Step 3: Obtain vehicle speed, lateral acceleration, and distance to the vehicle ahead through sensors; obtain driving scene and weather information through navigation software; calculate the impact value of road conditions and weather and sum them to obtain the scene impact value; detect driver attention through the in-vehicle camera and calculate the focus impact value; and calculate the complex driving parameters by comprehensively normalizing the vehicle speed, lateral acceleration, distance to the vehicle ahead, scene impact value, and focus impact value. The target tag set is obtained by merging the interest tag sets of multiple users to be determined. An interest vector is constructed based on the target tag set. A candidate vector is constructed based on the tag combination of the content candidate source. The library capacity is preset. Multiple content candidate sources are selected to form an initial content library, so that the matching degree between the sum of the candidate vectors and the interest vector is greater than the preset matching threshold. When the user to be identified is a child, content candidate sources containing restriction tags from the child restriction set will be removed from the initial content library; When the driving complexity parameter is greater than or equal to the driving impact threshold, the content candidate source with a multidimensional interference evaluation value greater than or equal to the multidimensional interference threshold will be removed from the initial content library; The candidate content sources are supplemented to obtain a pool of potential content, and the supplemented candidate content sources can pass the above screening. By calculating driving complexity parameters, the difficulty of the current driving task and the driver's cognitive margin were accurately quantified, providing real-time basis for safety decisions. When constructing the initial content library, a threshold was set for the sum of the tag distributions of all candidate sources to match the user's interest vector, ensuring that the overall tag ratio of the content library globally aligns with the user's diverse interests within the limited library capacity. In the first screening, content with restricted tags for child users was removed, ensuring content suitability. In the second screening, when driving complexity exceeded the threshold, content with high interference evaluation values was removed, effectively reducing audio interference with the driver's attention and achieving a dynamic balance between safety and user experience.
[0059] Step 4: Based on the interest vectors of the users to be identified and the candidate vectors of the content candidates in the content library, calculate the matching recommendation value (cosine similarity) between each content candidate and the interest vector, and set the content ranking distribution of the content recommendation homepage according to the ranking of the matching recommendation values.
[0060] By calculating the matching recommendation value (cosine similarity) between each candidate source in the candidate content library and the interest vector of the user to be determined, and sorting them accordingly to set the distribution of homepage content, an initial personalized selection set is provided for the user.
[0061] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions will not cause the essence of the corresponding technical solutions to deviate from the protection scope of the technical solutions of the embodiments of the present invention.
Claims
1. An intelligent push system of multimedia content, characterized in that, The application comprises the following steps: A pre-analysis unit performs pre-analysis on content candidate sources, assigns each content candidate source a label combination and a multi-dimensional interference evaluation value; A real-time analysis unit identifies the driver and passengers in the vehicle, determines the initial image of the passengers in the vehicle based on image data captured by a camera, divides the initial image into children, adult males and adult females, and determines the pending user as a push object based on the initial image; Video watching and audio listening data during the use of the vehicle-mounted multimedia are recorded as viewing and listening data, and an interest label set is generated based on the initial image of the pending user and the historical viewing and listening data of the pending user; Current driving state data and road condition information of the vehicle are obtained, and a current driving complexity parameter is calculated and analyzed; A push filtering unit constructs an initial content library composed of multiple content candidate sources based on the interest label set of multiple pending users, performs a first screening on the initial content library based on the initial image of the pending user, performs a second screening on the initial content library based on the driving complexity parameter and the multi-dimensional interference evaluation value, and obtains a selected content library; A push combination unit calculates a matching recommendation value of each content candidate source in the initial content library and the interest vector, and sets the content sorting distribution of the content recommendation homepage according to the matching recommendation value. 2.The intelligent push system of multimedia content of claim 1, wherein, The label combination construction process is as follows: A resource database is preset, the resource database contains multiple pre-stored candidate sources, multiple classification labels corresponding to each pre-stored candidate source are obtained, all classification labels in the resource database are extracted to obtain a total label library, multiple classification labels in the total label library are identified based on a semantic algorithm, and multiple seed labels are summarized to form a seed label library; Based on the classification labels corresponding to the content candidate sources, multiple seed labels corresponding to the content candidate sources are determined to form a label combination corresponding to the content candidate sources. 3.The intelligent push system of multimedia content of claim 1, wherein, The multi-dimensional interference evaluation value acquisition process is as follows: An equivalent sound level change line graph of the sound source corresponding to the content candidate source is obtained, the average value of the equivalent sound level is recorded as the loudness influence value, and the maximum value of the wave height in the equivalent sound level change line graph is calculated as the dynamic influence value; A loudness threshold is preset, the maximum length of the part higher than the loudness threshold in the equivalent sound level change line graph is determined as the continuous influence value, the loudness influence value, the dynamic influence value and the continuous influence value are normalized and multiplied by the corresponding preset weight coefficients to obtain the multi-dimensional interference evaluation value of the sound source. 4.The intelligent push system of multimedia content of claim 1, wherein, The pending user determination process is as follows: When any one of the passenger seats detects a person, image data of the corresponding seat area is obtained, visible faces corresponding to each seat area are identified based on a face detection model, face features of the visible faces are extracted and input into an age and gender estimation model to determine the gender and age of the face data, and an initial image of the passengers corresponding to each seat area is constructed; When the initial images of all passengers in the vehicle do not include children, all passengers in the vehicle are marked as pending users; When the initial images of all passengers in the vehicle include children, the child passengers in the vehicle are marked as pending users, and the other passengers are marked as accompanying users.
5. The intelligent push system of multimedia content according to claim 1, wherein, The interest label set determination process is as follows: A plurality of original label sets are preset, each original label set contains a plurality of type labels and an interest index of each type label, each initial image corresponds to an original label set, and a corresponding original label set is obtained based on the initial image of the to-be-determined user, denoted as a label base set. The listening data of the to-be-determined user in the past vehicle process is obtained, and the label base set and the historical listening data are fused to generate a corresponding interest label set. 6.The intelligent push system of multimedia content according to claim 5, wherein, The fusion process is as follows: Determine the interest index corresponding to each type label in the original label set and construct an original data set, the original data set contains a plurality of interest index values with serial numbers, obtain the content label, occurrence time, listening duration and listening completeness corresponding to each listening record in the historical listening data, and statistically weight the historical interest frequency of each type label based on the content label, occurrence time, listening duration and listening completeness; Based on the historical interest frequency of each type label, the interest index values in the original data set are fused and corrected to obtain a corrected data set, and the interest label set is obtained based on the corrected data set. 7.The intelligent push system of multimedia content of claim 1, wherein, The driving complexity parameter calculation process is as follows: Obtain the vehicle speed, lateral acceleration and front vehicle distance, obtain the current driving road condition and weather, respectively assign a road condition influence value and a weather influence value based on the driving road condition and the weather, sum to obtain a scene influence value, capture the driver image and perform attention detection based on the facial feature data of the driver in the image, and calculate a concentration influence value. The driving complexity parameter is calculated by comprehensively normalizing the vehicle speed, lateral acceleration, front vehicle distance, scene influence value and concentration influence value. 8.The intelligent push system of multimedia content according to claim 7, wherein, The concentration influence value acquisition process is as follows: Detect the driver's eyelid opening amount, pupil shift amount and head coordinates in the image, record the change amount of each value within a preset fixed detection time, analyze and calculate the corresponding first change amount, second change amount and third change amount, and calculate the concentration influence value after normalization processing. 9.The intelligent push system of multimedia content of claim 1, wherein, The initial content library construction process is as follows: Obtain a target label set by fusing a plurality of interest label sets, construct an interest vector based on the interest index corresponding to each type label in the target label set, construct a candidate vector based on the label combination corresponding to each content candidate source, and preset a library capacity. Select a plurality of content candidate sources to form an initial content library, and the initial content library satisfies the following conditions: Condition one: the number of content candidate sources in the initial content library is equal to the library capacity; Condition two: the matching degree between the sum of the candidate vectors of the plurality of content candidate sources and the interest vector is greater than a preset matching threshold. 10.A method for intelligent pushing of multimedia content, applied to the intelligent pushing system of multimedia content in any of claims 1 to 9, characterized in that, The following steps are included: Step one: analyze the content candidate source in the cloud, assign a corresponding label combination to each content candidate source, obtain the equivalent sound level change line graph of the sound source of the content candidate source, analyze and calculate a multi-dimensional interference evaluation value, and calculate the driving complexity parameter by comprehensively normalizing the vehicle speed, lateral acceleration, front vehicle distance, scene influence value and concentration influence value; Step two: perform face recognition through the in-vehicle camera, distinguish the driver and the passenger, and mark the passenger as a to-be-determined user based on the initial image of the passenger, wherein: When there is a child in the vehicle, mark it as the only to-be-determined user; By combining the original tag set corresponding to the undetermined user with historical viewing data, a set of interest tags is obtained through weighted statistics and fusion correction. Step 3: Based on the interest vectors corresponding to the target tag set and the candidate vectors of the content candidate sources, select multiple content candidate sources to form an initial content library; When the user to be identified is a child, the content candidate source containing the restriction tag in the child restriction set will be removed from the initial content library. When the driving complexity parameter is greater than or equal to the driving impact threshold, the content candidate source with the multidimensional interference evaluation value greater than or equal to the multidimensional interference threshold will be removed from the initial content library. The candidate content sources are replenished to obtain a pool of potential content. Step 4: Calculate the matching recommendation value between each content candidate source and the interest vector, and set the content sorting distribution on the recommended homepage based on the matching recommendation value.