Information pushing method and device, electronic equipment and storage medium

By acquiring and integrating market textual information and visual data features, and using multimodal feature fusion for market sentiment analysis, the problem that intelligent investment advisory systems cannot take into account multimodal data has been solved, resulting in more accurate market forecasts and investment advice.

CN119202390BActive Publication Date: 2026-06-23PING AN TECH (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PING AN TECH (SHENZHEN) CO LTD
Filing Date
2024-09-23
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing robo-advisory systems can only take into account single-modality data and cannot accurately predict the market trends of financial products, resulting in inaccurate investment advice.

Method used

By acquiring visual features from multiple market textual and visual data of candidate push products, a pre-trained investment information push model is used for feature extraction and multimodal feature fusion to conduct market sentiment analysis and market trend prediction, and to determine the push strategy.

Benefits of technology

The model has improved its attention to visual data details in different task scenarios, thus enhancing the accuracy of market predictions and investment information delivery.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN119202390B_ABST
    Figure CN119202390B_ABST
Patent Text Reader

Abstract

The application provides an information pushing method and device, electronic equipment and storage medium, relates to the field of financial technology, and includes obtaining text features of multiple market text information of a candidate pushing product and visual features obtained by fusing multiple visual data of the candidate pushing product, the visual features being obtained by a pre-trained investment information pushing model performing feature extraction on visual data based on corresponding visual instructions in multiple pre-trained visual instructions according to a task scene to which the visual data belongs in multiple preset task scenes, the multiple visual instructions corresponding to the multiple task scenes one by one; performing multi-modal feature fusion on the text features and the visual features, performing market trend prediction based on emotional analysis on the candidate pushing product according to the fused features, determining a pushing strategy for pushing the candidate pushing product to a target object, and performing investment information pushing according to the pushing strategy. The embodiments of the application can push more accurate investment information to the target object.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of financial technology, and in particular to an information push method, device, electronic device and storage medium. Background Technology

[0002] In related technologies, intelligent investment advisory systems can provide users with automated and personalized investment advice. Specifically, these intelligent investment advisory systems can use artificial intelligence technology to automatically monitor and adjust the real-time market conditions of each financial product in the investment portfolio provided to the user.

[0003] However, in the investment market, a financial product reflects its future market trend through multimodal information or data. Since the intelligent investment advisory system in related technologies can only take into account single-modal data, the intelligent investment advisory system cannot actually take into account other modal data when predicting the market trend of a product. This makes it impossible for the intelligent investment advisory system to accurately predict the market trend of a product, and thus it cannot give users an accurate investment advice. Summary of the Invention

[0004] The main objective of this application is to provide an information push method, apparatus, electronic device, and storage medium, which aims to push more accurate investment information to target objects.

[0005] To achieve the above objectives, a first aspect of this application proposes an information push method, comprising the following steps:

[0006] The method acquires text features of multiple market text information of candidate push products, as well as visual features of multiple visual data of the candidate push products. The visual features are obtained by fusing visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios corresponding to multiple pre-trained visual instructions. The target task scenario is one of the visual data to which the visual data belongs among multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario.

[0007] Multimodal feature fusion is performed on the text features and the visual features to obtain fused features;

[0008] Market sentiment analysis is performed based on the fusion features, and market sentiment prediction is made for the candidate products based on the market sentiment analysis results, thus obtaining the market sentiment prediction results for the candidate products.

[0009] Based on the market forecast results, a push strategy is determined to push the candidate products to the target audience, and investment information is pushed according to the push strategy.

[0010] In some embodiments, the method is performed by the investment information push model;

[0011] The investment information push model is trained through the following steps:

[0012] Obtain a first sample set, wherein the first sample set includes multiple first sample pairs, each first sample pair includes a first training label and a visual instruction sample, each task scenario corresponds to multiple of the multiple visual instruction samples, and the visual instruction samples of the multiple first sample pairs under each task scenario are different from each other;

[0013] Input a first sample pair into the initial model, and obtain the first sample market prediction result through the initial model;

[0014] Based on the first sample market prediction result and the first training label, a loss analysis is performed on the visual instruction sample to obtain a first loss value. The first loss value is used to adjust the parameters of the initial model and the visual instruction sample if the first preset training termination condition is not met.

[0015] When the first preset training termination condition is met, the initial model after parameter tuning is determined as the investment information push model, and the visual instructions for multiple task scenarios are determined based on multiple adjusted visual instruction samples.

[0016] In some embodiments, before obtaining the first sample set, the investment information push model is further trained through the following steps:

[0017] Obtain a second sample set, wherein the second sample set includes multiple second sample pairs, each of the multiple second sample pairs corresponds to at least one of the task scenarios, and the second sample pairs belonging to the same task scenario correspond to the same visual instruction sample, and the second sample pair includes a second training label;

[0018] Input a second sample pair into the initial model, and obtain the second sample market prediction result through the initial model;

[0019] Based on the second sample market prediction results and the second training label, a loss analysis for multimodal feature fusion is performed to obtain a second loss value. The second loss value is used to adjust the parameters of the initial model if the second preset training termination condition is not met.

[0020] When the second preset training termination condition is met, cancel the input of the next second sample pair into the initial model and obtain the first sample set.

[0021] In some embodiments, the first sample market prediction result is obtained by the initial model based on the first sample pair under the first model parameters;

[0022] The step of performing loss analysis on the visual instruction samples based on the first sample market prediction results and the first training labels to obtain a first loss value includes:

[0023] Based on the market prediction results of the first sample, the first prediction probability of the initial model for the determined push strategy sample is calculated according to the first sample;

[0024] The first loss value is calculated based on the first predicted probability and the first training label.

[0025] In some embodiments, the second sample market prediction result is obtained by the initial model based on the second sample pair under the second model parameters;

[0026] The step of performing loss analysis based on the second sample market prediction results and the second training labels for multimodal feature fusion to obtain a second loss value includes:

[0027] Based on the market prediction results of the second sample, the second prediction probability of the initial model for the determined push strategy sample is calculated according to the second sample;

[0028] The second loss value is calculated based on the second predicted probability and the second training label.

[0029] In some embodiments, the text features are obtained by fusing text sub-features of multiple market text information. The text sub-features are obtained by extracting features of the market text information for a target text instruction through the investment information push model. The target text instruction is one of multiple pre-trained text instructions corresponding to a target information type. The target information type is one of multiple preset information types to which the market text information belongs. There is a unique correspondence between a text instruction and an information type.

[0030] In some embodiments, the first sample pair further includes first visual sample features;

[0031] After obtaining the first loss value, the training steps of the investment information push model further include:

[0032] When the first preset training termination condition is not met, among multiple first sample pairs, determine the target first sample pair that belongs to the same task scenario as the current first sample pair and has not been input into the initial model;

[0033] Based on the adjusted visual instruction samples in the current first sample pair, the visual instruction samples and the features of the first visual samples in the target first sample pair are updated, and the next first sample pair is input into the parameter-tuned initial model.

[0034] To achieve the above objectives, a second aspect of this application provides an information push device, the device comprising:

[0035] The feature acquisition module is used to acquire text features of multiple market text information of the candidate push product, and visual features of multiple visual data of the candidate push product. The visual features are obtained by fusing visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios in multiple pre-trained visual instructions. The target task scenario is one of the visual data in multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario.

[0036] A multimodal feature fusion module is used to perform multimodal feature fusion on the text features and the visual features to obtain fused features;

[0037] The market forecasting module is used to perform market sentiment analysis based on the fusion features, and to make market forecasts for the candidate push products based on the market sentiment analysis results, thereby obtaining the market forecast results for the candidate push products.

[0038] The push execution module is used to determine a push strategy for pushing the candidate push products to the target object based on the market forecast results, and to execute the investment information push according to the push strategy.

[0039] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect.

[0040] To achieve the above objectives, a fourth aspect of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect.

[0041] This application proposes an information push method, apparatus, electronic device, and storage medium. The method first acquires text features of multiple market text information of candidate push products, as well as visual features of multiple visual data of the candidate push products. The visual features are obtained by fusing visual sub-features of each visual data. These visual sub-features are obtained by extracting features from the visual data based on target visual instructions using a pre-trained investment information push model. The target visual instruction is one of multiple pre-trained visual instructions corresponding to a target task scenario, and the target task scenario is one of multiple preset task scenarios to which the visual data belongs. There is a unique correspondence between a visual instruction and a task scenario. Next, multimodal feature fusion is performed on the text features and visual features to obtain fused features. Then, market sentiment analysis is performed based on the fused features, and market prediction is performed on the candidate push products based on the market sentiment analysis results to obtain market prediction results for the candidate push products. Finally, based on the market prediction results, a push strategy for pushing the candidate push products to target objects is determined, and investment information push is executed according to the push strategy. Since each visual sub-feature is extracted based on the target visual instructions corresponding to the target task scenario to which the visual data belongs, the investment information push model can pay attention to the detailed information in the visual data of different task scenarios. This makes the extracted visual sub-features more accurate. Thus, when performing market sentiment analysis based on fused features, the investment information push model can make market sentiment-based predictions based on both text features and visual features. It can also pay attention to the impact of detailed information in the visual data of different task scenarios on market sentiment analysis, thereby making the market prediction results obtained by the investment information push model more accurate. This is beneficial to improving the accuracy of the push strategy and, in turn, to pushing more accurate investment information about candidate push products to the target audience. Attached Figure Description

[0042] Figure 1 This is a flowchart of an information push method provided in an embodiment of this application;

[0043] Figure 2 yes Figure 1 A schematic diagram of a training step embodiment of the investment information push model disclosed herein;

[0044] Figure 3 yes Figure 1 A schematic diagram of another training step embodiment of the investment information push model disclosed herein;

[0045] Figure 4 yes Figure 2 A flowchart of an embodiment of a sub-step of step 230;

[0046] Figure 5 yes Figure 3 A flowchart of an embodiment of a sub-step of step 330;

[0047] Figure 6 yes Figure 1 A schematic diagram of another training step embodiment of the investment information push model disclosed herein;

[0048] Figure 7 This is a schematic diagram of the structure of the information push device provided in the embodiments of this application;

[0049] Figure 8 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0050] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0051] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

[0052] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.

[0053] First, let's analyze some of the terms used in this application:

[0054] Artificial intelligence (AI) is a new branch of computer science that studies, develops, and applies theories, methods, technologies, and systems to simulate, extend, and expand human intelligence. It aims to understand the essence of intelligence and produce intelligent machines that can react in a way similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing, and expert systems. AI can simulate the information processes of human consciousness and thought. Furthermore, AI utilizes digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceiving the environment, acquiring knowledge, and using that knowledge to achieve optimal results.

[0055] In related technologies, intelligent investment advisory systems can provide users with automated and personalized investment advice. However, in the investment market, a financial product reflects its future market trend through multimodal information or data. Since intelligent investment advisory systems in related technologies can only take into account single-modal data, they cannot actually take into account other modal data when predicting the market trend of a product. This makes it impossible for intelligent investment advisory systems to accurately predict the market trend of a product, thus making it impossible to give users an accurate investment advice.

[0056] To enable the delivery of more accurate investment information to target audiences, this application provides an information delivery method, device, electronic device, and computer-readable storage medium. The method first acquires textual features of multiple market textual information of candidate delivery products, as well as visual features of multiple visual data of the candidate delivery products. The visual features are obtained by fusing visual sub-features of each visual data. These visual sub-features are obtained by extracting features from the visual data based on target visual instructions using a pre-trained investment information delivery model. The target visual instruction is one of multiple pre-trained visual instructions corresponding to a target task scenario, and the target task scenario is one of multiple preset task scenarios to which the visual data belongs. A unique correspondence exists between each visual instruction and each task scenario. Next, multimodal feature fusion is performed on the textual and visual features to obtain fused features. Then, market sentiment analysis is performed based on the fused features, and market prediction is conducted based on the market sentiment analysis results to obtain the market prediction results for the candidate delivery products. Finally, based on the market prediction results, a delivery strategy for pushing the candidate delivery products to target audiences is determined, and investment information delivery is executed according to the delivery strategy. Since each visual sub-feature is extracted based on the target visual instructions corresponding to the target task scenario to which the visual data belongs, the investment information push model can pay attention to the detailed information in the visual data of different task scenarios. This makes the extracted visual sub-features more accurate. Thus, when performing market sentiment analysis based on fused features, the investment information push model can make market sentiment-based predictions based on both text features and visual features. It can also pay attention to the impact of detailed information in the visual data of different task scenarios on market sentiment analysis, thereby making the market prediction results obtained by the investment information push model more accurate. This is beneficial to improving the accuracy of the push strategy and, in turn, to pushing more accurate investment information about candidate push products to the target audience.

[0057] This application provides an information push method, device, electronic device, and storage medium, which are specifically described through the following embodiments. First, the information reasoning method in this application is described.

[0058] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0059] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.

[0060] The information push method provided in this application relates to the field of financial technology. The information push method provided in this application can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms; the software can be an application implementing the information push method, but is not limited to the above forms.

[0061] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0062] It should be noted that in all specific embodiments of this application, when processing data related to user identity or characteristics, such as user information, user behavior data, user historical data, and user location information, user permission or consent is obtained first. Furthermore, the collection, use, and processing of this data comply with relevant laws, regulations, and standards. In addition, when embodiments of this application require access to sensitive personal information of users, separate permission or consent from the user is obtained through pop-ups or redirection to confirmation pages. Only after obtaining the user's separate permission or consent is the necessary user-related data required for the proper functioning of these embodiments acquired.

[0063] See Figure 1 , Figure 1 The flowchart of an information push method provided by an embodiment of this application is shown. In this embodiment, the information push method may include steps 110 to 140.

[0064] Step 110: Obtain the text features of multiple market text information of candidate push products, and the visual features of multiple visual data of candidate push products. The visual features are obtained by fusing the visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on the target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios in multiple pre-trained visual instructions. The target task scenario is one of the visual data in multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario.

[0065] Step 120: Perform multimodal feature fusion on textual and visual features to obtain fused features;

[0066] Step 130: Perform market sentiment analysis based on the fusion features, and predict the market trend of the candidate products based on the market sentiment analysis results to obtain the market trend prediction results of the candidate products.

[0067] Step 140: Based on the market forecast results, determine the push strategy for sending candidate products to the target audience, and execute the investment information push according to the push strategy.

[0068] In one embodiment, market text information refers to the text content commenting on the candidate product after its launch. This market text information may include online news, social media comment posts, forum discussions, analyst reports, etc., without specific limitations. Furthermore, the publication times of multiple published text messages can differ.

[0069] In one embodiment, textual features refer to information that represents the overall characteristics of multiple market textual information after integrating them. Specifically, a textual feature can be a fusion feature, obtained by fusing the textual sub-features of each market textual information.

[0070] In one embodiment, in the process of obtaining the text features of multiple market text information of candidate push products, the multiple market text information of candidate push products can be obtained first. Then, the text features of each market text information are extracted by the investment information push model to obtain the text sub-features of each market text information. Then, it is queried whether the multiple text sub-features of the previously obtained candidate push products have been stored. If so, the multiple text sub-features of the previously obtained candidate push products are called, and the multiple text sub-features of the previously obtained candidate push products are fused with the multiple text sub-features extracted at present to obtain the text features. The multiple text sub-features extracted at present are then stored.

[0071] In one embodiment, if multiple text sub-features of candidate push products are not stored, the multiple text sub-features extracted at the moment are fused to obtain text features, and the multiple text sub-features extracted at the moment are stored.

[0072] In one embodiment, obtaining multiple market text information of candidate push products can be performed in response to a push task targeting a target object, or it can be performed periodically at preset time intervals, etc., without being specifically limited here.

[0073] In one embodiment, feature fusion of multiple text sub-features can be achieved by feature concatenation, feature averaging, feature cross-interaction, etc., and no specific limitation is made here.

[0074] In one embodiment, visual data refers to the visual information displayed in comments on the candidate product after its launch. This visual data may include one or more of the following: news reports, comment videos, product market charts, financial statement images, and presentation videos; no specific limitation is specified here.

[0075] In one embodiment, visual features refer to information that represents the overall characteristics of multiple market textual information by integrating multiple visual data.

[0076] In one embodiment, in the process of acquiring the visual features of multiple visual data of candidate push products, multiple visual data of candidate push products can be acquired first. Then, the investment information push model is used to extract features of each visual data according to the target visual instruction to which it belongs, so as to obtain the visual sub-features of each visual data. Then, it is queried whether the multiple visual sub-features of the previously acquired candidate push products have been stored. If so, the multiple visual sub-features of the previously acquired candidate push products are called, and the multiple visual sub-features of the previously acquired candidate push products are fused with the multiple visual sub-features extracted at present to obtain visual features. The multiple visual sub-features extracted at present are then stored.

[0077] In one embodiment, if multiple visual sub-features of candidate push products are not stored, the multiple visual sub-features extracted at present are fused to obtain visual features, and the multiple visual sub-features extracted at present are stored.

[0078] In one embodiment, acquiring multiple visual data of candidate push products can be performed in response to a push task targeting a target object, or it can be performed periodically at preset time intervals, etc., without being specifically limited here.

[0079] In one embodiment, feature fusion of multiple visual sub-features can be achieved by feature concatenation, feature averaging, feature cross-interaction, etc., and no specific limitation is made here.

[0080] In one embodiment, the investment information push model refers to a visual language model (VILA) capable of processing visual modal data and text modal data, performing sentiment-based market prediction based on the processed data, and pushing investment information about candidate financial products to target audiences based on the market prediction results. The investment information push model may include a visual feature extraction network and a text feature extraction network. The text feature extraction network may be a Large Language Model (LLM) network, a Bidirectional Encoder Representations from Transformers (BERT) network, or similar networks. The visual feature extraction network may be a Vision Transformer (ViT) network, a Residual Block network, or similar networks; the specific implementation is not limited here.

[0081] In one embodiment, visual instructions refer to text statements used to guide the investment information push model to perform targeted feature extraction on visual data using specific feature extraction methods for different task scenarios. Here, "task scenario" refers to an application scenario with a specific task, such as Visual Question Answering (VQA), Text Visual Question Answering (TextVQA), Image Captioning, etc., without further limitation.

[0082] In one embodiment, a unique correspondence between a visual instruction and a task scenario means that each pre-trained visual instruction is uniquely associated with one of multiple task scenarios, and does not correspond to any other task scenarios.

[0083] In one embodiment, during the process of extracting features from visual data based on target visual instructions using a pre-trained investment information push model, each visual data point can first be analyzed for a specific scenario. In multiple task scenarios, the target task scenario to which each visual data point belongs can be determined. Then, for each visual data point, the target visual instructions corresponding to the target task scenario of the visual data and the visual data itself are input into the investment information push model, so that the investment information push model can extract features from the visual data according to the target visual instructions to obtain the visual sub-features corresponding to the visual data.

[0084] In one embodiment, the process of performing scene analysis on each visual data point and determining the target task scene to which each visual data point belongs in multiple task scenarios can be performed by a pre-trained scene classification model, or by using the metadata of the visual data and preset scene classification rules, etc., without being specifically limited here.

[0085] In one embodiment, multimodal feature fusion refers to the operation of integrating and processing features corresponding to different types of data. The process of multimodal feature fusion may include feature alignment (i.e., the operation of converting features from different modalities into a shared feature space), feature fusion, and other operations, which are not specifically limited here.

[0086] In one embodiment, in the process of multimodal feature fusion of text features and visual features to obtain fused features, the text features and visual features can be aligned first to obtain aligned text features and aligned visual features. Then, the aligned text features and aligned visual features are fused to obtain fused features.

[0087] In one embodiment, market sentiment analysis based on fused features refers to the process of extracting features from the fused features to obtain market sentiment analysis results. The market sentiment analysis result can be a feature vector representing market sentiment information.

[0088] In one embodiment, the process of performing market sentiment analysis based on fusion features, predicting market trends for candidate products based on the market sentiment analysis results, and obtaining the market trend prediction results for candidate products can be achieved by first performing market sentiment analysis based on fusion features to obtain sentiment features, then acquiring historical market trend information for candidate products, extracting features from the historical market trend information to obtain historical market trend features, and finally performing market trend prediction based on the sentiment features and historical market trend features to obtain the market trend prediction results. Here, the market trend prediction results refer to a feature vector that can represent the future market trend of the candidate products.

[0089] In one embodiment, the push strategy refers to the specific method by which candidate push products are pushed to target objects for investment advice. The push strategy may include generating investment information for candidate push products based on the candidate push products, pushing this investment information to the target objects, and canceling the push of candidate push products to the target objects for investment information.

[0090] In one embodiment, during the process of determining a push strategy to push candidate products to target objects based on market forecast results and executing investment information push according to the push strategy, the financial attributes of the target objects can be obtained first, and feature extraction can be performed on the financial attributes to obtain the target objects' financial characteristics. Then, a first similarity between the target objects' financial characteristics and the market forecast results can be calculated. Next, the first similarity is compared with a preset similarity threshold. When the first similarity is greater than or equal to the preset similarity threshold, investment information of candidate products is formed based on the candidate products, and the investment information of candidate products is pushed to the target objects. When the first similarity is less than the preset similarity threshold, the investment information of candidate products is not pushed to the target objects.

[0091] See Figure 2 , Figure 2 It shows Figure 1 An embodiment of the training steps of the investment information push model disclosed herein, wherein the information push method provided by the present application is executed by the investment information push model, and the training steps may include steps 210 to 240.

[0092] Step 210: Obtain the first sample set, wherein the first sample set includes multiple first sample pairs, each first sample pair includes a first training label and a visual instruction sample, each task scenario corresponds to multiple visual instruction samples, and the visual instruction samples of the multiple first sample pairs under each task scenario are different from each other.

[0093] Step 220: Input a first sample pair into the initial model, and obtain the first sample market prediction result through the initial model;

[0094] Step 230: Perform loss analysis on the visual instruction samples based on the first sample market prediction results and the first training labels to obtain the first loss value. The first loss value is used to adjust the parameters of the initial model and the visual instruction samples if the first preset training termination condition is not met.

[0095] Step 240: When the first preset training end condition is met, the initial model after parameter tuning is determined as the investment information push model, and visual instructions for multiple task scenarios are determined based on multiple adjusted visual instruction samples.

[0096] In one embodiment, the correspondence between each task scenario and multiple visual instruction samples means that among the multiple first sample pairs included in the first sample set, at least one first sample pair corresponds to one of the multiple task scenarios. The first sample set refers to a training dataset that includes information delivery capabilities for training the investment information delivery model in each task scenario.

[0097] In one embodiment, the first sample pair refers to a training data set that enables the initial model to complete one information push training process. The initial model refers to the model of the investment information push model before training is complete, and its model architecture is the same as that of the investment information push model. Furthermore, in addition to the first training label and visual instruction samples, the first sample pair may also include first text sample features representing market text sample information of the pushed product sample, first visual sample features representing feature extraction from visual sample data of the same pushed product sample based on visual instruction samples, and first historical market data sample features representing the historical market data of the same pushed product sample, etc., etc., without specific limitations here.

[0098] In one embodiment, the first training label refers to the theoretical probability that the initial model outputs the correct push strategy based on the first sample market prediction result and the first object sample features. Here, the first sample market prediction result refers to the market prediction result of the push product sample obtained by the initial model based on the first text sample features and the first visual sample features extracted based on the visual instruction samples.

[0099] In one embodiment, in the process of inputting a first sample pair into an initial model and obtaining the first sample market prediction result through the initial model, the first text sample features and the first visual sample features of the pushed product sample can be fused into multimodal features to obtain the first fused sample features. Then, market sentiment analysis is performed based on the first fused sample features to obtain the first sentiment sample features. Finally, market prediction is performed based on the first fused sample features and the first sentiment sample features to obtain the first sample market prediction result.

[0100] In one embodiment, loss analysis for visual instruction samples refers to the process of determining the difference between the probability of the correct push strategy calculated after the initial model uses visual instruction samples and the probability before the first training label. Here, the correct push strategy refers to the push strategy that the initial model should output after inputting a sample pair into the initial model. The push strategy determined by the initial model can be the same as or different from the correct push strategy, and no specific limitation is made here.

[0101] In one embodiment, adjusting the parameters of the initial model and the visual instruction samples refers to adjusting the parameters of the initial model and some or all of the text content in the visual instruction samples based on the first loss value. It should be noted that the multimodal feature fusion capability of the initial model can also be adjusted during the adjustment of its parameters.

[0102] In one embodiment, during the adjustment of the parameters of the initial model and the visual instruction samples based on the first loss value, the first loss value and the visual instruction samples can be used as input data and fed into a pre-trained model with a text reinforcement learning (RL) mechanism. This model then performs text optimization on the visual instruction samples based on the first loss value, resulting in optimized visual instruction samples. The model with the text reinforcement learning (RL) mechanism can be trained using one of the following models: Q-learning model, Deep Q-Networks model, Policy Gradients (PG) network model, ChatGLM model, etc., without specific limitations.

[0103] In one embodiment, the first preset training termination condition may be that the first loss value is less than the first preset loss threshold, or that the maximum first loss value of each task scenario (the maximum first loss value refers to the maximum value among the first loss values ​​of the first sample pair in a task scenario) is less than the preset first loss threshold, etc., and the specific condition is not limited here.

[0104] In one embodiment, in the process of determining visual instructions for multiple task scenarios based on multiple adjusted visual instruction samples, the scene features of each task scenario can be obtained first. Then, for each task scenario, the instruction features of multiple adjusted visual instruction samples under that task scenario are obtained. Cluster analysis is performed on the multiple instruction features for the scene features of that task scenario. The instruction feature closest to the instruction feature after taking the instruction feature as the center is determined, and the instruction feature is determined as the visual instruction of that task scenario.

[0105] In one embodiment, when the first preset training termination condition is not met, the parameters of the initial model and the visual instruction samples are adjusted according to the first loss value, and the next first sample pair is input into the adjusted initial model to continue training for the visual instruction samples (that is, to execute steps 220 to 230).

[0106] Since the first loss value can be used to adjust the parameters of the initial model and the visual instruction samples when the first preset training termination condition is not met, each visual instruction sample can be better matched to the initial model's ability to understand visual instructions after adjustment. Thus, when the first preset training termination condition is met, the determined investment information push model can extract features from visual data in different task scenarios based on the visual instructions of multiple task scenarios determined from multiple adjusted visual instruction samples. This allows for more accurate visual instruction extraction and improves the accuracy of market forecast results.

[0107] See Figure 3 , Figure 3 It shows Figure 1 Another training step embodiment of the investment information push model disclosed herein, in one embodiment, before obtaining the first sample set, the training step may further include steps 310 to 340.

[0108] Step 310: Obtain a second sample set, wherein the second sample set includes multiple second sample pairs, each of the multiple second sample pairs corresponds to at least one task scenario, and the second sample pairs belonging to the same task scenario correspond to the same visual instruction sample, and the second sample pairs include second training labels;

[0109] Step 320: Input a second sample pair into the initial model, and obtain the market prediction result of the second sample through the initial model;

[0110] Step 330: Perform loss analysis on multimodal feature fusion based on the second sample market prediction results and the second training label to obtain the second loss value. The second loss value is used to adjust the parameters of the initial model if the second preset training termination condition is not met.

[0111] Step 340: When the second preset training termination condition is met, cancel the input of the next second sample pair into the initial model and obtain the first sample set.

[0112] In one embodiment, "multiple second sample pairs in the second sample set corresponding to at least one task scenario" refers to one of the following scenarios: multiple sample pairs can correspond one-to-one with multiple task scenarios, or multiple sample pairs correspond to only one of multiple task scenarios. The second sample set refers to a training dataset that includes the multimodal feature fusion capability used to train the investment information push model.

[0113] In one embodiment, the second sample pair refers to a training data set that enables the initial model to complete one information push training process. The second sample pair may include, in addition to the second training label, second text sample features representing market text sample information of the pushed product sample, second visual sample features representing visual sample data corresponding to the same pushed product sample, and second historical market data sample features representing the historical market conditions of the same pushed product sample, etc., etc., without further specific limitations.

[0114] In one embodiment, the second sample pair belonging to the same task scenario corresponding to the same visual instruction sample means that the second visual sample feature in the second sample pair belonging to the same task scenario is a feature extracted from the same visual instruction sample.

[0115] In one embodiment, the second training label refers to the theoretical probability that the initial model outputs the correct push strategy based on the second sample market prediction result and the second object sample features. Here, the second sample market prediction result refers to the market prediction result of the push product sample obtained by the initial model based on the second text sample features and the second visual sample features.

[0116] In one embodiment, in the process of inputting a second sample pair into the initial model and obtaining the second sample market prediction result through the initial model, the second text sample features and the second visual sample features of the pushed product sample can be fused into multimodal features to obtain the second fused sample features. Then, market sentiment analysis is performed based on the second fused sample features to obtain the second sentiment sample features. Finally, market prediction is performed based on the second fused sample features and the second sentiment sample features to obtain the second sample market prediction result.

[0117] In one embodiment, loss analysis for multimodal feature fusion refers to the process of determining the difference between the probability of the correct push strategy calculated by the initial model and the probability before the second training label. Here, the push strategy determined by the initial model can be the same as or different from the correct push strategy, and no specific limitation is made here.

[0118] In one embodiment, the second preset training termination condition may be that the second loss value is less than the second preset loss threshold, or that the number of training iterations of the initial model reaches the training iteration threshold, etc., and the specific condition is not limited here.

[0119] In one embodiment, when the second preset training termination condition is not met, the parameters of the initial model are adjusted according to the second loss value, and the next second sample pair is input into the adjusted initial model to continue training for multimodal feature fusion (i.e., steps 320 to 330 are executed).

[0120] In one embodiment, when multiple second sample pairs correspond to multiple task scenarios, during the multimodal feature fusion process, multiple second sample pairs from the same task scenario can be input one by one to train the initial model's preliminary feature alignment capability. Then, multiple second sample pairs from multiple task scenarios can be input to train the initial model's feature alignment capability in different task scenarios. Through progressive training for multimodal fusion, the model's ability to handle complex multimodal tasks is enhanced, thereby improving its versatility and robustness in practical applications while ensuring feature alignment accuracy.

[0121] By training for multimodal fusion before training for visual instructions, the impact of different visual instruction samples on multimodal feature fusion can be reduced. This allows the initial model to focus more on the multimodal feature fusion and visual instruction-based feature extraction processes when trained using the first and second sample sets respectively, thereby reducing the impact of both on the other end and making both multimodal feature fusion and visual instruction-based feature extraction more accurate.

[0122] See Figure 4 , Figure 4 It shows Figure 2 In one embodiment of step 230, the first sample market prediction result can be obtained by the initial model based on the first sample pair under the first model parameters. Step 230 may include steps 410 to 420.

[0123] Step 410: Based on the market prediction results of the first sample, calculate the first prediction probability of the initial model for the determined push strategy sample based on the first sample;

[0124] Step 420: Calculate the first loss value based on the first predicted probability and the first training label.

[0125] In one embodiment, the push strategy sample refers to the push strategy determined based on the first sample market prediction result after inputting a first sample into the initial model and the initial model obtains the first sample market prediction result, which is used to push the push product sample to the target sample.

[0126] In one embodiment, during the process of calculating the first prediction probability of the initial model for the determined push strategy sample based on the first sample market prediction result, the first sample market prediction result can be normalized to obtain multiple prediction probabilities. Then, the maximum value among the multiple prediction probabilities is determined as the first prediction probability. The normalization process can be performed using activation functions such as the sigmoid function or the softmax function; no specific limitation is made here.

[0127] In one embodiment, the first loss value calculated based on the first predicted probability and the first training label can be represented by the following formula (1).

[0128] (1)

[0129] in, It can be used to represent the first loss value;

[0130] It can be used to represent the first sample pair;

[0131] It can be used to represent the first training label;

[0132] It can be used to represent the first sample set;

[0133] It can be used to represent the expected value of the first training label when a first sample pair in the first sample set is input into the initial model and the output push strategy sample is the correct push strategy.

[0134] It can be used to represent the parameters of the first model;

[0135] It can be used to represent the probability that the push strategy sample output by the initial model under the first model parameters is the correct push strategy after a first sample pair in the first sample set is input into the initial model.

[0136] By calculating the first prediction probability and obtaining the first loss value based on the first prediction probability and the first training label, the ability of the initial model to extract visual features based on different visual instruction samples can be well expressed by the first loss value. This allows the first loss value to accurately express the visual feature extraction loss based on visual instructions, which is beneficial for more accurately adjusting the first model parameters of the initial model and improving the accuracy of feature extraction from visual data in different task scenarios.

[0137] See Figure 5 , Figure 5 It shows Figure 3 In one embodiment of the process of a sub-step of step 330, the second sample market prediction result is obtained by the initial model based on the second sample pair under the second model parameters. Step 330 may include steps 510 to 520.

[0138] Step 510: Based on the market prediction results of the second sample, calculate the second prediction probability of the initial model for the determined push strategy sample based on the second sample;

[0139] Step 520: Calculate the second loss value based on the second sample market prediction results and the second training label.

[0140] In one embodiment, during the process of calculating the second prediction probability of the initial model for the determined push strategy sample based on the second sample market prediction result, the second sample market prediction result can be normalized to obtain multiple prediction probabilities. Then, the maximum value among the multiple prediction probabilities is determined as the second prediction probability. The normalization process can be performed using activation functions such as the sigmoid function or the softmax function; no specific limitation is made here.

[0141] In one embodiment, the second loss value calculated based on the second predicted probability and the second training label can be represented by the following formula (2).

[0142] (2)

[0143] in, It can be used to represent the second loss value;

[0144] It can be used to represent the second sample pair;

[0145] It can be used to represent the second training label;

[0146] It can be used to represent the second sample set;

[0147] It can be used to represent the expected value of the second training label when a second sample pair in the second sample set is input into the initial model and the output push strategy sample is the correct push strategy.

[0148] It can be used to represent the parameters of the second model;

[0149] It can be used to represent the probability that the push strategy sample output by the initial model under the second model parameters is the correct push strategy after a second sample pair in the second sample set is input into the initial model.

[0150] By calculating the second prediction probability and the second loss value based on the second prediction probability and the second training label, the multimodal feature fusion capability of the initial model for different text features and visual features can be better expressed by the second loss value. Thus, the second loss value can more accurately express the feature alignment loss and feature fusion loss, which is conducive to more accurately adjusting the second model parameters of the initial model.

[0151] In one embodiment, the text features are obtained by fusing text sub-features from multiple market text information. These text sub-features are extracted from the market text information using an investment information push model, targeting specific text instructions. The target text instruction is one of several pre-trained text instructions corresponding to a target information type, and the target information type is one of several preset information types to which the market text information belongs. There is a unique correspondence between a text instruction and an information type. Since each text sub-feature is extracted based on the target text instruction corresponding to the target information type to which the market text information belongs, the investment information push model can notice the detailed information in market text information of different information types. This makes the extracted text sub-features more accurate. Thus, when performing market sentiment analysis based on the fused features, the investment information push model can jointly notice the combined impact of detailed information in market text information of different information types and the detailed visual data of different task scenarios on market sentiment. This makes the market prediction results obtained by the investment information push model more accurate, improving the accuracy of the push strategy and, consequently, enabling more accurate delivery of candidate push products to the target audience.

[0152] In one embodiment, a text instruction refers to a statement text used to guide the investment information push model to perform targeted feature extraction on a market text information using a specific feature extraction method for different information types. Here, information type refers to the text format of the market text information, such as question-and-answer, guidance text (e.g., notices, articles guiding financial management), etc., without specific limitations here.

[0153] In one embodiment, during the feature extraction of market text information based on target text instructions using a pre-trained investment information push model, the information type of each market text information can be analyzed one by one first. Among multiple information types, the target information type to which each market text information belongs can be determined. Then, for each market text information, the target text instruction corresponding to the target task scenario of the market text information and the market text information itself are input into the investment information push model, so that the investment information push model can extract features of the market text information according to the target text instruction to obtain the text sub-features corresponding to the market text information.

[0154] In one embodiment, the process of analyzing the information type of each market text information one by one and determining the target information type to which each market text information belongs among multiple information types can be performed by a pre-trained text classification model, or by the metadata of the market text information and a preset text classification rule, etc., without being specifically limited here.

[0155] In one embodiment, a unique correspondence between a text instruction and an information type means that each pre-trained text instruction is uniquely associated with one of multiple information types, and does not correspond to any other information type.

[0156] In one embodiment, the training method for text instructions can refer to the training method for visual instructions in steps 210 to 240, which will not be described in detail here.

[0157] See Figure 6 , Figure 6 It shows Figure 1 Another training step embodiment of the investment information push model disclosed herein, in one embodiment, after obtaining the first loss value, the training step may further include steps 610 to 620.

[0158] Step 610: When the first preset training termination condition is not met, among multiple first sample pairs, determine the target first sample pair that belongs to the same task scenario as the current first sample pair and has not been input into the initial model;

[0159] Step 620: Based on the adjusted visual instruction samples in the current first sample pair, update the visual instruction samples and first visual sample features in the target first sample pair, and input the next first sample pair into the parameter-tuned initial model.

[0160] In one embodiment, updating the visual instruction sample and the first visual sample feature in the target first sample pair based on the adjusted visual instruction sample in the current first sample pair refers to replacing the visual instruction sample in the target first sample pair with the adjusted visual instruction sample in the current first sample pair, and updating the original first visual sample feature in the target first sample pair with the first visual sample feature obtained by feature extraction of the original first visual sample feature corresponding to the adjusted visual instruction sample in the current first sample pair.

[0161] The following example illustrates steps 710 to 720: Assume there are four first sample pairs. The visual instruction sample in the first first sample pair is "Extract the content from this image." The visual instruction sample in the second first sample pair is "What is the content of this video?". The third first sample pair belongs to the same task scenario as the first first sample pair, and its visual instruction sample is "Summarize the content of this image." The fourth first sample pair belongs to the same task scenario as the second first sample pair, and its visual instruction sample is "What is this video about?". Assume that after the first first sample pair is input into the initial model, the first preset training termination condition is not met. At this time, the initial model and the visual instruction sample "Extract the content from this image" can be adjusted. The third first sample pair is then identified as the target first sample pair. The visual instruction sample in the sample pair is adjusted to "extract all the content in this image". At this time, the third first sample pair in the third first sample pair updates "summarize the content in this image" to "extract all the content in this image". Then, the second first sample pair is input into the initial model. Suppose that the first preset training termination condition is not met after the second first sample pair is input into the initial model. At this time, the initial model and the visual instruction sample "What is the content in this video?" can be adjusted. At this time, the fourth first sample pair is determined as the target first sample pair. Suppose that the visual instruction sample in the second first sample pair is adjusted to "What parts does the content in this video include?". At this time, the third first sample pair in the fourth first sample pair updates "What is this video about?" to "What parts does the content in this video include?". Then, the third first sample pair is input into the initial model.

[0162] In one embodiment, when the visual instruction sample in the target first sample pair is updated to the adjusted visual instruction sample in the current first sample pair, in the process of determining the visual instructions for multiple task scenarios based on multiple adjusted visual instruction samples, the visual instruction sample in the target first sample pair under each task scenario can be specifically determined as the visual instruction under each task scenario.

[0163] By updating the visual instruction samples and first visual sample features in the target first sample pair based on the adjusted visual instruction samples in the current first sample pair when the first preset training termination condition is not met, the initial model can be trained based on the first visual features extracted from the iterative visual instruction samples when the target first sample is used to train the input initial model. Since the first visual sample features in the target first sample pair are extracted based on the iterative visual instruction samples, the initial model can be iteratively trained on a task scenario for visual instruction samples, so that the initial model's understanding of visual instruction samples in a task scenario can be iteratively improved. Since the pre-trained visual instructions in a task scenario are formed iteratively from visual instruction samples, when the visual language model extracts visual features based on the visual instructions in a task scenario, it can extract visual features more accurately from the visual data based on a better understanding of the visual instructions. This makes the market prediction results obtained by the visual language model more accurate, which is conducive to improving the accuracy of the push strategy and thus more accurately pushing candidate push products to the target audience.

[0164] See Figure 7 This application also provides an information push device that can implement the above method. The information push device 700 may include:

[0165] The feature acquisition module 710 can be used to acquire the text features of multiple market text information of candidate push products, as well as the visual features of multiple visual data of candidate push products. The visual features are obtained by fusing the visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on the target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios in multiple pre-trained visual instructions. The target task scenario is one of the visual data in multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario.

[0166] The multimodal feature fusion module 720 can be used to fuse text features and visual features to obtain fused features;

[0167] The market forecasting module 730 can be used to perform market sentiment analysis based on fusion features, and to predict the market trend of candidate products based on the market sentiment analysis results, thus obtaining the market trend prediction results of candidate products.

[0168] The push execution module 740 can be used to determine the push strategy for pushing candidate push products to target objects based on market forecast results, and to execute the push of investment information according to the push strategy.

[0169] The specific implementation of this information push device is basically the same as the specific implementation of the information push method described above, and will not be repeated here.

[0170] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the aforementioned information push method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.

[0171] Please see Figure 8 , Figure 8 The hardware structure of an electronic device 800, according to another embodiment, is illustrated. The electronic device 800 includes:

[0172] The processor 801 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.

[0173] The memory 802 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 802 can store the operating system and other applications. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 802 and is called and executed by the processor 801 using the information push method of the embodiments of this application.

[0174] The 803 input / output interface is used to implement information input and output.

[0175] The communication interface 804 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, Ethernet cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).

[0176] Bus 805 transmits information between various components of the device (e.g., processor 801, memory 802, input / output interface 803, and communication interface 804);

[0177] The processor 801, memory 802, input / output interface 803, and communication interface 804 are connected to each other within the device via bus 805.

[0178] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described information push method.

[0179] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

[0180] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of this application, and do not constitute a limitation on the technical solutions provided in this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided in this application are also applicable to similar technical problems.

[0181] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.

[0182] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.

[0183] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.

[0184] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0185] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.

[0186] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0187] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0188] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0189] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0190] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.

Claims

1. An information push method, characterized in that, Includes the following steps: The method acquires text features of multiple market text information of candidate push products, as well as visual features of multiple visual data of the candidate push products. The visual features are obtained by fusing visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios corresponding to multiple pre-trained visual instructions. The target task scenario is one of the visual data to which the visual data belongs among multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario. Multimodal feature fusion is performed on the text features and the visual features to obtain fused features; Market sentiment analysis is performed based on the fusion features, and market sentiment prediction is made for the candidate products based on the market sentiment analysis results, thus obtaining the market sentiment prediction results for the candidate products. Based on the market forecast results, a push strategy is determined to push the candidate push products to the target audience, and investment information is pushed according to the push strategy. The investment information push model is trained through the following steps: Obtain a first sample set, wherein the first sample set includes multiple first sample pairs, each first sample pair includes a first training label and a visual instruction sample, each task scenario corresponds to multiple of the multiple visual instruction samples, and the visual instruction samples of the multiple first sample pairs under each task scenario are different from each other; Input a first sample pair into the initial model, and obtain the first sample market prediction result through the initial model; Based on the first sample market prediction result and the first training label, a loss analysis is performed on the visual instruction sample to obtain a first loss value. The first loss value is used to adjust the parameters of the initial model and the visual instruction sample if the first preset training termination condition is not met. When the first preset training termination condition is met, the initial model after parameter tuning is determined as the investment information push model, and the visual instructions for multiple task scenarios are determined based on multiple adjusted visual instruction samples.

2. The method according to claim 1, characterized in that, Before obtaining the first sample set, the investment information push model is trained through the following steps: Obtain a second sample set, wherein the second sample set includes multiple second sample pairs, each of the multiple second sample pairs corresponds to at least one of the task scenarios, and the second sample pairs belonging to the same task scenario correspond to the same visual instruction sample, and the second sample pair includes a second training label; Input a second sample pair into the initial model, and obtain the second sample market prediction result through the initial model; Based on the second sample market prediction results and the second training label, a loss analysis for multimodal feature fusion is performed to obtain a second loss value. The second loss value is used to adjust the parameters of the initial model if the second preset training termination condition is not met. When the second preset training termination condition is met, cancel the input of the next second sample pair into the initial model and obtain the first sample set.

3. The method according to claim 1, characterized in that, The first sample market prediction result is obtained by the initial model based on the first sample pair under the first model parameters; The step of performing loss analysis on the visual instruction samples based on the first sample market prediction results and the first training labels to obtain a first loss value includes: Based on the market prediction results of the first sample, the first prediction probability of the initial model for the determined push strategy sample is calculated according to the first sample; The first loss value is calculated based on the first predicted probability and the first training label.

4. The method according to claim 2, characterized in that, The second sample market prediction result is obtained by the initial model under the second model parameters, based on the second sample pair; The step of performing loss analysis based on the second sample market prediction results and the second training labels for multimodal feature fusion to obtain a second loss value includes: Based on the market prediction results of the second sample, the second prediction probability of the initial model for the determined push strategy sample is calculated according to the second sample; The second loss value is calculated based on the second predicted probability and the second training label.

5. The method according to claim 1, characterized in that, The text features are obtained by fusing text sub-features of multiple market text information. The text sub-features are obtained by extracting features of the market text information for the target text instruction through the investment information push model. The target text instruction is one of the target information types in multiple pre-trained text instructions. The target information type is one of the market text information to which the information belongs in multiple preset information types. There is a unique correspondence between a text instruction and an information type.

6. The method according to claim 1, characterized in that, The first sample pair also includes first visual sample features; After obtaining the first loss value, the training steps of the investment information push model further include: When the first preset training termination condition is not met, among multiple first sample pairs, determine the target first sample pair that belongs to the same task scenario as the current first sample pair and has not been input into the initial model; Based on the adjusted visual instruction samples in the current first sample pair, the visual instruction samples and the features of the first visual samples in the target first sample pair are updated, and the next first sample pair is input into the parameter-tuned initial model.

7. An information push device, characterized in that, The device includes: The feature acquisition module is used to acquire text features of multiple market text information of the candidate push product, and visual features of multiple visual data of the candidate push product. The visual features are obtained by fusing visual sub-features of each visual data. The visual sub-features are obtained by extracting features of the visual data based on target visual instructions through a pre-trained investment information push model. The target visual instruction is one of the target task scenarios in multiple pre-trained visual instructions. The target task scenario is one of the visual data in multiple preset task scenarios. There is a unique correspondence between a visual instruction and a task scenario. A multimodal feature fusion module is used to perform multimodal feature fusion on the text features and the visual features to obtain fused features; The market forecasting module is used to perform market sentiment analysis based on the fusion features, and to make market forecasts for the candidate push products based on the market sentiment analysis results, thereby obtaining the market forecast results for the candidate push products. The push execution module is used to determine a push strategy for pushing the candidate push products to the target object based on the market forecast results, and to push investment information according to the push strategy. The investment information push model is trained through the following steps: Obtain a first sample set, wherein the first sample set includes multiple first sample pairs, each first sample pair includes a first training label and a visual instruction sample, each task scenario corresponds to multiple of the multiple visual instruction samples, and the visual instruction samples of the multiple first sample pairs under each task scenario are different from each other; Input a first sample pair into the initial model, and obtain the first sample market prediction result through the initial model; Based on the first sample market prediction result and the first training label, a loss analysis is performed on the visual instruction sample to obtain a first loss value. The first loss value is used to adjust the parameters of the initial model and the visual instruction sample if the first preset training termination condition is not met. When the first preset training termination condition is met, the initial model after parameter tuning is determined as the investment information push model, and the visual instructions for multiple task scenarios are determined based on multiple adjusted visual instruction samples.

8. An electronic device, characterized in that, The electronic device includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the information push method according to any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the information push method according to any one of claims 1 to 6.