Information processing method and apparatus, electronic device, and readable storage medium

By recognizing text entities in the response information and combining them with a predefined model and matching algorithm, the system outputs response information that includes both images and text, thus solving the problem of limited search results in large language models and improving the user experience.

WO2026137934A1PCT designated stage Publication Date: 2026-07-02UCWEB

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
UCWEB
Filing Date
2025-08-27
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

The search results of the large language model are mainly text, and it cannot output search results in the corresponding format according to user needs, resulting in a poor user experience.

Method used

By identifying whether the text in the response information corresponds to a specific entity, if so, the response information containing both image and text elements is output. The information element type is determined using the established recognition and text processing models, and the correlation between the image and text is matched or searched in the information source to achieve diversified output.

Benefits of technology

It enhances the user experience by combining images and text to provide richer information and meet diverse user needs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN2025117212_02072026_PF_FP_ABST
    Figure CN2025117212_02072026_PF_FP_ABST
Patent Text Reader

Abstract

The present disclosure relates to an information processing method and apparatus, an electronic device, and a readable storage medium. The method comprises: in response to query information inputted by a user, sending the query information to a server; receiving response information returned by the server for the query information, wherein information elements comprised in the response information are determined on the basis of text in the response information, and in the case that a corresponding entity is mentioned in the text in the response information, the response information comprises a picture element and a text element; and outputting the response information corresponding to the query information.
Need to check novelty before this filing date? Find Prior Art

Description

Information processing methods, apparatus, electronic devices and readable storage media

[0001] This disclosure claims priority to Chinese Patent Application No. 202411933541.X, filed with the China Patent Office on December 25, 2024, entitled "Information Processing Method, Apparatus, Electronic Device and Readable Storage Medium", the entire contents of which are incorporated herein by reference. Technical Field

[0002] This disclosure relates to artificial intelligence technology, and more specifically, to an information processing method, apparatus, electronic device, and readable storage medium. Background Technology

[0003] With the development of internet technology, search technology based on large language models has become increasingly mature. However, the search results from large language models are mostly text-based, and they cannot output search results in a format that corresponds to the user's needs. Summary of the Invention

[0004] One objective of this disclosure is to provide a new technical solution for information processing.

[0005] According to a first aspect of this disclosure, an information processing method is provided, comprising:

[0006] In response to user input of query information, the query information is sent to the server;

[0007] Receive the response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and if the text in the response information has a corresponding entity, the response information contains image elements and text elements;

[0008] Output the response information corresponding to the query information.

[0009] Optionally, if the text in the response information does not have a corresponding entity, the response information includes text elements but does not include image elements.

[0010] Optionally, the information elements included in the response information are determined based on the text in the response information, including:

[0011] The information element types contained in the response information are determined by inputting the text in the response information into a set recognition model; wherein, when the set recognition model determines that the text in the response information has a corresponding entity, the response information contains image elements and text elements.

[0012] Optionally, if the established recognition model determines that the text in the response information does not have a corresponding entity, the response information includes text elements but does not include image elements.

[0013] Optionally, the text in the response information is determined by inputting the search results obtained based on the query information into a set text processing model.

[0014] Optionally, the response information includes image elements and text elements, and the image in the response information is obtained from the information source corresponding to the text in the response information.

[0015] Optionally, if the matching degree between the image in the information source and the text in the response information is greater than or equal to a set threshold, the image in the response information is obtained from the information source;

[0016] If the matching degree is less than the set threshold, the image in the response information is obtained by searching based on the text in the response information.

[0017] Optionally, the response information includes image elements and text elements. The information source corresponding to the text in the response information does not have an image. The image in the response information is obtained by searching based on the text in the response information.

[0018] Optionally, the response information includes multiple information entries, and the output order of the multiple information entries is determined based on the mention rate of each information entry, wherein the mention rate of the information entry is determined based on the number of times the information entry appears in the search results obtained by matching the query information.

[0019] Optionally, the output corresponds to the response information of the query information, including:

[0020] Display the first set of response information and interactive controls corresponding to the query information;

[0021] In response to the interactive control being triggered, a second set of response information corresponding to the query information is displayed after the first set of response information.

[0022] Optionally, the output corresponds to the response information of the query information, including:

[0023] The response information is output in a time-sharing manner until a termination identifier is detected, at which point the output of the response information is considered complete.

[0024] Optionally, the response information includes: a first part of content and a second part of content, and the content of the response information output in time-division includes:

[0025] At the first point in time, output the first part of the response information;

[0026] At the second time point, the second part of the response information is output;

[0027] The first time point is later than the second time point, and the output operation at the second time point is executed after the first part of the content.

[0028] Optionally, before outputting the response information corresponding to the query information, the method further includes:

[0029] If the response information contains image elements and text elements, then for each information item, the image elements and text elements are displayed according to a preset display layout.

[0030] Optionally, the image in the response information is obtained by searching based on the text in the response information, including:

[0031] Extract one or more keywords from the text in the response information;

[0032] Using the keywords, a search is performed in the preset media resource library to obtain multiple candidate images;

[0033] The text feature vectors and image feature vectors of the multiple candidate images and the keywords are determined respectively, and the similarity between the text feature vectors and the image feature vectors is calculated.

[0034] Based on the similarity between multiple text feature vectors and image feature vectors, the image element in the response information that matches the text element is determined.

[0035] According to a second aspect of this disclosure, an information processing method is provided, comprising:

[0036] Receive user-input query information sent by the client;

[0037] Information search is performed based on the query information, and response information corresponding to the query information is determined based on the search results; wherein, the information elements contained in the response information are determined based on the text in the response information, and if there is a corresponding entity for the text in the response information, the response information contains image elements and text elements;

[0038] The response information is returned to the client for display.

[0039] Optionally, the step of performing an information search based on the query information includes:

[0040] Calculate the matching degree between the query information and each search result, and filter out search results with a matching degree higher than a preset threshold;

[0041] The search results with a matching degree exceeding a preset threshold are input into a set text processing model to determine the text in the response information corresponding to the query information.

[0042] According to a third aspect of this disclosure, an information processing apparatus is provided, comprising:

[0043] The sending module is used to send the query information to the server in response to user input;

[0044] A receiving module is used to receive response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and if the text in the response information has a corresponding entity, the response information includes image elements and text elements;

[0045] The output module is used to output the response information corresponding to the query information.

[0046] According to a fourth aspect of this disclosure, an electronic device is provided, including a memory and a processor, the memory storing a computer program for controlling the processor to operate in order to perform the method according to a first or second aspect of this disclosure.

[0047] According to a fifth aspect of this disclosure, a non-volatile computer-readable storage medium is provided that stores computer program instructions thereon, which, when executed by a processor, implement the method described in the first or second aspect.

[0048] According to a sixth aspect of this disclosure, a computer program product is provided that, when instructions in the computer program product are executed by a processor, implements the method described in the first or second aspect.

[0049] The information processing method disclosed herein determines the information elements contained in the response information based on the text in the response information. When the text in the response information has a corresponding entity, the response information contains both image elements and text elements. The method outputs response information containing both image elements and text elements, thereby achieving diversified output of response information and improving user experience.

[0050] The features and advantages of the embodiments of this disclosure will become clear from the following detailed description of exemplary embodiments with reference to the accompanying drawings. Attached Figure Description

[0051] The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the present disclosure and, together with their description, serve to explain the principles of the embodiments of the present disclosure.

[0052] Figure 1 shows a schematic flowchart of an information processing method according to some embodiments of the present disclosure;

[0053] Figure 2 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0054] Figure 3 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0055] Figure 4 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0056] Figure 5 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0057] Figure 6 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0058] Figure 7 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0059] Figure 8 illustrates a schematic diagram of response information display according to some embodiments of the present disclosure;

[0060] Figure 9 shows a schematic flowchart of an information processing method according to some embodiments of the present disclosure;

[0061] Figure 10 shows a structural block diagram of an information processing apparatus according to some embodiments of the present disclosure;

[0062] Figure 11 shows a structural block diagram of an information processing apparatus according to some embodiments of the present disclosure;

[0063] Figure 12 shows a structural block diagram of an electronic device according to some embodiments of the present disclosure;

[0064] Figure 13 shows a structural block diagram of an electronic device according to other embodiments of the present disclosure. Detailed Implementation

[0065] Various exemplary embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.

[0066] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit the embodiments of this disclosure or their application or use.

[0067] It should be noted that similar labels and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be discussed further in subsequent figures.

[0068] To address the aforementioned technical problems, this disclosure provides an information processing method that determines the information elements contained in the response information based on whether the text in the response information corresponds to an entity. When the text in the response information corresponds to an entity, the response information contains both image elements and text elements. The method outputs response information containing both image elements and text elements, thereby achieving diversified output of response information and improving the user experience.

[0069] Figure 1 shows a flowchart of an information processing method according to an embodiment of the present disclosure. This method can be applied to an APP (Application). The APP can be a client-side APP or a web-based APP. As shown in Figure 1, the method includes steps S110 to S130.

[0070] Step S110: In response to the user inputting query information, the query information is sent to the server.

[0071] Inquiries can be entered through the text input box on the app interface, or through the voice input control on the app interface. When entering information through the voice input control, the received voice signal is converted into text as the inquiry information.

[0072] Step S120: Receive response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and if there is a corresponding entity for the text in the response information, the response information contains image elements and text elements.

[0073] In some embodiments, if the text in the response information does not have a corresponding entity, the response information contains text elements but does not contain image elements.

[0074] In some embodiments, determining the information elements contained in the response information based on the text in the response information specifically includes: determining the type of information elements contained in the response information by inputting the text in the response information into a set recognition model; wherein, when the set recognition model determines that the text in the response information has a corresponding entity, the response information contains image elements and text elements.

[0075] The established recognition model is trained on a large number of training samples to identify whether the input text contains corresponding entities. The training samples consist of text containing entity words and text not containing entity words. Entity words are words with corresponding real-world objects. Here, "real-world objects" refers to objectively existing objects with a physical form in the real world.

[0076] Entity terms can be defined based on the specific classification of the object. For example, entity terms can be defined according to the object's origin and production method. Entity terms include natural objects and man-made objects. Natural objects include various animals, plants, and minerals. Man-made objects include industrial products (e.g., machines, vehicles, electronic devices), handicrafts (e.g., embroidery, weaving, carving), and works of art (e.g., paintings, sculptures, photographs). Alternatively, entity terms can be defined according to the object's purpose. Entity terms include food objects, daily necessities objects, learning and work objects, and transportation objects. Food objects include grains, vegetables, fruits, and meat. Daily necessities objects include household items (e.g., furniture, tableware, kitchenware), clothing, shoes, and bags (e.g., clothes, shoes, backpacks). Learning and work objects include office equipment (e.g., computers, printers, scanners), stationery (e.g., pens, folders), books, and laboratory equipment (e.g., test tubes, microscopes). Transportation objects include cars, airplanes, and ships.

[0077] In some embodiments, the text input to the set recognition model can be all the text in the response information or keywords extracted from the text in the response information.

[0078] For example, if the query is "What are some of the special delicacies of Guangzhou?", the responses might include: "white-cut chicken", "rice noodle rolls", "clay pot rice", "Cantonese roast goose", "Chaoshan beef", "barbecued pork", "shrimp dumplings", "wonton noodles", "sweet soup", and "beef offal". The text from these responses is then input into a pre-defined recognition model to determine if any corresponding entities exist. Based on the text in the responses, it can be confirmed that corresponding entities exist; the responses contain both image and text elements, meaning each specific food item corresponds to an image.

[0079] For example, if the query is "What are the iconic buildings in Beijing?", the responses might include the text "Forbidden City", "Great Wall", and "Temple of Heaven". The text from these responses is then input into a pre-defined recognition model to determine if any corresponding entities exist. Based on the text in the responses, it is confirmed that corresponding entities exist; the responses contain both image and text elements, meaning each iconic building corresponds to an image.

[0080] For example, if the query is "What is the process for medical insurance reimbursement in other places?", the corresponding response will contain text elements but not image elements.

[0081] In some embodiments, the text in the response information is determined by inputting search results obtained based on the query information into a predefined text processing model. The text processing model is used to understand user intent based on the query information, obtaining intent understanding results. The text processing model is also used to analyze and organize the text in the search results obtained based on the query information, generating text content that matches the intent understanding results, which is then used as the text in the response information.

[0082] A text processing model is a model trained using a large amount of text data. By training the text processing model with a large amount of text data, the model learns the basic rules of language, such as syntax, semantics, and contextual relationships. This enables the model to analyze and organize text data based on user intent, producing text content that meets the user's needs.

[0083] In some embodiments, after receiving a query, the server performs an information search based on the query to obtain at least one search result. All search results are then input into a predefined text processing model to determine the text in the response information corresponding to the query.

[0084] In some embodiments, after receiving a query, the server performs an information search based on the query to obtain at least one search result. The server then filters the search results based on the matching degree between the query and each search result, selecting those with a matching degree exceeding a preset threshold. The search results with a matching degree exceeding the preset threshold are input into a predefined text processing model to determine the text in the response information corresponding to the query.

[0085] In some embodiments, when the response information contains image elements, the image in the response information is obtained from the information source corresponding to the text in the response information. For example, when all search results are input into a set text processing model to determine the text in the response information corresponding to the query information, the information source corresponding to the text in the response information is the information source corresponding to all search results obtained by the server based on the query information. For example, when search results with a matching degree exceeding a preset threshold are input into a set text processing model to determine the text in the response information corresponding to the query information, the information source corresponding to the text in the response information is the information source corresponding to the search results with a matching degree exceeding the preset threshold for the query information.

[0086] In some embodiments, the response information includes image elements and text elements. The information source corresponding to the text in the response information does not have an image. The image in the response information is obtained by searching based on the text in the response information.

[0087] The images and text in the response message are matched, meaning they are related. When an image is obtained from the information source corresponding to the text in the response message, it is determined whether the image in the information source matches the text in the response message to ensure the accuracy of the images in the response message, thereby meeting user needs and improving user experience.

[0088] Specifically, first, the matching degree between the image in the information source and the text in the response information is determined. If the matching degree is greater than or equal to a set threshold, the image in the response information is obtained from the information source. If the matching degree is less than the set threshold, the image in the response information is obtained by searching based on the text in the response information. The matching degree between the image in the information source and the text in the response information can be determined as follows: keywords are extracted based on the text in the response information, and corresponding text feature vectors are determined based on the keywords. Corresponding image feature vectors are also determined based on the images in the information source. Then, the similarity between the text feature vectors and the corresponding image feature vectors for each image is calculated as the matching degree between the image in the information source and the text in the response information.

[0089] In some embodiments, when the image is obtained by the server based on the text in the response information, the server extracts keywords from the text in the response information and uses the keywords to search in a preset media resource library (e.g., a preset image library or online search) to obtain multiple candidate images. After the server obtains multiple candidate images based on the keywords, it determines the corresponding text feature vector based on the keywords and the corresponding image feature vector based on each image obtained from the search. Then, it calculates the similarity between the text feature vector and the corresponding image feature vector of each image. Based on the similarity, the image with the highest similarity value is selected as the image that matches the text in the response information.

[0090] In some embodiments, the feature vector of the text corresponding to the keyword is obtained by inputting the keyword into a set multimodal model, and the image feature vector is obtained by inputting the corresponding image into the multimodal model. The multimodal model can extract features of different modal information (e.g., images, text) and has the ability to capture the correlation between different modal information.

[0091] For example, if the query is "What are some of Guangzhou's signature dishes?", the server analyzes the text in the search results and finds "white-cut chicken," which is then used as the text in the response. Images are then retrieved from the information sources corresponding to the text in these responses. If the match between the image in the information source and "white-cut chicken" is greater than or equal to a set threshold, the image in the information source is used as the image in the response. If the match between the image in the information source and "white-cut chicken" is less than the set threshold, a search is performed based on "white-cut chicken" to retrieve the corresponding image.

[0092] For example, if the query is "Who are the main actors in the movie ××?", the server analyzes the text in the search results and obtains "Actor Zhang ××", "Actor Li ××", and "Actor Wang ××" as text in the response. Images are then retrieved from the information sources corresponding to these texts. These images are group photos of all the actors in the movie, not individual photos of each actor. Since the matching degree between the images from these information sources and "Actor Zhang ××", "Actor Li ××", and "Actor Wang ××" is less than a set threshold, these images cannot be used as individual photos for each actor. Instead, separate searches are performed for each actor based on their name.

[0093] Step S130: Output the response information corresponding to the query information.

[0094] In some embodiments, the response information may contain one or more information entries. When the response information contains multiple information entries, and includes both image and text elements, each information entry includes both an image and text.

[0095] For example, if the query is "What are the different kinds of Guangzhou specialty foods?", the preset display layout is shown in Figure 2. Specifically, the output response contains 10 information items, each including an image and text. "White-cut chicken" corresponds to one image and related text, "rice noodle roll" corresponds to one image and related text, and "claypot rice" corresponds to one image and related text. All other information items include one image and related text, as shown in Figure 2, which will not be elaborated further here.

[0096] For example, if the query is "What is the process for medical insurance reimbursement in other places?" (See Figure 3), the output response will contain one information item, which only includes text content.

[0097] In some embodiments, step S130 specifically includes: outputting the content of the response information in a time-sharing manner until a termination identifier is detected, thus determining that the output of the response information is complete. Outputting the content of the response information in a time-sharing manner allows users to see the processing results and progress in real time, avoiding sudden feedback of the response information content after a long wait, making the user experience smoother and more comfortable.

[0098] Specifically, at a first time, a first response to the query is output on the display interface; at a second time, a second response to the query is output on the display interface. This output process continues until the response to the query is completed. The first and second times are points in time or time periods, with the second time being later than the first time. The first and second response information are different components of the response to the query.

[0099] For example, based on the inquiry "What are the different types of specialty foods in Guangzhou?", Figure 4 shows a schematic diagram of the time-sharing response information. Figure 4 shows the display interface corresponding to four different times. The response information displayed on these four different time-related display interfaces increases sequentially, corresponding to the time-sharing output method.

[0100] It should be noted that, in addition to time-sharing output, the response information can also be output as a whole. Whole output displays the response information as a single unit, allowing it to be presented all at once.

[0101] In some embodiments, the response information includes multiple information items, and the output order of these information items is determined based on the mention rate of each information item. The mention rate of an information item is determined by the number of times it appears in the search results obtained by matching the query information. The higher the mention rate of an information item, the earlier it appears in the output order on the display interface. Prioritizing the display of information items with high mention rates allows users to find the information they need more quickly, improving the user experience.

[0102] In some embodiments, the mention rate of an information entry is determined based on the number of times the information entry appears in all search results obtained by matching the query information.

[0103] In some embodiments, the mention rate of an information item is determined based on the number of times the information item appears in a subset of the search results obtained from all search results matching the query. The subset of search results consists of results where the match between the query and the search results exceeds a preset threshold.

[0104] For example, based on the query "What are the different kinds of Guangzhou specialty foods?", the response information includes 10 information items. The higher the mention rate of an information item, the earlier the corresponding information item is output. The output order of the 10 information items is as follows: "white-cut chicken", "rice noodle roll", "clay pot rice", "Cantonese roast goose", "Chaoshan beef", "barbecued pork", "shrimp dumplings", "wonton noodles", "sweet soup", and "beef offal".

[0105] In some embodiments, when the response information includes multiple information items, the mention rate of each information item is displayed on the interface. By displaying the mention rate of each information item, users can clearly understand each item, which helps them better understand the source of the information and improves the user experience.

[0106] Referring to Figure 5, based on the query "What are the different types of Guangzhou specialty foods?", the response includes 10 information items. The output order of these 10 information items is as follows: "White Cut Chicken", "Rice Noodle Roll", "Claypot Rice", "Cantonese Roast Goose", "Chaoshan Beef", "Char Siu", "Shrimp Dumplings", "Wonton Noodles", "Sweet Soup", and "Beef Offal". The mention rate of each of these 10 information items is displayed on the interface.

[0107] In some embodiments, step S130 specifically includes: displaying a first set of response information and interactive controls corresponding to the query information; and, in response to the interactive controls being triggered, displaying a second set of response information corresponding to the query information following the first set of response information. This method of displaying response information step by step by triggering interactive controls allows users to obtain information according to their own needs, thereby improving the user experience. In addition, when the user does not need the second set of response information, the system does not need to load and display this information, thereby saving bandwidth and computing resources.

[0108] The number of information entries in the first group of response information can be set. The information entries in the second group of response information are the response information corresponding to the query information, excluding the response information from the first group of response information.

[0109] For example, based on the query "What are some of the special delicacies of Guangzhou?", the response information includes 10 items. The first set of responses includes "white-cut chicken", "rice noodle roll", "claypot rice", "Cantonese roast goose", and "Chaoshan beef". The second set of responses includes "char siu", "shrimp dumplings", "wonton noodles", "sweet soup", and "beef offal". Referring to Figure 6, the first set of responses corresponding to the query information, namely "white-cut chicken", "rice noodle roll", "claypot rice", "Cantonese roast goose", and "Chaoshan beef", is displayed on the display interface. At the same time, the interactive control corresponding to "expand more" is also displayed on the display interface. When the user triggers this interactive control, as shown in Figure 7, "char siu", "shrimp dumplings", "wonton noodles", "sweet soup", and "beef offal" are displayed after "white-cut chicken", "rice noodle roll", "claypot rice", "Cantonese roast goose", and "Chaoshan beef", and the text corresponding to the interactive control changes to "collapse". When the user triggers the interactive control, "barbecued pork", "shrimp dumplings", "wonton noodles", "sweet soup" and "beef offal" will no longer be displayed after "white-cut chicken", "rice noodle rolls", "claypot rice", "Cantonese roast goose" and "Chaoshan beef". At the same time, the text corresponding to the interactive control will change to "Expand for more".

[0110] In some embodiments, the display interface also shows selection controls for information element types of response information. These selection controls include selection controls for text and selection controls for images. The method further includes: in response to a user triggering the selection of an information element type for the response information, displaying response information corresponding to the selected information element type. This can meet the different needs of users based on the information element type of the response information, improving the user experience.

[0111] As shown in Figure 8, the display interface also shows selection controls for the information element types of the response information, namely "text" and "image". Users can click the selection button corresponding to "text" or "image" to trigger the information type of the response information.

[0112] A flowchart of an information processing method according to an embodiment of the present disclosure. This method can be applied to a server. As shown in FIG9, the method includes steps S910 to S930.

[0113] Step S910: Receive user input query information sent by the client.

[0114] Inquiries can be entered through the text input box on the app interface, or through the voice input control on the app interface. When entering information through the voice input control, the received voice signal is converted into text as the inquiry information.

[0115] Step S920: Perform information search based on the query information, and determine the response information corresponding to the query information based on the search results; wherein, the information elements contained in the response information are determined based on the text in the response information, and if there is a corresponding entity for the text in the response information, the response information contains image elements and text elements.

[0116] In some embodiments, if the text in the response information does not have a corresponding entity, the response information contains text elements but does not contain image elements.

[0117] In some embodiments, the server inputs text from the response information into a predefined recognition model to determine whether the text in the response information corresponds to a specific entity. The predefined recognition model is a model trained on a large number of training samples to identify whether the input text corresponds to a specific entity. The training samples include text containing entity words and text not containing entity words. Entity words are words with corresponding physical objects. Here, "physical objects" refers to objects that objectively exist in the real world and have a physical form.

[0118] In some embodiments, the server inputs the search results obtained based on the query information into a set text processing model to determine the text in the response information.

[0119] The text processing model is used to understand user intent based on query information, obtaining intent understanding results. The text processing model is also used to analyze and organize the text in the query results obtained based on query information, generating text content that matches the intent understanding results, which is then used as text in the response information.

[0120] A text processing model is a model trained using a large amount of text data. By training the text processing model with a large amount of text data, the model learns the basic rules of language, such as syntax, semantics, and contextual relationships. This enables the model to analyze and organize text data based on user intent, producing text content that meets the user's needs.

[0121] In some embodiments, after receiving a query, the server performs an information search based on the query to obtain at least one search result. All search results are then input into a predefined text processing model to determine the text in the response information corresponding to the query.

[0122] In some embodiments, after receiving a query, the server performs an information search based on the query to obtain at least one search result. The server then filters the search results based on the matching degree between the query and each search result, selecting those with a matching degree exceeding a preset threshold. The search results with a matching degree exceeding the preset threshold are input into a predefined text processing model to determine the text in the response information corresponding to the query.

[0123] In some embodiments, when the response information contains an image, the server obtains the image from the information source corresponding to the text in the response information.

[0124] The images and text in the response message are matched, meaning they are related. When an image is obtained from the information source corresponding to the text in the response message, the server determines whether the image in the information source matches the text in the response message to ensure the accuracy of the image in the response message, thereby meeting user needs and improving user experience.

[0125] Specifically, first, the server determines the matching degree between the image in the information source and the text in the response information. The matching degree can be determined as follows: keywords are extracted from the text in the response information, and corresponding text feature vectors are determined based on the keywords; corresponding image feature vectors are determined based on the images in the information source; then, the similarity between the text feature vectors and the corresponding image feature vectors for each image is calculated as the matching degree between the image in the information source and the text in the response information. If the matching degree between the image in the information source and the text in the response information is greater than or equal to a set threshold, the server obtains the image from the response information. If the matching degree is less than the set threshold, the server searches based on the text in the response information to obtain the image from the response information.

[0126] In some embodiments, if the response information contains an image, and the source of the information corresponding to the text in the response information does not have an image, the server searches for the image in the response information based on the text in the response information.

[0127] In some embodiments, when the image is obtained by the server based on the text in the response information, the server extracts keywords based on the text in the response information and obtains multiple images based on the keywords. After the server obtains multiple images based on the keywords, it determines the corresponding text feature vector based on the keywords and the corresponding image feature vector based on each image obtained from the search. Then, it calculates the similarity between the text feature vector and the corresponding image feature vector of each image. Based on the similarity, the image with the highest similarity value is selected as the image that matches the text in the response information.

[0128] In some embodiments, the feature vector of the text corresponding to the keyword is obtained by inputting the keyword into a set multimodal model, and the image feature vector is obtained by inputting the corresponding image into the multimodal model. The multimodal model can extract features of different modal information (e.g., images, text) and has the ability to capture the correlation between different modal information.

[0129] In some embodiments, where the response information includes multiple information entries, the server determines the mention rate for each information entry. The mention rate of an information entry is determined based on the number of times the information entry appears in the search results obtained by matching the query information.

[0130] Step S930: Return the response information to the client for display.

[0131] The information processing method provided in this disclosure enables diversified output of response information and improves user experience.

[0132] This disclosure also provides an information processing apparatus. FIG10 shows a structural block diagram of an information processing apparatus according to some embodiments. As shown in FIG10, the information processing apparatus 1000 may include a transmitting module 1010, a receiving module 1020, and an output module 1030.

[0133] The sending module 1010 is used to send the query information to the server in response to the user's input query information.

[0134] The receiving module 1020 is used to receive the response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and when there is a corresponding entity for the text in the response information, the response information contains image elements and text elements.

[0135] The output module 1030 is used to output response information corresponding to the query information.

[0136] In some embodiments, if the text in the response information does not have a corresponding entity, the response information contains text elements but does not contain image elements.

[0137] In some embodiments, the information elements contained in the response information are determined based on the text in the response information, including: the type of information elements contained in the response information is determined by inputting the text in the response information into a set recognition model; wherein, when the set recognition model determines that the text in the response information has a corresponding entity, the response information contains image elements and text elements.

[0138] In some embodiments, if the established recognition model determines that the text in the response information does not correspond to a specific entity, the response information contains text elements but does not contain image elements.

[0139] In some embodiments, the text in the response information is determined by inputting the search results obtained based on the query information into a set text processing model.

[0140] In some embodiments, the response information includes image elements and text elements, and the image in the response information is obtained from the information source corresponding to the text in the response information.

[0141] In some embodiments, if the matching degree between the image in the information source and the text in the response information is greater than or equal to a set threshold, the image in the response information is obtained from the information source; if the matching degree is less than the set threshold, the image in the response information is obtained by searching based on the text in the response information.

[0142] In some embodiments, the response information includes image elements and text elements. The information source corresponding to the text in the response information does not have an image. The image in the response information is obtained by searching based on the text in the response information.

[0143] In some embodiments, the response information includes multiple information entries, and the output order of the multiple information entries is determined based on the mention rate of each information entry, wherein the mention rate of an information entry is determined based on the number of times the information entry appears in the search results obtained by matching the query information.

[0144] In some embodiments, the output module 1030 is used to display a first set of response information and interactive controls corresponding to the query information; in response to the interactive controls being triggered, a second set of response information corresponding to the query information is displayed after the first set of response information.

[0145] In some embodiments, the output module 1030 is used to output the content of the response information in a time-sharing manner until a termination identifier is detected, thus determining that the output of the response information is complete.

[0146] In some embodiments, the output module 1030 is used to output a first part of the response information at a first time point.

[0147] The output module 1030 is used to output the second part of the response information at a second time point.

[0148] The first time point is later than the second time point, and the output operation at the second time point is executed after the first part of the content.

[0149] In some embodiments, if the response information includes image elements and text elements, the output module 1030 is used to control the display of the image elements and text elements according to a preset display layout for each information item.

[0150] In some embodiments, one or more keywords are extracted from the text in the response information; the keywords are used to search in a preset media resource library to obtain multiple candidate images; the text feature vectors and image feature vectors of the multiple candidate images and keywords are determined respectively, and the similarity between the text feature vectors and image feature vectors is calculated; based on the similarity between the multiple text feature vectors and image feature vectors, the image elements in the response information that match the text elements are determined.

[0151] This disclosure also provides an information processing apparatus. FIG11 shows a structural block diagram of an information processing apparatus according to some embodiments. As shown in FIG11, the information processing apparatus 1100 may include a receiving module 1110, a response information determining module 1120, and a response information sending module 1130.

[0152] The receiving module 1110 is used to receive user input query information sent by the client.

[0153] The response information determination module 1120 is used to perform information search based on the query information and determine the response information corresponding to the query information based on the search results; wherein, the information elements contained in the response information are determined based on the text in the response information, and when there is a corresponding entity for the text in the response information, the response information contains image elements and text elements.

[0154] The response information sending module 1120 is used to return the response information to the client for display.

[0155] In some embodiments, the matching degree between the query information and each search result is calculated, and search results with a matching degree higher than a preset threshold are filtered out; the search results with a matching degree higher than the preset threshold are input into a set text processing model to determine the text in the response information corresponding to the query information.

[0156] This disclosure also provides an electronic device for implementing any of the above method embodiments. FIG12 shows a structural block diagram of an electronic device 1200 according to some embodiments. The electronic device 1200 may be a PC, workstation, laptop computer, server, etc., and is not limited thereto.

[0157] As shown in FIG12, the electronic device 1200 includes a processor 1210 and a memory 1220 for storing executable instructions of the processor 1210. The processor 1210 is configured to implement an information processing method according to any embodiment of the present disclosure when executing the instructions stored in the memory 1220.

[0158] The processor 1210 is used to execute computer instructions, which can be written using instruction sets of architectures such as x86, Arm, RISC, MIPS, and SSE. The memory 1220 includes, for example, ROM (Read-Only Memory), RAM (Random Access Memory), and non-volatile memory such as hard disks, etc., and is not limited thereto.

[0159] Figure 13 shows a structural block diagram of an electronic device according to some other embodiments. As shown in Figure 13, in addition to the processor 1210 and the memory 1220, the electronic device 1200 may also include a display device 1230, an interface device 1240, a communication device 1250, an input device 1260, etc.

[0160] Interface device 1240 includes, for example, a USB interface, a bus interface, a network interface, etc. Communication device 1250 is capable of wired or wireless communication, and may include at least one short-range communication module, such as any module for short-range wireless communication based on short-range wireless communication protocols such as Hilink, WiFi (IEEE 802.11), Mesh, Bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, etc. Communication device 1250 may also include a long-range communication module, such as any module for WLAN, GPRS, 2G / 3G / 4G / 5G long-range communication. Display device 1230 can display an application interface. Input device 1260 may include a touchscreen, keyboard, mouse, microphone, camera, etc., without limitation.

[0161] This disclosure also provides a non-volatile computer-readable storage medium having computer program instructions stored thereon. When executed by a processor, the computer program instructions implement the information processing method provided in any of the above embodiments.

[0162] This disclosure also provides a computer program product that, when executed by a processor, implements the information processing method provided in any of the above embodiments.

[0163] The various embodiments in this disclosure are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. For the apparatus embodiments, relevant parts can be referred to the descriptions of the method embodiments.

[0164] The foregoing has described specific embodiments of this disclosure. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired results. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

[0165] Embodiments of this disclosure may be systems, methods, and / or computer program products. A computer program product may include a computer-readable storage medium having computer instructions stored thereon for causing a processor to implement various aspects of the embodiments of this disclosure.

[0166] Computer-readable storage media can be tangible devices capable of holding and storing computer instructions for use by computer instruction execution devices. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing computer instructions thereon, and any suitable combination thereof. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.

[0167] The computer instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network layer, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network layer may include copper cables, fiber optic cables, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network layer adapter card or network layer interface in each computing / processing device receives computer instructions from the network layer and forwards those instructions for storage on computer-readable storage media within the respective computing / processing device.

[0168] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of computer instructions, which includes one or more executable computer instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions. It will be known to those skilled in the art that implementation in hardware, implementation in software, and implementation in a combination of software and hardware are equivalent.

[0169] The various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, and are not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope of the described embodiments. The terminology used herein is chosen to best explain the principles, practical application, or improvement of the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.

Claims

1. An information processing method, wherein, include: In response to user input of query information, the query information is sent to the server; Receive the response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and if the text in the response information has a corresponding entity, the response information contains image elements and text elements; Output the response information corresponding to the query information.

2. The method according to claim 1, wherein, If the text in the response information does not have a corresponding entity, the response information contains text elements but does not contain image elements.

3. The method according to claim 1, wherein, The information elements contained in the response information are determined based on the text in the response information, including: The information element types contained in the response information are determined by inputting the text in the response information into a set recognition model; wherein, when the set recognition model determines that the text in the response information has a corresponding entity, the response information contains image elements and text elements.

4. The method according to claim 3, wherein, If the established recognition model determines that the text in the response information does not correspond to a specific entity, the response information contains text elements but does not contain image elements.

5. The method according to any one of claims 1-4, wherein, The text in the response information is determined by inputting the search results obtained based on the query information into a set text processing model.

6. The method according to claim 1 or 3, wherein, The response information includes image elements and text elements, and the image in the response information is obtained from the information source corresponding to the text in the response information.

7. The method according to claim 6, wherein, If the matching degree between the image in the information source and the text in the response information is greater than or equal to a set threshold, the image in the response information is obtained from the information source; If the matching degree is less than the set threshold, the image in the response information is obtained by searching based on the text in the response information.

8. The method according to claim 1 or 3, wherein, The response information includes image elements and text elements. The information source corresponding to the text in the response information does not have an image. The image in the response information is obtained by searching based on the text in the response information.

9. The method according to any one of claims 1-8, wherein, The response information includes multiple information entries, and the output order of the multiple information entries is determined based on the mention rate of each information entry, wherein the mention rate of the information entry is determined based on the number of times the information entry appears in the search results obtained by matching the query information.

10. The method according to any one of claims 1-9, wherein, The output corresponds to the response information of the query information, including: Display the first set of response information and interactive controls corresponding to the query information; In response to the interactive control being triggered, a second set of response information corresponding to the query information is displayed after the first set of response information.

11. The method according to any one of claims 1-10, wherein, The output corresponds to the response information of the query information, including: The response information is output in a time-sharing manner until a termination identifier is detected, at which point the output of the response information is considered complete.

12. The method according to claim 11, wherein, The response information includes: a first part and a second part, and the content of the response information output in a time-division manner includes: At the first point in time, output the first part of the response information; At the second time point, the second part of the response information is output; The first time point is later than the second time point, and the output operation at the second time point is executed after the first part of the content.

13. The method according to any one of claims 1-12, wherein, Before outputting the response information corresponding to the query information, the method further includes: If the response information contains image elements and text elements, then for each information item, the image elements and text elements are displayed according to a preset display layout.

14. The method according to claim 7 or 8, wherein, The images in the response information are obtained by searching based on the text in the response information, including: Extract one or more keywords from the text in the response information; Using the keywords, a search is performed in the preset media resource library to obtain multiple candidate images; The text feature vectors and image feature vectors of the multiple candidate images and the keywords are determined respectively, and the similarity between the text feature vectors and the image feature vectors is calculated. Based on the similarity between multiple text feature vectors and image feature vectors, the image element in the response information that matches the text element is determined.

15. An information processing method, wherein, include: Receive user-input query information sent by the client; Information search is performed based on the query information, and response information corresponding to the query information is determined based on the search results; wherein, the information elements contained in the response information are determined based on the text in the response information, and if there is a corresponding entity for the text in the response information, the response information contains image elements and text elements; The response information is returned to the client for display.

16. The method according to claim 15, wherein, The information search based on the query information includes: Calculate the matching degree between the query information and each search result, and filter out search results with a matching degree higher than a preset threshold; The search results with a matching degree exceeding a preset threshold are input into a set text processing model to determine the text in the response information corresponding to the query information.

17. An information processing apparatus, wherein, include: The sending module is used to send the query information to the server in response to user input; A receiving module is used to receive response information returned by the server in response to the query information; wherein, the information elements contained in the response information are determined based on the text in the response information, and if the text in the response information has a corresponding entity, the response information includes image elements and text elements; The output module is used to output the response information corresponding to the query information.

18. An electronic device, wherein, It includes a memory and a processor, the memory storing a computer program for controlling the processor to operate in order to perform the method according to any one of claims 1 to 16.

19. A non-volatile computer-readable storage medium having computer program instructions stored thereon, wherein, When the computer program instructions are executed by the processor, they implement the method described in any one of claims 1 to 16.

20. A computer program product, wherein, The method described in any one of claims 1 to 16 is implemented when the instructions in the computer program product are executed by a processor.