Resource retrieval method and apparatus, electronic device, storage medium, and product
By obtaining the retrieval numerical identifier of the resource search text and performing resource retrieval based on semantic information, the problem of traditional keyword matching being unable to recall semantically relevant resources is solved, achieving accurate resource retrieval and efficient retrieval response, and improving user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HANGZHOU NETEASE CLOUD MUSIC TECH CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309798A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of information retrieval technology, and in particular to a resource retrieval method, apparatus, electronic device, storage medium, and product. Background Technology
[0002] With the booming development of the digital music industry, the music libraries of online music platforms have grown exponentially. While providing users with a wealth of choices, this massive amount of music content also places higher demands on music retrieval technology. Users' search needs have gradually shifted from traditional precise keyword searches (such as song titles and artists) to complex scenarios such as general semantic searches and fuzzy memory searches, for example, retrieving target songs through descriptions of mood, emotional expressions, life scenarios, or fragmented memory of lyrics.
[0003] In related technologies, traditional resource retrieval methods typically rely on keyword matching, matching the user-input resource search text with basic resource metadata (song titles, artists, etc.). However, this method over-relies on the superficial overlap between the search text and the metadata. For resources that are semantically highly relevant but whose keywords are not precisely matched, the system cannot effectively recall them, resulting in wasted computational resources and limited retrieval scope. Furthermore, when the search text contains ambiguous resource content, the keyword matching mechanism has extremely low tolerance for memory biases such as synonym substitution and word order reversal, leading to poor retrieval recall. Summary of the Invention
[0004] This invention provides a resource retrieval method, apparatus, electronic device, storage medium, and product to achieve the effect of accurately retrieving media resources that meet resource search needs based on retrieval digital identifiers that characterize the semantic information of the resources.
[0005] According to one aspect of the present invention, a resource retrieval method is provided, the method comprising: In response to a resource search request, the system obtains the resource search text and determines at least one retrieval number identifier corresponding to the resource search text; wherein the retrieval number identifier is used to characterize the resource semantic information of the resource search text. Based on at least one of the aforementioned retrieval digital identifiers, the resource identifiers of at least one candidate media resource are retrieved from the resource index information; wherein, the resource index information is determined based on the retrieval digital identifier and the resource identifier of the media resource corresponding to the retrieval digital identifier; The semantic relevance is determined based on the resource identifiers of the candidate media resources; Based on the semantic relevance, a target media resource corresponding to the resource search text is determined from at least one of the candidate media resources, and the target media resource is returned as the search result.
[0006] According to another aspect of the present invention, a resource retrieval apparatus is provided, the apparatus comprising: The search text acquisition module is used to acquire resource search text in response to a resource search request, and determine at least one retrieval numeric identifier corresponding to the resource search text; wherein, the retrieval numeric identifier is used to characterize the resource semantic information of the resource search text; The resource retrieval module is used to retrieve the resource identifier of at least one candidate media resource from the resource index information based on at least one of the retrieval digital identifiers; wherein, the resource index information is constructed based on the retrieval digital identifier and the resource identifier of the candidate media resource corresponding to the retrieval digital identifier; The relevance determination module is used to determine the semantic relevance based on the resource identifier of the candidate media resources; The target resource determination module is used to determine the target media resource corresponding to the resource search text from at least one of the candidate media resources based on the semantic relevance, and return the target media resource as the search result.
[0007] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: One or more processors; Storage device for storing one or more programs. When one or more programs are executed by one or more processors, the one or more processors implement a resource retrieval method as described in any of the embodiments of this disclosure.
[0008] According to another aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for causing a processor to execute and implement any of the resource retrieval methods of the present invention.
[0009] According to another aspect of the present disclosure, a computer program product is provided, which, when executed by a processor, implements any of the resource retrieval methods described in the embodiments of the present disclosure.
[0010] The technical solution of this disclosure, in response to a resource search request, acquires the resource search text and determines at least one retrieval numerical identifier corresponding to the resource search text. The retrieval numerical identifier is used to characterize the resource semantic information of the resource search text, transforming the unstructured resource search text into an indexable and computable retrieval numerical identifier. This achieves a structured representation of the deep semantic information of the search text, breaking through the limitations of traditional keyword literal matching and laying a core data foundation for subsequent precise resource retrieval based on semantic dimensions. Furthermore, by retrieving at least one candidate media resource's resource identifier from the resource index information based on at least one retrieval numerical identifier, resource matching and retrieval are directly achieved based on semantic dimensions, avoiding the semantic blind spots of traditional keyword indexing. Simultaneously, the lightweight integer identifier and inverted index structure significantly improve the retrieval response efficiency. Furthermore, by determining the semantic relevance based on the resource identifier of the candidate media resources, refined semantic screening of candidate resources is achieved, effectively eliminating resources weakly related to the search intent and providing accurate quantitative basis for subsequent result ranking. Furthermore, by determining the target media resource corresponding to the resource search text from at least one candidate media resource based on semantic relevance, and returning the target media resource as the search result, the search results are accurately matched with the user's search intent, significantly improving the quality of search results and the user's search experience, while avoiding interference from irrelevant resources in the result display. The technical solution of this disclosure addresses the problems of low resource retrieval accuracy and poor retrieval recall caused by relying on keyword matching for resource retrieval in related technologies. It achieves the effect of accurately retrieving media resources that meet resource search needs based on retrieval digital identifiers representing the semantic information of resources, breaking through the semantic blind spots of traditional keyword retrieval. It not only achieves accurate matching between search results and user search intent, but also improves retrieval response efficiency through lightweight digital identifiers and inverted index structures, significantly optimizing the accuracy of media resource retrieval and user experience.
[0011] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description
[0012] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0013] Figure 1 A flowchart illustrating a resource retrieval method provided in an embodiment of this disclosure; Figure 2 A flowchart illustrating another resource retrieval method provided in this embodiment of the disclosure; Figure 3 This is a schematic diagram of the structure of a resource retrieval device provided in an embodiment of the present disclosure; Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. Detailed Implementation
[0014] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0015] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0016] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.
[0017] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.
[0018] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.
[0019] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.
[0020] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.
[0021] Figure 1 This is a flowchart illustrating a resource retrieval method provided in an embodiment of this disclosure. This embodiment is applicable to situations where candidate media resources in a resource database are retrieved based on received resource retrieval text. This method can be executed by a resource retrieval device, which can be implemented in hardware and / or software and can be configured in electronic devices such as computers or servers. Figure 1 As shown, the method in this embodiment includes: S110. In response to a resource search request, obtain the resource search text and determine at least one retrieval numeric identifier corresponding to the resource search text.
[0022] In this context, a resource retrieval request can be understood as an instruction initiated by an end-user to the resource retrieval system to obtain target media resources. In a resource retrieval scenario, a resource retrieval request can be a service call instruction triggered by the user entering query text on the media resource platform, serving as the input signal to initiate the entire retrieval process. Optionally, a resource retrieval request can be used to retrieve all media resources stored within the platform. The resource types of media resources can include at least one of the following: video, audio, audio-visual, text, and images. Resource search text can be understood as the text content entered by the user to obtain target media resources, serving as the direct carrier of the user's retrieval intent. Optionally, the text content of resource search text includes at least one of the following: resource keywords; semantic descriptive information (such as mood, emotion, scene, etc.); and fuzzy text fragments (such as lyric fragments, melody-related descriptions, etc.). For example, in a music retrieval scenario, resource search text can include music search text used to retrieve the desired target music. Its form can include at least one of the following: song information, song mood description, and lyric fragments. For instance, music search text could be "a song with hope amidst despair" or "a song suitable for doing homework," etc. Retrieval digital identifiers can be understood as discrete integer identifiers generated after semantic encoding and discretization / quantization of resource search text; their essence is the structured semantic fingerprint of the resource search text. In this embodiment, unstructured resource search text is transformed into computable and indexable digital information to obtain retrieval digital identifiers corresponding to the resource search text. Retrieval digital identifiers can be used to characterize the resource semantic information of the resource search text. The digital form of retrieval digital identifiers can include at least one of decimal, binary, and other number systems. Optionally, the number of retrieval digital identifiers corresponding to the resource search text can include one or more. When there are multiple retrieval digital identifiers, different retrieval digital identifiers can be used to characterize the resource semantic information of the resource search text under different semantic dimensions. Resource semantic information can be understood as the deeper meaning beyond the literal keywords contained in the resource search text, including but not limited to emotional tendencies (such as sadness, joy, etc.), application scenarios (such as sleep aid for insomnia, fat burning through exercise, etc.), and content themes (such as youthful memories, homesickness, etc.), which are the core basis for achieving resource mood matching and fuzzy matching. At least one retrieval digital identifier corresponding to the resource search text can be used to characterize the user's potential search intent for the target media resource. For example, assuming the search identifiers are decimal numbers, there are three search identifiers corresponding to the resource search text, representing sentiment, application scenario, and content topic, respectively. Specifically, the search identifier corresponding to sentiment can be 23; the search identifier corresponding to the application scenario can be 79; and the search identifier corresponding to the content topic can be 512.
[0023] In this embodiment, the input method for resource search text may include at least one, and these at least one input method will be described below.
[0024] Optionally, obtaining resource search text includes: obtaining resource search text in response to text input operations to the search box.
[0025] The search box can be understood as a text input interaction control deployed on the resource retrieval system client (such as a mobile app or webpage), used to receive resource search text manually entered by the user. Users can input their queries (such as "light music suitable for commuting") into the search box using a keyboard, handwriting, etc., making it the core interactive entry point for resource retrieval. The text input operation can be understood as the interactive action performed by the user within the search box, used to enter the text information to be queried. The text input operation can trigger a system response, directly obtaining the original resource search text entered by the user, and can be a basic retrieval triggering method.
[0026] In one implementation, a search box is displayed on the interface. Furthermore, upon receiving a text input operation for the search box, the system responds to the text input operation and determines the entered text information based on the text input operation. Furthermore, the entered text information can be used as resource search text.
[0027] Optionally, obtaining resource search text includes: in response to a voice input operation on a voice input control, determining the resource search text based on the input voice information.
[0028] The voice input control can be understood as an entry point control for voice interaction deployed on the client side, used to receive user voice input commands. Typically, after clicking the voice input control, the user can input their voice query, and the system uses speech recognition technology to convert the speech into text. The voice input operation can be understood as the user's interactive operation on the voice input control; this operation can be used to verbally state the query intent and complete the voice recording and submission. The voice input operation can be a trigger condition for voice retrieval, distinct from manual text input, and suitable for scenarios where typing is inconvenient (such as searching for music while driving). Voice information can be understood as audio data generated by the user through the voice input operation, which can contain the user's search intent (such as verbally stating "Play ancient style songs by singer A"). Typically, speech recognition technology can be used to convert the received voice information into a processable text format, thereby obtaining the resource search text.
[0029] In another implementation, a voice input control is displayed on the interface. Furthermore, upon receiving a voice input operation on the voice input control, the system responds to the voice input operation and determines the input voice information based on the operation. Further, a speech recognition model can be invoked to convert the input voice information into processable text, and the converted text can be used as resource search text.
[0030] Optionally, obtaining the resource search text includes: in response to a control selection operation on the search option control, determining the resource search text based on the selected resource selection control.
[0031] The search option control can be understood as a pre-defined quick search interactive component deployed on the interface of the resource retrieval system client (such as a mobile app or webpage). It is typically presented visually as category tags, recommended terms, or scene buttons (e.g., buttons like "Rainy Day Healing," "Exercise to Burn Fat," and "Nostalgic Old Songs"). The search option control can serve as an interactive entry point carrying a pre-defined search intent. The search option control is associated with resource search text. This association can be understood as the operation of establishing a one-to-one mapping between the search option control and specific resource search text during the system's backend configuration phase. For example, associating the "Rainy Day Healing" control with the resource search text "Healing songs suitable for rainy days." When a user triggers the control selection operation, the system can automatically retrieve the associated text. The control selection operation can be understood as a triggered interactive operation performed by the user on the search option control on the client interface. Control selection operations can include at least one of the following: click operation, long press operation, selection toggle operation, etc. The control selection operation can be a trigger condition for initiating a quick search process, allowing the user to initiate a search request without manually entering text.
[0032] In this embodiment, when the user-input resource search text is obtained, it can be processed according to a text encoding unit to determine at least one retrieval numeric identifier corresponding to the resource search text. The text encoding unit may include at least one of a text encoding model, a module integrating a text encoding algorithm, a conversational language model, and an intelligent agent. As a specific implementation, the resource search text can be input into a pre-trained end-to-end deep text encoding model (e.g., a Transformer-based model), which is trained to directly map the input text into a set of discrete integers, i.e., the retrieval numeric identifier. As an alternative implementation, a general word vector model (such as the Word2Vec model) can first be used to convert the words in the resource search text into word vectors. Then, cluster analysis (e.g., using the K-means algorithm) can be performed on these word vectors to map the entire text to several cluster centers, and the cluster center numbers can be used as the retrieval numeric identifier corresponding to the text.
[0033] Optionally, determining at least one retrieval numeric identifier corresponding to the resource search text includes: inputting the resource search text into a pre-trained text encoding model to obtain an output at least one retrieval numeric identifier corresponding to the resource search text.
[0034] The text encoding model can be understood as a pre-trained encoder for processing text information, which can be used to extract and discretize semantic features. The text encoding model can capture deep semantics related to media resources in resource search text and discretize the captured deep semantic vectors. The text encoding model can include a neural network model with any model structure, optionally including a BERT-based Transformer architecture. In this embodiment, the text encoding model can include a semantic encoding module and a quantization encoding module.
[0035] In this embodiment, upon obtaining the resource search text, the resource search text can be input into a pre-trained text encoding model. Furthermore, the resource search text can be processed sequentially by the semantic encoding module and the quantization encoding module within the text encoding model to obtain at least one retrieval numeric identifier corresponding to the resource search text.
[0036] Optionally, the resource search text is input into a pre-trained text encoding model to obtain at least one retrieval numeric identifier corresponding to the resource search text, including: inputting the resource search text into a semantic encoding module to obtain at least one search semantic vector corresponding to the resource search text; and inputting at least one search semantic vector corresponding to the resource search text into a quantization encoding module to obtain at least one retrieval numeric identifier corresponding to the resource search text.
[0037] The semantic encoding module can be understood as the core functional unit in the text encoding model used to extract semantic features from the text. It captures deep semantic information related to media resources within the text and outputs corresponding high-dimensional vector representations. The search semantic vector can be understood as a continuous floating-point vector output by the semantic encoding module after encoding the resource search text. Each search semantic vector corresponds to a semantic feature dimension of the text and is a numerical representation of the semantics of the resource search text. To address the ambiguity of media resource content, the semantic encoding module can output multiple sets of search semantic vectors, each corresponding to a different semantic aspect of the resource search text. That is, when there are multiple search semantic vectors corresponding to the resource search text, different search semantic vectors can correspond to different semantic feature dimensions of the resource search text. For example, assuming there are three search semantic vectors, the first search semantic vector can correspond to information related to "genre / style"; the second search semantic vector can correspond to information related to "emotion / atmosphere"; and the third search semantic vector can correspond to information related to "lyrics content". The quantization encoding module can be understood as the core functional unit in the text encoding model used to convert continuous vectors into discrete identifiers. The quantization encoding module can be used to project high-dimensional, continuous semantic vectors onto a predefined finite set of integers. Quantization operations reduce vector storage and computation costs while preserving core semantic features. Optionally, the quantization encoding module may include a linear projection layer, a vector quantization layer, and a numerical analysis layer.
[0038] In one implementation, when the resource search text is input into the text encoding model, the resource search text can be received by a semantic encoding module. Based on the semantic encoding module, deep semantic features of the resource search text are extracted, and at least one search semantic vector is generated according to the extracted deep semantic features. Further, the at least one search semantic vector can be input into a quantization encoding module. The at least one search semantic vector is then compressed to reduce its dimensionality using a linear projection layer in the quantization encoding module, resulting in at least one low-dimensional semantic vector. Further, the at least one low-dimensional semantic vector is binary quantized using a vector quantization layer in the quantization encoding module, resulting in binary strings corresponding to each low-dimensional semantic vector. Further, the at least one binary string is converted to decimal using a numerical parsing layer, and the resulting decimal value is used as the retrieval numeric identifier corresponding to the resource search text, thereby obtaining at least one retrieval numeric identifier corresponding to the resource search text.
[0039] S120. Based on at least one retrieval digital identifier, retrieve the resource identifier of at least one candidate media resource corresponding to the resource search text from the resource index information.
[0040] Resource index information can be understood as a pre-generated and stored semantic index structure, which can be used to retrieve media resources based on search digits. Resource index information can be determined based on search digits and the resource identifiers of the corresponding media resources. In other words, resource index information can be constructed based on the mapping relationship between search digits and the resource identifiers of media resources. It should be noted that resource index information differs from traditional keyword indexes; it directly establishes the association between search digits and resource identifiers based on semantic dimensions, and is the core data foundation for achieving semantic recall. There are multiple ways to implement the index structure of resource index information. In one optional implementation, an inverted index structure based on search digits can be used, where the search digits are used as index keys and resource identifiers are used as index values. In another optional implementation, a hash table structure can be used, where the search digits are hashed to obtain a hash bucket address, and the corresponding list of resource identifiers is stored at that address. This structure is suitable for scenarios with extremely high requirements for retrieval speed and relatively controllable data volume. Candidate media resources can be understood as a set of media resources with specific semantic features that can be associated with corresponding search digits. Alternatively, candidate media resources can be understood as all existing media resources that have completed semantic encoding and quantization. A resource identifier can be understood as a numeric or string identifier used to uniquely identify each media resource. Optionally, the resource identifier may include at least one of the following: media resource name, media resource code, and media resource thumbnail. It is understood that the resource identifier is globally unique, and the system can directly locate the corresponding media resource file, metadata information, etc., through the resource identifier, serving as a bridge between index information and actual media resources. Candidate media resources can be understood as media resources that have a potential semantic relationship with the resource search text, found by retrieving numeric identifiers from the resource index information. Candidate media resources can be media resources semantically related to the resource search text and are the objects of processing in the subsequent fine-tuning stage.
[0041] It should be noted that, since the resource index information is an index structure built using the retrieval numeric identifier as the index key, during the resource index information construction phase, different media resources stored in the media resource retrieval system may contain the same retrieval numeric identifier. Therefore, in the resource index information, the resource identifier of the media resource corresponding to the retrieval numeric identifier may include one or more identifiers.
[0042] In one optional implementation, if at least one retrieval digit identifier corresponding to the resource search text is obtained, a search can be performed in the resource index information based on the retrieval digit identifier to determine the resource identifier of the media resource corresponding to the retrieval digit identifier, and the determined media resource can be used as a candidate media resource corresponding to the resource search text. Furthermore, if all at least one retrieval digit identifiers have been retrieved, the resource identifiers of at least one candidate media resource corresponding to the resource search text can be obtained.
[0043] In this embodiment, to ensure the comprehensiveness of the recall scope, the resource identifiers retrieved separately for all searched digital identifiers can be integrated to finally merge into a candidate resource identifier set covering all potential matching objects, providing a data foundation for subsequent relevance screening and ranking.
[0044] Optionally, based on at least one retrieval number identifier, retrieve the resource identifier of at least one candidate media resource from the resource index information, including: for at least one retrieval number identifier corresponding to the resource search text, obtain the resource identifier of the media resource associated with the retrieval number identifier from the resource index information; and determine the union of the resource identifiers obtained for at least one retrieval number identifier as the resource identifier of at least one candidate media resource corresponding to the resource search text.
[0045] In another optional implementation, for at least one retrieval digit identifier corresponding to the resource search text, a search is performed in the resource index information based on the retrieval digit identifier to determine the resource identifier of the media resource associated with the retrieval digit identifier from the resource index information. Further, a set operation is performed on the resource identifiers obtained for the at least one retrieval digit identifier to determine the union of the obtained resource identifiers, and the determined union is determined as the resource identifier of at least one candidate media resource corresponding to the resource search text.
[0046] S130. Determine the semantic relevance based on the resource identifiers of the candidate media resources.
[0047] Semantic relevance can be understood as a quantitative indicator of the degree of semantic matching between the resource search text and the candidate media resources. In other words, semantic relevance determines the degree of matching between the semantic information of the candidate media resources and the resource search text. A higher semantic relevance indicates a stronger semantic fit between the candidate media resources and the resource search text; a lower semantic relevance indicates a weaker semantic fit between the candidate media resources and the resource search text.
[0048] In practical applications, when candidate media resources corresponding to the search text are retrieved based on media resource index information, these candidate resources may be directly returned as search results. This method of media resource retrieval may suffer from a lack of precise semantic matching between candidate resources and search intent, resulting in a mixture of strongly and weakly related resources. This not only fails to guarantee the accuracy of search results but also reduces the reach efficiency of high-quality long-tail resources, thereby lowering the accuracy of resource searches and impacting the user's resource search experience.
[0049] To address the above issues, in this embodiment, after obtaining at least one candidate media resource corresponding to the resource search text, the semantic relevance between the candidate media resource and the resource search text is further determined. Furthermore, the search result corresponding to the resource search text can be determined based on the semantic relevance of at least one candidate media resource.
[0050] In this embodiment, the semantic relevance can be determined by at least one of the following methods: using a relevance matching model to calculate the semantic relevance between candidate media resources and resource search text; determining the vector relevance between the resource semantic vector of the candidate media resource and the search semantic vector of the resource search text, and determining the semantic relevance between the candidate media resource and the resource search text based on the vector relevance. One such determination method is described in detail below.
[0051] Optionally, determining semantic relevance based on the resource identifier of the candidate media resource includes: retrieving at least one search association information corresponding to the candidate media resource based on the resource identifier of the candidate media resource, and determining a search text sequence corresponding to the candidate media resource based on the at least one search association information; determining at least one resource semantic vector corresponding to the candidate media resource based on the search text sequence corresponding to the candidate media resource; and determining at least one search semantic vector corresponding to the resource search text; and determining the semantic relevance between the candidate media resource and the resource search text based on the at least one resource semantic vector and the at least one search semantic vector.
[0052] The retrieval-related information can be understood as multi-dimensional structured data bound to the candidate media resource and used to characterize its core semantics. Optionally, the retrieval-related information may include at least one of the following: resource information, resource content information, feedback information, and resource attribute information. For example, assuming the candidate media resource is a song, its corresponding retrieval-related information may include at least one of the following: song information (including song title, artist name, album name, and release year, etc.), high-weight lyric fragments (such as chorus segments or frequently repeated paragraphs in the lyrics), song feedback information (such as the most liked and / or most interactive comments in the song comments, etc.), and audio attribute tags (such as song genre, song rhythm, and instrumental music, etc.). For video media resources, the retrieval-related information may include video title, director / starring information, descriptive text of keyframes, high-frequency words or sentiment analysis results in user comments, and video category tags (such as "documentary" or "suspense"). For image media resources, the retrieval-related information can include image filenames or titles, image content description text generated by an image recognition model (e.g., "Two people walking on a beach at sunset"), user-added tags, and the geographical location or shooting device model in the image's EXIF information. The retrieval text sequence can be understood as an ordered set of terms formed by preprocessing and concatenating the retrieved retrieval-related information. Preprocessing can include at least one of cleaning, word segmentation, and structured processing. The resource semantic vector can be understood as a high-dimensional continuous floating-point vector obtained after extracting semantic features from the retrieval text sequence of candidate media resources. The resource semantic vector can be a numerical representation of the core semantics of the candidate media resources. It can cover multiple semantic dimensions such as emotion, scene, and theme. The search semantic vector can be understood as a high-dimensional continuous floating-point vector obtained after extracting semantic features from the resource search text. The search semantic vector can be a numerical representation of the user's search intent, possessing the same vector space dimension as the resource semantic vector, allowing for direct similarity calculation. That is, the number of search semantic vectors is consistent with the number of resource semantic vectors, and the semantic dimensions corresponding to the search semantic vectors are consistent with the semantic dimensions corresponding to the resource semantic vectors.
[0053] In this embodiment, the determination of the search semantic vector may include at least one of the following: processing the resource search text using the semantic encoding module in the text encoding model; extracting semantic features from the resource search text using a semantic encoding algorithm; extracting semantic features from the resource search text based on a conversational language model and prompt words; or extracting semantic features from the resource search text using an intelligent agent.
[0054] Optionally, determining at least one search semantic vector corresponding to the resource search text includes: semantically encoding the resource search text through the semantic encoding module in the text encoding model to obtain at least one search semantic vector corresponding to the resource search text.
[0055] It should be noted that when determining the resource semantic vector corresponding to the candidate media resource, the same method as the search semantic vector can be used, or a different method can be used.
[0056] In this embodiment, for at least one candidate media resource, after obtaining at least one resource semantic vector corresponding to the candidate media resource, the at least one resource semantic vector and at least one search semantic vector can be processed according to a preset relevance determination method to obtain the semantic relevance between the candidate media resource and the resource search text. The relevance determination method includes at least one of the following: determining the vector relevance between the resource semantic vector and the search semantic vector respectively, and determining the semantic relevance based on the vector relevance; or directly determining the semantic relevance based on at least one resource semantic vector and at least one search semantic vector. One of the relevance determination methods is described in detail below.
[0057] Optionally, based on at least one resource semantic vector and at least one search semantic vector, the semantic relevance between candidate media resources and resource search text is determined, including: for at least one resource semantic vector, determining the relevance between the resource semantic vector and at least one search semantic vector through a relevance determination model to obtain at least one vector relevance corresponding to the resource semantic vector; and performing aggregation operation on the at least one vector relevance corresponding to the at least one resource semantic vector to obtain the semantic relevance between candidate media resources and resource search text.
[0058] The relevance determination model can be understood as an algorithmic model used to calculate the semantic matching degree between vectors. It performs fine-grained interactive calculations on resource semantic vectors and search semantic vectors to obtain the relevance of each resource semantic vector to at least one search semantic vector. Optionally, the relevance determination model can support various calculation methods such as cosine similarity, dot product operation, and maximum similarity matching. Vector relevance can be understood as the matching score between a single resource semantic vector and a single search semantic vector, calculated by the relevance determination model. Vector relevance directly reflects the degree of fit of the semantic dimensions corresponding to two semantic vectors. The aggregation operation can be understood as a data fusion calculation operation performed on multiple vector relevances. Optionally, the aggregation operation can include at least one of the following operations: maximum value, summation, weighted summation, and weighted average. The aggregation operation can be used to integrate multi-dimensional vector matching scores into a single quantitative indicator, thereby comprehensively representing the overall semantic matching degree between candidate media resources and resource search text.
[0059] In this embodiment, the number of vector relevance values corresponding to the resource semantic vector can be one or more. When there is only one vector relevance value, to achieve high-precision semantic association between the query text and various media resources in the resource library, the determined maximum similarity can be used as the vector relevance value corresponding to the resource semantic vector. The following is a detailed explanation of the case where there is only one vector relevance value.
[0060] Optionally, for at least one resource semantic vector, the relevance between the resource semantic vector and at least one search semantic vector is determined by a relevance determination model to obtain at least one vector relevance corresponding to the resource semantic vector, including: calculating the similarity between the resource semantic vector and each search semantic vector to obtain a similarity set; and selecting the maximum similarity from the similarity set as the vector relevance corresponding to the resource semantic vector.
[0061] Similarity can be understood as a numerical metric calculated using relevance determination algorithms (such as cosine similarity, Euclidean distance, etc.) to measure the distance or directional consistency between resource semantic vectors and search semantic vectors in a multidimensional space. Generally, a higher similarity value indicates that the semantic content represented by the resource semantic vector and the search semantic vector is closer. A similarity set can be the set of all similarities obtained by calculating the similarity between a given resource semantic vector and all search semantic vectors.
[0062] In one implementation, for at least one candidate media resource, at least one piece of retrieval association information corresponding to the candidate media resource can be retrieved based on the resource identifier of the candidate media resource. This retrieval association information is then preprocessed to obtain at least one piece of preprocessed retrieval association information. Further, the preprocessed retrieval association information is concatenated using a specific delimiter, and the concatenated information is determined as a retrieval text sequence corresponding to the candidate media resource. Further, the retrieval text sequence can be input into a text encoding model, where a semantic encoding module performs semantic encoding to obtain at least one resource semantic vector corresponding to the candidate media resource. Additionally, resource search text can be input into a text encoding model, where a semantic encoding module performs semantic encoding to obtain at least one search semantic vector corresponding to the resource search text. Further, for at least one resource semantic vector, a relevance determination model can be used to calculate the similarity between the resource semantic vector and each search semantic vector to obtain at least one similarity to the resource semantic vector, and a similarity set is constructed based on the obtained at least one similarity. Furthermore, the maximum similarity is selected from the similarity set, and this maximum similarity is used as the vector relevance corresponding to the resource semantic vector. Further, the vector relevances corresponding to at least one resource semantic vector are aggregated, and the resulting calculation is determined as the semantic relevance between the candidate media resource and the resource search text.
[0063] It should be noted that, for at least one resource semantic vector, the similarity between the calculated resource semantic vector and each search semantic vector can also be directly used as the relevance of at least one vector corresponding to the resource semantic vector.
[0064] S140. Based on semantic relevance, determine the target media resource corresponding to the resource search text from at least one candidate media resource, and return the target media resource as the search result.
[0065] The target media resource can be the best matching resource selected from the candidate media resource set based on semantic relevance. The selection rules can include one of the following: sorting candidate media resources in descending order of semantic relevance and selecting a preset number of the top-ranked resources; or using candidate media resources with a semantic relevance greater than a preset relevance threshold as the target media resource. The search results can be understood as structured content returned by the system to the end user, containing the target media resource. Search results can be presented in an ordered list format, with list items containing resource information, resource playback links, and other information.
[0066] Optionally, based on semantic relevance, the target media resource corresponding to the resource search text is determined from at least one candidate media resource, including: sorting at least one candidate media resource in descending order of semantic relevance, and determining a preset number of candidate media resources at the top as the target media resource corresponding to the resource search text.
[0067] The preset quantity can be a fixed value pre-configured by the system to limit the size of the target media resources. The preset quantity can be flexibly adjusted according to business needs (such as the size of the client display area, user browsing habits, etc.). Optionally, the preset quantity can be 10, 15, or 20, etc.
[0068] In one implementation, after obtaining the semantic relevance between the resource search text and at least one candidate media resource, the candidate media resources can be sorted in descending order of semantic relevance. Furthermore, a predetermined number of candidate media resources at the top of the list can be selected as the target media resources corresponding to the resource search text.
[0069] In this embodiment, after obtaining the target media resource, it can be directly returned as the search result; or, in order to improve the user's resource retrieval experience, the target media resources with higher semantic relevance can be ranked first, and the ranked target media resources can be returned as the search result.
[0070] Optionally, returning the target media resources as search results includes: sorting the target media resources in descending order of semantic relevance, and returning the sorted target media resources as search results.
[0071] In one implementation, after obtaining the target media resources, the target media resources can be sorted in descending order of semantic relevance, and the sorted target media resources can be returned as search results.
[0072] The technical solution of this disclosure, in response to a resource search request, acquires the resource search text and determines at least one retrieval numerical identifier corresponding to the resource search text. The retrieval numerical identifier is used to characterize the resource semantic information of the resource search text, transforming the unstructured resource search text into an indexable and computable retrieval numerical identifier. This achieves a structured representation of the deep semantic information of the search text, breaking through the limitations of traditional keyword literal matching and laying a core data foundation for subsequent precise resource retrieval based on semantic dimensions. Furthermore, by retrieving at least one candidate media resource's resource identifier from the resource index information based on at least one retrieval numerical identifier, resource matching and retrieval are directly achieved based on semantic dimensions, avoiding the semantic blind spots of traditional keyword indexing. Simultaneously, the lightweight integer identifier and inverted index structure significantly improve the retrieval response efficiency. Furthermore, by determining the semantic relevance based on the resource identifier of the candidate media resources, refined semantic screening of candidate resources is achieved, effectively eliminating resources weakly related to the search intent and providing accurate quantitative basis for subsequent result ranking. Furthermore, by determining the target media resource corresponding to the resource search text from at least one candidate media resource based on semantic relevance, and returning the target media resource as the search result, the accurate matching of search results with the user's search intent is achieved, significantly improving the quality of search results and the user's search experience, while avoiding interference from irrelevant resources in the result display. The technical solution of this disclosure addresses the problems of low resource retrieval accuracy and poor retrieval recall caused by relying on keyword matching for resource retrieval in related technologies. It achieves the effect of accurately retrieving resources that meet resource search needs based on retrieval digital identifiers representing the semantic information of resources, breaking through the semantic blind spot of traditional keyword retrieval. It not only achieves accurate matching of search results with the user's search intent, but also improves retrieval response efficiency through lightweight digital identifiers and inverted index structures, significantly optimizing the accuracy of resource retrieval and user experience.
[0073] Figure 2 This is a schematic flowchart illustrating another resource retrieval method provided in this embodiment. The technical solution of this embodiment can be combined with other embodiments; for the same or related parts, they can be described in conjunction with the descriptions of other embodiments, and will not be repeated here. Figure 2 As shown, the method in this embodiment may specifically include: S210. Determine the search text sequence corresponding to multiple media resources.
[0074] The retrieval text sequence can be understood as an ordered set of terms formed by preprocessing and concatenating multi-dimensional retrieval-related information of a single media resource. Preprocessing may include at least one of the following: cleaning, word segmentation, and structured processing. The retrieval text sequence may consist of at least one piece of retrieval-related information. Retrieval-related information may include at least one of the following: resource information, resource content information, feedback information, and resource attribute information. Resource information can be understood as the basic identifying information of a media resource, such as the resource name, the name of the performer, and the release date. Resource content information can be understood as the core content-derived information of a media resource, such as the resource content text, style description, and album introduction. Feedback information can be understood as interactive feedback information generated by users regarding media resources, such as user comments, bullet screen content, collection tags, and recommendations. Resource attribute information can be understood as the classification and tagging information of media resources, such as genre, emotional attributes, and applicable scenarios.
[0075] In one implementation, for multiple media resources, at least one piece of search-related information corresponding to the media resource can be retrieved from the database based on the resource identifier of the media resource. This at least one piece of search-related information is then preprocessed to obtain at least one piece of preprocessed search-related information. Further, the at least one piece of preprocessed search-related information is concatenated using a specific delimiter, and the concatenated information is determined as the search text sequence corresponding to the candidate media resource.
[0076] S220. Input multiple search text sequences into a pre-trained text encoding model to obtain multiple search numeric identifiers corresponding to each media resource.
[0077] The text encoding model can be understood as a pre-trained encoder for processing text information, which can be used to extract and discretize semantic features. The text encoding model can capture the deep semantics of media resources based on the retrieved text sequence and discretize the captured deep semantic vectors. The text encoding model can include neural network models of any model structure, optionally including a BERT-based Transformer architecture. In this embodiment, the text encoding model can include a semantic encoding module and a quantization encoding module. The retrieval digital identifier can be a discrete integer obtained by quantizing the high-dimensional semantic vector output by the text encoding model; it is a structured digital fingerprint of the semantic features of the media resource. A single media resource can correspond to multiple retrieval digital identifiers, each retrieval digital identifier can correspond to a semantic dimension, and different retrieval digital identifiers correspond to different semantic dimensions.
[0078] In this embodiment, when a retrieval text sequence corresponding to multiple media resources is obtained, the retrieval text sequence corresponding to the media resources can be input into a pre-trained text encoding model. Furthermore, the semantic encoding module and quantization encoding module in the text encoding model can be used to process the retrieval text sequence sequentially to obtain multiple retrieval numeric identifiers corresponding to the media resources.
[0079] Optionally, the text encoding model includes a semantic encoding module and a quantization encoding module; multiple retrieval text sequences are input into the pre-trained text encoding model to obtain multiple retrieval numeric identifiers corresponding to each media resource, including: for each media resource, inputting the retrieval text sequence corresponding to the media resource into the semantic encoding module to obtain multiple resource semantic vectors corresponding to the media resource; inputting the multiple resource semantic vectors corresponding to the media resource into the quantization encoding module to obtain multiple retrieval numeric identifiers corresponding to the media resource.
[0080] The semantic encoding module can be a front-end submodule of the text encoding model, used to extract deep semantic features from the input retrieval text sequence and output a high-dimensional vector that can represent the multi-dimensional semantics of media resources. The quantization encoding module can be a back-end submodule of the text encoding model, used to project and map the continuous high-dimensional vector output by the semantic encoding module into a discrete integer sequence within a finite set. While preserving the core semantic features, it achieves lightweight compression of the vector to meet the storage and retrieval performance requirements of the inverted index.
[0081] Optionally, the quantization encoding module includes a linear projection layer, a vector quantization layer, and a numerical analysis layer. Multiple resource semantic vectors corresponding to the media resources are input into the quantization encoding module to obtain multiple retrieval numeric identifiers corresponding to the media resources. This includes: for the multiple resource semantic vectors, performing dimensionality reduction and compression on the resource semantic vectors through the linear projection layer to obtain low-dimensional semantic vectors; performing binary quantization on the low-dimensional semantic vectors through the vector quantization layer to obtain binary sequences corresponding to the low-dimensional semantic vectors; and performing decimal conversion on the binary strings through the numerical analysis layer to obtain retrieval numeric identifiers corresponding to the high-dimensional semantic vectors.
[0082] The linear projection layer can be a pre-processing sub-layer of the quantization encoding module, typically composed of a single or multi-layer linear neural network. It performs linear transformations and dimensionality compression on high-dimensional resource semantic vectors, reducing vector dimensionality while preserving core semantic features and decreasing the complexity of subsequent quantization calculations. The vector quantization layer, a core processing sub-layer of the quantization encoding module, performs binary quantization on the low-dimensional semantic vectors output by the linear projection layer, mapping continuous floating-point values in the vector to a binary sequence of "0"s and "1"s, thus converting continuous vectors into discrete symbols. The numerical parsing layer, a post-processing sub-layer of the quantization encoding module, performs decimal conversion on the binary sequence output by the vector quantization layer, converting the discrete binary sequence into decimal integers, ultimately generating retrieval digital identifiers that can be used for index construction. The low-dimensional semantic vector, obtained by dimensionality reduction and compression of the high-dimensional resource semantic vectors by the linear projection layer, serves as intermediate data connecting the high-dimensional semantic representation and quantization operations, balancing semantic preservation and computational efficiency. Binary sequences can be discrete symbol sequences composed of "0" and "1" output by a vector quantization layer. They are binary representations of high-dimensional resource semantic vectors, possessing the characteristics of small storage footprint and fast matching speed. Decimal conversion can be a numerical encoding operation that maps binary sequences to decimal integers. This operation can be maliciously used to generate unique integer identifiers to adapt to the "key-value" storage and retrieval logic of inverted indexes.
[0083] In one implementation, for a retrieval text sequence of multiple media resources, the retrieval text sequence can be input into a text encoding model. A semantic encoding module receives the retrieval text sequence, extracts deep semantic features from the retrieval text sequence based on the semantic encoding module, and generates multiple resource semantic vectors corresponding to the media resources based on the extracted deep semantic features. Further, for the multiple resource semantic vectors, the resource semantic vectors can be input into a quantization encoding module. The linear projection layer in the quantization encoding module performs dimensionality reduction and compression on the resource semantic vectors to obtain low-dimensional semantic vectors. Further, the vector quantization layer in the quantization encoding module performs binary quantization on the low-dimensional semantic vectors to obtain a binary sequence corresponding to the low-dimensional semantic vectors. Further, a numerical parsing layer performs decimal conversion on the binary sequence, and the resulting decimal value is used as the retrieval numeric identifier corresponding to the resource semantic vector. Thus, multiple retrieval numeric identifiers corresponding to the media resources can be obtained.
[0084] S230. Determine resource index information based on the resource identifiers of multiple media resources and the multiple retrieval numeric identifiers corresponding to the media resources.
[0085] In this embodiment, after obtaining multiple retrieval numeric identifiers corresponding to each media resource, resource index information can be determined based on the resource identifiers of the multiple media resources and the multiple retrieval numeric identifiers corresponding to the media resources.
[0086] Optionally, resource index information is determined based on the resource identifiers of multiple candidate media resources and multiple retrieval numeric identifiers corresponding to the candidate media resources, including: using the retrieval numeric identifier as the index key and using the resource identifier of at least one media resource corresponding to the retrieval numeric identifier as the index value to obtain resource index information.
[0087] In this context, the index key can be understood as the retrieval matching keyword in the index structure. During online retrieval, the system uses the corresponding search numeric identifier as the index key to quickly match the list of associated resource identifiers. The index value can be understood as the set of associated data corresponding to the index key in the index structure. One index key can correspond to multiple index values, achieving a "one-to-many" semantic resource mapping. The index structure of resource index information can include an inverted index structure.
[0088] In one implementation, after obtaining multiple retrieval numerical identifiers corresponding to each media resource, the retrieval numerical identifiers can be used as index keys, and the resource identifier of at least one media resource corresponding to the retrieval numerical identifier can be used as index values to establish an index structure between the retrieval numerical identifiers and the resource identifiers. Furthermore, the completed index structure can be identified as resource index information.
[0089] S240. In response to a resource search request, obtain the resource search text and determine at least one retrieval numeric identifier corresponding to the resource search text.
[0090] S250. Based on at least one retrieval digital identifier, retrieve the resource identifier of at least one candidate media resource from the resource index information.
[0091] S260. Determine semantic relevance based on the resource identifiers of candidate media resources.
[0092] S270. Based on semantic relevance, determine the target media resource corresponding to the resource search text from at least one candidate media resource, and return the target media resource as the search result.
[0093] The technical solution of this disclosure, by determining a retrieval text sequence corresponding to multiple media resources, wherein the retrieval text sequence consists of at least one retrieval association information, including at least one of resource information, resource content information, feedback information, and resource attribute information; further, the multiple retrieval text sequences are input into a pre-trained text encoding model to obtain multiple retrieval numeric identifiers corresponding to each media resource; further, based on the resource identifiers of the multiple media resources and the multiple retrieval numeric identifiers corresponding to the media resources, resource index information is determined, thereby realizing a structured and lightweight representation of the semantic features of media resources, laying a solid data foundation for efficient recall and accurate matching of subsequent online semantic retrieval.
[0094] This disclosure provides an optional embodiment of a resource retrieval method, the specific implementation of which can be found in the following embodiments. Technical features that are the same as or similar to those in the above embodiments will not be repeated here. This resource retrieval method can retrieve target songs based on song search text. The following uses songs as an example of media resources to describe this resource retrieval method.
[0095] First, the system not only receives basic song metadata (such as song title, artist, and album description), but also focuses on accessing user feedback ecosystem data, especially high-quality user comments and updates, as well as lyrics. The preprocessing module cleans, splices, and structurally assembles the above multi-source texts to construct a composite text sequence rich in explicit information and implicit meaning, i.e., the song retrieval text sequence, providing full feature input for subsequent semantic extraction.
[0096] Next, the concatenated composite text sequence is input into the semantic encoding module (based on the BERT Transformer architecture) of the document encoding model. To capture the polysemy of music (e.g., a song can simultaneously possess the attributes of "sadness" and "healing"), the model outputs multiple sets of learnable vectors to represent semantics in different dimensions. After multiple layers of Transformer encoding, the module outputs multiple independent continuous floating-point vectors, thus obtaining multiple resource semantic vectors. Ultimately, each song is no longer represented by a single vector, but by a semantic vector group composed of these multiple high-dimensional vectors, providing a rich and decoupled semantic foundation for subsequent quantization and inverted index construction.
[0097] Furthermore, to address the issues of high storage costs, large memory footprint, and high retrieval latency faced by searching a vast library of hundreds of millions of songs using dense vectors, a quantization encoding module can be introduced. This module, located at the output of the semantic encoding module, is responsible for transforming the high-dimensional continuous floating-point vectors generated in the previous stage into compact, machine-indexable discrete integer sequences—i.e., semantic identifiers. This process essentially extracts the "semantic fingerprints" of musical entities in the latent space, preserving both the generalization and understanding capabilities of deep learning and adapting to an efficient inverted index structure. The original semantic vectors output by the text encoding module typically have high dimensionality; direct quantization would lead to an overly sparse encoding space and computational redundancy. Therefore, the system first maps the original resource semantic vectors to a low-dimensional subspace using a linear projection layer, obtaining low-dimensional semantic vectors. Further, a Finite Scalar Quantization (FSQ) layer is introduced, employing the FSQ algorithm to independently discretize each dimension of the low-dimensional semantic vector, transforming each low-dimensional semantic vector into a binary string composed of 0s and 1s, and then parsing the binary string into a unique integer identifier. For example, the vector of a song in the "emotional dimension" may be quantized to generate an integer identifier, representing the semantic dimension of "healing / quiet".
[0098] Furthermore, the system abandons traditional indexing methods and constructs a semantic inverted index based on the generated retrieval identifiers, using the retrieval identifiers as index keys and song identifiers as index values, thus obtaining resource index information. Since the retrieval identifiers are highly compressed integer codes, this structure can directly reuse existing high-performance search engine kernels, significantly reducing storage costs and supporting high-concurrency reads while maintaining semantic generalization capabilities.
[0099] Furthermore, during the online retrieval phase, the system receives the song search text input by the user and processes it using a text encoding model to obtain a retrieval numerical identifier. A preset matching strategy is then used to quickly scan and merge the resource index information, recalling a set of semantically relevant candidate songs. This strategy allows queries and documents to be recalled if they overlap in any semantic dimension (such as matching only the sentiment dimension), significantly improving the recall rate for long-tail and fuzzy queries.
[0100] Furthermore, in the fine-tuning stage, for the recalled candidate song set, a fine-grained interaction score between the search semantic vector and the song semantic vector is calculated, results with weak semantic relevance are eliminated, and the final list is sorted and truncated according to the relevance score, so that the music content that best matches the user's intent is sent to the client.
[0101] This technical solution significantly improves the semantic generalization ability and long-tail content distribution efficiency of music retrieval systems, completely resolving the technical bottlenecks of traditional keyword search in handling "general semantic song search" and "fuzzy lyrics matching." By mapping musical entities from the text space to a discrete semantic space, this system can bridge the literal gap and accurately capture the deep intent behind user queries. Even if the song's metadata does not contain specific emotional words or scene descriptions entered by the user, as long as there is overlap between the two in the implicit semantic dimension constructed by lyrics and popular song reviews (i.e., semantic ID match), the target song can be "forgivingly" recalled. This mechanism not only greatly improves the search success rate when users input vague memories or abstract concepts, but also effectively activates the platform's massive long-tail, niche music library, enabling high-quality music content lacking popular tags to be reached based on its inherent artistic conception and style, significantly improving the click-through rate and content distribution efficiency of long-tail queries.
[0102] Secondly, this technical solution significantly reduces system engineering maintenance costs and computational resource consumption while achieving deep semantic understanding, enabling efficient industrial-grade deployment. Unlike dense retrieval schemes that rely on expensive vector databases, this solution innovatively reuses a mature and efficient inverted index data structure. It utilizes finite scalar quantization technology to compress high-dimensional neural network vectors into compact integer sequences, preserving the model's generalization ability while reducing storage space and memory usage by several times and maintaining millisecond-level ultra-fast response times. Combined with refined rearrangement, the system eliminates the need for manual maintenance of massive thesaurus or complex rule engines, automatically adapting to evolving music culture and new vocabulary. This allows it to support the real-time, high-concurrency search needs of hundreds of millions of users with lower computational costs, demonstrating significant commercial potential.
[0103] Figure 3 This is a schematic diagram of the structure of a resource retrieval device provided in an embodiment of this disclosure. Figure 3As shown, the media resource retrieval device includes: a search text acquisition module 310, a resource retrieval module 320, a relevance determination module 330, and a target resource determination module 340. The search text acquisition module 310 is used to acquire resource search text in response to a resource search request and determine at least one retrieval numeric identifier corresponding to the resource search text; wherein the retrieval numeric identifier is used to characterize the resource semantic information of the resource search text. The resource retrieval module 320 is used to retrieve the resource identifiers of at least one candidate media resource from resource index information based on at least one retrieval numeric identifier; wherein the resource index information is constructed based on the retrieval numeric identifier and the resource identifiers of the candidate media resource corresponding to the retrieval numeric identifier. The relevance determination module 330 is used to determine semantic relevance based on the resource identifiers of the candidate media resource. The target resource determination module 340 is used to determine the target media resource corresponding to the resource search text from at least one candidate media resource according to the semantic relevance, and return the target media resource as a search result.
[0104] The technical solution of this disclosure, in response to a resource search request, acquires the resource search text and determines at least one retrieval numerical identifier corresponding to the resource search text. The retrieval numerical identifier is used to characterize the resource semantic information of the resource search text, transforming the unstructured resource search text into an indexable and computable retrieval numerical identifier. This achieves a structured representation of the deep semantic information of the search text, breaking through the limitations of traditional keyword literal matching and laying a core data foundation for subsequent precise resource retrieval based on semantic dimensions. Furthermore, by retrieving at least one candidate media resource's resource identifier from the resource index information based on at least one retrieval numerical identifier, resource matching and retrieval are directly achieved based on semantic dimensions, avoiding the semantic blind spots of traditional keyword indexing. Simultaneously, the lightweight integer identifier and inverted index structure significantly improve the retrieval response efficiency. Furthermore, by determining the semantic relevance based on the resource identifier of the candidate media resources, refined semantic screening of candidate resources is achieved, effectively eliminating resources weakly related to the search intent and providing accurate quantitative basis for subsequent result ranking. Furthermore, by determining the target media resource corresponding to the resource search text from at least one candidate media resource based on semantic relevance, and returning the target media resource as the search result, the accurate matching of search results with the user's search intent is achieved, significantly improving the quality of search results and the user's search experience, while avoiding interference from irrelevant resources in the result display. The technical solution of this disclosure addresses the problems of low resource retrieval accuracy and poor retrieval recall caused by relying on keyword matching for resource retrieval in related technologies. It achieves the effect of accurately retrieving resources that meet resource search needs based on retrieval digital identifiers representing the semantic information of resources, breaking through the semantic blind spot of traditional keyword retrieval. It not only achieves accurate matching of search results with the user's search intent, but also improves retrieval response efficiency through lightweight digital identifiers and inverted index structures, significantly optimizing the accuracy of resource retrieval and user experience.
[0105] In some embodiments of this disclosure, optionally, the search text acquisition module 310 includes a retrieval digit identifier determination unit. The retrieval digit identifier determination unit is used to input the resource search text into a pre-trained text encoding model to obtain at least one retrieval digit identifier corresponding to the resource search text.
[0106] In some embodiments of this disclosure, optionally, the resource retrieval module 320 includes: a resource identifier acquisition unit and a resource identifier determination unit. The resource identifier acquisition unit is configured to acquire, from the resource index information, a resource identifier of a media resource associated with at least one search digit identifier corresponding to the resource search text; the resource identifier determination unit is configured to determine, by taking the union of the resource identifiers acquired for at least one search digit identifier, the resource identifier of at least one candidate media resource corresponding to the resource search text.
[0107] In some embodiments of this disclosure, optionally, the relevance determination module 330 includes: a retrieval text sequence determination unit, a vector determination unit, and a relevance determination unit. The retrieval text sequence determination unit is used to retrieve at least one piece of retrieval association information corresponding to the candidate media resource based on the resource identifier of the candidate media resource, and to determine a retrieval text sequence corresponding to the candidate media resource based on at least one piece of retrieval association information; the lexical determination unit is used to determine at least one resource semantic vector corresponding to the candidate media resource based on the retrieval text sequence corresponding to the candidate media resource; and to determine at least one search semantic vector corresponding to the resource search text; the relevance determination unit is used to determine the semantic relevance between the candidate media resource and the resource search text based on at least one resource semantic vector and at least one search semantic vector.
[0108] In some embodiments of this disclosure, optionally, the vector determination unit includes: a search vector determination subunit, used to perform semantic encoding on the resource search text through a semantic encoding module in a text encoding model, so as to obtain at least one search semantic vector corresponding to the resource search text.
[0109] In some embodiments of this disclosure, optionally, the relevance determination unit is specifically configured to, for at least one of the resource semantic vectors, determine the relevance between the resource semantic vector and at least one of the search semantic vectors through a relevance determination model, to obtain at least one vector relevance corresponding to the resource semantic vector; and perform an aggregation operation on the at least one vector relevance corresponding to at least one of the resource semantic vectors to obtain the semantic relevance between the candidate media resource and the resource search text.
[0110] In some embodiments of this disclosure, optionally, the target resource determination module 340 includes: a target resource determination unit, configured to sort at least one of the candidate media resources in descending order of semantic relevance, and determine a preset number of the candidate media resources at the top as target media resources corresponding to the resource search text; the target resource determination module 340 includes: a search result return unit, configured to sort the target media resources in descending order of semantic relevance, and return the sorted target media resources as search results.
[0111] In some embodiments of this disclosure, optionally, the apparatus further includes: a retrieval text sequence determination module, a retrieval number identifier determination module, and a resource index information determination module. The retrieval text sequence determination module is used to determine a retrieval text sequence corresponding to multiple media resources; wherein the retrieval text sequence consists of at least one piece of retrieval association information, the retrieval association information including at least one of resource information, resource content information, feedback information, and resource attribute information; the retrieval number identifier determination module is used to input the multiple retrieval text sequences into a pre-trained text encoding model to obtain multiple retrieval number identifiers corresponding to each media resource; the resource index information determination module is used to determine resource index information based on the resource identifiers of the multiple media resources and the multiple retrieval number identifiers corresponding to the media resources.
[0112] In some embodiments of this disclosure, optionally, the text encoding model includes a semantic encoding module and a quantization encoding module; the retrieval digital identifier determination module includes a resource semantic vector determination unit and a retrieval digital identifier determination unit. The resource semantic vector determination unit is used to input the retrieval text sequence corresponding to each media resource into the semantic encoding module to obtain multiple resource semantic vectors corresponding to the media resource; the retrieval digital identifier determination unit is used to input the multiple resource semantic vectors corresponding to the media resource into the quantization encoding module to obtain multiple retrieval digital identifiers corresponding to the media resource.
[0113] In some embodiments of this disclosure, optionally, the quantization encoding module includes a linear projection layer, a vector quantization layer, and a numerical analysis layer; the retrieval digital identifier determination unit is specifically used to perform dimensionality reduction and compression on the resource semantic vectors through the linear projection layer to obtain a low-dimensional semantic vector; perform binary quantization on the low-dimensional semantic vector through the vector quantization layer to obtain a binary sequence corresponding to the low-dimensional semantic vector; and perform decimal conversion on the binary sequence through the numerical analysis layer to obtain the output retrieval digital identifier corresponding to the resource semantic vector.
[0114] In some embodiments of this disclosure, optionally, the resource index information determination module is specifically used to use the retrieval digital identifier as an index key and the resource identifier of at least one of the media resources corresponding to the retrieval digital identifier as an index value to obtain resource index information.
[0115] The resource retrieval device provided in this disclosure can execute the resource retrieval method provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of executing the method.
[0116] It is worth noting that the various units and modules included in the above-mentioned resource retrieval device are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of each functional unit are only for easy differentiation and are not used to limit the protection scope of the embodiments of this disclosure.
[0117] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. The electronic device 10 is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (such as helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the present disclosure described and / or claimed herein.
[0118] like Figure 4 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 may also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.
[0119] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0120] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as resource retrieval methods.
[0121] In some embodiments, the resource retrieval method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and / or installed on electronic device 10 via read-only memory (ROM) 12 and / or communication unit 19. When the computer program is loaded into random access memory (RAM) 13 and executed by processor 11, one or more steps of the resource retrieval method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the resource retrieval method by any other suitable means (e.g., by means of firmware).
[0122] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.
[0123] Computer programs used to implement the resource retrieval methods of this disclosure may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.
[0124] This disclosure provides a computer-readable storage medium storing computer instructions for causing a processor to execute a resource retrieval method, comprising: in response to a resource search request, acquiring resource search text and determining at least one retrieval numeric identifier corresponding to the resource search text; wherein the retrieval numeric identifier is used to characterize resource semantic information of the resource search text; retrieving resource identifiers of at least one candidate media resource from resource index information based on at least one retrieval numeric identifier; wherein the resource index information is determined based on the retrieval numeric identifier and the resource identifier of the media resource corresponding to the retrieval numeric identifier; determining semantic relevance based on the resource identifier of the candidate media resource; determining a target media resource corresponding to the resource search text from at least one candidate media resource according to the semantic relevance, and returning the target media resource as a search result.
[0125] In the context of this disclosure, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Alternatively, a computer-readable storage medium can be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
[0126] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).
[0127] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.
[0128] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.
[0129] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication unit 19, or installed from storage unit 18, or installed from ROM 12. When the computer program is executed by processor 11, it performs the functions defined in the methods of embodiments of this disclosure.
[0130] This disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements a resource retrieval method according to any embodiment of this disclosure.
[0131] In implementing a computer program product, computer program code for performing the operations of this disclosure can be written in one or more programming languages or a combination thereof. Programming languages include object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0132] It should be understood that the various forms of processes shown above can be used to rearrange, add, or delete steps. For example, the steps described in this disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this disclosure can be achieved, and this is not limited herein.
[0133] The specific embodiments described above do not constitute a limitation on the scope of protection of this disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this disclosure should be included within the scope of protection of this disclosure.
Claims
1. A resource search method characterized by comprising: The method comprises: in response to a resource search request, obtaining a resource search text, and determining at least one search digital identifier corresponding to the resource search text; wherein the search digital identifier is used to represent resource semantic information of the resource search text; based on the at least one search digital identifier, retrieving resource identifiers of at least one candidate media resource from resource index information; wherein the resource index information is determined based on the search digital identifier and the resource identifiers of the media resources corresponding to the search digital identifier; determining semantic relevance based on the resource identifiers of the candidate media resources; determining a target media resource corresponding to the resource search text from the at least one candidate media resource according to the semantic relevance, and returning the target media resource as a search result.
2. The resource search method according to claim 1, characterized by, The method comprises: inputting the resource search text into a pre-trained text encoding model to obtain at least one search digital identifier corresponding to the resource search text.
3. The resource retrieval method of claim 1, wherein, The method comprises: retrieving at least one search association information corresponding to the candidate media resource based on the resource identifier of the candidate media resource, and determining a search text sequence corresponding to the candidate media resource based on the at least one search association information; determining at least one resource semantic vector corresponding to the candidate media resource according to the search text sequence corresponding to the candidate media resource; and determining at least one search semantic vector corresponding to the resource search text; determining the semantic relevance of the candidate media resource and the resource search text according to the at least one resource semantic vector and the at least one search semantic vector.
4. The resource retrieval method of claim 3, wherein, The method comprises: performing semantic encoding on the resource search text through a semantic encoding module in the text encoding model to obtain at least one search semantic vector corresponding to the resource search text.
5. The resource retrieval method of claim 3, wherein, The method comprises: for the at least one resource semantic vector, determining the relevance of the resource semantic vector and the at least one search semantic vector through a relevance determination model to obtain at least one vector relevance corresponding to the resource semantic vector; performing aggregation operation on the at least one vector relevance corresponding to the at least one resource semantic vector to obtain the semantic relevance of the candidate media resource and the resource search text.
6. The resource retrieval method of claim 1, wherein, The method comprises: sorting the at least one candidate media resource in descending order of semantic relevance, and determining the candidate media resources ranked in the front as the target media resources corresponding to the resource search text. Returning the target media resource as a search result includes: The target media resources are sorted in descending order of semantic relevance, and the sorted target media resources are returned as search results.
7. The resource retrieval method of claim 1, wherein, Also includes: Determine a search text sequence corresponding to multiple media resources; wherein the search text sequence consists of at least one search-related piece of information, the search-related information including at least one of resource information, resource content information, feedback information, and resource attribute information; Multiple search text sequences are input into a pre-trained text encoding model to obtain multiple search numeric identifiers corresponding to each media resource. Resource index information is determined based on the resource identifiers of the multiple media resources and the multiple retrieval numeric identifiers corresponding to the media resources.
8. The resource retrieval method of claim 7, wherein, The text encoding model includes a semantic encoding module and a quantization encoding module; the step of inputting multiple retrieved text sequences into the pre-trained text encoding model to obtain multiple retrieved numerical identifiers corresponding to each media resource includes: For each media resource, the search text sequence corresponding to the media resource is input into the semantic encoding module to obtain multiple resource semantic vectors corresponding to the media resource. Multiple resource semantic vectors corresponding to the media resource are input into the quantization encoding module to obtain multiple retrieval digital identifiers corresponding to the media resource.
9. The resource retrieval method of claim 8, wherein, The quantization encoding module includes a linear projection layer, a vector quantization layer, and a numerical parsing layer; the step of inputting multiple resource semantic vectors corresponding to the media resource into the quantization encoding module to obtain multiple retrieval digital identifiers corresponding to the media resource includes: For multiple resource semantic vectors, the linear projection layer is used to reduce the dimensionality of the resource semantic vectors to obtain low-dimensional semantic vectors; The low-dimensional semantic vector is binary quantized by the vector quantization layer to obtain a binary sequence corresponding to the low-dimensional semantic vector. The binary sequence is converted to decimal by the numerical parsing layer to obtain the retrieval digital identifier corresponding to the resource semantic vector.
10. The resource retrieval method of claim 7, wherein, The step of determining resource index information based on the resource identifiers of the multiple candidate media resources and the multiple retrieval numeric identifiers corresponding to the candidate media resources includes: The search numerical identifier is used as the index key, and the resource identifier of at least one of the media resources corresponding to the search numerical identifier is used as the index value to obtain resource index information.