Writing material recommendation method and device, electronic equipment and storage medium

By using natural language understanding technology to calculate the relevance between writing materials and query statements, and combining multiple features to adjust the ranking, the problem of low relevance in writing material recommendations in existing technologies has been solved, resulting in more accurate recommendations and higher user satisfaction.

CN115455152BActive Publication Date: 2026-06-26BEIJING CENTURY TAL EDUCATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BEIJING CENTURY TAL EDUCATION TECH CO LTD
Filing Date
2022-09-29
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing writing material recommendation methods are based on keyword matching, without considering the meaning of words. This results in low relevance between the retrieved writing materials and the query, poor recommendation performance, and ineffective ranking, which affects user satisfaction.

Method used

Using natural language understanding technology, through topic word extraction, part-of-speech fusion, and semantic understanding, combined with part-of-speech dimension features and semantic features, the relevance between the query statement and candidate materials is calculated. When making recommendations, the number of keywords, topic words, recall scores, and material tags are integrated to adjust the recommendation ranking.

Benefits of technology

It improved the accuracy and user satisfaction of writing material recommendations, ensuring that the materials most relevant to the query are ranked higher, thus enhancing the recommendation effect.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115455152B_ABST
    Figure CN115455152B_ABST
Patent Text Reader

Abstract

The present disclosure provides a writing material recommendation method and device, electronic equipment and storage medium, the method comprising: receiving a query sentence input by a user; determining a plurality of candidate materials related to the query sentence from a material library according to the query sentence; performing word segmentation and part-of-speech statistics on the query sentence and each candidate material to obtain the part-of-speech dimension feature corresponding to each candidate material; concatenating and encoding the query sentence and each candidate material to obtain the semantic feature corresponding to each candidate material; determining the relevance between the query sentence and each candidate material according to the part-of-speech dimension feature and the semantic feature; and determining a first predetermined number of target materials from the plurality of candidate materials according to the relevance for recommendation. The present scheme can improve the accuracy of relevance calculation and thus improve the accuracy of writing material recommendation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of machine learning technology, and in particular to a method, apparatus, electronic device, and storage medium for recommending writing materials. Background Technology

[0002] The ranking of Chinese writing materials is crucial in assisting students in writing essays or speeches. Currently, keyword matching is commonly used for recommending writing materials. However, this keyword-based method does not consider the meaning of words and ignores the issue of polysemy, resulting in low relevance between the retrieved writing materials and the query, leading to poor recommendation performance. Summary of the Invention

[0003] In order to solve the above-mentioned technical problems, or at least partially solve the above-mentioned technical problems, the present disclosure provides a method, apparatus, electronic device and storage medium for recommending writing materials.

[0004] According to one aspect of this disclosure, a method for recommending writing material is provided, including:

[0005] Receive query statements input by the user;

[0006] Based on the query statement, multiple candidate materials related to the query statement are determined from the material library;

[0007] The query statement and each candidate material are segmented and part-of-speech statistics are performed to obtain the part-of-speech dimension features corresponding to each candidate material;

[0008] The query statement and each candidate material are concatenated and encoded to obtain the semantic features corresponding to each candidate material;

[0009] Based on the part-of-speech dimension features and the semantic features, the relevance between the query statement and each candidate material is determined;

[0010] Based on the relevance, a first preset number of target materials are determined from the plurality of candidate materials for recommendation.

[0011] According to another aspect of this disclosure, a writing material recommendation device is provided, comprising:

[0012] The receiving module is used to receive the query statement input by the user;

[0013] The first determining module is used to determine multiple candidate materials related to the query statement from the material library based on the query statement;

[0014] The first acquisition module is used to perform word segmentation and part-of-speech statistics on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material;

[0015] The second acquisition module is used to concatenate and encode the query statement and each candidate material to obtain the semantic features corresponding to each candidate material.

[0016] The second determining module is used to determine the relevance between the query statement and each candidate material based on the part-of-speech dimension features and the semantic features;

[0017] The recommendation module is used to determine a first preset number of target materials from the plurality of candidate materials based on the relevance and to recommend them.

[0018] According to another aspect of this disclosure, an electronic device is provided, comprising:

[0019] Processor; and

[0020] Stored program memory,

[0021] The program includes instructions that, when executed by the processor, cause the processor to perform the method for recommending writing materials according to the foregoing aspect.

[0022] According to another aspect of this disclosure, a non-transitory computer-readable storage medium is provided storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method for recommending writing materials according to the foregoing aspect.

[0023] According to another aspect of this disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the method for recommending writing materials as described in the foregoing aspect.

[0024] One or more technical solutions provided in this disclosure receive a query statement input by a user, determine multiple candidate materials related to the query statement from a material library based on the query statement, then perform word segmentation and part-of-speech tagging on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material, and concatenate and encode the query statement and each candidate material to obtain the semantic features corresponding to each candidate material. Based on the part-of-speech dimension features and semantic features, the relevance between the query statement and each candidate material is determined, and based on the relevance, a first preset number of target materials are selected from the multiple candidate materials for recommendation. Using the solution of this disclosure can improve the accuracy of relevance calculation, thereby improving the accuracy of writing material recommendation. Attached Figure Description

[0025] Further details, features, and advantages of this disclosure are disclosed in the following description of exemplary embodiments in conjunction with the accompanying drawings, in which:

[0026] Figure 1 A schematic diagram of the module architecture for implementing the writing material recommendation method of this disclosure is shown;

[0027] Figure 2 A flowchart illustrating a method for recommending writing materials according to an exemplary embodiment of this disclosure is shown;

[0028] Figure 3 A schematic diagram of a business process for determining the semantic features of candidate materials according to an exemplary embodiment of the present disclosure is shown.

[0029] Figure 4 A flowchart illustrating a method for recommending writing materials according to another exemplary embodiment of this disclosure is shown;

[0030] Figure 5 A flowchart illustrating a method for recommending writing materials according to yet another exemplary embodiment of this disclosure is shown;

[0031] Figure 6 A schematic block diagram of an apparatus for recommending writing materials according to exemplary embodiments of the present disclosure is shown;

[0032] Figure 7 A structural block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure is shown. Detailed Implementation

[0033] Embodiments of this disclosure will now be described in more detail with reference to the accompanying drawings. While some embodiments of this disclosure are shown in the drawings, it should be understood that this disclosure can be implemented in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a more thorough and complete understanding of this disclosure. It should be understood that the accompanying drawings and embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of protection of this disclosure.

[0034] It should be understood that the steps described in the method embodiments of this disclosure may be performed in different orders and / or in parallel. Furthermore, the method embodiments may include additional steps and / or omit the steps shown. The scope of this disclosure is not limited in this respect.

[0035] The term "comprising" and its variations as used herein are open-ended, meaning "including but not limited to". The term "based on" means "at least partially based on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Definitions of other terms will be given in the description below. It should be noted that the concepts of "first", "second", etc., used in this disclosure are only used to distinguish different devices, modules, or units, and are not intended to limit the order of functions performed by these devices, modules, or units or their interdependencies.

[0036] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0037] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0038] The following description, with reference to the accompanying drawings, outlines recommended methods, apparatuses, electronic devices, and storage media for the writing materials provided in this disclosure.

[0039] The ranking of Chinese essay writing materials is crucial in assisting students with essay or speech writing. Currently, the most common essay writing material retrieval method is based on keyword matching. While this method is fast and interpretable, it ignores the issue of polysemy, resulting in low relevance between retrieved writing materials and the query, leading to poor recommendation performance. Furthermore, existing technologies do not consider the ranking of search results; the most relevant search result may be ranked lower, reducing user satisfaction with the recommended results.

[0040] To address the aforementioned issues, this disclosure provides a method for recommending writing materials. It primarily utilizes natural language understanding technology and Chinese grammatical features to retrieve fragments of Chinese essay writing materials. It employs keyword extraction, part-of-speech fusion, and semantic understanding technologies to achieve accurate matching of search results. Furthermore, during recommendation, it determines the relevance between the query and the writing materials by integrating the number of keyword hits in the query, part-of-speech statistics of the query and the material fragments, and semantic encoding of the query and the material fragments, thereby improving the accuracy of material recommendations. Additionally, it integrates the number of keywords, the number of keyword terms, the recall score, and the number of material tags to adjust the ranking of recommended materials, ensuring that writing materials more relevant to the query are ranked higher, thus improving user satisfaction with the recommendation results.

[0041] Figure 1 A schematic diagram of the module architecture for implementing the method of recommending writing materials disclosed herein is shown. Figure 1 The Chinese essay writing resource database contains millions of data points, with content sourced from K-12 (kindergarten through twelfth grade) educational materials and publicly available excellent essays online. The resources can be in fragment format. The first and second encoding models can be pre-trained semantic encoding models or semantic encoding models from different ends of a pre-trained dual-tower model. The dual-tower model is used to train the semantic relationship between the query and the resources. Semantic encoding is performed on both the query and the resources separately, and the optimal model is obtained through backpropagation by minimizing cross-entropy. The first encoding model is the resource-side encoding model, and the second encoding model is the query-side encoding model. Faiss is an open-source search library for clustering and similarity. Each resource in the Chinese essay writing resource database is pre-encoded semantically using the first encoding model, and the encoded results are standardized and stored in Faiss for later use. The online query module receives the user's query and encodes it using the second encoding model to obtain a representation vector. Then, based on the representation vector, a certain number of candidate resources with the closest similarity are found from Faiss. The coarse ranking module is used to determine the target materials based on the relevance between the query and the candidate materials; the fine ranking module is used to adjust the order of the target materials.

[0042] Figure 2 A flowchart illustrating a method for recommending writing materials according to an exemplary embodiment of the present disclosure is provided. This method can be executed by a writing material recommendation device, which can be implemented in software and / or hardware and is generally integrated into an electronic device, including devices such as mobile phones, tablets, and servers. When applied to a server, the server determines target materials according to the writing material recommendation method of the present disclosure and returns the target materials to the user's terminal for display, thus realizing the recommendation of writing materials. When applied to terminal devices such as mobile phones and tablets, the terminal device determines target materials according to the writing material recommendation method of the present disclosure and displays the target materials to the user. The following embodiments use an application to a server as an example to explain the writing material recommendation method of the present disclosure, but should not be construed as limiting the present disclosure.

[0043] like Figure 2 As shown, the method for recommending this writing material may include the following steps:

[0044] Step 101: Receive the query statement input by the user.

[0045] Users can input query statements through terminal devices such as mobile phones and computers. The server can interact with the terminal devices and receive the query statements input by users through the terminal devices.

[0046] For example, the query entered by the user can be a sentence, such as "materials describing snow scenes", or at least one keyword, such as "snow scene essay" or "youthful dreams". This disclosure does not limit the form of the query.

[0047] Step 102: Based on the query statement, determine multiple candidate materials related to the query statement from the material library.

[0048] In this embodiment of the disclosure, after receiving a query statement input by a user, multiple candidate materials related to the query statement can be determined from the material library.

[0049] For example, the number of words in the query statement contained in the material library can be counted, and a certain number of materials containing the most words in the query statement can be identified as multiple candidate materials.

[0050] For example, the query statement and the materials in the material library can be vectorized respectively. Based on the vectorized representation results, the similarity between the query statement and each material in the material library can be calculated, and a certain number of materials with the highest similarity can be identified as multiple candidate materials.

[0051] Step 103: Perform word segmentation and part-of-speech statistics on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material.

[0052] In this embodiment of the disclosure, the received query statement can be segmented into words, and the number of words of each part of speech in the segmentation results can be counted. Each candidate material can also be segmented into words, and for each candidate material, the number of words of each part of speech contained in the candidate material can be counted. Then, for each candidate material, the part-of-speech dimension feature corresponding to the candidate material can be determined based on the number of words of each part of speech contained in the query statement and the number of words of each part of speech contained in the candidate material.

[0053] For example, different parts of speech and their order can be pre-defined and represented as a matrix with dimensions [number of parts of speech, 1]. For instance, if the pre-defined parts of speech include nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, the matrix dimension would be [6, 1]. The number of words of each part of speech in the query statement is substituted into the corresponding elements of the matrix; for example, the number of nouns is substituted into the element corresponding to nouns, and the number of verbs is substituted into the element corresponding to verbs. This yields the part-of-speech features corresponding to the query statement. Similarly, the number of words of each part of speech in each candidate material is substituted into the corresponding elements of the matrix, yielding the part-of-speech features for each candidate material. Furthermore, for each candidate material, the part-of-speech features corresponding to the query statement and the part-of-speech features corresponding to that candidate material are concatenated in the row direction. The concatenated result is used as the part-of-speech dimension feature corresponding to that candidate material.

[0054] Step 104: Concatenate and encode the query statement and each candidate material to obtain the semantic features corresponding to each candidate material.

[0055] In this embodiment of the disclosure, for the received query statement and each determined candidate material, the query statement can be concatenated with each candidate material to obtain concatenated text. Then, each concatenated file is encoded, and the resulting vector representation is determined as the semantic feature corresponding to each candidate material.

[0056] For example, Figure 3 This illustration shows a business process diagram for determining the semantic features of candidate materials according to an exemplary embodiment of the present disclosure. Figure 3 In this context, a pre-trained model is an encoding model obtained by training an initial model with a large number of training samples. It can encode the input text, resulting in a matrix of preset dimensions (e.g., [768, 512]), where each row represents the vector representation of a character in the text. For example... Figure 3 As shown, the query "materials describing snow scenes" is concatenated with the candidate material "the world is white...", where the query and candidate material are separated by the delimiter "". <sep>The text is separated by a separator, which is not encoded during the encoding process. It should be noted that the "…" in the candidate materials above indicates content from candidate materials that could not be displayed due to space limitations. The concatenated text is input into a pre-trained model for semantic encoding, resulting in a vector corresponding to each character. These vectors are then used to generate a matrix of a preset dimension (if the number of characters is less than the number of rows in the preset dimension, zero vectors can be used to pad it). To ensure that the matrix of the preset dimension contains vectors corresponding to all characters in the concatenated text, the number of rows in the preset dimension can be set to a large value, and the number of columns in the preset dimension can be determined based on the dimension of the vectors output by the pre-trained model. Afterwards, the generated matrix of the preset dimension is subjected to mean pooling to obtain a multi-dimensional vector (e.g., [768,1]), which is the semantic feature corresponding to the candidate materials mentioned above.

[0057] It should be noted that steps 103 and 104 are not executed in any particular order; they can be executed simultaneously or sequentially. Figure 1 The embodiments shown are merely examples of how step 104 is performed after step 103 to illustrate this disclosure, and should not be construed as limiting the scope of this disclosure.

[0058] Step 105: Determine the relevance between the query statement and each candidate material based on the part-of-speech dimension features and the semantic features.

[0059] In this embodiment of the disclosure, after determining the part-of-speech dimension features and semantic features corresponding to each candidate material, for each candidate material, the relevance between the query statement and the candidate material can be determined based on the part-of-speech dimension features and semantic features of the candidate material, thereby obtaining the relevance between the query statement and each candidate material.

[0060] For example, the part-of-speech and semantic features of the same candidate material can be concatenated, and then a preset transformation matrix can be used to transform the concatenated features to obtain a matrix with a dimension of [1,1]. The element values ​​of this matrix are the relevance between the query statement and the candidate material.

[0061] The dimension of the transformation matrix can be determined based on the dimension of the part-of-speech feature and the dimension of the semantic feature. For example, if the dimension of the matrix represented by different parts of speech and their order is [6,1], then the dimension of the part-of-speech feature corresponding to each candidate material is [12,1], and the dimension of the semantic feature corresponding to each candidate material is [768,1]. Therefore, the dimension of the transformation matrix can be set to [780,1].

[0062] Step 106: Based on the relevance, determine a first preset number of target materials from the multiple candidate materials for recommendation.

[0063] The first preset number can be set in advance according to actual needs, such as 8 or 10.

[0064] In this embodiment of the disclosure, after determining the relevance between the query statement and each candidate material, the first preset number of candidate materials with the highest relevance can be selected as the target material from multiple candidate materials based on the relevance, and the target material can be recommended to the user.

[0065] For example, when recommending target materials to users, the target materials can be sorted according to their relevance, and the sorted target materials can be displayed on the user's terminal device to achieve the recommendation of writing materials. This ensures that the materials ranked first are the ones with the highest relevance to the query statement, thus guaranteeing the recommendation effect.

[0066] The writing material recommendation method of this disclosure receives a query statement input by a user, determines multiple candidate materials related to the query statement from a material library, performs word segmentation and part-of-speech tagging on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material, and concatenates and encodes the query statement and each candidate material to obtain the semantic features corresponding to each candidate material. Then, based on the part-of-speech dimension features and semantic features, the relevance between the query statement and each candidate material is determined. Based on the relevance, a first preset number of target materials are selected from the multiple candidate materials for recommendation. Using the solution of this disclosure can improve the accuracy of relevance calculation, thereby improving the accuracy of writing material recommendation.

[0067] It is generally believed that the more keywords a piece of material contains, the higher its quality. Therefore, in one optional embodiment of this disclosure, the number of keywords contained in candidate materials can also be used to characterize the candidate materials and for relevance calculation, thereby improving the quality of the target materials. In embodiments of this disclosure, such as... Figure 4 As shown, in Figure 1 Based on the illustrated embodiment, the method for recommending writing materials further includes:

[0068] Step 107: Determine the keyword features of each candidate material based on the number of keywords contained in each candidate material.

[0069] The keywords can be predetermined.

[0070] For example, it is possible to Figure 1 The Chinese essay material library contains various materials, and the TF-IDF (Term Frequency – Inverse Document Frequency) algorithm is used to count a preset number (e.g., 2000) of keywords in the Chinese essay material library. Each material will have a different number of keywords. In this embodiment of the disclosure, the number of these keywords appearing in each candidate material is counted.

[0071] Furthermore, the keyword features of each candidate material are determined based on the number of keywords contained in each candidate material.

[0072] For example, the keyword features of candidate materials can be a vector with dimensions [1,1]. For instance, if a candidate material contains 3 keywords, then the keyword features corresponding to the candidate material are a matrix [3].

[0073] It should be noted that the execution order of step 107, steps 103, and steps 104 is not important; they can be executed simultaneously or sequentially. Figure 4 The embodiments shown are merely examples of how step 107 is performed after step 104 to illustrate this disclosure, and should not be construed as limiting the scope of this disclosure.

[0074] Furthermore, in embodiments of this disclosure, such as Figure 4 As shown, step 105 may include the following sub-steps:

[0075] Step 201: Determine the relevance between the query statement and each candidate material based on the topic word features, the part-of-speech dimension features, and the semantic features.

[0076] In one optional embodiment of this disclosure, the topic word features, part-of-speech dimension features and semantic features of the same candidate material can be concatenated in the row direction to obtain concatenated features. Then, a preset transformation matrix is ​​used to perform dimension transformation on the concatenated features to obtain a matrix with dimension [1,1]. The element values ​​of this matrix are the relevance between the query statement and the candidate material.

[0077] For example, suppose the candidate material has a matrix of topic word features of [1,1], a matrix of part-of-speech features of [12,1], and a matrix of semantic features of [768,1]. Then, by concatenating the topic word features, part-of-speech features, and semantic features in the row direction, we can obtain a matrix of dimension [781,1]. We can then pre-set a transformation matrix of dimension [1,781] and perform matrix multiplication with the concatenated matrix to obtain a matrix of dimension [1,1].

[0078] In one optional embodiment of this disclosure, when determining the relevance between the query statement and each candidate material, for each candidate material, the topic word features and part-of-speech dimension features of each candidate material can be concatenated to obtain a first concatenated feature; the first concatenated feature is then dimensionally transformed according to a preset first transformation matrix to obtain a first transformed feature; next, the first transformed feature is concatenated with the semantic features of each candidate material to obtain a second concatenated feature, and the second concatenated feature is then dimensionally transformed according to a preset second transformation matrix to obtain a second transformed feature; next, the second transformed feature is dimensionally transformed according to a preset third transformation matrix to obtain a third transformed feature; the feature value of the third transformed feature is determined as the relevance between the query statement and each candidate material.

[0079] For example, the relevance of the query statement to each candidate material can be calculated using the following formula (1).

[0080]

[0081] Here, `relation` represents the third transformation feature, which can be used to determine the relevance between the query and the candidate materials; V z V represents the keyword features of the candidate materials, with dimensions [1,1]; c This represents the part-of-speech dimension feature of the candidate materials; the dimension is related to the preset part-of-speech type. y The semantic features of the candidate material are represented by the dimension, which is related to the output dimension defined in the pre-trained model. "join" means to concatenate two vectors by connecting their beginning and end to form a new vector. In this embodiment, the two features are concatenated in the row direction. and Let V represent the transposes of the first, second, and third transformation matrices, respectively. Their dimensions can be determined by V. z V c V y The dimensions are preset, and the eigenvalues ​​of each transpose matrix can be set according to actual needs.

[0082] For example, assuming the candidate material's topic word feature has a dimension of [1,1], the part-of-speech feature has a dimension of [108,1], and the semantic feature has a dimension of [768,1], then the first concatenated feature obtained by concatenating the topic word feature and the part-of-speech feature of the candidate material has a dimension of [109,1]. The dimension of the first transformation matrix can then be set to [109,768]. The number of rows in the first transformation matrix is ​​related to the dimensions of the topic word feature and the part-of-speech feature. Multiplying the transpose of the first transformation matrix with the first concatenated feature yields the first transformed feature, which has a dimension of [768,1]. Next, according to the above formula (1), the first transformation feature is concatenated with the semantic features of the candidate material to obtain the second concatenated feature. The dimension of the second concatenated feature is [1536, 1]. The dimension of the second transformation matrix can be set to [1536, 768]. The transpose of the second transformation matrix is ​​multiplied with the second concatenated feature to obtain the second transformation feature. The dimension of the second transformation feature is [768, 1]. Next, the transpose of the third transformation matrix (dimension [768, 1]) is multiplied with the second transformation feature to obtain the third transformation feature. The dimension of the third transformation feature is [1, 1]. Thus, the feature value of the third transformation feature is determined as the relevance between the query statement and the candidate material.

[0083] The writing material recommendation method of this disclosure further improves the accuracy of relevance calculation by determining the topic word features of each candidate material based on the number of topic words contained in each candidate material, and then determining the relevance between the query statement and each candidate material based on the topic word features, part-of-speech dimension features, and semantic features. This improves the accuracy of material recommendation and the quality of the recommended target materials.

[0084] In one optional embodiment of this disclosure, when determining the part-of-speech dimension feature corresponding to each candidate material, the query statement and each candidate material can be segmented into words first to obtain the first word segment corresponding to the query statement and the second word segment corresponding to each candidate material. Then, according to a preset part-of-speech dimension table, the number of each part of speech contained in the first word segment is counted to obtain the first part-of-speech feature. According to the part-of-speech dimension table, the number of each part of speech contained in the second word segment corresponding to each candidate material is counted to obtain the second part-of-speech feature. According to the part-of-speech dimension table, the number of the same part of speech contained in the first word segment and the second word segment corresponding to each candidate material is counted to obtain the third part-of-speech feature. Furthermore, for each candidate material, the first part-of-speech feature, the second part-of-speech feature, and the third part-of-speech feature are concatenated to obtain the fourth part-of-speech feature corresponding to each candidate material. The fourth part-of-speech feature is then standardized to obtain the part-of-speech dimension feature corresponding to each candidate material.

[0085] The part-of-speech dimension table can be pre-set according to actual needs. For example, 36 different parts of speech, such as common nouns, time nouns, and locative nouns, can be pre-set. Each part of speech corresponds to a different part-of-speech code, and they are sorted in order to obtain the part-of-speech dimension table, as shown in Table 1.

[0086] Table 1

[0087]

[0088]

[0089] In this embodiment of the disclosure, after segmenting the query statement to obtain the first segment and segmenting each candidate material to obtain the second segment corresponding to each candidate material, the number of each part of speech contained in the first segment can be counted according to the part-of-speech dimension table shown in Table 1. The counts are then filled into the corresponding positions according to the order in Table 1 to obtain a first part-of-speech feature with a dimension of [36,1]. For the second segment obtained from segmenting each candidate material (one candidate material corresponds to one set of second segment), the number of each part of speech contained in the second segment can be counted according to the part-of-speech dimension table shown in Table 1. The counts are then filled into the corresponding positions according to the order in Table 1 to obtain a second part-of-speech feature with a dimension of [36,1]. Thus, the second part-of-speech feature corresponding to each candidate material is obtained. Based on the first word segment and each group of second word segments, the number of times the same part of speech is contained in the first and second word segments can be counted, that is, the number of each co-occurring part of speech in the query statement and candidate materials. Following the order in Table 1 above, the count of each co-occurring part of speech is filled into the corresponding position to obtain a third part of speech feature with a dimension of [36,1]. For example, if the first word segment corresponding to the query statement includes 1 common noun, 1 adjective, and 1 conjunction, and the second word segment corresponding to a certain candidate material contains 20 common nouns, 12 adjectives, 5 adverbs, and 7 conjunctions, etc., then the co-occurring parts of speech can be determined to be common nouns, adjectives, and conjunctions, with corresponding counts of 21, 13, and 8 respectively. Therefore, in the obtained third part of speech feature, the element value corresponding to the common noun position is 21, the element value corresponding to the adjective position is 13, the element value corresponding to the conjunction position is 8, and the element value corresponding to other parts of speech is 0. Next, the first, second, and third part-of-speech features corresponding to the same candidate material are concatenated to obtain a fourth part-of-speech feature with a dimension of [108,1]. The fourth part-of-speech feature is then standardized to have a center of 0, a mean of 0, and a variance of 1, thus obtaining the part-of-speech dimension features corresponding to each candidate material. The dimension of the part-of-speech dimension features is [108,1].

[0090] In this embodiment of the disclosure, the number of parts of speech contained in the query statement and each candidate material is counted, and the part-of-speech dimension features corresponding to the candidate material are determined for use in calculating the relevance between the candidate material and the query statement, thus providing data support for improving the accuracy of relevance calculation.

[0091] In one optional embodiment of this disclosure, when determining multiple candidate materials similar to the query statement from the material library, a pre-trained query statement-side encoding model of a dual-tower model can be used to determine the request representation vector corresponding to the query statement based on the query statement; the similarity between the request representation vector and each material representation vector in the material library is calculated, wherein the material representation vectors in the material library are obtained by pre-encoding multiple materials using the material-side encoding model of the dual-tower model; then, based on the similarity, a second preset number of target material representation vectors with the highest similarity to the request representation vector are determined from the material library; and the materials corresponding to the target material representation vectors are determined as multiple candidate materials.

[0092] The second preset number can be set in advance according to actual needs, such as 100, 130, etc.

[0093] In this embodiment, a dual-tower model can be pre-trained to train the semantic relationship between query statements and writing materials. During training, semantic encoding is performed on both the query statements and the materials, and the optimal model is obtained through backpropagation by minimizing cross-entropy. After the dual-tower model is trained, the material-side encoding model of the dual-tower model can be used to encode multiple materials in advance, obtaining the material representation vector corresponding to each material and storing it in a material library (e.g., Figure 1 As shown in the diagram (Faiss), in actual use, there is no need to encode the material, which helps to improve the speed and efficiency of material retrieval. After receiving the query statement input by the user, the query statement encoding model of the dual-tower model can be used to determine the vector representation corresponding to the query statement (denoted as [x1 x2 x3…x)). N ]), called the request representation vector, where N represents the dimension of the vector. Then, the similarity between the request representation vector and each material representation vector in the material library is calculated, and a second preset number of target material representation vectors with the highest similarity to the query statement are determined from the material library.

[0094] For example, when calculating similarity, the similarity between the request representation vector and each material representation vector can be calculated using the cosine similarity calculation formula shown in formula (2).

[0095]

[0096] Where, x i This indicates a request to represent the value of the i-th element in the vector, y. i Let represent the value of the i-th element in the source representation vector, and d represent the cosine similarity between the request representation vector and the source representation vector.

[0097] In this embodiment of the disclosure, after determining the target material representation vector, the material corresponding to the target material representation vector can be identified as multiple candidate materials.

[0098] For example, writing materials and their corresponding material representation vectors can be marked with the same unique identifier. After the target material representation vector is determined, materials marked with the same unique identifier can be found based on the unique identifier of the target material representation vector, and these materials can be used as candidate materials.

[0099] In this embodiment, multiple materials are pre-encoded using the material-end encoding model of the dual-tower model, and the resulting material representation vectors are stored in the material library for later use. This avoids the time-consuming process of determining the representation vector corresponding to each material during material retrieval, thus improving the speed and efficiency of material search. By calculating the similarity between the request representation vector of the query statement and each material representation vector, materials corresponding to a second preset number of target material representation vectors are selected as candidate materials based on the similarity. This achieves the initial screening of writing materials, reduces the computational load of subsequent target material determination, and improves the speed and efficiency of target material recommendation.

[0100] In the field of search result ranking, NDCG (Normalized Discounted Cumulative Gain) is commonly used as a metric to measure the performance of ranking models. This metric measures whether the ranking model places more relevant results in higher positions, i.e., it aims to prioritize the most relevant results. However, the inventors of this disclosure, during their research on ranking and recommending writing materials, discovered that the target materials recommended using the above embodiments sometimes fail to place the most relevant target materials in the first position, reducing the accuracy of the recommendations and user satisfaction. To address this issue, this disclosure further provides a scheme for fine-tuning the ranking of target materials before recommendation, achieving the goal of precise ranking of target materials. This fine-tuning scheme can be deployed in... Figure 1 The fine-tuning module shown is used for implementation. The following is a detailed explanation of the specific process of fine-tuning the sorting of the target materials, with reference to the attached diagram.

[0101] Figure 5 A flowchart illustrating a method for recommending writing materials according to yet another exemplary embodiment of this disclosure is shown, such as... Figure 5 As shown, based on the foregoing embodiments, step 106 may include the following sub-steps:

[0102] Step 301: Based on the relevance, determine the target material with the highest relevance from the plurality of candidate materials, which is the first preset number.

[0103] In this embodiment of the disclosure, after calculating the relevance between the query statement and each candidate material, the candidate materials can be sorted in descending order of relevance, and the first preset number of candidate materials at the top of the sort can be selected as the target material.

[0104] Step 302: Determine the first probability sequence corresponding to the target material based on the relevance of the target material.

[0105] In this embodiment of the disclosure, after the target material is determined, the first probability sequence corresponding to the target material can be determined according to the relevance of the target material.

[0106] For example, the identified target materials can be arranged into a material sequence according to their relevance from high to low, denoted as D = [doc1, doc2, doc3, ..., doc...]. m ], where m is the total number of target materials, doc1 is the target material with the highest relevance, doc2 has the next highest relevance after doc1, and doc m The material with the lowest relevance among the target materials is denoted as P1 = [P 1.1 P 1.2 P 1.3 , ..., P 1.m Each probability in the first probability sequence is obtained by calculating the correlation, and the specific calculation formula is shown in formula (3) below.

[0107]

[0108] In formula (3), j takes values ​​from 1 to m, relation j P represents the relevance of the j-th target material in the material sequence. 1.j This represents the score probability of the j-th target material.

[0109] Step 303: Based on the first probability sequence, determine the target score sequence with the smallest relative entropy to the first probability sequence from among multiple score sequences.

[0110] The multiple score sequences can be selected from a preset probability sequence, and the number of score sequences can be at least two.

[0111] In one optional embodiment of this disclosure, for a sequence of target materials, multiple probability sequences corresponding to the material sequence can be calculated from different dimensions, and then at least two of these sequences can be selected as scoring sequences for fine-tuning the ranking of the target materials. These multiple dimensions may include, but are not limited to, keywords, relevance, and carried tags. The determination of the probability sequences corresponding to different dimensions is described in detail below.

[0112] Regarding the keyword dimension, when determining the probability sequence of the material sequence, for each target material, the second probability sequence corresponding to the target material can be determined based on the number of keywords contained in the target material and the number of words in the query statement contained in the target material.

[0113] For example, suppose the first target material doc1 in the material sequence contains 3 keywords, and one keyword appears in the query statement (i.e., it contains one keyword from the query statement). Then, the score for target material doc1 is (3 + 1 = 4). By calculating the score for each target material in the material sequence in this way, we obtain the score sequence for the keyword dimension of the material sequence, denoted as score = [score1, score2, score3, ..., score...]. m ], where score j (j = 1 to m) represents the score corresponding to the j-th target material in the material sequence D. Then, based on the score sequence, the second probability sequence corresponding to the material sequence D can be obtained, denoted as P2 = [P...]. 2.1 P 2.2 P 2.3 , ..., P 2.m ], where the score probability of the j-th target material in the second probability sequence is the score corresponding to the j-th target material. j (j=1~m) is the value of the total score of the m target materials.

[0114] Regarding the relevance dimension, when determining the probability sequence of the material sequence, a third probability sequence can be determined for each target material based on the similarity between the query and the target material, as well as the relevance between the query and the target material.

[0115] As an optional implementation, when determining the third probability sequence corresponding to the target material, for the j-th target material in the material vector D, the similarity between the query statement and the j-th target material (the similarity has been calculated when determining the candidate materials) and the relevance between the query statement and the j-th target material (already calculated in step 105) can be obtained. The mean of the similarity and relevance is calculated, and the obtained mean is used as the score probability of the j-th target material. Thus, the third probability sequence is scored, denoted as P3 = [P 3.1 P 3.2 P 3.3 , ..., P 3.m The score probability of the j-th target material in the third probability sequence is the mean of the similarity and relevance of the j-th target material.

[0116] As an optional implementation, when determining the third probability sequence corresponding to the target material, a first preset weight corresponding to similarity and a second preset weight corresponding to relevance can be obtained first. The first and second preset weights can be set according to actual needs, and their sum is 1. For example, the first preset weight can be set to 0.32 and the second preset weight to 0.68. Next, for the j-th target material in the material sequence D, the similarity between the query statement and the j-th target material, as well as the relevance between the query statement and the j-th target material, can be weighted and summed according to the first and second preset weights to obtain the recall score S of the j-th target material. j = First preset weight * d j +Second preset weight*relation j , where d j Relation represents the similarity between the query and the j-th target material. j This represents the relevance between the query and the j-th target content. Then, based on the recall score S of the j-th target content... j The score probability of the j-th target material can be determined, thus obtaining the third probability sequence of the material sequence, denoted as P3 = [P 3.1 P 3.2 P 3.3 , ..., P 3.m The score probability of the j-th target material in the third probability sequence is the recall score S corresponding to the j-th target material. j Divide by the sum of the recall scores of the m target materials in the material sequence.

[0117] Regarding the dimension of carried tags, when determining the probability sequence of the material sequence, for each target material, the number of tags carried by the target material can be counted, and the number of tags carried by the target material can be used as the score of the corresponding target material. Based on the score, the fourth probability sequence corresponding to the target material can be determined.

[0118] In this embodiment, each writing material is tagged with at least one preset tag upon entry into the database. These preset tags can include, but are not limited to, hundreds of tags such as "metaphor," "parallelism," "allusion," "classical Chinese," and "famous figures." The more tags a writing material carries, the better it is considered. Therefore, this embodiment uses the number of tags carried by the target material as a dimension to score it; the number of tags represents the score of the corresponding target material. Then, based on the score of each target material, a fourth probability sequence corresponding to the target material is determined.

[0119] For example, for the j-th target material in the material sequence D, the number of tags carried by the j-th target material is counted as the score of the j-th target material. For example, if the first target material carries 3 tags, then the score of the first target material is 3 points. Next, the score of the j-th target material is calculated by dividing the score of the j-th target material by the total score of the m target materials in the material sequence D to obtain the score probability of the j-th target material. The score probabilities corresponding to the m target materials constitute the fourth probability sequence, denoted as P4 = [P 4.1 P 4.2 P 4.3 , ..., P 4.m ].

[0120] Furthermore, after calculating the second, third, and fourth probability sequences, at least two probability sequences can be selected as multiple scoring sequences. For example, the second and third probability sequences can be selected as scoring sequences, or the second, third, and fourth probability sequences can be selected as scoring sequences. This disclosure does not impose any restrictions on the selection of scoring sequences.

[0121] In this embodiment of the disclosure, multiple probability sequences corresponding to the material sequence are calculated from multiple dimensions such as the number of keywords, recall score and the number of tags carried by the target material, and the score sequence is determined from them to fine-tune the ranking of the target material, thereby enabling accurate ranking of recommended writing materials.

[0122] In this embodiment of the disclosure, after determining multiple score sequences, the target score sequence with the minimum relative entropy to the first probability sequence can be determined from the multiple score sequences based on the first probability sequence.

[0123] Relative entropy, also known as Kullback-Leibler Divergence (KLD), is a measure of the asymmetry of the difference between two probability distributions, that is, it measures the "distance" between two probability distributions. The smaller the relative entropy, the smaller the difference between the two probability distributions, that is, the smaller the "distance".

[0124] In other words, in this embodiment of the disclosure, the KL divergence between the first probability sequence and each of the multiple score sequences can be calculated, and the score sequence with the smallest KL divergence from the first probability distribution can be selected as the target score sequence. Alternatively, it can be understood as selecting the score sequence from the multiple score sequences that is closest to the first probability distribution as the target score sequence.

[0125] Step 304: Sort the target materials according to the score probability of each target material in the target score sequence and recommend them to the user.

[0126] In this embodiment of the disclosure, after the target score sequence is determined, the target materials can be sorted according to the score probability of each target material in the target score sequence, and the reordered target materials can be recommended to the user.

[0127] For example, suppose there are 10 target materials, and the material sequence D = [doc1, doc2, doc3, ..., doc...]. 10 The material sequence D is obtained by sorting 10 target materials in descending order of relevance. After calculating the relative entropy, the target score sequence is determined to be P3 = [P 3.1 P 3.2 P 3.3 , ..., P 3.10 Assume that in the target score sequence, the scores with probabilities decreasing in the order of P... 3.3 >P 3.1 >P 3.2 >P 3.6 >P 3.4 >P 3.5 >P 3.7 >P 3.8 >P 3.10 >P 3.9 The sorting of the target materials will be slightly adjusted from the original sorting result to a new sorting result, which is: doc3, doc1, doc2, doc6, doc4, doc5, doc7, doc8, doc 10 doc9. Then, the target material is recommended to the user according to the new sorting results. Comparing the two sortings before and after the fine-tuning, it can be seen that the target material doc3, which was originally ranked third, is now ranked first after the fine-tuning.

[0128] The writing material recommendation method of this disclosure determines the target material with the highest relevance from multiple candidate materials based on relevance, and determines the first probability sequence corresponding to the target material based on the relevance of the target material. Then, based on the first probability sequence, it determines the target score sequence with the smallest relative entropy with the first probability sequence from multiple score sequences. Finally, it sorts the target materials according to the score of each target material in the target score sequence and recommends them to the user. Thus, it achieves accurate sorting of the determined target materials so that the most relevant materials are placed at the top, which helps to improve the user's satisfaction with the sorting results.

[0129] This exemplary embodiment also provides a device for recommending writing materials. Figure 6 A schematic block diagram of an apparatus for recommending writing materials according to exemplary embodiments of the present disclosure is shown, such as Figure 6 As shown, the writing material recommendation device 60 includes: a receiving module 610, a first determining module 620, a first acquiring module 630, a second acquiring module 640, a second determining module 650, and a recommendation module 660.

[0130] The receiving module 610 is used to receive the query statement input by the user.

[0131] The first determining module 620 is used to determine multiple candidate materials related to the query statement from the material library based on the query statement;

[0132] The first acquisition module 630 is used to perform word segmentation and part-of-speech statistics on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material;

[0133] The second acquisition module 640 is used to concatenate and encode the query statement and each candidate material to obtain the semantic features corresponding to each candidate material.

[0134] The second determining module 650 is used to determine the relevance between the query statement and each candidate material based on the part-of-speech dimension features and the semantic features;

[0135] The recommendation module 660 is used to determine a first preset number of target materials from the plurality of candidate materials for recommendation based on the relevance.

[0136] Optionally, the writing material recommendation device 60 further includes:

[0137] The third determining module is used to determine the keyword features of each candidate material based on the number of keyword terms contained in each candidate material.

[0138] The second determining module 650 is further configured to:

[0139] Based on the keyword features, the part-of-speech dimension features, and the semantic features, the relevance between the query statement and each candidate material is determined.

[0140] Optionally, the second determining module 650 is further configured to:

[0141] For each candidate material, the topic word features and part-of-speech dimension features of each candidate material are concatenated to obtain the first concatenated feature;

[0142] According to the preset first transformation matrix, the first spliced ​​feature is dimensionally transformed to obtain the first transformed feature;

[0143] The first transformation feature is concatenated with the semantic feature of each candidate material to obtain the second concatenated feature;

[0144] According to the preset second transformation matrix, the second spliced ​​feature is subjected to dimensional transformation to obtain the second transformed feature;

[0145] According to the preset third transformation matrix, the second transformation feature is dimensionally transformed to obtain the third transformation feature;

[0146] The feature value of the third transformation feature is determined as the relevance between the query statement and each candidate material.

[0147] Optionally, the first acquisition module 630 is further configured to:

[0148] The query statement and each candidate material are segmented into words to obtain the first word segment corresponding to the query statement and the second word segment corresponding to each candidate material;

[0149] According to the preset part-of-speech dimension table, the number of each part of speech contained in the first word segment is counted to obtain the first part-of-speech feature;

[0150] According to the part-of-speech dimension table, the number of each part of speech contained in the second word segment corresponding to each candidate material is counted to obtain the second part-of-speech feature;

[0151] According to the part-of-speech dimension table, the number of times the first word segment and the second word segment corresponding to each candidate material contain the same part of speech is counted to obtain the third part-of-speech feature;

[0152] For each candidate material, the first part-of-speech feature, the second part-of-speech feature, and the third part-of-speech feature are concatenated to obtain the fourth part-of-speech feature corresponding to each candidate material;

[0153] The fourth part-of-speech feature is standardized to obtain the part-of-speech dimension feature corresponding to each candidate material.

[0154] Optionally, the first determining module 620 is further configured to:

[0155] Using a pre-trained dual-tower model query statement encoding model, the request representation vector corresponding to the query statement is determined based on the query statement;

[0156] Calculate the similarity between the request representation vector and each material representation vector in the material library, wherein the material representation vectors in the material library are obtained by encoding multiple materials in advance using the material-end encoding model of the dual-tower model;

[0157] Based on the similarity, a second preset number of target material representation vectors with the highest similarity to the requested representation vector are determined from the material library;

[0158] The material corresponding to the target material representation vector is determined as one of the multiple candidate materials.

[0159] Optionally, the recommendation module 660 includes:

[0160] The first determining unit is used to determine, based on the relevance, the first preset number of target materials with the highest relevance from the plurality of candidate materials;

[0161] The second determining unit is used to determine the first probability sequence corresponding to the target material based on the relevance of the target material.

[0162] The third determining unit is used to determine, from multiple score sequences, the target score sequence with the smallest relative entropy to the first probability sequence based on the first probability sequence.

[0163] The sorting unit is used to sort the target materials according to the score probability of each target material in the target score sequence and then recommend them to the user.

[0164] Optionally, the writing material recommendation device 60 further includes:

[0165] The fourth determining module is used to determine the second probability sequence corresponding to the target material based on the number of topic words contained in the target material and the number of words in the query statement contained in the target material;

[0166] The fifth determining module is used to determine the third probability sequence corresponding to the target material based on the similarity between the query statement and the target material, and the relevance between the query statement and the target material;

[0167] The sixth determining module is used to take the number of tags carried by the target material as the score of the corresponding target material, and determine the fourth probability sequence corresponding to the target material based on the score;

[0168] The selection module is used to select at least two from the second probability sequence, the third probability sequence, and the fourth probability sequence as the plurality of score sequences.

[0169] Optionally, the fifth determining module is further configured to:

[0170] Obtain the first preset weight corresponding to similarity and the second preset weight corresponding to relevance;

[0171] Based on the first preset weight and the second preset weight, the similarity between the query statement and the target material and the relevance between the query statement and the target material are weighted and summed to obtain the recall score of the target material;

[0172] Based on the recall score of the target material, a third probability sequence corresponding to the target material is determined.

[0173] The writing material recommendation device provided in this disclosure can execute any writing material recommendation method applicable to electronic devices provided in this disclosure, and has the corresponding functional modules and beneficial effects for executing the method. Content not described in detail in the device embodiments of this disclosure can be referred to the descriptions in any method embodiments of this disclosure.

[0174] Exemplary embodiments of this disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, causing the electronic device to perform a method for recommending writing materials according to embodiments of this disclosure.

[0175] Exemplary embodiments of this disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a computer's processor, is used to cause the computer to perform a method of recommending writing material according to embodiments of this disclosure.

[0176] Exemplary embodiments of this disclosure also provide a computer program product, including a computer program, wherein, when executed by a computer's processor, the computer program is used to cause the computer to perform a method for recommending writing materials according to embodiments of this disclosure.

[0177] refer to Figure 7 The present invention describes a structural block diagram of an electronic device 1100 that can serve as a server or client of the present disclosure, which is an example of a hardware device that can be applied to various aspects of the present disclosure. The electronic device is intended to represent various forms of digital electronic computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and / or claimed herein.

[0178] like Figure 7 As shown, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a random access memory (RAM) 1103. The RAM 1103 may also store various programs and data required for the operation of the device 1100. The computing unit 1101, ROM 1102, and RAM 1103 are interconnected via a bus 1104. An input / output (I / O) interface 1105 is also connected to the bus 1104.

[0179] Multiple components in electronic device 1100 are connected to I / O interface 1105, including: input unit 1106, output unit 1107, storage unit 1108, and communication unit 1109. Input unit 1106 can be any type of device capable of inputting information to electronic device 1100. Input unit 1106 can receive input digital or character information and generate key signal inputs related to user settings and / or function control of electronic device. Output unit 1107 can be any type of device capable of presenting information and may include, but is not limited to, a display, speaker, video / audio output terminal, vibrator, and / or printer. Storage unit 1108 may include, but is not limited to, disk and optical disk. Communication unit 1109 allows electronic device 1100 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and / or chipsets, such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communication devices, and / or the like.

[0180] The computing unit 1101 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the various methods and processes described above. For example, in some embodiments, the method of recommending writing materials can be implemented as a computer software program tangibly contained in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program can be loaded and / or installed on the electronic device 1100 via ROM 1102 and / or communication unit 1109. In some embodiments, the computing unit 1101 can be configured to perform the method of recommending writing materials by any other suitable means (e.g., by means of firmware).

[0181] The program code used to implement the methods of this disclosure may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing apparatus, such that when executed by the processor or controller, the program code causes the functions / operations specified in the flowcharts and / or block diagrams to be implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0182] In the context of this disclosure, a machine-readable medium can be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium can be, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

[0183] As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and / or apparatus (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and / or data to a programmable processor, including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal for providing machine instructions and / or data to a programmable processor.

[0184] To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device for displaying information to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the computer. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0185] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as a data server), or computing systems that include middleware components (e.g., an application server), or computing systems that include frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with embodiments of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., a communication network). Examples of communication networks include local area networks (LANs), wide area networks (WANs), and the Internet.

[0186] Computer systems can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other.< / sep>

Claims

1. A method for recommending writing materials, wherein, The method includes: Receive query statements input by the user; Based on the query statement, multiple candidate materials related to the query statement are determined from the material library; The query statement and each candidate material are segmented and part-of-speech statistics are performed to obtain the part-of-speech dimension features corresponding to each candidate material; The query statement and each candidate material are concatenated and encoded to obtain the semantic features corresponding to each candidate material; Based on the part-of-speech dimension features and the semantic features, the relevance between the query statement and each candidate material is determined; Based on the relevance, a first preset number of target materials are determined from the plurality of candidate materials for recommendation, including: Based on the relevance, the target material with the highest relevance among the plurality of candidate materials is determined from the first preset number of such materials; Based on the relevance of the target material, a first probability sequence corresponding to the target material is determined; Based on the first probability sequence, determine the target score sequence with the minimum relative entropy to the first probability sequence from multiple score sequences; The target materials are sorted according to the score probability of each target material in the target score sequence and then recommended to the user; Based on the number of keywords contained in the target material and the number of words in the query statement contained in the target material, a second probability sequence corresponding to the target material is determined; Based on the similarity between the query statement and the target material, and the relevance between the query statement and the target material, a third probability sequence corresponding to the target material is determined; The number of tags carried by the target material is used as the score of the target material, and the fourth probability sequence corresponding to the target material is determined based on the score; At least two of the second probability sequence, the third probability sequence, and the fourth probability sequence are selected as the plurality of score sequences.

2. The method for recommending writing materials as described in claim 1, wherein, The method further includes: The keyword features of each candidate material are determined based on the number of keywords contained in each candidate material. Furthermore, determining the relevance between the query statement and each candidate material based on the part-of-speech dimension features and the semantic features includes: Based on the keyword features, the part-of-speech dimension features, and the semantic features, the relevance between the query statement and each candidate material is determined.

3. The method for recommending writing materials as described in claim 2, wherein, The step of determining the relevance between the query statement and each candidate material based on the topic term features, the part-of-speech dimension features, and the semantic features includes: For each candidate material, the topic word features and part-of-speech dimension features of each candidate material are concatenated to obtain the first concatenated feature; According to the preset first transformation matrix, the first spliced ​​feature is dimensionally transformed to obtain the first transformed feature; The first transformation feature is concatenated with the semantic feature of each candidate material to obtain the second concatenated feature; According to the preset second transformation matrix, the second spliced ​​feature is subjected to dimensional transformation to obtain the second transformed feature; According to the preset third transformation matrix, the second transformation feature is dimensionally transformed to obtain the third transformation feature; The feature value of the third transformation feature is determined as the relevance between the query statement and each candidate material.

4. The method for recommending writing materials as described in claim 1, wherein, The step of performing word segmentation and part-of-speech tagging on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material includes: The query statement and each candidate material are segmented into words to obtain the first word segment corresponding to the query statement and the second word segment corresponding to each candidate material; According to the preset part-of-speech dimension table, the number of each part of speech contained in the first word segment is counted to obtain the first part-of-speech feature; According to the part-of-speech dimension table, the number of each part of speech contained in the second word segment corresponding to each candidate material is counted to obtain the second part-of-speech feature; According to the part-of-speech dimension table, the number of times the first word segment and the second word segment corresponding to each candidate material contain the same part of speech is counted to obtain the third part-of-speech feature; For each candidate material, the first part-of-speech feature, the second part-of-speech feature, and the third part-of-speech feature are concatenated to obtain the fourth part-of-speech feature corresponding to each candidate material; The fourth part-of-speech feature is standardized to obtain the part-of-speech dimension feature corresponding to each candidate material.

5. The method for recommending writing materials as described in claim 1, wherein, The step of determining multiple candidate materials related to the query statement from the material library, based on the query statement, includes: Using a pre-trained dual-tower model query statement encoding model, the request representation vector corresponding to the query statement is determined based on the query statement; Calculate the similarity between the request representation vector and each material representation vector in the material library, wherein the material representation vectors in the material library are obtained by encoding multiple materials in advance using the material-end encoding model of the dual-tower model; Based on the similarity, a second preset number of target material representation vectors with the highest similarity to the requested representation vector are determined from the material library; The material corresponding to the target material representation vector is determined as one of the multiple candidate materials.

6. The method for recommending writing materials as described in claim 1, wherein, The step of determining the third probability sequence corresponding to the target material based on the similarity between the query statement and the target material, and the relevance between the query statement and the target material, includes: Obtain the first preset weight corresponding to similarity and the second preset weight corresponding to relevance; Based on the first preset weight and the second preset weight, the similarity between the query statement and the target material and the relevance between the query statement and the target material are weighted and summed to obtain the recall score of the target material; Based on the recall score of the target material, a third probability sequence corresponding to the target material is determined.

7. A device for recommending writing materials, wherein, The device includes: The receiving module is used to receive the query statement input by the user; The first determining module is used to determine multiple candidate materials related to the query statement from the material library based on the query statement; The first acquisition module is used to perform word segmentation and part-of-speech statistics on the query statement and each candidate material to obtain the part-of-speech dimension features corresponding to each candidate material; The second acquisition module is used to concatenate and encode the query statement and each candidate material to obtain the semantic features corresponding to each candidate material. The second determining module is used to determine the relevance between the query statement and each candidate material based on the part-of-speech dimension features and the semantic features; The recommendation module is used to determine a first preset number of target materials from the plurality of candidate materials for recommendation based on the relevance, including: Based on the relevance, the target material with the highest relevance among the plurality of candidate materials is determined from the first preset number of such materials; Based on the relevance of the target material, a first probability sequence corresponding to the target material is determined; Based on the first probability sequence, determine the target score sequence with the minimum relative entropy to the first probability sequence from multiple score sequences; The target materials are sorted according to the score probability of each target material in the target score sequence and then recommended to the user; Based on the number of keywords contained in the target material and the number of words in the query statement contained in the target material, a second probability sequence corresponding to the target material is determined; Based on the similarity between the query statement and the target material, and the relevance between the query statement and the target material, a third probability sequence corresponding to the target material is determined; The number of tags carried by the target material is used as the score of the target material, and the fourth probability sequence corresponding to the target material is determined based on the score; At least two of the second probability sequence, the third probability sequence, and the fourth probability sequence are selected as the plurality of score sequences.

8. An electronic device, comprising: processor; as well as Stored program memory, The program includes instructions that, when executed by the processor, cause the processor to perform the method of recommending writing materials according to any one of claims 1-6.

9. A non-transitory computer-readable storage medium storing computer instructions, wherein, The computer instructions are used to cause the computer to perform the method of recommending writing materials according to any one of claims 1-6.