Intangible cultural heritage news recommendation method and device, equipment and storage medium
By using the BERT-BiLSTM-CRM model and intangible cultural heritage knowledge graph, the title and keyword feature vectors of intangible cultural heritage news are obtained. Combined with user preferences, the problem of inaccurate intangible cultural heritage news recommendations in traditional recommendation algorithms is solved, and more accurate intangible cultural heritage news recommendations are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HUAZHONG NORMAL UNIV
- Filing Date
- 2024-08-15
- Publication Date
- 2026-06-12
AI Technical Summary
Traditional news recommendation algorithms struggle to accurately recommend intangible cultural heritage news based on user preferences, primarily because intangible cultural heritage news contains a large number of proper noun entities and special entity relationships, and the lack of a thematic corpus leads to poor recommendation results.
Using the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, we obtain the title feature vector and keyword feature vector of intangible cultural heritage news. We then fuse the title and keyword features through an attention mechanism and combine them with user click history to generate user preference feature vectors to push intangible cultural heritage news.
It enables accurate recommendations of intangible cultural heritage news based on user preferences, improving the accuracy and effectiveness of recommendations and making up for the problem of insufficient semantic information in short titles.
Smart Images

Figure CN119202219B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of information processing technology, and in particular to a method, apparatus, equipment and storage medium for recommending intangible cultural heritage news. Background Technology
[0002] Intangible cultural heritage (ICH) is an important part of China's outstanding traditional culture, carrying rich historical and cultural connotations and the genes of Chinese culture.
[0003] Due to the large number of proper noun entities and special entity relationships contained in intangible cultural heritage news, as well as the lack of a thematic corpus on intangible cultural heritage, traditional news recommendation algorithms struggle to accurately recommend intangible cultural heritage news based on user preferences. Summary of the Invention
[0004] This disclosure provides a method, apparatus, device, and storage medium for recommending intangible cultural heritage news, capable of accurately recommending intangible cultural heritage news based on user preferences. The technical solution includes at least the following:
[0005] Firstly, a method for recommending intangible cultural heritage (ICH) news is provided, comprising: obtaining the title feature vector and keyword feature vector of each ICH news article in the ICH news collection based on the BERT-BiLSTM-CRM model and a first ICH knowledge graph, wherein the keyword feature vector is extracted from the body text of the ICH news article; processing the title feature vector and keyword feature vector of each ICH news article using an attention mechanism to obtain the feature vector of each ICH news article; determining the user preference feature vector corresponding to each candidate news article based on the feature vectors of multiple ICH news articles in the ICH news collection, wherein the candidate news articles are ICH news articles in the ICH news collection that the user has not clicked; and pushing ICH news to the user based on the user preference feature vector corresponding to each candidate news article.
[0006] Optionally, the step of obtaining the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph includes: converting the title of the first intangible cultural heritage news article into multiple first word vectors, wherein the first intangible cultural heritage news article is any one of the multiple intangible cultural heritage news articles; obtaining multiple first entities in the title of the first intangible cultural heritage news article and a first entity vector of each first entity based on the BERT-BiLSTM-CRM model; obtaining multiple first entity context vectors based on the first intangible cultural heritage knowledge graph and the multiple first entities, wherein each first entity context vector corresponds to a first entity; and inputting the multiple first word vectors, the multiple first entity vectors, and the multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news article.
[0007] Optionally, the step of obtaining the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph further includes: obtaining the text keyword sequence of the first intangible cultural heritage news article, wherein the text keyword sequence includes multiple keywords in the text of the first intangible cultural heritage news article; converting the multiple keywords in the text keyword sequence into multiple second word vectors; obtaining multiple second entities and a second entity vector of each second entity in the text keyword sequence of the first intangible cultural heritage news article based on the BERT-BiLSTM-CRM model; obtaining multiple second entity context vectors based on the first intangible cultural heritage knowledge graph and the multiple second entities, wherein each second entity context vector corresponds to a second entity; and inputting the multiple second word vectors, the multiple second entity vectors, and the multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news article.
[0008] Optionally, the step of processing the title feature vector and keyword feature vector of each intangible cultural heritage news article using an attention mechanism to obtain the feature vector of each intangible cultural heritage news article includes: determining the feature vector of each intangible cultural heritage news article using the following formula:
[0009] r n =α t r t +α k r k
[0010] Where, r n Let r be the feature vector of intangible cultural heritage news. t Let r be the title feature vector of the intangible cultural heritage news. k Let a be the keyword feature vector of the intangible cultural heritage news. t a is the attention weight of the title feature vector. k The attention weights are the feature vectors of the keywords.
[0011] Optionally, determining the user preference feature vector corresponding to each candidate news item based on the feature vectors of multiple intangible cultural heritage news items in the intangible cultural heritage news collection includes: determining the user preference vector corresponding to the j-th candidate news item using the following formula:
[0012]
[0013] Where e(i) is the user preference vector of user i for the j-th candidate news item. Let N be the feature vector of the k-th intangible cultural heritage news item clicked by user i, and let N be the total number of clicks by user i. i A news item about intangible cultural heritage. Let t be the weight matrix. j This refers to the j-th candidate news item;
[0014] The weight matrix is calculated using the following formula:
[0015]
[0016] in, Let be the weight matrix, softmax be the normalization function, and e(t) be the weight matrix. j ) is the feature vector of the j-th candidate news.
[0017] Optionally, the step of pushing intangible cultural heritage news to the user based on the user preference feature vector corresponding to each candidate news item includes: determining the user's degree of liking for each candidate news item based on the user preference feature vector corresponding to each candidate news item; sorting the candidate news items in descending order of liking to obtain a first liking order; and pushing the first N candidate news items in the first liking order to the user.
[0018] Secondly, a non-material cultural heritage (NCH) news recommendation device is also provided, comprising: an acquisition module, used to acquire the title feature vector and keyword feature vector of each NCH news article in the NCH news collection based on the BERT-BiLSTM-CRM model and a first NCH knowledge graph, wherein the keyword feature vector is extracted from the body text of the NCH news article; an attention mechanism module, used to process the title feature vector and keyword feature vector of each NCH news article using an attention mechanism to obtain the feature vector of each NCH news article; a user preference feature vector calculation module, used to determine the user preference feature vector corresponding to each candidate news article based on the feature vectors of multiple NCH news articles in the NCH news collection, wherein the candidate news articles are NCH news articles in the NCH news collection that the user has not clicked; and a push module, used to push NCH news articles to the user based on the user preference feature vector corresponding to each candidate news article.
[0019] Optionally, the acquisition module is further configured to convert the title of the first intangible cultural heritage news into multiple first word vectors, wherein the first intangible cultural heritage news is any one of the multiple intangible cultural heritage news; based on the BERT-BiLSTM-CRM model, acquire multiple first entities in the title of the first intangible cultural heritage news and the first entity vector of each first entity; based on the first intangible cultural heritage knowledge graph and the multiple first entities, acquire multiple first entity context vectors, wherein each first entity context vector corresponds to a first entity; input the multiple first word vectors, the multiple first entity vectors and the multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news.
[0020] Optionally, the acquisition module is further configured to acquire the main text keyword sequence of the first intangible cultural heritage news, the main text keyword sequence including multiple keywords in the main text of the first intangible cultural heritage news; convert the multiple keywords in the main text keyword sequence into multiple second word vectors; based on the BERT-BiLSTM-CRM model, acquire multiple second entities in the main text keyword sequence of the first intangible cultural heritage news and the second entity vector of each second entity; based on the first intangible cultural heritage knowledge graph and the multiple second entities, acquire multiple second entity context vectors, each second entity context vector corresponding to a second entity; input the multiple second word vectors, the multiple second entity vectors and the multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news.
[0021] Optionally, the attention mechanism module is further configured to determine the feature vector of each intangible cultural heritage news article using the following formula:
[0022] r n =α t r t +α k r k
[0023] Where, r n Let r be the feature vector of intangible cultural heritage news. t Let r be the title feature vector of the intangible cultural heritage news. k Let a be the keyword feature vector of the intangible cultural heritage news. t a is the attention weight of the title feature vector. k The attention weights are the feature vectors of the keywords.
[0024] Optionally, the user preference feature vector calculation module is further configured to determine the user preference vector corresponding to the j-th candidate news item using the following formula:
[0025]
[0026] Where e(i) is the user preference vector of user i for the j-th candidate news item. Let N be the feature vector of the k-th intangible cultural heritage news item clicked by user i, and let N be the total number of clicks by user i. i A news item about intangible cultural heritage. Let t be the weight matrix. j This refers to the j-th candidate news item;
[0027] The weight matrix is calculated using the following formula:
[0028]
[0029] in, Let be the weight matrix, softmax be the normalization function, and e(t) be the weight matrix. j ) is the feature vector of the j-th candidate news.
[0030] Optionally, the push module is further configured to determine the user's degree of liking for each candidate intangible cultural heritage news item based on the user preference feature vector corresponding to each candidate news item; sort the candidate intangible cultural heritage news items in descending order of liking to obtain a first liking order; and push the first N candidate news items in the first liking order to the user.
[0031] Thirdly, a computer device is also provided, comprising: a memory and a processor, wherein the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor to perform the intangible cultural heritage news push method described in the above embodiments.
[0032] Fourthly, a computer-readable storage medium is also provided, wherein at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to perform the intangible cultural heritage news push method described in the above embodiments.
[0033] Fifthly, a computer program product is provided, including a computer program / instructions that, when executed by a processor, implement the method described in the first aspect.
[0034] The beneficial effects of the technical solutions provided in this disclosure include at least the following:
[0035] In this embodiment, although news headlines are a high-level summary of news content, some intangible cultural heritage news headlines are relatively short and contain less entity and semantic information. If only news headlines are used for recommendation, the effect may not be ideal. Therefore, in this embodiment, the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news set are obtained based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph. This is equivalent to using both the title and the body text of the intangible cultural heritage news article for recommendation, thus enabling accurate recommendation of intangible cultural heritage news based on user preferences. Attached Figure Description
[0036] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0037] Figure 1 A flowchart illustrating an exemplary embodiment of the present disclosure of a method for pushing intangible cultural heritage news is shown.
[0038] Figure 2 This is a schematic diagram of the first intangible cultural heritage knowledge graph;
[0039] Figure 3 A flowchart illustrating another exemplary embodiment of the present disclosure provides a method for pushing intangible cultural heritage news.
[0040] Figure 4 A schematic diagram of the intangible cultural heritage knowledge graph for children;
[0041] Figure 5 This is a schematic diagram of the KCNN structure;
[0042] Figure 6 This is a schematic diagram of the structure of the intangible cultural heritage news recommendation model;
[0043] Figure 7 This illustration shows a schematic diagram of the structure of an intangible cultural heritage news recommendation device provided in an exemplary embodiment of the present disclosure;
[0044] Figure 8 This is a schematic diagram of the structure of a computer device provided in an embodiment of this disclosure. Detailed Implementation
[0045] Unless otherwise defined, the technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure pertains. The terms “first,” “second,” “third,” and similar terms used in this patent application specification and claims do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an” or “a” and similar terms do not indicate a quantity limitation, but rather indicate the presence of at least one. The terms “comprising” or “including” and similar terms mean that the elements or objects preceding “comprising” or “including” encompass the elements or objects listed following “comprising” or “including” and their equivalents, but do not exclude other elements or objects.
[0046] To make the objectives, technical solutions, and advantages of this disclosure clearer, the embodiments of this disclosure will be described in further detail below with reference to the accompanying drawings.
[0047] Figure 1 A flowchart illustrating an exemplary embodiment of this disclosure provides a method for recommending intangible cultural heritage news, which can be executed by a computer device. See also Figure 1 The method includes:
[0048] In step 101, based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection are obtained.
[0049] Among them, the keyword feature vector is extracted from the main text of the intangible cultural heritage news.
[0050] The BERT-BiLSTM-CRM (Bidirectional Encoder Representations from Transformers-Bidirectional Long Short-Term Memory-Conditional Random Fields) model can be divided into three parts: BERT, BiLSTM, and CRM. BERT, a bidirectional encoder representation from Transformers, is capable of understanding complex relationships in text; BiLSTM has strong context capture capabilities; and CRM offers advantages in sequence labeling. Therefore, the BERT-BiLSTM-CRM model, combining these three elements, possesses powerful Chinese entity recognition capabilities, reduces labeling errors, and effectively improves entity recognition accuracy.
[0051] The first intangible cultural heritage knowledge graph is a knowledge graph in the field of intangible cultural heritage. In this embodiment of the disclosure, the first intangible cultural heritage knowledge graph can be an existing intangible cultural heritage knowledge graph, or the first intangible cultural heritage knowledge graph can be constructed using the following steps ac.
[0052] Step a: Obtain a set of intangible cultural heritage text data from the intangible cultural heritage database, and then filter out multiple intangible cultural heritage entities from the set of intangible cultural heritage text data.
[0053] Optionally, the intangible cultural heritage database includes unstructured text information such as the "List of Representative Projects of National Intangible Cultural Heritage" and the list and introduction of representative inheritors of representative projects of national intangible cultural heritage published on the official website "China Intangible Cultural Heritage Network · China Intangible Cultural Heritage Digital Museum" (hereinafter referred to as China Intangible Cultural Heritage Network), which is hosted by the Chinese Academy of Arts and the China Intangible Cultural Heritage Protection Center.
[0054] When retrieving a collection of intangible cultural heritage (ICH) text data from an ICH database, a web crawling framework can be used to scrape all national-level project lists and inheritor lists, as well as a large amount of ICH-related news text data from the China Intangible Cultural Heritage website, thus obtaining the ICH text data collection. The web crawling framework can be the Python-based Scrapy framework.
[0055] Before selecting multiple intangible cultural heritage entities from the intangible cultural heritage text data set, it is first necessary to define the types of intangible cultural heritage entities. In this embodiment of the disclosure, intangible cultural heritage entities include intangible cultural heritage projects, inheritors, time, geographical location, project category, project level, project batch, dynasty of origin, and region of application. Among them, intangible cultural heritage projects, inheritors, time, and geographical location can be considered first-level entities, while project category, project level, project batch, dynasty of origin, and region of application can be considered second-level entities. First-level entities are of higher importance than second-level entities.
[0056] Optionally, the BERT-BiLSTM-CRM model can be used to filter out multiple intangible cultural heritage entities from the intangible cultural heritage text dataset.
[0057] Step b: Obtain the entity relationships of multiple intangible cultural heritage entities.
[0058] Optionally, the Language Technology Platform (LTP) developed by the Harbin Institute of Technology team can be used to obtain the entity relationships of multiple intangible cultural heritage entities. Ultimately, multiple entity relationship triples can be obtained, with the structure of <entity, relationship, entity>.
[0059] Optionally, for any entity, entity attributes can also be defined. Clicking on the entity in the knowledge graph will display its entity attributes. Entity attributes can include data attributes and object attributes. For example, entities of the type "Intangible Cultural Heritage Project" include data attributes such as project name, number, batch, category, level, application region, protection unit, origin dynasty, and announcement time; object attributes of entities of the type "Intangible Cultural Heritage Project" include other related entities such as inheritor, project category, project level, and origin dynasty, which are the inherent connections between entities, also known as relational attributes.
[0060] Step c: Construct the first intangible cultural heritage knowledge graph based on multiple intangible cultural heritage entities and their relationships.
[0061] After obtaining triples of multiple entity relationships, the first intangible cultural heritage knowledge graph can be constructed based on these triples.
[0062] For example, the first intangible cultural heritage knowledge graph is constructed using the open-source graph database Neo4j. Neo4j has advantages such as high performance, built-in visualization tools, efficient graph traversal, ability to handle billions of nodes and relationships, and support for the Cypher language.
[0063] When importing intangible cultural heritage (ICH) entities, the identified ICH entities are first categorized and stored as CSV files. Then, the LOAD CSV command in Cypher is used to batch import the entities from the CSV files into the Neo4j database, forming a series of nodes. Next, the MATCH command in Cypher is used to query entities that meet specific conditions and create relationships between them, thus obtaining the first ICH knowledge graph. For example, the statement "MATCH(from:project),(to:project_type)where from.type=to.type MERGE(from)-[r:project_type]->(to)return r" constructs a triplet of relationships between the ICH project and ICH category entities based on the common attribute "type" of the project entity "project" and the project category entity "project_type".
[0064] The collected textual information on intangible cultural heritage projects and inheritors was processed using the steps described above (ac), and then imported into the Neo4j graph database in batches to form an intangible cultural heritage knowledge graph, resulting in a total of 6731 nodes and 25523 relationships.
[0065] Figure 2 This is a schematic diagram of the first intangible cultural heritage knowledge graph. Using the intangible cultural heritage item "Legend of Mulan" from the database as the query node, the knowledge graph generated in Neo4j Browser is as follows.Figure 2 As shown.
[0066] Optionally, the intangible cultural heritage news collection includes multiple published intangible cultural heritage news articles, some of which have been clicked by users and some of which have not. This embodiment of the disclosure does not limit the number or source of intangible cultural heritage news articles in the intangible cultural heritage news collection.
[0067] In step 102, an attention mechanism is used to process the title feature vector and keyword feature vector of each intangible cultural heritage news article to obtain the feature vector of each intangible cultural heritage news article.
[0068] In step 103, based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection, the user preference feature vector corresponding to each candidate news article is determined.
[0069] Among them, the candidate news is the intangible cultural heritage news that the user has not clicked on in the intangible cultural heritage news collection.
[0070] In step 104, intangible cultural heritage news is pushed to users based on the user preference feature vector corresponding to each candidate news.
[0071] In this embodiment, although news headlines are a high-level summary of news content, some intangible cultural heritage news headlines are relatively short and contain less entity and semantic information. If only news headlines are used for recommendation, the effect may not be ideal. Therefore, in this embodiment, the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news set are obtained based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph. This is equivalent to using both the title and the body text of the intangible cultural heritage news article for recommendation, thus enabling accurate recommendation of intangible cultural heritage news based on user preferences.
[0072] Figure 3 A flowchart illustrating a method for recommending intangible cultural heritage news provided in another exemplary embodiment of this disclosure is shown. This method can be executed by a computer device. See [link to relevant documentation]. Figure 3 The method includes:
[0073] In step 301, based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection are obtained.
[0074] For details regarding the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, please refer to step 101 above, which will be omitted here.
[0075] Optionally, the title feature vector of each intangible cultural heritage news article is obtained using the following steps: dg.
[0076] Step d: Convert the title of the first intangible cultural heritage news into multiple first word vectors.
[0077] The first intangible cultural heritage news is any one of several intangible cultural heritage news items.
[0078] Optionally, the title of the first intangible cultural heritage news can be converted into multiple first word vectors based on the word2vec word vector representation method.
[0079] The word2vec word vector representation method includes the following four steps:
[0080] The first step is to select the context window to convert the target text into multiple text blocks.
[0081] The word2vec word vector representation method here is used to convert the title of the first intangible cultural heritage news into multiple first word vectors. Therefore, the target text here is the title text of the first intangible cultural heritage news.
[0082] For example, the context window is a sliding window of length 2C+1, which facilitates capturing the contextual information of a word. Here, the preceding or following window of a word is usually C, 2C is the sum of the sizes of the preceding and following windows of the word, and 1 represents the word itself.
[0083] The second step is to convert multiple text blocks into word vectors, resulting in multiple first context word vectors.
[0084] Alternatively, the second step can be implemented using one-hot encoding. The core idea of one-hot encoding is to map each word to a pre-trained word vector space.
[0085] The third step is to obtain the vector of the hidden layer.
[0086] To implement this third step, multiple context word vectors need to be multiplied by a shared weight matrix to obtain multiple intermediate vectors. These intermediate vectors are then weighted and averaged to obtain the hidden layer vectors.
[0087] The fourth step is to obtain the output vector based on the vector from the hidden layer and then activate the output vector.
[0088] Optionally, the fourth step includes: performing matrix multiplication on the hidden layer vector and the weight matrix to obtain the output vector.
[0089] Alternatively, activation of the output vector can be achieved using the softmax function.
[0090] There are many implementation methods for weight matrices and shared weight matrices in related technologies, so they will not be detailed here.
[0091] Steps one through four above can exist in the form of a word vector model. For this word vector model, gradient descent can be used to optimize the minimum loss function by dynamically adjusting the model parameters (weight matrix). After the word vector model is trained, it can be used to generate word vectors.
[0092] Step e: Based on the BERT-BiLSTM-CRM model, obtain multiple first entities in the title of the first intangible cultural heritage news and the first entity vector of each first entity.
[0093] The BERT-BiLSTM-CRM model can accurately identify Chinese entities, so it is possible to obtain multiple first entities in the first intangible cultural heritage news headline based on the BERT-BiLSTM-CRM model.
[0094] After obtaining multiple first entities, the TransD knowledge embedding technology can be used to transform these multiple first entities into a first entity vector.
[0095] There are many implementation methods for TransD knowledge embedding technology, which will not be detailed here.
[0096] Step f: Based on the first intangible cultural heritage knowledge graph and multiple first entities, obtain multiple first entity context vectors.
[0097] Each first entity context vector corresponds to one first entity.
[0098] After correctly identifying the first entity in the first intangible cultural heritage news title using the BERT-BiLSTM-CRM model, these first entities can be linked to the corresponding nodes in the first intangible cultural heritage knowledge graph through entity linking technology, thereby constructing the connection between the first intangible cultural heritage news title and the first intangible cultural heritage knowledge graph.
[0099] However, considering that using the complete intangible cultural heritage knowledge graph in intangible cultural heritage news recommendations would increase model training costs and introduce a large amount of redundant information, step f is used to selectively utilize the knowledge graph. In this case, step f includes the following two steps:
[0100] The first step is to construct a sub-intangible cultural heritage knowledge graph corresponding to each first entity in the first intangible cultural heritage knowledge graph, thereby obtaining multiple sub-intangible cultural heritage knowledge graphs.
[0101] The sub-intangible cultural heritage knowledge graph corresponding to any first entity is constructed in the following way: taking the first entity as the center, extend one layer outward, that is, consider all other entities and related edges that are 1 away from the first entity. In this way, the sub-intangible cultural heritage knowledge graph corresponding to the first entity can be obtained.
[0102] Entities other than the first entity in the sub-intangible cultural heritage knowledge graph corresponding to the first entity e can be represented by formula (1).
[0103] context(e) = {e i |(e, r, e) i )∈SG or(e i ,r,e)∈SG} (1)
[0104] In formula (1), context(e) refers to entities other than the first entity in the sub-intangible cultural heritage knowledge graph corresponding to the first entity e, and SG refers to the sub-intangible cultural heritage knowledge graph corresponding to the first entity e. i For the entity directly connected to the first entity e, r represents a relationship between the head entity and the tail entity, such as (e, r, e). i This means that the head entity is the first entity e, and the tail entity is e. i The relationship between the head entity and the tail entity is r.
[0105] Figure 4 This is a schematic diagram of a knowledge graph for intangible cultural heritage. For example... Figure 4 As shown, taking the first entity "Huangmei Opera" as the center, extending outwards by one layer, we obtain the sub-intangible cultural heritage knowledge graph corresponding to "Huangmei Opera" in the dotted line. This sub-intangible cultural heritage knowledge graph contains other entities that are 1 distance away from "Huangmei Opera", such as "Traditional Drama", "Huang Xinde", "Han Zaifen", "Extended Project", "Hefei City", "2011", and "National Level".
[0106] The second step is to obtain multiple entity context vectors based on multiple sub-intangible cultural heritage knowledge graphs.
[0107] Among them, the first entity corresponding to the first entity context vector obtained by transforming any sub-intangible cultural heritage knowledge graph is the first entity at the center of that sub-intangible cultural heritage knowledge graph.
[0108] Optionally, the entity context vector representation can be generated by using the entity average value method. The first entity context vector corresponding to the first entity e can be represented by formula (2).
[0109]
[0110] In formula (2), Let e be the context vector of the first entity e. I For entity e i The vector form of entity e. i The vector form can be obtained through TransD knowledge embedding technology, which will not be detailed here. The meanings of the other parameters in formula (2) are the same as those in formula (1), and will not be detailed here.
[0111] Step g involves inputting multiple first word vectors, multiple first entity vectors, and multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news.
[0112] After obtaining multiple first entity vectors from the title of the first intangible cultural heritage news article, as well as multiple first entity context vectors corresponding to these first entity vectors, it is necessary to organically combine the first entity vectors, first entity context vectors, and first word vectors through a concatenation operation.
[0113] Directly concatenating word vectors with entity vectors and entity context vectors not only disrupts their correlation but also fails to express their alignment relationship in the resulting new sequence of word entities. Therefore, word vectors, entity vectors, and entity context vectors must maintain dimensionality consistency during concatenation. While padding can achieve alignment, it is not the optimal choice in practice. To overcome the limitations of traditional vector concatenation, this embodiment employs a knowledge-aware convolutional neural network (KCNN) to perform the concatenation operation, utilizing KCNN's multi-channel approach to address the dimensionality inconsistency issue during concatenation.
[0114] Figure 5 This is a schematic diagram of the KCNN structure. The following section will combine... Figure 5 Step g will be explained.
[0115] Multiple first-word vectors can be represented as a first-word sequence News t = [s1, s2, s3, ... s n ], where s1, s2, s3, ... s n Let each of the following represent the first word vector: 1, 2, 3, ..., nth. Transforming the sequence of first words into matrix form yields the first word vector matrix. Each word All of these may be related to the intangible cultural heritage knowledge graph.
[0116] Multiple first entity vectors can be represented as a sequence of first entity vectors. in These represent the first first entity vector, the second first entity vector, the third first entity vector, ..., the nth first entity vector, respectively.
[0117] Accordingly, multiple first entity context vectors can be represented as a sequence of first entity context vectors. in These represent the first entity context vector corresponding to the first first entity vector, the first entity context vector corresponding to the second first entity vector, the first entity context vector corresponding to the third first entity vector, and so on, corresponding to the nth first entity vector.
[0118] The first entity vector sequence and the first entity context vector sequence can be scaled using formula (3). The scale of the transformed first entity vector sequence and the first entity context vector sequence is the same as the scale of the first word vector matrix.
[0119] g(x)=tanh(Mx+b) (3)
[0120] In formula (3), g(x) is the transformation function, M∈R d×k Let b ∈ R be the transformation matrix. d×1 This is the bias parameter. Since the transformation function is continuous, it can be used to obtain entity vectors and entity context vector projection vectors that preserve the original spatial relationships.
[0121] When using the transformation function to transform the first entity vector sequence and the first entity context vector sequence, simply substitute each item in the first entity vector sequence and the first entity context vector sequence into x in the transformation function.
[0122] The final transformed first entity vector sequence can be represented by formula (4), and the transformed first entity context vector sequence can be represented by formula (5).
[0123]
[0124] In formula (4), g(e t () represents the first entity vector sequence after transformation. The first transformed first entity vector, the second transformed first entity vector, ... the nth transformed first entity vector.
[0125]
[0126] In formula (5), This is the sequence of the first entity context vectors after transformation. The first transformed first entity context vector, the second transformed first entity context vector, ... the nth transformed first entity context vector.
[0127] The transformed first entity vector sequence, the transformed first entity context vector sequence, and the first word vector sequence are aligned and stacked to obtain the title feature matrix, which is then used as the multi-channel input in KCNN. The title feature matrix can be represented by formula (6).
[0128]
[0129] In formula (6), W t The title feature matrix is represented by the other parameters in formula (6), which have the same meaning as the parameters in formula (4), formula (5) and the first word vector sequence in step g. Detailed explanations are omitted here.
[0130] After obtaining multi-channel input, features are extracted from the title feature matrix using multiple convolutional kernels with different window sizes in the convolutional layer. The convolution process can be represented by formula (7).
[0131]
[0132] In formula (7), h is a certain convolution kernel. The convolution result of the title feature submatrix and the convolution kernel h, is a local submatrix of the title feature matrix, f is the activation function, and b is the bias vector.
[0133] After convolving the title feature matrix with the convolution kernel h, multiple convolution results are obtained. At this point, a pooling layer is needed to select important local features from the multiple convolution results. In this embodiment, the pooling layer is a max pooling layer, and the max pooling process can be represented by formula (8).
[0134]
[0135] In formula (8), The local title features after max pooling. The convolution result is obtained by convolving the title feature matrix with the convolution kernel h.
[0136] Furthermore, the convolutional layer contains multiple convolutional kernels, each of which needs to be convolved with the title feature matrix and subjected to max pooling. Therefore, multiple local title features can be obtained. By concatenating these multiple local title features, the title feature vector of the first intangible cultural heritage news can be obtained. The title feature vector can be represented by formula (9).
[0137]
[0138] In formula (9), e(t) is the title feature vector, and n is the number of convolution kernels. The meanings of the other parameters in formula (9) are the same as in formula (8), and are omitted here.
[0139] For all intangible cultural heritage news items in the intangible cultural heritage news collection except for the first intangible cultural heritage news item, the above steps (dg) can be used to extract the title feature vector.
[0140] Although news headlines are a high-level summary of news content, some intangible cultural heritage news headlines are quite brief and contain limited entity and semantic information. Therefore, relying solely on news headlines for recommendations may not be very effective. Thus, in this embodiment, a certain number of keywords representing the semantic information of the entire news article are extracted and integrated with an intangible cultural heritage knowledge graph, serving as an important factor in recommendations.
[0141] Optionally, the keyword feature vector for each intangible cultural heritage news article is obtained using the following steps: hl.
[0142] Step h: Obtain the keyword sequence of the main text of the first intangible cultural heritage news.
[0143] The keyword sequence in the main text includes keywords from multiple first-hand intangible cultural heritage news articles.
[0144] Optionally, step h includes: constructing a dictionary in the field of intangible cultural heritage; using the jieba word segmentation library, which supports custom dictionaries, to segment the main text of the first intangible cultural heritage news article and filter out stop words to obtain multiple main text words; selecting the first m words from the multiple main text words to form a main text keyword sequence.
[0145] The dictionary in the field of intangible cultural heritage includes the names of all projects and the names of the inheritors. Each column of the dictionary represents a word in the field of intangible cultural heritage, and the part of speech is marked after the word.
[0146] Optionally, a stop word list from Harbin Institute of Technology can be used to filter stop words.
[0147] Considering the high cost of fully vectorizing long texts, the value of m should not be too large. For example, it can be close to the number of words in the title of an intangible cultural heritage news article. For example, m is a positive integer, and the value of m is between 7 and 12, such as 7, 10, or 12.
[0148] The sequence of keywords in the body text not only helps to capture the core concepts of the news content, but also makes up for some cases where the headline is too simple or vague.
[0149] Step i: Convert multiple keywords in the keyword sequence of the main text into multiple second word vectors.
[0150] The keyword sequence can be represented as Keys = [s1, s2, s3, ... s m], where s1, s2, s3, ... s m These represent the first keyword, the second keyword, the third keyword, ..., the m-th keyword, respectively.
[0151] Multiple second word vectors can be represented as a word vector matrix. in These represent the word vectors corresponding to the 1st keyword, the 2nd keyword, the 3rd keyword, ..., the mth keyword, respectively.
[0152] Step j: Based on the BERT-BiLSTM-CRM model, obtain multiple second entities in the keyword sequence of the first intangible cultural heritage news article and the second entity vector of each second entity.
[0153] Here, each keyword in the main text is a second entity.
[0154] Step k: Based on the first intangible cultural heritage knowledge graph and multiple second entities, obtain multiple context vectors of the second entities.
[0155] Each second entity context vector corresponds to one second entity.
[0156] Step 1: Input multiple second word vectors, multiple second entity vectors, and multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news.
[0157] The implementation methods for steps i to 1 are similar to those for steps d to g; it is only necessary to convert the news titles in steps d to g into a sequence of keywords in the news text. Detailed explanations are omitted here.
[0158] Similar to step g, step 1 also includes the process of convolution, max pooling, and concatenation of the keyword feature matrix, which ultimately yields the keyword feature vector of the first intangible cultural heritage news. The keyword feature vector can be represented by formula (10).
[0159]
[0160] In formula (10), e(k) is the keyword feature vector, and n is the number of convolution kernels.
[0161] In step 302, an attention mechanism is used to process the title feature vector and keyword feature vector of each intangible cultural heritage news article to obtain the feature vector of each intangible cultural heritage news article.
[0162] Since different news articles contain varying numbers of headlines and keywords, their importance should be differentiated. To effectively integrate news headline and keyword representation vectors, this embodiment employs a news feature-level attention mechanism. This sub-attention mechanism dynamically adjusts weight allocation based on the importance of the input vectors, allowing the model to give different attention to different parts, thereby improving model performance. Therefore, through this attention mechanism, the attention weights of the headline feature vector and the keyword feature vector can be determined separately.
[0163] The attention weight of the title feature vector can be calculated using formula (11), and the attention weight of the keyword feature vector can be calculated using formula (12).
[0164] a t =q t tanh(V t r t +b t (11)
[0165] In formula (11), a t q represents the attention weights of the title feature vector. t V is the query vector for the title feature vector. t and b t For trainable parameters, r t The title feature vector of intangible cultural heritage news is e(t) in formula (9).
[0166] a k =q k tanh(V k r k +b k (12)
[0167] Similar to formula (11), in formula (12), a k q represents the attention weights of the keyword feature vectors. k V is the query vector of the keyword feature vector. k and b k For trainable parameters, r k The keyword feature vector of intangible cultural heritage news is e(k) in formula (10).
[0168] The attention weights of the title feature vector and the keyword feature vector need to be normalized using the softmax function. The normalization process can be represented by formula (13).
[0169]
[0170] Optionally, formula (14) can be used to determine the feature vector of each intangible cultural heritage news article, where a in formula (14) t and a k It is the normalized a in formula (13) t and a k .
[0171] r n =α t r t +α k r k (14)
[0172] In formula (14), r n is the feature vector of intangible cultural heritage news. The meanings of the other parameters in formula (14) are the same as those in formulas (11) and (12), and are omitted here.
[0173] Through steps 301 to 302 above, the feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection can be obtained.
[0174] In step 303, based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection, the user preference feature vector corresponding to each candidate news article is determined.
[0175] The candidate news items are intangible cultural heritage news items that the user has not clicked on from the intangible cultural heritage news collection.
[0176] Let user i have clicked a total of N times. i A news item about intangible cultural heritage, this N i A news report on intangible cultural heritage This N i The feature vector corresponding to each intangible cultural heritage news item is denoted as . The feature vector of the j-th candidate news is denoted as e(t). j ).
[0177] For this N i The feature vectors corresponding to each intangible cultural heritage news item and the feature vector of the j-th candidate news item are all convolved, and the results of the convolution are concatenated into a single feature vector with attention enabled. This attention-enabled feature vector is then input into an attention network layer to obtain a weighted matrix, which is then normalized to obtain the weight matrix.
[0178] The weight matrix can be represented by formula (15).
[0179]
[0180] In formula (15), Here, represents the weight matrix, and softmax is the normalization function. Let e(t) be the feature vector of the k-th intangible cultural heritage news item clicked by user i. j ) is the feature vector of the j-th candidate news.
[0181] Optionally, formula (16) can be used to determine the user preference vector corresponding to the j-th candidate news item:
[0182]
[0183] In formula (15), e(i) is the user preference vector of user i for the j-th candidate news. The meanings of the other parameters in formula (16) are the same as those in formula (15), and are omitted here.
[0184] Thus, through step 303, the user preference vector for each candidate news item can be obtained.
[0185] In step 304, intangible cultural heritage news is pushed to users based on the user preference feature vector corresponding to each candidate news.
[0186] Optionally, step 304 includes the following three steps.
[0187] The first step is to determine the degree of user liking for each candidate intangible cultural heritage news item based on the user preference feature vector corresponding to each candidate news item.
[0188] Here, the degree to which a user likes each candidate intangible cultural heritage news item is represented by the probability of the user clicking on each candidate intangible cultural heritage news item. It is generally believed that the more similar the intangible cultural heritage news item is to the user's preferences, the more likely the user is to choose to read it. Therefore, when calculating the similarity between the feature vector of a candidate news item and the user's preference vector for that candidate news item, since both are represented in vector form, the dot product operation of the vectors is used, and then the reduce sum is calculated to obtain the probability of the user clicking on the candidate news item. This process can be represented by formula (17).
[0189]
[0190] In formula (17), To represent user i's preference for the j-th candidate intangible cultural heritage news item, A higher value indicates that user i is more likely to click on the j-th candidate intangible cultural heritage news item; G(e(i), e(t)) j )) represents performing a dot product operation on the user preference vector of user i for the j-th candidate news and the feature vector of the j-th candidate news, and then summing them to reduce dimensionality.
[0191] Thus, the degree of liking of user i for each candidate news item can be calculated using the above formula (17).
[0192] The second step is to sort each candidate intangible cultural heritage news item from highest to lowest preference to obtain the first preference order.
[0193] The third step is to push the top N candidate news items from the user's preferred order to the user's list of news items.
[0194] Considering the actual business scenarios of online news, after a user refreshes the page, the news website will push news that the user is interested in based on the user's browsing history or click behavior. In this embodiment, the first N candidate news items are the news pushed to the user by the news website after the user refreshes the page.
[0195] Steps 301 to 304 above can be used to establish an intangible cultural heritage news recommendation model. Figure 6 This is a schematic diagram of the intangible cultural heritage news recommendation model. The structure of this model is consistent with steps 301 to 304 above, and detailed descriptions are omitted here.
[0196] The following are device embodiments of this application. For details not described in detail in the device embodiments, please refer to the above method embodiments.
[0197] Figure 7 This illustration shows a schematic diagram of the structure of an intangible cultural heritage news recommendation device provided in an exemplary embodiment of this disclosure. See also: Figure 7 The intangible cultural heritage news recommendation device 700 includes: an acquisition module 701, an attention mechanism module 702, a user preference feature vector calculation module 703, and a push module 704.
[0198] The acquisition module 701 is used to acquire the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph. The keyword feature vector is extracted from the main text of the intangible cultural heritage news article.
[0199] The attention mechanism module 702 is used to process the title feature vector and keyword feature vector of each intangible cultural heritage news article using the attention mechanism to obtain the feature vector of each intangible cultural heritage news article.
[0200] The user preference feature vector calculation module 703 is used to determine the user preference feature vector corresponding to each candidate news based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection. The candidate news articles are intangible cultural heritage news articles that the user has not clicked on in the intangible cultural heritage news collection.
[0201] The push module 704 is used to push intangible cultural heritage news to users based on the user preference feature vector corresponding to each candidate news.
[0202] Optionally, the acquisition module 701 is further configured to convert the title of the first intangible cultural heritage news into multiple first word vectors, wherein the first intangible cultural heritage news is any one of the multiple intangible cultural heritage news; based on the BERT-BiLSTM-CRM model, acquire multiple first entities in the title of the first intangible cultural heritage news and the first entity vector of each first entity; based on the first intangible cultural heritage knowledge graph and the multiple first entities, acquire multiple first entity context vectors, wherein each first entity context vector corresponds to a first entity; input the multiple first word vectors, the multiple first entity vectors, and the multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news.
[0203] Optionally, the acquisition module 701 is further configured to acquire the keyword sequence of the main text of the first intangible cultural heritage news, which includes multiple keywords in the main text of the first intangible cultural heritage news; convert the multiple keywords in the main text keyword sequence into multiple second word vectors; based on the BERT-BiLSTM-CRM model, acquire multiple second entities in the main text keyword sequence of the first intangible cultural heritage news and the second entity vector of each second entity; based on the first intangible cultural heritage knowledge graph and the multiple second entities, acquire multiple second entity context vectors, each second entity context vector corresponding to a second entity; input the multiple second word vectors, multiple second entity vectors and multiple entity context vectors into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news.
[0204] Optionally, the attention mechanism module 702 is also used to determine the feature vector of each intangible cultural heritage news article using the following formula:
[0205] r n =α t r t +α k r k
[0206] Where, r n Let r be the feature vector of intangible cultural heritage news. t Let r be the feature vector of the title of intangible cultural heritage news. k Let a be the keyword feature vector of intangible cultural heritage news. t For the attention weights of the title feature vector, a k The attention weights are the keyword feature vectors.
[0207] Optionally, the user preference feature vector calculation module 703 is also used to determine the user preference vector corresponding to the j-th candidate news item using the following formula:
[0208]
[0209] Where e(i) is the user preference vector of user i for the j-th candidate news item. Let N be the feature vector of the k-th intangible cultural heritage news item clicked by user i. User i has clicked a total of N. i A news item about intangible cultural heritage. Let t be the weight matrix. j For the j-th candidate news;
[0210] The weight matrix is calculated using the following formula:
[0211]
[0212] in, Let be the weight matrix, and softmax be the normalization function, e(t) j ) is the feature vector of the j-th candidate news.
[0213] Optionally, the push module 704 is also used to determine the user's liking for each candidate intangible cultural heritage news based on the user preference feature vector corresponding to each candidate news; sort the candidate intangible cultural heritage news in descending order of liking to obtain the first liking order; and push the first N candidate news in the first liking order to the user.
[0214] It should be noted that the above embodiments of the intangible cultural heritage news push device are only illustrated by the division of the above functional modules. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the intangible cultural heritage news push device and the intangible cultural heritage news push method embodiments provided above belong to the same concept, and the specific implementation process can be found in the method embodiments, which will not be repeated here.
[0215] The module division in this embodiment is illustrative and represents only one logical functional division. In actual implementation, other division methods are possible. Furthermore, the functional modules in the various embodiments of this disclosure can be integrated into a single processor, exist as separate physical entities, or be integrated into a single module. The integrated modules described above can be implemented in hardware or as software functional modules.
[0216] If the integrated module is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this disclosure, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a terminal device (which may be a personal computer, mobile phone, or communication device, etc.) or processor to execute all or part of the steps of the methods of the various embodiments of this disclosure. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0217] Figure 8 This is a schematic diagram of the structure of a computer device provided in an embodiment of this disclosure. For example... Figure 8 As shown, the computer device 800 includes a processor 801 and a memory 802.
[0218] Processor 801 may include one or more processing cores, such as a quad-core processor or an octa-core processor. Processor 801 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). Processor 801 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, processor 801 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, processor 801 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.
[0219] The memory 802 may include one or more computer-readable storage media, which may be non-transitory. The memory 802 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In some embodiments, the non-transitory computer-readable storage media in the memory 802 is used to store at least one instruction, which is executed by the processor 801 to implement the intangible cultural heritage news push method provided in this disclosure embodiment.
[0220] Those skilled in the art will understand that Figure 8 The structure shown does not constitute a limitation on the computer device 800, and may include more or fewer components than shown, or combine certain components, or use different component arrangements.
[0221] This disclosure also provides a non-transitory computer-readable storage medium, which, when the instructions in the storage medium are executed by the processor of a computer device, enables the computer device to execute the intangible cultural heritage news push method provided in this disclosure.
[0222] This disclosure also provides a computer program product, including a computer program / instruction, which, when executed by a processor, implements the intangible cultural heritage news push method provided in this disclosure.
[0223] The above description is merely an optional embodiment of this disclosure and is not intended to limit this disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the protection scope of this disclosure.
Claims
1. A method for recommending news about intangible cultural heritage, characterized in that, The method includes: Based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection are obtained. The keyword feature vector is extracted from the main text of the intangible cultural heritage news article. An attention mechanism is used to process the title feature vector and the keyword feature vector of each intangible cultural heritage news article to obtain the feature vector of each intangible cultural heritage news article. Based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection, the user preference feature vector corresponding to each candidate news article is determined. The candidate news articles are intangible cultural heritage news articles in the intangible cultural heritage news collection that the user has not clicked on. Based on the user preference feature vector corresponding to each candidate news, push intangible cultural heritage news to the user; The method, based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph, obtains the title feature vector and keyword feature vector for each intangible cultural heritage news article in the intangible cultural heritage news collection, including: The title of the first intangible cultural heritage news is transformed into multiple first word vectors, where the first intangible cultural heritage news is any one of the multiple intangible cultural heritage news; Based on the BERT-BiLSTM-CRM model, multiple first entities and the first entity vector of each first entity are obtained from the title of the first intangible cultural heritage news. Based on the first intangible cultural heritage knowledge graph and the multiple first entities, multiple first entity context vectors are obtained, and each first entity context vector corresponds to one first entity. Multiple first word vectors, multiple first entity vectors, and multiple entity context vectors are input into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news. Obtain the keyword sequence of the main text of the first intangible cultural heritage news, wherein the keyword sequence includes multiple keywords from the main text of the first intangible cultural heritage news; Convert multiple keywords in the main text keyword sequence into multiple second word vectors; Based on the BERT-BiLSTM-CRM model, multiple second entities and the second entity vector of each second entity are obtained from the keyword sequence of the main text of the first intangible cultural heritage news. Based on the first intangible cultural heritage knowledge graph and the multiple second entities, multiple second entity context vectors are obtained, and each second entity context vector corresponds to a second entity. Multiple second word vectors, multiple second entity vectors, and multiple entity context vectors are input into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news.
2. The method according to claim 1, characterized in that, The attention mechanism is used to process the title feature vector and keyword feature vector of each intangible cultural heritage news article to obtain the feature vector of each intangible cultural heritage news article, including: The feature vector for each intangible cultural heritage news article is determined using the following formula: in, This is the feature vector of intangible cultural heritage news. This refers to the title feature vector of the intangible cultural heritage news. This refers to the keyword feature vector of the intangible cultural heritage news. The attention weights are those for the title feature vector. The attention weights are the feature vectors of the keywords.
3. The method according to claim 1, characterized in that, The step of determining the user preference feature vector corresponding to each candidate news item based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection includes: The following formula is used to determine the first... User preference vectors corresponding to each candidate news item: in, For users For the first User preference vectors for candidate news items For the user Clicked the The feature vector of intangible cultural heritage news, the user Total clicks A news item about intangible cultural heritage. This is the weight matrix. For the first One candidate news item; The weight matrix is calculated using the following formula: in, The weight matrix is... For normalization function, For the first The feature vectors of the candidate news items.
4. The method according to claim 1, characterized in that, The step of pushing intangible cultural heritage news to the user based on the user preference feature vector corresponding to each candidate news item includes: Based on the user preference feature vector corresponding to each candidate news item, determine the degree of the user's liking for each candidate intangible cultural heritage news item; The candidate intangible cultural heritage news items are sorted from most to least liked to obtain the first liking order; The top N candidate news items in the first preference order are pushed to the user.
5. A device for recommending news about intangible cultural heritage, characterized in that, The device includes: The acquisition module is used to acquire the title feature vector and keyword feature vector of each intangible cultural heritage news article in the intangible cultural heritage news collection based on the BERT-BiLSTM-CRM model and the first intangible cultural heritage knowledge graph. The keyword feature vector is extracted from the main text of the intangible cultural heritage news article. The attention mechanism module is used to process the title feature vector and the keyword feature vector of each intangible cultural heritage news article using an attention mechanism to obtain the feature vector of each intangible cultural heritage news article. The user preference feature vector calculation module is used to determine the user preference feature vector corresponding to each candidate news based on the feature vectors of multiple intangible cultural heritage news articles in the intangible cultural heritage news collection. The candidate news is intangible cultural heritage news that the user has not clicked on in the intangible cultural heritage news collection. The push module is used to push intangible cultural heritage news to the user based on the user preference feature vector corresponding to each candidate news item; The acquisition module is further configured to convert the title of the first intangible cultural heritage news into multiple first word vectors, wherein the first intangible cultural heritage news is any one of the multiple intangible cultural heritage news; Based on the BERT-BiLSTM-CRM model, multiple first entities and the first entity vector of each first entity are obtained from the title of the first intangible cultural heritage news. Based on the first intangible cultural heritage knowledge graph and the multiple first entities, multiple first entity context vectors are obtained, and each first entity context vector corresponds to one first entity. Multiple first word vectors, multiple first entity vectors, and multiple entity context vectors are input into a knowledge-aware convolutional neural network KCNN to obtain the title feature vector of the first intangible cultural heritage news. Obtain the keyword sequence of the main text of the first intangible cultural heritage news, wherein the keyword sequence includes multiple keywords from the main text of the first intangible cultural heritage news; Convert multiple keywords in the main text keyword sequence into multiple second word vectors; Based on the BERT-BiLSTM-CRM model, multiple second entities and the second entity vector of each second entity are obtained from the keyword sequence of the main text of the first intangible cultural heritage news. Based on the first intangible cultural heritage knowledge graph and the multiple second entities, multiple second entity context vectors are obtained, and each second entity context vector corresponds to a second entity. Multiple second word vectors, multiple second entity vectors, and multiple entity context vectors are input into a knowledge-aware convolutional neural network KCNN to obtain the keyword feature vector of the first intangible cultural heritage news.
6. A computer device, characterized in that, The computer device includes a memory and a processor, wherein the memory stores at least one computer program, which is loaded and executed by the processor to implement the method according to any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one computer program, which is loaded and executed by a processor to implement the method of any one of claims 1 to 4.
8. A computer program product comprising a computer program / instructions, characterized in that, When the computer program / instructions are executed by the processor, they implement the method described in any one of claims 1 to 4.