Cross-document innovation mining method and device based on reasoning type creative chain, equipment and medium

By establishing a local knowledge graph and a multi-step logical reasoning algorithm, the set of innovative connection points is determined, which solves the problem of lack of innovation discovery in large language models and realizes the automatic mining of potential innovative ideas and research directions.

CN122240810APending Publication Date: 2026-06-19SHANGHAI HAIYAN XINZHI TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHANGHAI HAIYAN XINZHI TECHNOLOGY CO LTD
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing large language models lack structured, evaluable, and proactive innovation discovery mechanisms, and are unable to provide users with reasonable directions for innovation.

Method used

By establishing a local knowledge graph, candidate innovative concept pairs are identified, and a multi-step logical reasoning algorithm is used to construct logical links. A large language model is used for flipping processing to determine the set of innovative connection points, and target innovative points are selected based on the innovation potential score.

🎯Benefits of technology

It enables the automatic extraction of potential innovative ideas and research directions from unstructured documents, and provides structured document mining results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240810A_ABST
    Figure CN122240810A_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, device, and medium for cross-document innovation mining based on reasoning-based creative chains. The method includes: acquiring a set of documents to be processed selected by a target user; classifying the set of documents according to a classification dimension; establishing a local knowledge graph based on the classified set of documents; determining candidate innovative concept pairs based on the local knowledge graph; processing the candidate innovative concept pairs using a spark-flipping framework to determine a set of innovative connection points; determining an innovation potential score corresponding to each innovative connection point in the set of innovative connection points; selecting target innovative points from the set of innovative connection points based on the innovation potential scores; generating document mining results corresponding to the set of documents to be processed based on the target innovative points; and feeding back the document mining results to the target user. By determining target innovative points through a local knowledge graph and feeding them back to the user, the method achieves automatic mining of potential innovative ideas and research directions from unstructured documents.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence, and in particular to a method, apparatus, device, and medium for cross-document innovation mining based on reasoning-based creative chains. Background Technology

[0002] With the continuous development of computer technology, users can use large language models to query the information they need, and then summarize and generalize the information obtained to arrive at the final direction of innovation.

[0003] However, existing information retrieval methods for large models summarize and answer existing knowledge based on user questions, but the answers rely on existing patterns in the training data and lack a structured, evaluable, and proactive innovation discovery mechanism, thus failing to provide users with reasonable innovation directions. Summary of the Invention

[0004] This invention provides a method, apparatus, device, and medium for cross-document innovation mining based on reasoning-based creative chains. By establishing a local knowledge graph corresponding to the document to be processed, further selecting innovation connection points based on the local knowledge graph, determining the target innovation point based on the innovation connection point, and feeding the target innovation point back to the user, it realizes the automatic mining of potential innovative ideas and research directions from unstructured documents.

[0005] According to one aspect of the present invention, a cross-document innovation mining method based on reasoning-based creative chains is provided, comprising:

[0006] Obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimension;

[0007] A local knowledge graph is built based on the classified document set to be processed, and candidate innovative concept pairs are determined according to the local knowledge graph, wherein the candidate innovative concepts are two nodes that are not directly connected in the local knowledge graph;

[0008] A multi-step logical reasoning algorithm is used to determine the logical link corresponding to the candidate innovative concept pair, and the core hypothesis corresponding to the candidate innovative concept pair is determined based on the logical link.

[0009] Based on the flip problem template and the core assumptions, flip prompt words are determined, and the innovative connection point set is determined by performing flip processing on the large language model according to the flip prompt words.

[0010] Determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and select target innovation points from the innovation connection point set based on the innovation potential score;

[0011] Based on the target innovation points, document mining results corresponding to the document set to be processed are generated, and the document mining results are fed back to the target user.

[0012] According to another aspect of the present invention, a cross-document innovation mining device based on reasoning-based creative chains is provided, comprising:

[0013] The document classification module is used to obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimensions.

[0014] The candidate innovation concept pair determination module is used to build a local knowledge graph based on the classified document set to be processed, and determine candidate innovation concept pairs according to the local knowledge graph, wherein the candidate innovation concepts are two nodes that are not directly connected in the local knowledge graph;

[0015] The Spark Inference module is used to apply a multi-step logical reasoning algorithm to determine the logical link corresponding to the candidate innovative concept pair, and to determine the core hypothesis corresponding to the candidate innovative concept pair based on the logical link.

[0016] The flipping module is used to determine flipping prompt words based on the flipping question template and the core assumptions, and to determine the innovative connection point set by performing flipping processing on the large language model according to the flipping prompt words.

[0017] The target innovation point determination module is used to determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and to filter target innovation points from the innovation connection point set based on the innovation potential score.

[0018] The document mining results feedback module is used to generate document mining results corresponding to the document set to be processed based on the target innovation points, and to feed back the document mining results to the target user.

[0019] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising:

[0020] At least one processor; and

[0021] A memory communicatively connected to the at least one processor; wherein,

[0022] The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to execute the cross-document innovation mining method based on reasoning-based creative chains as described in any embodiment of the present invention.

[0023] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the cross-document innovation mining method based on reasoning-based creative chains as described in any embodiment of the present invention.

[0024] The technical solution of this invention involves obtaining a set of documents to be processed selected by a target user, classifying the set of documents according to a classification dimension, establishing a local knowledge graph based on the classified set of documents, determining candidate innovative concept pairs based on the local knowledge graph, using a multi-step logical reasoning algorithm to determine the logical links corresponding to the candidate innovative concept pairs, and determining the core hypotheses corresponding to the candidate innovative concept pairs based on the logical links; determining flipping prompts based on flipping question templates and the core hypotheses, and using the flipping prompts to instruct a large language model to perform flipping processing to determine the set of innovative connection points, determining the innovation potential score corresponding to each innovative connection point in the set of innovative connection points, filtering target innovative points from the set of innovative connection points based on the innovation potential scores, and finally generating document mining results corresponding to the set of documents to be processed based on the target innovative points, and feeding the document mining results back to the target user. Based on the above technical solution, by establishing a local knowledge graph corresponding to the document to be processed, further selecting innovative connection points based on the local knowledge graph, determining the target innovative point based on the innovative connection point, and feeding the target innovative point back to the user, it is possible to automatically mine potential innovative ideas and research directions from unstructured documents.

[0025] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0026] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0027] Figure 1 This is a flowchart illustrating a cross-document innovation mining method based on inference-based creative chains provided in an embodiment of the present invention.

[0028] Figure 2 This is a structural block diagram of a cross-document innovation mining device based on a reasoning-based creative chain provided in an embodiment of the present invention;

[0029] Figure 3 This is a schematic diagram of the structure of the electronic device provided in an embodiment of the present invention. Detailed Implementation

[0030] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0031] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0032] It should be noted that the acquisition, storage, use, and processing of data in the technical solution of this application all comply with the relevant provisions of national laws and regulations.

[0033] Example 1

[0034] Figure 1 This is a flowchart illustrating a cross-document innovation mining method based on inference-based creative chains, provided by an embodiment of the present invention. This embodiment is applicable to situations where a user-selected set of documents to be processed is automatically mined to obtain document mining results corresponding to the set. This method can be executed by a cross-document innovation mining device based on inference-based creative chains. This device can be implemented in hardware and / or software and can be configured in an electronic device, such as a server or terminal device. Figure 1 As shown, the method includes:

[0035] S110. Obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimension.

[0036] The target users are those who require creative idea mining. The document set to be processed can be understood as a collection of documents consisting of at least two documents uploaded by the target user through the front-end interactive interface. The classification dimensions can be pre-selected logical classification dimensions, including basic theories / problems, core technologies / methods, application scenarios / impacts, etc. It should be noted that the technical solution of this embodiment can be deployed in an intelligent agent; that is, during the process of the user processing documents by invoking the intelligent agent, the technical solution of this embodiment can be used to automatically mine potential innovative ideas and research directions from the massive, scattered unstructured documents uploaded by the user, such as academic papers, research reports, and market analyses.

[0037] Specifically, the process involves acquiring the document set selected by the target user and classifying it according to classification dimensions. For example, a document upload portal can be provided to users through a front-end interface, supporting various file formats such as PDF, DOCX, and TXT. Users can then batch select and upload documents through this portal, constructing a document set to be processed. Based on preset classification dimensions, such as document type, subject area, urgency, and language, a multimodal feature extraction algorithm is used to deeply analyze the document content, extracting textual, structural, and semantic features. Based on the extracted feature vectors, the document set is automatically classified into different categories. For instance, multiple tags for each article are converted into high-dimensional feature vectors using word embedding models (such as Word2Vec or BERT). An implicit three-classification algorithm is used: three or more logical classification prototypes are preset, such as "basic theory / problems," "core technologies / methods," and "application scenarios / impacts." The semantic similarity between the tag vector of each article and each classification prototype is calculated. Multi-category classification decision: Set a similarity threshold (e.g., 0.7). If an article has a similarity to multiple class prototypes that exceeds this threshold, it can be classified into the corresponding category at the same time.

[0038] Based on the above technical solution, classifying the document set to be processed according to the classification dimension includes: determining the document tag corresponding to each document to be processed in the document set to be processed, and using a word embedding model to determine the tag feature vector of the document tag; determining the semantic similarity between the tag feature vector and the classification dimension, and classifying the document to be processed into the corresponding classification dimension according to the semantic similarity.

[0039] Document tags can be feature labels corresponding to the documents to be processed. These feature labels can be obtained manually or automatically through a preset labeling algorithm. It should be noted that one document can correspond to one or more document tags. The word embedding model is used to transform document tags into feature vectors. Semantic similarity can be a value used to measure the similarity between document tags and the classification dimension.

[0040] Specifically, document tags corresponding to each document in the document set to be processed are determined, and a word embedding model is used to determine the tag feature vector of the document tag. Then, the semantic similarity between the tag feature vector and the classification dimension is determined, and the document to be processed is classified into the corresponding classification dimension according to the semantic similarity.

[0041] For example, natural language processing techniques can be used to extract keywords and perform semantic analysis on document titles, abstracts, or full-text content, automatically matching the most suitable tags for each document. Then, word embedding models, such as Word2Vec, GloVe, or BERT, can be used to transform document tags into high-dimensional feature vectors, i.e., tag feature vectors. The semantic similarity between the tag feature vectors and the classification dimensions can then be calculated, and the degree of matching can be evaluated using metrics such as cosine similarity. Preset classification dimensions can include basic theories / problems, core technologies / methods, application scenarios / impacts, etc. Finally, based on semantic similarity thresholds or ranking results, the documents to be processed are automatically assigned to the classification dimension with the highest similarity.

[0042] Based on the above technical solution, the step of classifying the document to be processed into the corresponding classification dimension according to the semantic similarity includes: obtaining the similarity threshold corresponding to each classification dimension; classifying the current document to be processed into the current classification dimension when the semantic similarity is greater than the similarity threshold of the current classification dimension; classifying the current document into the at least two classification dimensions when the semantic similarity is greater than the similarity threshold of at least two classification dimensions, and marking the current document as a cross-knowledge point.

[0043] The similarity threshold can be a pre-set threshold for similarity corresponding to different classification dimensions, used to classify documents.

[0044] Specifically, if the semantic similarity is greater than the similarity threshold of the current classification dimension, the current document to be processed is classified into the current classification dimension; if the semantic similarity is greater than the similarity thresholds of at least two classification dimensions, the current document to be processed is classified into the at least two classification dimensions, and the current document to be processed is labeled as a cross-domain knowledge point. For example, a pre-trained semantic model (such as BERT, Sentence-BERT) is used to extract the vector representation of the document to be processed, and its semantic similarity with the representative documents of each classification dimension is calculated. For each classification dimension, if the semantic similarity of the current document exceeds the preset threshold of that dimension, it is classified into that dimension; if the similarity of the document with at least two classification dimensions exceeds the corresponding threshold, a multi-label classification strategy is adopted to classify the document into these dimensions simultaneously and label it as a "cross-domain knowledge point" to identify its cross-domain characteristics.

[0045] The technical solution in this embodiment improves the accuracy of classification and the completeness of knowledge coverage through dynamic threshold matching and multi-label annotation mechanism, and is especially suitable for document management scenarios under complex knowledge systems.

[0046] S120. Establish a local knowledge graph based on the classified document set to be processed, and determine candidate innovative concept pairs based on the local knowledge graph.

[0047] The local knowledge graph can be a knowledge representation structure used to represent the knowledge domain corresponding to the document set to be processed. Candidate innovative concepts are two nodes in the local knowledge graph that are not directly connected. The set of innovative connection points can be a collection of at least two innovative connection points; innovative connection points can be understood as entirely new connection points obtained through screening.

[0048] Specifically, a local knowledge graph is built based on the classified document set to be processed, and candidate innovative concept pairs are determined according to the local knowledge graph. For example, for documents under each classification dimension, entity recognition and relation extraction techniques can be used to extract key entities from the documents. Key entities can be such as technical terms, product names, etc., as well as the relationships between entities, to construct a local knowledge graph based on the classification. In the graph, nodes represent entities, and edges represent relationships between entities. Then, based on the structural features and semantic information of the local knowledge graph, by analyzing graph features such as node degree distribution and path length, combined with entity semantic similarity, concept pairs with potential innovative associations are identified. The candidate innovative concept pairs are evaluated, and a set of innovative connection points is selected based on indicators such as complementarity and technical feasibility between concepts. It should be noted that innovative connection points represent key positions where innovative fusion may occur between different entities or concepts. For example, entity recognition and relation extraction are performed on the classified articles to construct a local knowledge graph with core concepts as nodes and relationships as edges. Algorithms such as PageRank are used to calculate the importance of nodes, and concept pairs (Concept A, Concept B) without direct connections but with potential indirect paths are searched in the knowledge graph.

[0049] Based on the above technical solution, the step of establishing a local knowledge graph based on the classified document set to be processed includes: performing entity recognition on the classified document set to be processed to determine at least two core concepts corresponding to the classified document set to be processed; determining the relationship between the at least two core concepts and determining the importance of the core concepts; and constructing the local knowledge graph based on the at least two core concepts, the relationship between the at least two core concepts, and the importance of the core concepts.

[0050] In this local knowledge graph, nodes are core concepts; edges represent the relationships between at least two core concepts. Importance can be data used to measure the importance of nodes, or it can be calculated using the PageRank algorithm to determine the importance of each node within the current knowledge domain.

[0051] Core concepts refer to technical terms, key entities, or theoretical models with high information density and structural importance within a specific knowledge domain of the input document set, such as a disease in the medical field: hypertension. Entity recognition can leverage the capabilities of large models to extract all "candidate concepts" from the documents, including but not limited to technical terms, product names, and theoretical terms, and use the large model framework to identify and extract relationships between entities.

[0052] Specifically, entity recognition is performed on the categorized document set to determine at least two core concepts corresponding to the categorized document set; the relationship between the at least two core concepts is determined, and the importance of the core concepts is determined; based on the at least two core concepts, the relationship between the at least two core concepts, and the importance of the core concepts, a local knowledge graph is constructed. For example, each document in the document set is parsed, and at least two core concepts closely related to the classification topic are extracted from the text, such as specific technical terms, key business elements, etc. A relation extraction algorithm is used to analyze the document content to determine the associations between the core concepts, including causal relationships, compositional relationships, and synergistic relationships, etc., and the importance of each core concept is comprehensively evaluated by combining the frequency of occurrence in the text, the representativeness in the classification, and the weight rules preset by experts. Based on the core concepts, the relationships between concepts, and the importance information obtained above, a local knowledge graph is constructed with core concepts as nodes, relationships as edges, and importance as node attributes. The technical solution of this embodiment of the invention not only clearly presents the knowledge associations in the categorized documents, but also highlights key information through importance weights, providing solid data support and a foundation for visualization analysis for subsequent applications such as knowledge reasoning and innovation discovery.

[0053] Based on the above technical solution, the step of determining candidate innovative concept pairs according to the local knowledge graph includes: determining initial innovative concept pairs from the local knowledge graph according to a maximum distance threshold, and determining the confidence level of the path connection relationship corresponding to the initial innovative concept pairs; if the confidence level of the path connection relationship corresponding to the initial innovative concept pairs is greater than a preset confidence threshold, and the initial innovative concepts do not match the exclusion list, the initial innovative concept pairs are taken as candidate innovative concept pairs.

[0054] The maximum distance threshold can be a pre-set maximum connection distance threshold, and can be 4 steps. Initial innovative concept pairs can connect to concept pairs whose distance is less than the maximum connection distance threshold. Confidence reflects the reliability of the association between concept pairs. The exclusion list can be understood as a pre-set list of words used for filtering; this exclusion list can consist of words with no real meaning, such as "research," "problem," and "impact."

[0055] Specifically, based on a preset maximum distance threshold, a traversal analysis of the local knowledge graph is performed. It should be noted that the maximum distance threshold limits the maximum interval between innovative concept pairs in the graph, preventing weak concept associations due to excessive distance. Using graph algorithms, such as breadth-first search or shortest path algorithms, initial innovative concept pairs that meet the maximum distance threshold are selected from the graph. For each initial innovative concept pair, the confidence score of its path connection is calculated. The confidence score can be obtained through weighted calculation based on factors such as path length, number of intermediate nodes, edge weights (e.g., relationship strength), and the co-occurrence frequency of the concept in the document. Concept pairs with a path connection confidence score greater than the preset confidence threshold are retained to ensure robust associations. Furthermore, the initial innovative concept pairs are checked to see if they match a pre-defined exclusion list, excluding non-innovative or known associations. Initial innovative concept pairs that meet the above conditions are identified as candidate innovative concept pairs.

[0056] For example, pre-screening only considers short paths, focusing on concept pairs that can be connected in a few steps, such as a maximum of four. Connections that are too far apart are generally meaningless and are discarded. It's crucial to note that every connection on the path must be a "strong connection," meaning that the confidence level between the relationships is high during relation extraction. The path cannot contain overly broad, meaningless "universal terms," ​​such as "research," "problem," or "impact." These terms are filtered out using a pre-defined exclusion list. Furthermore, the starting and ending points of the path must be the identified core concepts; only paths connecting two important concepts are potentially meaningful.

[0057] S130. A multi-step logical reasoning algorithm is used to determine the logical link corresponding to the candidate innovative concept pair, and the core hypothesis corresponding to the candidate innovative concept pair is determined based on the logical link.

[0058] S140. Based on the flip problem template and the core assumption, determine the flip prompt words, and determine the innovative connection point set by performing flip processing on the large language model according to the flip prompt words.

[0059] Multi-step reasoning can be used to construct logical link reasoning algorithms. For example, using a multi-step logical reasoning algorithm, a logical link from A to B can be constructed, such as A→C→D→B. A logical link can be understood as a connecting path used to describe the logical relationship between two nodes in an innovative concept pair. The core assumption is the most frequently mentioned and most generally accepted relationship among all input documents. For example, if 90% of the documents say "A leads to B," then "A leads to B" is a core assumption of this analysis. The flip question template can be understood as a pre-set prompt question used for flipping the concept.

[0060] Specifically, for candidate innovative concept pairs, multi-step reasoning techniques are used to construct logical links. Directly related paths between concept pairs are extracted from a local knowledge graph, and implicit reasoning steps are supplemented using a domain knowledge base to form a complete logical chain. For example, if concept A is "new energy battery" and concept B is "smart grid," the logical link can be deduced as "battery energy storage characteristics → grid peak-shaving demand → bidirectional charging and discharging technology → smart grid collaboration." Core hypotheses are extracted based on these logical links. Key turning points or unverified links in the chain are analyzed and transformed into hypothetical statements, such as "bidirectional charging and discharging technology can improve the stability of smart grids." A flipped question template is designed, embedding the core hypotheses into the template to generate prompts. These prompts are then input into a large language model (such as GPT-4), requiring it to deduce potential innovative directions based on the hypotheses.

[0061] It should be noted that the candidate innovative concept pairs are processed using the Spark Flipping Framework to determine the set of innovative connection points. The Spark Flipping Framework consists of two parts: Spark Inference and Flipping Processing. Spark Inference constructs logical links using a multi-step logical reasoning algorithm, while Flipping Processing determines innovative connection points by flipping the core concepts using pre-set flipping questions.

[0062] For example, the Spark-Flip Framework: For candidate concept pairs, a logical link from A to B (A→C→D→B) is constructed by applying a multi-step logical reasoning algorithm; Flip: A core assumption or traditional cognition in the link is reversed, such as "What if C does not exist?" or cross-domain integration, such as "Can the problem of C be solved using the method of domain X?", thereby generating a brand-new, non-obvious innovative connection point.

[0063] It's important to explain the specific processing flow of the "flipping" process: After identifying the core assumptions, three standardized "preliminary questions" are generated, and the large language model searches for answers based on existing documents. These questions include: Finding counterexamples: directly asking, "In these documents, are there any texts or data that challenge or provide exceptions to the statement 'A causes B'?" Finding alternatives: directly asking, "Besides A, what other reasons do these documents mention that could lead to B?" Finding external inspiration: directly asking, "'A causes B' is a causal relationship. In other fields described in these documents, such as economics and physics, are there similar but completely different causal relationships that can be learned from?"

[0064] S150. Determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and select target innovation points from the innovation connection point set based on the innovation potential score.

[0065] The innovation potential score can be a rating used to evaluate the innovation value of innovation connectors. The target innovation point can be an innovation point that meets preset screening criteria.

[0066] Specifically, an innovation potential score is determined for each innovation connection point in the innovation connection point set. Target innovation points are then selected from the innovation connection point set based on these scores. For example, the innovation potential of each innovation connection point is quantified across multiple dimensions. These dimensions may include: Knowledge association strength: measured by the weights and path diversity of edges in the local knowledge graph; higher weights and richer paths indicate a stronger pivotal role for the connection point in the knowledge network; Conceptual novelty: using a text similarity algorithm, the concepts involved in the connection point are compared with existing knowledge bases; lower similarity indicates higher novelty; Application scenario breadth: based on domain ontology analysis, its expandable domains are analyzed; the more domains covered, the greater the application potential; Technological maturity: combining patent literature and industry reports to assess the development stage; connection points in the early stages but with positive technological trends have greater breakthrough potential; and Market demand: through text mining analysis of user feedback, industry reports, and market trend data, connection points with high user attention and unmet market demand have greater value. The system employs the analytic hierarchy process (AHP) to assign weights to each dimension. A comprehensive score is determined based on these weights and the corresponding scores for each dimension. This comprehensive score serves as the innovation potential score. Target innovation points are then selected from the set of innovation connection points based on preset scoring thresholds or ranking rules. For example, the top 20% of connection points in the score can be designated as target innovation points, or connection points with scores exceeding a specific threshold can be selected. For instance, the Innovation Potential Score (IP) quantifies each generated innovation point as follows: IP = α⋅Novelty + β⋅Feasibility + γ⋅Impact. Novelty is calculated through semantic distance between concepts, feasibility is assessed through the strength of association with existing technologies, and impact predicts its potential academic or commercial value.

[0067] Based on the above technical solution, the step of selecting target innovation points from the set of innovation connection points according to the innovation potential score includes: sorting each innovation connection point in the set of innovation connection points according to the innovation potential score to obtain a sorting result; determining a set of innovation points to be applied from the set of innovation connection points according to the sorting result, and determining target innovation points from the set of innovation points to be applied according to the innovation potential score.

[0068] The sorting result can be a sorted list obtained by arranging the new set of connection points in descending order.

[0069] Specifically, based on the calculated innovation potential score, all elements in the innovation connection point set are sorted in descending order. The sorting results are presented in list form, with each element containing a connection point identifier, an innovation potential score, and a corresponding core hypothesis description. According to a preset screening ratio, such as the top 30%, or a scoring threshold, such as a score ≥ 85, a set of innovation points to be applied is extracted from the sorting results. This set undergoes further secondary screening. For example, connection points with high relevance to the current R&D direction and excellent resource input-output ratio are prioritized, or the final target innovation points are determined through multiple rounds of expert review.

[0070] Based on the above technical solution, the step of determining the target innovation point from the set of innovation points to be applied according to the innovation potential score includes: determining the individual evaluation score corresponding to each innovation evaluation dimension of the set of innovation points to be applied; and determining the target innovation point from the set of innovation points to be applied according to the individual evaluation score and preset screening rules.

[0071] The innovation evaluation dimensions include novelty, feasibility, and impact.

[0072] Specifically, for each innovation point in the set of innovation points to be applied, corresponding individual evaluation scores are determined from three dimensions: novelty, feasibility, and impact. Novelty is calculated by measuring the distance between two concepts in a local knowledge graph. In the constructed knowledge graph, the number of steps required to find the shortest path connecting concept A and concept B is determined. If A and B are directly connected, it indicates a known association, and the novelty score is very low, such as 1 point. If it requires 2-3 intermediate concepts to connect, it indicates a weak association, and the novelty score is moderate, such as 4-6 points. Feasibility is evaluated using the "barrel effect" principle, that is, the reliability of the creative chain depends on the least reliable link. In the relation extraction stage, a confidence score is given to each connection relationship on the creative path, ranging from 0 to 1, representing the degree of reliability. All connections on the entire path connecting concept A to concept B are examined, and the one with the lowest confidence score is found. The lowest score is directly used as the feasibility score of the entire creative idea. For example, if a path consists of three connections with confidence levels of 0.9, 0.8, and 0.5, then the feasibility score for the entire idea is 0.5. Influence predicts potential impact by assessing the importance of the connected concepts themselves, measured by the total number of occurrences in all documents analyzed. A higher frequency of occurrence indicates greater core importance within the current knowledge domain. The frequency of occurrences of concepts A and B connected to the idea is extracted, and these two frequencies are added together as the base influence score. This score is then normalized, such as by dividing by the highest frequency of occurrence among all concepts, to obtain a final influence score between 0 and 10.

[0073] For example, taking the final 500 innovative connections as an example, the screening process follows these principles: Principle 1: Eliminate options that are "completely surpassed". By comparing the 500 innovative connections, if the scores of innovative connection B in all three categories [8, 3, 7] are all lower than those of innovative connection A [9, 4, 8], then innovative connection B will be considered an option that is "completely surpassed" and will be eliminated. Principle 2: Retain options that are "unique". Suppose there is also innovative connection C, whose score is [7, 9, 8]. Although innovative connection A has a higher "novelty" (9 points) than innovative connection C (7 points), innovative connection C has a much higher "feasibility" (9 points) than innovative connection A (4 points). In this case, A and C each have their own strengths, and neither completely surpasses the other. Therefore, the innovative connection will retain both A and C.

[0074] S160. Generate document mining results corresponding to the document set to be processed based on the target innovation points, and feed back the document mining results to the target user.

[0075] Among them, the document mining results can be structured documents generated based on the target innovation points.

[0076] Specifically, document mining results are generated based on the target innovation points. For example, this can be achieved by extracting related information such as related documents and scientific research data corresponding to the target innovation points, generating a structured document from the related information and the target innovation points, and displaying it on the target user's interface.

[0077] For example, after the above screening, the final "recommended list" will consist of a batch of high-quality innovative connections that are "distinctive and unsurpassed." Finally, these innovative connections will be automatically categorized into several preset "strategy folders" and presented to the user: Category 1: "Cutting-Edge Exploration" Characteristics: Extremely high novelty, but current feasibility may be low. Suitable for long-term R&D departments requiring disruptive innovation. Source: The batch of innovative connections with the highest novelty score among the selected connections. Category 2: "Steady Implementation" Characteristics: Extremely high feasibility, good impact, and can be quickly transformed into results. Suitable for product departments seeking short-term returns. Source: The batch of innovative connections with the highest feasibility score among the selected connections. Category 3: "High-Value Potential" Characteristics: Extremely high impact score, balanced across all categories. Represents a direction that can solve major problems. Source: The batch of innovative connections with the highest impact score and other two categories not low among the selected connections.

[0078] The technical solution of this invention involves acquiring a set of documents to be processed selected by a target user, classifying the set of documents according to a classification dimension, establishing a local knowledge graph based on the classified document set, determining candidate innovative concept pairs based on the local knowledge graph, processing the candidate innovative concept pairs based on the Spark Flipping Framework to determine a set of innovative connection points, determining an innovation potential score corresponding to each innovative connection point in the set of innovative connection points, filtering target innovative points from the set of innovative connection points based on the innovation potential scores, and finally generating document mining results corresponding to the document set of documents to be processed based on the target innovative points, and feeding the document mining results back to the target user. Based on the above technical solution, by establishing a local knowledge graph corresponding to the documents to be processed, further selecting innovative connection points based on the local knowledge graph, determining target innovative points based on the innovative connection points, and feeding the target innovative points back to the user, potential innovative ideas and research directions are automatically mined from unstructured documents.

[0079] Example 2

[0080] Figure 2 This is a structural block diagram of a cross-document innovation mining device based on a reasoning-based creative chain, provided as an embodiment of the present invention. Figure 2 As shown, the device includes: a document classification module 210, a candidate innovative concept pair determination module 220, a spark reasoning module 230, a flipping module 240, a target innovative point determination module 250, and a mining result feedback module 260; wherein.

[0081] The document classification module 210 is used to obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimensions;

[0082] The candidate innovation concept pair determination module 220 is used to build a local knowledge graph based on the classified document set to be processed, and determine candidate innovation concept pairs according to the local knowledge graph, wherein the candidate innovation concepts are two nodes that are not directly connected in the local knowledge graph;

[0083] Spark reasoning module 230 is used to apply multi-step logical reasoning algorithm to determine the logical link corresponding to the candidate innovative concept pair, and to determine the core hypothesis corresponding to the candidate innovative concept pair based on the logical link;

[0084] The flipping module 240 is used to determine flipping prompt words based on the flipping question template and the core assumptions, and to determine the innovative connection point set by performing flipping processing on the large language model according to the flipping prompt words.

[0085] The target innovation point determination module 250 is used to determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and to filter target innovation points from the innovation connection point set according to the innovation potential score.

[0086] The mining result feedback module 260 is used to generate document mining results corresponding to the document set to be processed based on the target innovation points, and to feed back the document mining results to the target user.

[0087] Based on the above technical solution, the document classification module is used to determine the document tag corresponding to each document in the document set to be processed, and to use a word embedding model to determine the tag feature vector of the document tag; to determine the semantic similarity between the tag feature vector and the classification dimension, and to classify the document to be processed into the corresponding classification dimension according to the semantic similarity.

[0088] Based on the above technical solution, the document classification module is used to obtain the similarity threshold corresponding to each classification dimension; when the semantic similarity is greater than the similarity threshold of the current classification dimension, the document to be processed is classified into the current classification dimension; when the semantic similarity is greater than the similarity threshold of at least two classification dimensions, the document to be processed is classified into the at least two classification dimensions, and the document to be processed is marked as a cross-knowledge point.

[0089] Based on the above technical solution, the candidate innovative concept pair determination module is used to perform entity recognition on the classified document set to be processed to determine at least two core concepts corresponding to the classified document set to be processed; determine the relationship between the at least two core concepts and determine the importance of the core concepts; construct the local knowledge graph based on the at least two core concepts, the relationship between the at least two core concepts and the importance of the core concepts, wherein the nodes in the local knowledge graph are core concepts; and the edges are the relationship between the at least two core concepts.

[0090] Based on the above technical solution, the candidate innovation concept pair determination module is used to determine an initial innovation concept pair from the local knowledge spectrum according to a maximum distance threshold, and determine the confidence level of the path connection relationship corresponding to the initial innovation concept pair; if the confidence level of the path connection relationship corresponding to the initial innovation concept pair is greater than a preset confidence threshold, and the initial innovation concept does not match the exclusion list, the initial innovation concept pair is used as the candidate innovation concept pair.

[0091] Based on the above technical solution, the target innovation point determination module is used to sort each innovation connection point in the innovation connection point set according to the innovation potential score to obtain a sorting result; determine the set of innovation points to be applied from the innovation connection point set according to the sorting result, and determine the target innovation point from the set of innovation points to be applied according to the innovation potential score.

[0092] Based on the above technical solution, the target innovation point determination module is used to determine the individual evaluation score corresponding to each innovation evaluation dimension of the set of innovation points to be applied, wherein the innovation evaluation dimensions include novelty dimension, feasibility dimension and impact dimension; and to determine the target innovation point from the set of innovation points to be applied according to the individual evaluation score and preset screening rules.

[0093] The technical solution of this invention involves acquiring a set of documents to be processed selected by a target user, classifying the set of documents according to a classification dimension, establishing a local knowledge graph based on the classified document set, determining candidate innovative concept pairs based on the local knowledge graph, processing the candidate innovative concept pairs based on the Spark Flipping Framework to determine a set of innovative connection points, determining an innovation potential score corresponding to each innovative connection point in the set of innovative connection points, filtering target innovative points from the set of innovative connection points based on the innovation potential scores, and finally generating document mining results corresponding to the document set of documents to be processed based on the target innovative points, and feeding the document mining results back to the target user. Based on the above technical solution, by establishing a local knowledge graph corresponding to the documents to be processed, further selecting innovative connection points based on the local knowledge graph, determining target innovative points based on the innovative connection points, and feeding the target innovative points back to the user, potential innovative ideas and research directions are automatically mined from unstructured documents.

[0094] The cross-document innovation mining device based on inference-based creative chain provided in the embodiments of the present invention can execute the cross-document innovation mining method based on inference-based creative chain provided in any embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0095] Example 3

[0096] Figure 3A schematic diagram of an electronic device 10, which can be used to implement embodiments of the present invention, is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0097] like Figure 3 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.

[0098] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0099] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as cross-document innovation mining methods based on inference-based idea chains.

[0100] In some embodiments, the cross-document innovation mining method based on inference-based creative chains can be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program can be loaded and / or installed on electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the cross-document innovation mining method based on inference-based creative chains described above can be performed. Alternatively, in other embodiments, processor 11 can be configured to perform the cross-document innovation mining method based on inference-based creative chains by any other suitable means (e.g., by means of firmware).

[0101] The various embodiments of the techniques described above and applied herein can be implemented in digital electronic circuits, integrated circuits, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), system-on-a-chip (SoCs), payload-programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable device including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from memory, at least one input device, and at least one output device, and transferring data and instructions to the memory, the at least one input device, and the at least one output device.

[0102] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0103] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with instruction execution, means or apparatus. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor means or apparatus, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0104] To provide interaction with a user, the techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0105] The technologies described herein can be implemented in computing that includes backend components (e.g., as a data server), or middleware components (e.g., an application server), or frontend components (e.g., a user computer with a graphical user interface or web browser through which a user can interact with the embodiments of the technologies described herein), or any combination of such backend, middleware, or frontend components. The components can be interconnected via digital data communication (e.g., a communication network) of any form or medium. Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0106] Computation can include clients and servers. Clients and servers are generally geographically separated and typically interact via communication networks. Client-server relationships are created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0107] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0108] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A cross-document innovation mining method based on reasoning-based creative chains, characterized in that, include: Obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimension; A local knowledge graph is built based on the classified document set to be processed, and candidate innovative concept pairs are determined according to the local knowledge graph, wherein the candidate innovative concepts are two nodes that are not directly connected in the local knowledge graph; A multi-step logical reasoning algorithm is used to determine the logical link corresponding to the candidate innovative concept pair, and the core hypothesis corresponding to the candidate innovative concept pair is determined based on the logical link. Based on the flip problem template and the core assumptions, flip prompt words are determined, and the innovative connection point set is determined by performing flip processing on the large language model according to the flip prompt words. Determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and select target innovation points from the innovation connection point set based on the innovation potential score; Based on the target innovation points, document mining results corresponding to the document set to be processed are generated, and the document mining results are fed back to the target user.

2. The method according to claim 1, characterized in that, The classification of the document set to be processed according to the classification dimension includes: Determine the document tag corresponding to each document in the document set to be processed, and use a word embedding model to determine the tag feature vector of the document tag; Determine the semantic similarity between the tag feature vector and the classification dimension, and classify the document to be processed into the corresponding classification dimension based on the semantic similarity.

3. The method according to claim 2, characterized in that, The step of classifying the document to be processed into the corresponding classification dimension based on the semantic similarity includes: Obtain the similarity threshold corresponding to each category dimension; If the semantic similarity is greater than the similarity threshold of the current classification dimension, the document to be processed will be classified into the current classification dimension. If the semantic similarity is greater than the similarity threshold of at least two classification dimensions, the currently processed document is classified into the at least two classification dimensions, and the currently processed document is labeled as a cross-knowledge point.

4. The method according to claim 1, characterized in that, The process of building a local knowledge graph based on the classified document set includes: Entity recognition is performed on the classified document set to be processed to determine at least two core concepts corresponding to the classified document set to be processed. Determine the relationship between the at least two core concepts and determine the importance of the core concepts; The local knowledge graph is constructed based on the at least two core concepts, the relationship between the at least two core concepts, and the importance of the core concepts, wherein the nodes in the local knowledge graph are core concepts, and the edges are the relationships between the at least two core concepts.

5. The method according to claim 1, characterized in that, The step of determining candidate innovative concept pairs based on the local knowledge graph includes: Initial innovative concept pairs are determined from the local knowledge graph based on the maximum distance threshold, and the confidence level of the path connection relationship corresponding to the initial innovative concept pairs is determined. If the confidence level of the path connection relationship corresponding to the initial innovation concept pair is greater than a preset confidence threshold, and the initial innovation concept does not match the exclusion list, the initial innovation concept pair will be used as the candidate innovation concept pair.

6. The method according to claim 1, wherein the step of selecting target innovation points from the innovation connection point set based on the innovation potential score comprises: Based on the innovation potential score, each innovation connection point in the innovation connection point set is sorted to obtain the sorting result; Based on the ranking results, a set of innovation points to be applied is determined from the set of innovation connection points, and a target innovation point is determined from the set of innovation points to be applied based on the innovation potential score.

7. The method according to claim 6, characterized in that, The step of determining the target innovation point from the set of innovation points to be applied based on the innovation potential score includes: Determine the individual evaluation score corresponding to each innovation evaluation dimension of the set of innovation points to be applied, wherein the innovation evaluation dimensions include novelty dimension, feasibility dimension and impact dimension; The target innovation point is determined from the set of innovation points to be applied based on the individual evaluation score and the preset screening rules.

8. A cross-document innovation mining device based on reasoning-based creative chains, characterized in that, include: The document classification module is used to obtain the set of documents to be processed selected by the target user, and classify the set of documents to be processed according to the classification dimensions. The candidate innovation concept pair determination module is used to build a local knowledge graph based on the classified document set to be processed, and determine candidate innovation concept pairs according to the local knowledge graph, wherein the candidate innovation concepts are two nodes that are not directly connected in the local knowledge graph; The Spark Inference module is used to apply a multi-step logical reasoning algorithm to determine the logical link corresponding to the candidate innovative concept pair, and to determine the core hypothesis corresponding to the candidate innovative concept pair based on the logical link. The flipping module is used to determine flipping prompt words based on the flipping question template and the core assumptions, and to determine the innovative connection point set by performing flipping processing on the large language model according to the flipping prompt words. The target innovation point determination module is used to determine the innovation potential score corresponding to each innovation connection point in the innovation connection point set, and to filter target innovation points from the innovation connection point set based on the innovation potential score. The document mining results feedback module is used to generate document mining results corresponding to the document set to be processed based on the target innovation points, and to feed back the document mining results to the target user.

9. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program executable by the at least one processor, which enables the at least one processor to perform the cross-document innovation mining method based on inference-based creative chains as described in any one of claims 1-7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause a processor to execute the cross-document innovation mining method based on inference-based creative chains as described in any one of claims 1-7.