A scientific field-oriented technology association dynamic tracking method
By constructing a benchmark technology association network and time series, and combining semantic similarity and positive and negative samples, the limitations of existing technologies in dynamic tracking of scientific and technological associations are overcome, and a more efficient and accurate dynamic tracking effect is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZHEJIANG UNIV CITY COLLEGE
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to capture the real-time and comprehensive development and changes of different technological entities in their related and subsidiary fields when tracking scientific and technological connections, resulting in reduced efficiency and accuracy in dynamic tracking.
By constructing a benchmark technology association network, using semantic matching and time factors to build a time series of association strength, and combining positive and negative samples and semantic similarity, the update range of technology entities is determined, and a dynamic technology association network is generated.
It improves the intuitiveness and accuracy of technology-related evolution, avoids the redundancy of full updates, and ensures the comprehensiveness and interpretability of data tracking.
Smart Images

Figure CN121920358B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, specifically a method for dynamic tracking of technology associations in the scientific field. Background Technology
[0002] Technological association tracing in the scientific field explores the coupling relationship between scientific and technological knowledge, showcasing the dynamic evolution of association analysis through different technological entities. Conventionally, technological association is processed through methods such as paper citations, entry mapping, role association, and keyword distribution. However, this approach often presents a unidirectional, static tracing under retrieval analysis, failing to consider the updates of different technological entities within their associated and subordinate domains. This leads to limitations and a singular evolutionary pattern in data association tracing, resulting in reduced efficiency and accuracy of dynamic tracing.
[0003] For example, Chinese Patent Publication No. CN117633253A discloses a method for detecting science-technology associations based on multidimensional coupling of knowledge networks. The steps are as follows: select the target scientific and technological innovation research field, collect scientific literature and patent technology literature for retrieval, perform data cleaning operations, extract core scientific and technological knowledge elements using word / phrase embedding method under multi-grammatical framework, construct a scientific and technological knowledge network representation based on the co-occurrence relationship of core knowledge elements, divide the scientific and technological knowledge network year by year, calculate the coupling strength, coupling delay, coupling depth and coupling synergy index between scientific and technological knowledge networks, and provide an index calculation scheme for the coupling measure of scientific and technological knowledge networks.
[0004] For example, Chinese Patent Publication No. CN119739848A discloses a method and system for searching related technical texts based on semantic recognition, which relates to the field of patent novelty search technology. The method includes: obtaining at least one keyword for the technology to be searched based on the technical content to be searched; obtaining the key index of each keyword in the text of the technical content to be searched; obtaining the set of existing related technical content corresponding to each keyword and finding its union to obtain the total set of related technical content for the technical content to be searched; calculating the association index between each existing related technology in the total set of related technical content and the technical content to be searched based on a technology association algorithm; and outputting the related technical text content based on the association index with the technical content to be searched.
[0005] In existing technologies, time-delay coupling techniques based on science-technology networks are commonly used to determine the knowledge network topology under co-occurrence relationships; or, relying on a set of cited technologies, novelty searches are performed on known technologies. However, these methods have significant limitations when dealing with dynamic technological evolution: their update mechanisms heavily depend on pre-defined citation scenarios and fixed data mapping rules, representing a coupling method based on single semantic matching. This makes it difficult to capture the development and changes of different technological entities in their related and subordinate domains in real time and comprehensively. This leads to a general emphasis on semantic relevance rather than the intersection and extension within the domains of each technological entity during dynamic tracking of technological knowledge. This results in each technological entity emphasizing a strong correlation evolution along a single path, leading to limitations and singularity in data-related tracking, reducing the accuracy of technological situation analysis and the timeliness of decision support. Summary of the Invention
[0006] To solve the above-mentioned technical problems, the technical solution adopted by the present invention is: a method for dynamic tracking of technology associations in the scientific field, comprising: extracting at least one technical entity from text information based on text information of the target scientific field.
[0007] The acquired technical entities are semantically matched to determine the relationships between them. Based on these relationships, a baseline technical association network is constructed, with technical entities as nodes and the relationships between them as edges.
[0008] Using the update of any technical entity in the benchmark technical association network as the trigger condition, and combining the time factors corresponding to different technical entities, a time series of association strength containing positive and negative samples is constructed.
[0009] Based on the trend of association strength at any time, and combined with the semantic similarity of the current technical entity, the update range at different times is determined.
[0010] The update range at different times is synchronized to the baseline technology association network, and the dynamic technology association network corresponding to the current technology entity is output according to the intersection of the ranges of each update.
[0011] The beneficial effects of this invention are as follows: First, this invention extracts technical entities from a keyword set by statistically analyzing keyword frequencies and combining mutual information between words; then, it uses semantic matching and prior association to determine the association relationship between technical entities, and calculates the error confidence level through association error and semantic error. The error confidence level is used to verify the prior association, and if the conditions are not met, semantic similarity is used as the association relationship; this ensures the construction of the association relationship between the current technical entity and its neighboring entities, providing a data foundation for subsequent dynamic tracking.
[0012] Second, this invention constructs a time series of association strength containing positive and negative samples by using the update of technical entities as the trigger condition and combining it with time factors; it generates anchor points with timestamps and determines the scope of the update's impact, obtains a representation vector by bidirectionally supplementing text and topological attributes, determines the association strength after the update by comparing positive and negative samples, and combines the time interval and the intensity change value to form a time series; it adds a time series factor to the update content between the current technical entity and adjacent technical entities, and uses positive and negative samples to make up for the one-sidedness of single attribute processing, laying the foundation for subsequent trend analysis and scope definition.
[0013] Third, this invention generates smooth data segments by periodically dividing the data, calculates the difference between adjacent periods to determine positive and negative difference results, divides and integrates upward / downward trends based on the consecutive number of positive and negative differences, and integrates non-trend data according to average intensity, ultimately obtaining the update range corresponding to the trend and non-trend integration results. Simultaneously, semantic constraints are added to the trend integration, using the data flow direction corresponding to the positive and negative sample sets to verify the relationship between data flow direction and technical entities. This ensures the comprehensiveness of trend data coverage, improves the interpretability of each data part when defining the update range, and provides an update basis for the benchmark technology association network in multiple scenarios. Finally, by using the intersection of update ranges, the labeled data items of strong and secondary correlation intersections are concatenated in chronological order, enabling the current technology update to present a dynamic tracking from stable changes to optional changes, avoiding the redundancy of full updates in technology knowledge updates, and improving the intuitiveness of technology association evolution. Attached Figure Description
[0014] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0015] Figure 1 This is a flowchart illustrating a method for dynamically tracking technology-related relationships in the scientific field.
[0016] Figure 2 This is a flowchart illustrating step S2 of a technology-related dynamic tracking method for the scientific field.
[0017] Figure 3 This is a flowchart illustrating step S3 of a technology-related dynamic tracking method for the scientific field.
[0018] Figure 4 This is a flowchart illustrating step S4 of a technology-related dynamic tracking method for the scientific field. Detailed Implementation
[0019] The embodiments of the present invention are described in detail below. The embodiments described below are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention. Where specific techniques or conditions are not specified in the embodiments, they shall be performed in accordance with the techniques or conditions described in the literature in the art or in accordance with the product manual.
[0020] See Figure 1 A method for dynamic tracking of technology associations in the scientific field includes: S1, extracting at least one technology entity from text information based on text information of the target scientific field.
[0021] S2. Semantic matching is performed on the acquired technical entities to determine the relationships between them. Based on the relationships between the technical entities, a benchmark technical association network is constructed with the technical entities as nodes and the relationships between them as edges.
[0022] S3 uses the update of any technical entity in the benchmark technical association network as the trigger condition, and combines the time factors corresponding to different technical entities to construct a time series of association strength containing positive and negative samples.
[0023] S4. Based on the trend of association strength at any time, combined with the semantic similarity of the current technical entity, determine the update range at different times.
[0024] S5 synchronizes the update range at different times to the baseline technology association network, and outputs the dynamic technology association network corresponding to the current technology entity according to the intersection of the ranges of each update.
[0025] The extracted technical entities will be clearly defined as keywords in the corresponding technical fields, such as specific terms like quantum computing, gene editing, and lithium battery materials. These specific keywords will be divided into individual technical entities according to the name of the literature or project in which each term appears, the research objectives, and the start and end content, in order to explain the relative development process of different technologies.
[0026] Meanwhile, the relationships between technical entities will extend to the semantic relationships between corresponding keywords, the references, subordination, and co-occurrence of corresponding technologies, and the association of different technical entities to form a relatively clear semantic relationship network, which can serve as the knowledge graph configured in the current scenario.
[0027] When extracting at least one technical entity from the text information in step S1, the implementation method further includes: using word segmentation technology to segment the text information of the target scientific field to obtain a keyword set of the target scientific field; each extracted keyword will be segmented according to the professional vocabulary dictionary of the corresponding scientific field and according to the segmentation method of adjectives, verbs and nouns to obtain a keyword set of the target scientific field.
[0028] The frequency of each word in the keyword set is statistically analyzed, and the corresponding technical entities are extracted from the keyword set based on the mutual information between each word.
[0029] Calculating mutual information measures the degree of association between two words. In this case, mutual information calculation measures the situation where two words appear simultaneously and sets the words that can appear simultaneously as the corresponding technical entities. For example, if the mutual information value between convolution and neural network in a convolutional neural network is high, then these two words can be combined into an output technical entity to explain the technical entities of multiple sets of words in the corresponding literature in the current target scientific field.
[0030] When calculating mutual information, the value of the word frequency is used as the marginal distribution value, and the word frequency of two words appearing at the same time is used as the joint distribution value, thereby identifying the mutual information value between any two words in the current scene. When the mutual information value is greater than 0, it indicates that the corresponding words are related, and the mutual information value is less than or equal to 0, which means that they are unrelated and the words are independent of each other. At this time, a professional vocabulary dictionary of the current scientific field is selected, and the average value of the mutual information between strongly related words is used as the benchmark to find a series of technical entities that can be combined.
[0031] When using word frequency to extract technical entities, the word frequency threshold used in the current technical field will be examined. For example, words with a word frequency ≥ 0.001 are considered keywords that meet the frequency threshold within the current technical domain. Then, the mutual information of the corresponding keywords will be calculated. Finally, multiple words with mutual information greater than the average value of strongly related words will be selected as the main technical entities to be processed. The word frequency threshold can be adjusted according to the total amount of text data in the target domain, and other value ranges of word frequency thresholds can be selected according to the scenario.
[0032] like Figure 2 As shown, in step S2, the semantic association between different technical entities will be analyzed based on the content corresponding to the technical entities, and the association between different technical entities will be completed in the form of word vectors. The association relationship will then be used to form a benchmark technical association network.
[0033] The method for determining the relationship between technical entities in step S2 includes: S21, extracting the prior association and semantic information of the technical entities, and calculating the association error and semantic error between the prior association and the actual association, respectively.
[0034] S22, use association error and semantic error to obtain error confidence, use error confidence to correct prior association, and determine the association relationship between technical entities.
[0035] Prior associations will include reference associations, subordinate associations, and co-occurrence associations. These are the initial associations of the current technical entities, which can be obtained directly without the need for word segmentation, mutual information, or other calculations. Based on these associations, the changes of technical entities under dynamic tracking can be verified.
[0036] Prior association will be based on the reference association, dependency association and co-occurrence association corresponding to any two technical entities, and initial feature values will be set between the technical entities; as for the feature values of the actual association, they will be set according to the normalized mutual information value to determine the association error corresponding to the prior association.
[0037] The semantic error is determined by calculating the semantic similarity between any two technical entities. This can be done by using word vectors to calculate the cosine similarity, and then subtracting 1 from the semantic similarity to determine the semantic error corresponding to the technical entity.
[0038] The error confidence level is set based on the confidence levels corresponding to the association error and the semantic error. That is, it is the ratio of the amount of data that simultaneously has the current association error and the semantic error to the amount of data that has the current association error, thereby illustrating the relative deviation between the current semantic and association.
[0039] When correcting prior associations using error confidence, the implementation method also includes: verifying the prior associations corresponding to the current technical entity based on the value range corresponding to the error confidence; at this time, the value range of the error confidence will be used as the first part to be verified, and then the prior associations will be verified one by one; the larger the value of the error confidence, the more priority is given to retaining the associations corresponding to the prior relationship.
[0040] When the prior association is a reference association, the number of references and the error confidence level of the current technical entity are used as the judgment conditions to determine the prior association that meets the requirements. The reference association represents a one-way association from the referrer to the referenced party. It is necessary to determine whether the current reference relationship is valid according to the reference situation and error confidence level of different technical entities in the tracking process.
[0041] The error confidence level will be sorted from smallest to largest based on the error confidence level of all technical entities. The 75th percentile value will be used as the current confidence threshold. For technical entities with an error confidence level greater than this threshold, it indicates that their prior associations conform to the current scenario. Afterward, only the citation count needs to be checked. If the citation count is greater than the minimum citation count, it is considered to meet the judgment criteria. The minimum citation count can be set based on the minimum citation count between technical entities under normal circumstances, such as selecting 3 times, and adjusting this value according to the number of documents in the corresponding technical field.
[0042] When the prior association is a subordinate association, the semantic similarity and error confidence between the current technical entities are used as judgment conditions to determine the qualified prior association. Subordinate associations often represent inclusion relationships, such as the subordinate association between convolutional neural networks and deep learning. It is necessary to verify whether the inclusion relationship between technical entities is consistent with the part explained in the prior association, and then use its semantic similarity to describe the corresponding inclusion relationship. The semantic similarity can be based on the average semantic similarity of subordinate associations in historical data as a threshold. The part that is greater than the semantic similarity threshold and the error confidence threshold is regarded as a reliable subordinate association and output.
[0043] When the prior relationship is a co-occurrence association, the co-occurrence frequency and error confidence level of the current technical entity are used as the judgment conditions to determine the prior relationship that meets the requirements. The co-occurrence association represents the frequency of two technical entities appearing at the same time. At this time, the average co-occurrence frequency between technical entities in historical data is used as the benchmark, and data that are greater than this part and greater than the corresponding error confidence level are output to set a relatively stable association relationship.
[0044] Data that conforms to prior associations are used as the output associations. When any prior association does not meet its judgment conditions, the semantic similarity corresponding to the current technical entity is used as the output association.
[0045] For relatively unstable connections between technical entities, the connections between different technical entities will be constructed directly based on the similarity of semantic information, thus presenting a relationship network under the iteration of technology tracking.
[0046] When using semantic similarity as the association, a confidence interval approach will be adopted. The semantic similarity of historical data will be called, and a confidence interval of mean ± 3 times standard deviation will be set. Technical entities with a value greater than the lower limit of the confidence interval will be associated, otherwise the corresponding technical entities will not be connected, in order to prevent some unrelated technical entities from being connected in the benchmark technical association network.
[0047] In one embodiment of the present invention, a time factor is introduced in step S3 to determine the change in correlation strength under any technological iteration update, guided by the change in time.
[0048] like Figure 3 As shown, the implementation method of constructing the association strength time series containing positive and negative samples in step S3 includes: S31, using the set of text attribute features when the technical entity is updated as a benchmark, generating anchor points with timestamps, and determining the influence range when the technical entity is updated by comparing the text attribute similarity between the current anchor point and the historical anchor points of adjacent entities. Among them, positive samples refer to the associated entity pairs with high semantic similarity to the currently updated technical entity, and negative samples refer to the associated entity pairs with low semantic similarity.
[0049] The text attribute feature set contains descriptions of the updated content, such as keywords, technical fields, and application scenarios, to explain the specific content when each technical entity is tracked and updated. Each anchor point is bound to the text attribute feature set according to the timestamp to serve as the basic coordinate on the time series.
[0050] As for the adjacent entity, it represents the entity that is directly adjacent to the currently updated entity in the reference technology association network, and extracts its historical update anchor point to determine whether the current update will be transmitted to the adjacent entity. When a new scientific and technological document or patent text related to the target technology entity is detected, it is considered that the technology entity has been updated.
[0051] Suppose the current technical entity is deep learning, and its updates include keywords such as Transformer architecture, self-attention mechanism, large language model, and long text processing; the adjacent entities include image recognition and natural language processing; the historical updates of these two adjacent entities are medical imaging, feature extraction, convolutional kernels, and text classification, semantic understanding, and recurrent neural networks, respectively. Then, the text attribute similarity between the anchor point and the adjacent anchor points is calculated using cosine similarity. The adjacent entities with high similarity to the current anchor point are selected as the scope of influence, thereby avoiding indiscriminate updates of related content of multiple technical entities, which would lead to data waste.
[0052] When using text attribute similarity to determine the scope of influence, 0.6 is used as the threshold for the current similarity division. Neighboring entities with similarity values greater than this part are obtained. Alternatively, the average similarity between neighboring entities and the current technical entity can be used as the division benchmark to determine the high similarity part among multiple neighboring entities.
[0053] S32, identify the anchor point sequence corresponding to the same technical entity, and perform bidirectional data supplementation through text attributes and topological attributes to determine the representation vector of the supplemented technical entity.
[0054] After determining the scope of the data update, all anchor points corresponding to the same technical entity are identified to form a time-series anchor point sequence. Then, the missing related data after the update is supplemented in two directions: text attributes and topological attributes. Text attributes are used to supplement the text features of the technical entity, such as cited literature and technical white papers. Topological attributes are used to update the adjacent entities that are related to the current technical entity, as well as the attributes representing the topological structure, such as the position and hierarchy of the current technical entity and its adjacent entities in the baseline technical association network. This eliminates the problem of missing information about the association of technical entities after each data update.
[0055] S33. Based on the representation vector of the current technical entity and the influence range during the update, the correlation strength between the current technical entity and its neighboring entities after the update is determined by comparing positive and negative samples.
[0056] Positive samples are anchor points with high similarity to the current technical entity, while negative samples are anchor points with low similarity. At this time, the adjacent entity data in the influence range of step S31 will be introduced as positive samples here, and other adjacent entity data related to the current technical entity will be used as negative samples to determine the changes in their updated association strength.
[0057] When comparing positive and negative samples in step S33, the implementation method also includes: combining the current technical entity with any adjacent entity to obtain multiple entity pairs.
[0058] When the current entity pair belongs to the scope of influence when updating a technical entity, the corresponding entity pair is considered a positive sample, and other entity pairs are considered negative samples.
[0059] Based on the proportion of shared adjacent edges of any entity pair to the total number of adjacent edges of that entity pair, the adjacency density of the entity pair is configured. The association strength of each entity pair is set by weighting the adjacency density and the text attribute similarity. At this point, the adjacency density is used to reflect the proportion of shared neighbors of any two technical entities in the network, characterizing whether the two are closely related, and thus clarifying the focus of technical dynamic updates.
[0060] As for the weighting of adjacency density and text attribute similarity, weight ratios of 0.3 and 0.7 will be set respectively. Based on text attribute similarity and supplemented by the topological relationship of entity pairs, the correlation strength of each entity pair before and after the update will be explained. Regarding the comparison of positive and negative samples, the correlation strength of the two samples before and after the update will be compared by calculating the correlation strength, and the correlation strength of both samples will be synchronized to the corresponding technical entities.
[0061] S34. Based on the time interval of each update and the change value of the association strength, the association strength corresponding to each technical entity is combined into an association strength time series.
[0062] Preferably, when outputting the correlation strength time series, in addition to determining the updated content of the technical entity, it is also necessary to determine the update status of positive and negative samples, so as to complete the processing of the current technical entity and its neighboring entities.
[0063] Therefore, the implementation of the correlation strength time series also includes: comparing the fields updated by the technical entity at any time, performing incremental updates based on the data pointed to by each field, and synchronizing the incrementally updated data to the correlation strength time series.
[0064] After obtaining the corresponding updated technical entities, the corresponding data can be synchronized to the correlation strength time series through incremental update insertion, so as to facilitate subsequent splitting of the time series and outputting the differences of each update.
[0065] In one embodiment of the present invention, such as Figure 4 As shown, the implementation method for determining the update range at different times in step S4 includes: S41, setting the cycle length for each update, and segmenting the data according to the cycle length to obtain a smooth data segment after each update, and using the smooth data segment to analyze the changing trend of the correlation strength.
[0066] Smoothing data segments eliminate random fluctuations in the correlation strength at each update by using moving averages, thus highlighting the trend characteristics after each update. The set period length can be monthly, using data from three consecutive monthly updates as the basis for trend analysis to determine the iterative changes of the currently tracked technology.
[0067] S42, for the same technical entity, the correlation strength of adjacent periodically smoothed data segments is subtracted one by one, and the calculation results corresponding to positive and negative difference values are determined according to the number of differences. Positive difference values represent the part where the correlation strength increases, and negative difference values represent the part where the correlation strength decreases. During the subtraction process, for each continuous smoothed data segment, the data updated each time is subtracted to determine the parts where the correlation strength increases and decreases.
[0068] S43. Based on the consecutive occurrences of positive and negative differences, the upward trend results and downward trend results are determined sequentially. The content of each trend result is verified through semantic similarity to obtain the integrated part of the trend results.
[0069] When using positive and negative differences for data statistics, the positive and negative sample portions corresponding to the correlation strength are also introduced to determine the changes of positive and negative samples under the trend of rising and falling correlation strength. The rate of change of correlation strength before and after the update is used as the degree of influence of the current correlation relationship, thereby verifying the integration of trend results and finally completing the update configuration of the current data. This prevents some data from being updated to irrelevant technical entities, which would ultimately cause the data update to be bloated and the data storage to be complicated.
[0070] An upward trend result represents a situation where a positive difference occurs three or more times consecutively, and the corresponding time interval is integrated into an upward trend result. The same applies to a downward trend result. As for technical entities, those where the correlation strength does not show consecutive increases or decreases are not included in the corresponding trend results and need to be processed separately.
[0071] The implementation of the trend result integration part in step S43 further includes: binding the data in the upward trend results and downward trend results to the semantically segmented positive and negative sample sets, and determining the semantic constraints after mapping each trend result. The semantic constraints include the semantic similarity and correlation strength of the trend results when updating, as well as the data updated in multiple time intervals. These data are semantically labeled according to terms such as significant increase, significant decrease, slow increase, and slow decrease to form the semantic constraints corresponding to the positive and negative sample sets.
[0072] Meanwhile, since the current technology entity will record the updated parts of positive and negative samples in the corresponding association relationship when updating data, its semantic constraints will be further extended to the significant increase driven by positive samples and the significant increase driven by negative samples, and constraints will be set on this part of the data.
[0073] For example, when the negative samples corresponding to the current technical entity show a significant decrease in correlation strength, it means that the subordinate technologies related to the current technical entity have shown other development directions, and the correlation with the content corresponding to the current technical entity has decreased significantly; at the same time, when the positive samples show a significant increase, it means that the highly correlated technologies of the current technical entity have increased the subordinate content related to the current technical entity, indicating that its technology has shown significant development.
[0074] Preferably, when setting semantic constraints, data is sorted based on semantic similarity and association strength. Since the similarity of positive samples is ≥0.6, the similarity of the positive samples will be rechecked. The similarity of negative samples will only select the part that is weakly related to the corresponding technical entity. For example, data with similarity ≥0.4 will be selected as the topic to be checked, and other data will not be updated.
[0075] Meanwhile, the correlation strength is distinguished into significant and slow-changing parts based on its rate of change, and the parts with smaller changes are not included in the trend integration. For example, if the absolute value of the correlation strength change rate exceeds 30%, it is considered to have a significant trend, and data with an absolute value of change rate between 5% and 30% are considered to have a slow trend. Stable parts with a change rate of less than 5% are omitted and are not included in the integration content of adjacent periods.
[0076] Positive samples can then correspond to trends of significant increase, slow increase, and significant decrease, forming semantic constraints on technological path adjustment and maturity; negative samples correspond to trends of significant decrease and slow increase, illustrating semantic constraints on technological subordinate development and cross-border extension, in order to track technological development under different circumstances.
[0077] Based on the semantic constraints corresponding to each trend result, the data flow direction of the positive and negative sample sets during data updates is determined.
[0078] As for the direction of data flow, it will be described according to the semantic constraints of the description section, such as positive samples - a significantly increasing flow direction, positive samples - a slowly increasing flow direction, and so on.
[0079] The correlation between the data flow direction and the corresponding technical entity is verified. When the verification is successful, the data of the corresponding positive and negative sample sets are integrated to obtain the integrated part of the trend result.
[0080] When verifying the data flow, the system will judge based on the adjacent entities associated with the corresponding technical entity to determine the changes in the number of associated edges of the corresponding technical entity under scenarios such as an increase or decrease in the association strength. The system will also retrieve the data content added by the corresponding adjacent entities and the current technical entity to determine whether the technological breakthroughs, technology route transfers, etc. recorded in the semantic constraints are consistent with the content presented by the association relationship, thereby obtaining the integrated part of the trend results.
[0081] S44: For data that does not belong to the upward trend results or downward trend results, integrate the data based on the average correlation strength of the corresponding data to obtain the integrated part of the non-trend results.
[0082] At this stage, the technological entities are slightly affected by various technological updates. The correlation patterns between these entities are mature, and subtle changes will occur depending on the scenario or application. Compared to scenarios with rapid technological updates where correlations rise and fall, this scenario prefers storing relevant data in independent partitions and recording fluctuations related to the technological entities to determine subtle updates and changes in the corresponding domains. The differences in each update can be recorded using average correlation strength, and these differences can be directly integrated to identify changes in related technologies when trends are not significant.
[0083] S45, based on the integration of trend results and non-trend results, combines the corresponding time information to obtain the update range for different time periods.
[0084] The integrated update scope is used to specify which technical entities' associated data needs to be updated, what the update time intervals are, and how to process this data after the update, thereby defining the boundaries of data updates and avoiding full data updates that could disrupt the relationships in the baseline network.
[0085] After obtaining the update range for adjacent cycles, this update range is gradually synchronized to the initial benchmark technology association network, and the intersection of the updated ranges is determined to identify the common points of progress of the corresponding technologies. The intersection is then used as an extension to track the dynamic process of the corresponding technology entities in real time.
[0086] The implementation of the dynamic technology association network in step S5 also includes: when the update range at different times is obtained, pruning is performed on the data of each labeled data to obtain the updated labeled data item; the pruning process is to prune the update range itself to filter out invalid and duplicate update records and reduce the workload under technology synchronization tracking.
[0087] The intersection of the marked data items is considered as the range intersection for each update, and the marked data items are connected sequentially.
[0088] It should be noted that since the data currently input to step S5 covers the datasets of positive and negative samples, the intersection of their update ranges will be further divided into strong associations corresponding to the intersection of positive samples, and secondary associations corresponding to the intersections of positive and negative samples and the intersection of negative samples.
[0089] When the range intersection includes the strong correlation intersection corresponding to the positive sample intersection, the secondary correlation intersection corresponding to the positive and negative sample intersection, and the negative sample intersection, the labeled data items of the strong correlation intersection and the secondary correlation intersection are concatenated in chronological order; and the connected labeled data items are used to jump to the process order of the technical entities, which is the output dynamic technical correlation network.
[0090] The final output dynamic technology association network will express the strong association edge, the secondary association edge, and the update process of the corresponding technology in sequence according to the changes of each adjacent entity of the technology entity, thereby reflecting the update process of the technology entity at each association edge.
[0091] Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present invention, which are still covered within the protection scope of the present invention.
Claims
1. A method for dynamically tracking technological associations in the scientific field, characterized in that, include: Based on textual information in the target scientific field, extract at least one technical entity from the textual information; The acquired technical entities are semantically matched to determine the relationships between them. Based on these relationships, a baseline technology association network is constructed, with technical entities as nodes and the relationships between them as edges. Using the update of any technical entity in the benchmark technical association network as the trigger condition, and combining the time factors corresponding to different technical entities, a time series of association strength containing positive and negative samples is constructed. Methods for constructing time series of association strengths containing positive and negative samples include: Based on the set of text attribute features when a technical entity is updated, timestamped anchor points are generated. By comparing the text attribute similarity between the current anchor point and the historical anchor points of adjacent entities, the scope of influence when a technical entity is updated is determined. Identify the anchor point sequence corresponding to the same technical entity, and perform bidirectional data supplementation through text attributes and topological attributes to determine the representation vector of the supplemented technical entity; Based on the representation vector of the current technical entity and the influence range during the update, the correlation strength between the current technical entity and its neighboring entities after the update is determined by comparing positive and negative samples. Based on the time interval of each update and the change in correlation strength, the correlation strengths corresponding to each technical entity are combined into a correlation strength time series. Positive samples refer to associated entity pairs with high semantic similarity to the entities in the current update technology, while negative samples refer to associated entity pairs with low semantic similarity. Based on the trend of association strength at any time, and combined with the semantic similarity of the current technical entities, the update range at different times is determined; The implementation methods for determining the update range at different times include: Set the period length for each update, and segment the data according to the period length to obtain a smoothed data segment after each update. Use the smoothed data segment to analyze the changing trend of the correlation strength. For the same technical entity, the correlation strength of adjacent periodic smoothed data segments is subtracted one by one, and the calculation results corresponding to positive and negative difference values are determined according to the number of differences. Based on the consecutive occurrences of positive and negative differences, the upward trend results and downward trend results are determined sequentially. The content of each trend result is verified through semantic similarity to obtain the integrated part of the trend results. For data that does not belong to the upward trend or downward trend results, the average correlation strength of the corresponding data is used to integrate them to obtain the integrated part of the non-trend results; Based on the integration of trend results and non-trend results, and combined with the corresponding time information, the update range for different time periods is obtained. The update range at different times is synchronized to the baseline technology association network, and the dynamic technology association network corresponding to the current technology entity is output according to the intersection of the ranges of each update.
2. The method for dynamic tracking of technological associations in the scientific field according to claim 1, characterized in that, When extracting at least one technical entity from text information, the implementation methods also include: Word segmentation technology is used to segment text information in the target scientific field to obtain a keyword set for the target scientific field; The frequency of each word in the keyword set is statistically analyzed, and the corresponding technical entities are extracted from the keyword set based on the mutual information between each word.
3. The method for dynamic tracking of technological associations in the scientific field according to claim 1, characterized in that, The methods for determining the relationships between technical entities include: Extract the prior association and semantic information of technical entities, and calculate the association error and semantic error between the prior association and the actual association, respectively; Error confidence is obtained by using association error and semantic error. The error confidence is then used to correct the prior association and determine the association between technical entities.
4. The method for dynamic tracking of technology associations in the scientific field according to claim 3, characterized in that, When correcting prior associations using error confidence, other methods include: Based on the range of values corresponding to the error confidence level, verify the prior associations corresponding to the current technical entity; When the prior association is a reference association, the number of references to the current technical entity and the error confidence level are used as the judgment conditions to determine the prior association that meets the requirements; When the prior association is a subordinate association, the semantic similarity and error confidence between the current technical entities are used as the judgment conditions to determine the prior association that meets the requirements. When the prior relationship is a co-occurrence association, the co-occurrence frequency and error confidence of the current technical entity are used as the judgment conditions to determine the prior relationship that meets the requirements. Data that conforms to prior associations are used as the output associations. When any prior association does not meet its judgment conditions, the semantic similarity corresponding to the current technical entity is used as the output association.
5. The method for dynamic tracking of technology associations in the scientific field according to claim 1, characterized in that, When using positive and negative sample comparison, the implementation methods also include: The current technical entity is combined with any adjacent entity to obtain multiple entity pairs; When the current entity pair belongs to the scope of influence when updating the technical entity, the corresponding entity pair is regarded as a positive sample, and other entity pairs are regarded as negative samples; Based on the proportion of the number of shared adjacent edges of any entity pair to the total number of adjacent edges of that entity pair, the adjacency density of the entity pair is configured, and the association strength of each entity pair is set by weighting the adjacency density and the text attribute similarity.
6. The method for dynamic tracking of technology associations in the scientific field according to claim 1, characterized in that, Other methods for implementing correlation strength time series include: By comparing the fields updated by the technical entity at any given time, incremental updates are performed using the data pointed to by each field, and the incrementally updated data is synchronized to the correlation strength time series.
7. The method for dynamic tracking of technology associations in the scientific field according to claim 1, characterized in that, The implementation methods for obtaining the integrated trend results also include: The data in the upward and downward trend results are bound to the positive and negative sample sets of semantic segmentation to determine the semantic constraints after mapping each trend result; Based on the semantic constraints corresponding to each trend result, the data flow direction of the positive and negative sample sets during data updates is determined. The correlation between the data flow direction and the corresponding technical entity is verified. When the verification is successful, the data of the corresponding positive and negative sample sets are integrated to obtain the integrated part of the trend result.
8. The method for dynamic tracking of technological associations in the scientific field according to claim 1, characterized in that, The implementation methods of dynamic technology-related networks also include: When the update range at different times is obtained, the data of each labeled item is pruned to obtain the updated labeled data item; The intersection of the marked data items is considered as the range intersection for each update, and the marked data items are connected sequentially. When the range intersection includes the strong correlation intersection corresponding to the positive sample intersection, the secondary correlation intersection corresponding to the positive and negative sample intersection, and the negative sample intersection, the labeled data items of the strong correlation intersection and the secondary correlation intersection are concatenated in chronological order; and the connected labeled data items are used to jump to the process order of the technical entities, which is the output dynamic technical correlation network.