An automatic extraction and correlation matching method and system for enterprise management elements
By performing sentence-by-sentence preprocessing and semantic-driven structured decomposition of enterprise management documents, enterprise management elements are identified and extracted. A multi-dimensional similarity matching model is constructed, which solves the problems of low efficiency and poor accuracy in establishing the relationship between management elements in existing technologies. This enables the automatic extraction and intelligent association matching of enterprise management elements, thereby improving management efficiency and decision-making accuracy.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING JIRAN SOFTWARE CO LTD
- Filing Date
- 2026-03-14
- Publication Date
- 2026-06-19
Smart Images

Figure CN122242516A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent management technology for enterprise management elements, and in particular to a method and system for automatically extracting and associating enterprise management elements. Background Technology
[0002] The effective operation of an enterprise management system relies on the synergy and consistency of various management elements, such as systems, SOPs (Standard Operating Procedures), processes, standards, procedures, and records. These management elements reference and constrain each other, forming the core framework of enterprise management. As enterprises expand and business complexity increases, the number of management elements grows exponentially, and the relationships between these elements become increasingly complex. A single system may reference multiple processes and standards, and a single process may be associated with multiple records and involve multiple job roles, forming a multi-dimensional, networked system of relationships.
[0003] Currently, most companies manage these management elements using document management systems, such as OA systems, shared folders, and professional document management tools, for storage and version control. Some companies establish the relationships between management elements manually, such as manually marking the process numbers and standard names referenced in policy documents.
[0004] However, existing management methods suffer from the following core problems, leading to a disconnect between the enterprise management system and actual implementation, and hindering continuous optimization. The existing management element relationships rely primarily on manual sorting and maintenance, requiring managers to read documents one by one, identify reference points, and manually enter related information. This process is time-consuming, labor-intensive, and susceptible to issues such as missing relationships, incorrect labeling, and untimely updates. When a management element changes, managers struggle to quickly and comprehensively identify other affected related elements. For example, if a key control point in a system is adjusted, the corresponding process nodes and operating standards may not be updated simultaneously, leading to personnel continuing to operate according to the old process. This asynchronous change gradually accumulates management blind spots, reducing the effectiveness of the management system. Existing technology cannot automatically analyze and visualize the impact of changes to management elements, making it difficult for managers to assess the chain reaction of changes. This results in a lack of data support for change decisions, potentially leading to decision-making errors or excessively redundant change costs. Furthermore, the extraction of management elements lacks a unified algorithm, relying mostly on subjective manual definition. This leads to inconsistent element definitions and non-standardized expressions, preventing the formation of a standardized management element library and affecting the universality and scalability of element relationships.
[0005] Therefore, it is necessary to provide a method and system for automatically extracting and matching enterprise management elements to solve the above-mentioned technical problems. Summary of the Invention
[0006] To address the aforementioned technical problems, this invention provides an automatic extraction and association matching method and system for enterprise management elements, which solves the problems of low efficiency and poor accuracy in establishing association relationships of management elements, difficulty in synchronizing changes, invisible scope of change impact, fragmented extraction, and lack of standardization in the prior art.
[0007] This invention provides a method for automatically extracting and associating enterprise management elements, the method comprising: The document performs sentence-by-sentence preprocessing and semantic-driven structured decomposition on enterprise management documents, outputting a set of structured semantic paragraphs with feature annotations. The structured semantic paragraph set is subjected to enterprise management element identification, extraction and standardization processing to output an enterprise management element library containing metadata information; A multi-dimensional similarity matching model is constructed to intelligently associate and match enterprise management elements in the enterprise management element database, and outputs a dynamically updated enterprise management element association network. Based on the aforementioned management element association network, the changes in the enterprise management element database are captured and the scope of their impact is analyzed, resulting in a change impact analysis report and a change warning.
[0008] Preferably, the sentence-segment preprocessing of the enterprise management document specifically includes: Obtain the enterprise management documents in different formats and match them with the corresponding document extraction toolkits. Use the document extraction toolkits to extract the plain text content of the enterprise management documents. The plain text content of the document is cleaned by removing headers, footers, page numbers, duplicate blank lines, special symbols, and correcting punctuation errors. The cleaned plain text content of the document is then output. The cleaned document's plain text content is atomically segmented according to punctuation marks to generate a list of sentences. Perform semantic coherence verification on the sentence list, remove meaningless sentences, and output the verified sentence list.
[0009] Preferably, the validated sentence list is subjected to semantic-driven structured decomposition, specifically including: Convert the sentences in the validated sentence list into corresponding sentence semantic vectors and output the sentence semantic vector set. The cosine similarity algorithm is used to calculate the semantic vectors of adjacent sentences in the sentence semantic vector set. semantic similarity Output a list of semantic similarities, wherein the semantic similarity The calculation formula is as follows: ; The K-means clustering algorithm is used to divide the semantic similarity in the semantic similarity list into high-relevance semantic similarity and low-relevance semantic similarity. The boundary value between the high-relevance semantic similarity and the low-relevance semantic similarity is calculated and set as the semantic boundary judgment threshold. Based on the semantic boundary determination threshold and the sentence semantic vector set, the sentence is divided into paragraphs, and then a lightweight LLM model is called to perform integrity verification on the divided paragraphs, outputting the structured semantic paragraph set with feature annotations.
[0010] Preferably, the identification and extraction of enterprise management elements from the structured semantic paragraph set specifically includes: Construct a feature library of enterprise management elements and determine the characteristic keywords and syntactic structures of key control points, job roles, input and output items, and reference standards; The structured semantic paragraph set is matched with the enterprise management element feature library, including extracting key control points using a keyword dictionary, syntactic analysis model and BERT fine-tuning model for the structured semantic paragraph set, extracting job roles using a named entity recognition model, extracting input and output items using feature word matching, and extracting citation standards using citation identification technology and semantic analysis technology, and outputting the first enterprise management element set; The first set of enterprise management elements is linked and matched with the enterprise organizational structure library and the standardized terminology library. The terminology in the first set of enterprise management elements is normalized, and the second set of enterprise management elements is output.
[0011] Preferably, the second set of enterprise management elements is standardized, specifically including: Extract the second enterprise management elements from the second enterprise management element set, determine the texts of the two second enterprise management elements to be compared as text a and text b, and calculate the length of text a as m and the length of text b as n; Construct the edit distance d[i][j], where d[i][j] represents the edit distance between the first i characters of text a and the first j characters of text b. , The recursive formula for the edit distance d[i][j] is as follows: In the formula, a[i−1] represents the i-th character of text a; b[j−1] represents the j-th character of text b; Obtain the recursive edit distance d[m][n], and calculate the text similarity (a,b) between texts a and b. The corresponding calculation formula is as follows: In the formula, max(m,n) represents the maximum value between length m and length n; Set a text similarity threshold, determine the two second enterprise management elements corresponding to texts a and b whose text similarity(a,b) is higher than the text similarity threshold as duplicate enterprise management elements and perform deduplication, and output the enterprise management element library; Metadata information is added to each enterprise management element in the enterprise management element library to construct the enterprise management element library containing metadata information.
[0012] Preferably, based on the multi-dimensional similarity matching model, the name similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: Extract the name keyword sets of enterprise management elements A and B from the enterprise management element database respectively. and p and q represent the sets of name keywords, respectively. and The number of elements in; Calculate the set of name keywords and Number of intersection elements Number of elements in the union ; Based on the number of intersection elements and the number of elements in the union set The improved Jaccard similarity algorithm is used to calculate the name similarity (namesim(A,B)) between enterprise management elements A and B. The corresponding calculation formula is as follows: In the formula, This represents the word order matching coefficient, when the word order is completely identical. When the word order is consistent, This represents the percentage of word order overlap for keywords.
[0013] Preferably, based on the multi-dimensional similarity matching model, the semantic similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: The core descriptive texts of enterprise management elements A and B in the enterprise management element library are extracted and input into the BERT fine-tuning model to be converted into corresponding element semantic vectors. and ; Based on the semantic vector of the elements and The semantic similarity semsim(A,B) between enterprise management elements A and B is calculated using the cosine similarity algorithm. A semantic knowledge base in the management domain is then introduced to correct any deviations in the semantic similarity semsim(A,B). The formula for calculating the semantic similarity semsim(A,B) is as follows: .
[0014] Preferably, based on the multi-dimensional similarity matching model, intelligent association matching is performed on the enterprise management elements in the enterprise management element database to output a dynamically updated association network of the enterprise management elements, specifically including: Based on the multi-dimensional similarity matching model, the attribute similarity attrsim(A,B) and source association srcsim(A,B) of enterprise management elements A and B in the enterprise management element database are calculated. Combined with the name similarity namesim(A,B) and the semantic similarity semsim(A,B), the total association similarity totalsim(A,B) of enterprise management elements A and B is calculated. The corresponding calculation formula is as follows: ; Set the first association similarity threshold Second association similarity threshold The total association similarity totalsim(A,B) is compared with the first association similarity threshold. The second association similarity threshold Perform a comparison, if If so, the relationship between enterprise management elements A and B will be automatically established and the relationship type and degree of matching will be marked; if If so, a list of items to be reviewed is generated and sent to the manual review end, which determines whether there is a relationship between enterprise management elements A and B; if If no relationship exists between enterprise management elements A and B, then the dynamically updated relationship network of the enterprise management elements will be output.
[0015] An automatic extraction and correlation matching system for enterprise management elements, the system comprising: The paragraph decomposition module is used to perform sentence preprocessing and semantic-driven structured decomposition of enterprise management documents, and output a set of structured semantic paragraphs with feature annotations. The element extraction module is used to identify, extract, and standardize the enterprise management elements of the structured semantic paragraph set, and output an enterprise management element library containing metadata information. The association matching module is used to construct a multi-dimensional similarity matching model, perform intelligent association matching on enterprise management elements in the enterprise management element library, and output a dynamically updated enterprise management element association network. The change analysis module is used to capture change content and analyze the scope of change impact on the enterprise management element database based on the management element association network, and output change impact analysis report and change early warning information.
[0016] Compared with related technologies, the automatic extraction and association matching method and system for enterprise management elements provided by this invention have the following beneficial effects: This invention preprocesses and semantically drives the structured decomposition of enterprise management documents through sentence segmentation, outputting a set of structured semantic paragraphs with feature annotations. It then identifies, extracts, and standardizes these semantic paragraph sets to output an enterprise management element library containing metadata. A multi-dimensional similarity matching model is constructed to intelligently match and associate enterprise management elements within the library, outputting a dynamically updated network of associations. Based on this network, the invention captures changes and analyzes the scope of impact of these changes, outputting a change impact analysis report and early warning information. This enables automatic extraction, intelligent association matching, change impact analysis, and visualization of enterprise management elements, achieving intelligent management and control throughout the entire lifecycle of enterprise management elements.
[0017] This invention automatically extracts enterprise management elements and establishes relationships between them, replacing the traditional manual sorting method. This improves the efficiency of enterprise management element extraction and association by over 80%, reducing repetitive work for managers, avoiding human error, and lowering management costs. This invention can automatically analyze the impact range of changes to enterprise management elements and provide early warnings of anomalies, ensuring that after any element changes, related elements are accurately identified, allowing managers to make timely and synchronized modifications, reducing compliance risks and management blind spots. This invention employs a multi-dimensional similarity matching algorithm, combined with a management-specific corpus and semantic knowledge base. Compared to general text matching algorithms, the accuracy of enterprise management element association is improved to over 90%. It is adaptable to the characteristics of management elements in different industries and enterprises of different sizes, exhibiting strong versatility and scalability. This invention uses multi-format visualization to display the impact range of changes to enterprise management elements, helping managers quickly grasp the chain reaction of changes, accurately judge the degree of impact, optimize change decisions, reduce change costs, and improve the consistency and dynamic optimization capabilities of the enterprise management system. This invention lays the foundation for the digital and intelligent upgrading of enterprise management systems through a standardized enterprise management element database and related network, realizes the interconnection of management data, and improves the overall management efficiency of enterprises. Attached Figure Description
[0018] Figure 1 A flowchart illustrating an automatic extraction and association matching method for enterprise management elements provided in an embodiment of the present invention; Figure 2 A system block diagram of an automatic extraction and association matching system for enterprise management elements provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0019] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0020] like Figure 1 The diagram shown is a flowchart of an automatic extraction and association matching method for enterprise management elements provided by an embodiment of the present invention. Figure 1 The execution entity of the method shown can be a software and / or hardware device. The execution entity of this application can include, but is not limited to, at least one of the following: user equipment, network equipment, etc. User equipment can include, but is not limited to, computers, smartphones, personal digital assistants (PDAs), and the aforementioned electronic devices. Network equipment can include, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud based on cloud computing consisting of a large number of computers or network servers. Cloud computing is a type of distributed computing, consisting of a super virtual computer composed of a group of loosely coupled computers. This embodiment does not limit this. Steps S1 to S4 are detailed as follows: S1 performs sentence preprocessing and semantic-driven structured decomposition on enterprise management documents, outputting a set of structured semantic paragraphs with feature annotations; S2, identify, extract and standardize the enterprise management elements of the structured semantic paragraph set, and output an enterprise management element library containing metadata information; S3, Construct a multi-dimensional similarity matching model to intelligently associate and match enterprise management elements in the enterprise management element database, and output a dynamically updated enterprise management element association network; S4. Based on the management element association network, capture the changes in the enterprise management element database and analyze the scope of the impact of the changes, and output a change impact analysis report and change early warning information.
[0021] First, the document processing steps involve parsing and cleaning different formats of enterprise management documents, removing redundant information such as headers, footers, and special symbols, correcting sentence segmentation errors, and then performing atomized sentence segmentation based on punctuation marks to obtain the smallest unit for semantic analysis. Subsequently, based on the principle of semantic coherence, the segmented results are divided into paragraphs. Combining this with a paragraph feature library for the enterprise management domain, each paragraph is categorized, ultimately outputting a structured semantic paragraph set with feature annotations.
[0022] Based on a structured semantic paragraph set and combined with a constructed enterprise management element feature library, a hybrid extraction algorithm integrating a rule engine and a deep learning model is used to accurately identify and extract core management elements such as key control points, job roles, input / output items, and reference standards. The extracted elements are then standardized by using a string similarity algorithm to remove duplicates and by combining a standardized enterprise terminology library to normalize synonymous elements. Simultaneously, each element is assigned unique identifiers, source documents, attribute tags, and other metadata information, ultimately constructing a standardized management element library.
[0023] A multi-dimensional similarity matching model is constructed, establishing four core association dimensions: name, semantics, attributes, and source relevance, and configuring dynamic weight coefficients. Based on this model, intelligent association matching is performed on enterprise management elements within the enterprise management element database, automatically establishing association relationships between enterprise management elements, and dynamically maintaining and updating the enterprise management element association network.
[0024] Furthermore, it can monitor changes to the enterprise management element database in real time, capturing the identifier, type, content, time, and person responsible for the changes, and generating change records. Based on the change records and the enterprise management element association network, a depth-first search (DFS) algorithm is used to identify affected related elements. The traversal depth is set to three levels. First, the primary influencing elements directly related to the changed element are identified. Then, secondary influencing elements are identified starting from the primary influencing elements, and so on, outputting a multi-level list of influencing elements. Based on this multi-level list, and combining three indicators—change type, association matching degree, and business importance—an impact severity assessment model is constructed, classifying the impact severity into three levels: "must be modified," "recommended modification," and "no modification required," outputting a list of influencing elements with impact level labels. Based on this list, an impact analysis report is generated, clarifying the handling recommendations for each influencing element, and outputting a change impact analysis report.
[0025] Obtain the change impact analysis report and the enterprise management element relationship network, extract the change elements, influencing elements at each level, and relationship matching data, and output a visual data source. Based on the visual data source, use a relationship graph visualization method, with the change elements as core nodes, first-level influencing elements as first-level nodes, and second- and third-level influencing elements as lower-level nodes. Node size corresponds to the degree of influence, and line thickness corresponds to the relationship matching degree, generating a relationship graph visualization file and outputting the relationship graph file. Based on the visual data source, use a tree structure visualization method, displaying the hierarchical relationship of "change element - directly influencing element - indirectly influencing element", marking the influence level and relationship type of each element, generating a tree structure visualization file and outputting the tree structure file. Based on the visual data source, use a heatmap visualization method, statistically analyzing the distribution of influencing elements by business area and department dimension, using color intensity to indicate the degree of influence, with red indicating a high degree of influence and blue indicating a low degree of influence, generating a heatmap visualization file. Integrate the relationship graph, tree structure, and heatmap files into a visualization report, send early warning notifications to the responsible persons of affected elements, and output the visualization report and early warning information.
[0026] In the specific implementation process, the sentence-by-sentence preprocessing of enterprise management documents includes: Obtain the enterprise management documents in different formats and match them with the corresponding document extraction toolkits. Use the document extraction toolkits to extract the plain text content of the enterprise management documents. The plain text content of the document is cleaned by removing headers, footers, page numbers, duplicate blank lines, special symbols, and correcting punctuation errors. The cleaned plain text content of the document is then output. The cleaned document's plain text content is atomically segmented according to punctuation marks to generate a list of sentences. Perform semantic coherence verification on the sentence list, remove meaningless sentences, and output the verified sentence list.
[0027] For common enterprise management document formats such as Word, PDF, TXT, and Excel, we use differentiated toolkits for adaptation and parsing. For example, for PDF format, we use either precise noise reduction or lightweight and efficient toolkits for extraction to ensure that plain text content can be accurately extracted from documents of different formats, avoiding text loss or garbled characters caused by format compatibility issues.
[0028] The extracted plain text is cleaned in a targeted manner to remove irrelevant noise and correct text defects. Specifically, this includes removing special symbols such as headers, footers, page numbers, and duplicate blank lines, while correcting punctuation errors. This preserves the core semantic content of the document, eliminates formatting redundancy and text errors that could interfere with subsequent analysis, and outputs clean and well-organized text.
[0029] Using punctuation marks as boundaries, the cleaned plain text is structurally split to generate an independent list of sentences, thus breaking the text down into the smallest unit of semantic analysis and ensuring the semantic independence of each sentence.
[0030] The generated sentence list undergoes semantic quality screening. By determining whether sentences possess complete semantics, meaningless and redundant sentences are removed, such as simple blank sentences or sentences consisting of combinations of symbols without actual meaning. The validated sentence list combines semantic validity and structural regularity, directly supporting the subsequent semantic-driven paragraph decomposition process and ensuring the efficiency and accuracy of the overall processing chain.
[0031] The validated list of sentences undergoes semantic-driven structured decomposition, specifically including: Convert the sentences in the validated sentence list into corresponding sentence semantic vectors and output the sentence semantic vector set. The cosine similarity algorithm is used to calculate the semantic vectors of adjacent sentences in the sentence semantic vector set. semantic similarity Output a list of semantic similarities, wherein the semantic similarity The calculation formula is as follows: ; The K-means clustering algorithm is used to divide the semantic similarity in the semantic similarity list into high-relevance semantic similarity and low-relevance semantic similarity. The boundary value between the high-relevance semantic similarity and the low-relevance semantic similarity is calculated and set as the semantic boundary judgment threshold. Based on the semantic boundary determination threshold and the sentence semantic vector set, the sentence is divided into paragraphs, and then a lightweight LLM model is called to perform integrity verification on the divided paragraphs, outputting the structured semantic paragraph set with feature annotations.
[0032] For the validated list of normalized sentences, semantic feature extraction technology is used to convert each sentence into a 768-dimensional semantic vector. This vector can accurately capture the deep semantic features of the sentence rather than just the literal information, ensuring the accuracy of subsequent similarity calculations, and finally outputting a structured set of sentence semantic vectors.
[0033] This algorithm uses cosine similarity to measure the semantic relevance of adjacent sentence vectors in a vector set. By calculating the degree of association between vectors, it quantifies the semantic fit between adjacent sentences, with values ranging from 0 to 1. Values closer to 1 indicate stronger semantic relevance, while values closer to 0 suggest a possible topic shift. After calculation, a list of semantic similarities is output, clearly showing the distribution of semantic associations between sentences.
[0034] The K-means clustering algorithm is used to perform binary clustering on the semantic similarity list, automatically dividing it into high-relevance and low-relevance classes. A semantic boundary judgment threshold is determined by calculating the critical value between the two clusters. This threshold serves as the core basis for paragraph segmentation, accurately distinguishing semantically coherent sentence clusters from topic-switching nodes, ensuring the rationality of paragraph segmentation.
[0035] Based on semantic boundary thresholds and sentence semantic vector sets, highly semantically related consecutive sentences are aggregated into independent paragraphs. A lightweight LLM model is then used to verify the completeness of the segmented paragraphs; if a paragraph does not express complete semantics, it is returned for reprocessing. After successful verification, each paragraph is tagged with a type, such as outline, overview, or process description, using a feature library of enterprise management document paragraphs. The final output is a set of structured semantic paragraphs with feature annotations.
[0036] The identification and extraction of enterprise management elements from the structured semantic paragraph set specifically includes: Construct a feature library of enterprise management elements and determine the characteristic keywords and syntactic structures of key control points, job roles, input and output items, and reference standards; The structured semantic paragraph set is matched with the enterprise management element feature library, including extracting key control points using a keyword dictionary, syntactic analysis model and BERT fine-tuning model for the structured semantic paragraph set, extracting job roles using a named entity recognition model, extracting input and output items using feature word matching, and extracting citation standards using citation identification technology and semantic analysis technology, and outputting the first enterprise management element set; The first set of enterprise management elements is linked and matched with the enterprise organizational structure library and the standardized terminology library. The terminology in the first set of enterprise management elements is normalized, and the second set of enterprise management elements is output.
[0037] First, establish a feature database of enterprise management elements, clarifying the identification criteria for four categories of core elements. Among them, key control points focus on management keywords such as "approval, review, and monitoring" and the syntactic structure of "action + object"; job roles clarify the descriptive characteristics of job and department names; input and output items lock in feature words such as "based on, submit, and generate" and the description rules of corresponding objects; and citation standards define citation identifiers such as "refer to a certain process" and the semantic characteristics of implicit citations.
[0038] A hybrid extraction algorithm combining rule engines and deep learning models is employed to match the structured semantic paragraph set with the enterprise management element feature library segment by segment. For key control points, a keyword dictionary and syntactic analysis are used to locate core action phrases, followed by the identification of implicit control points using a fine-tuned BERT model, filtering out non-core content. For job roles, a Named Entity Recognition (NER) model is used to accurately extract job and department names from the text. For input and output items, feature word matching is used to identify associated objects and data format requirements. For citation standards, the names and numbers of explicit citations are first extracted using citation identifiers, followed by semantic analysis to uncover implicit citation relationships. Finally, the resulting set integrates and outputs a first enterprise management element set containing four categories of elements.
[0039] The first set of enterprise management elements is linked and validated with the enterprise organizational structure database and the standardized terminology database. The organizational structure database standardizes the descriptions of job titles and departments, avoiding situations where "Personnel Specialist" and "Human Resources Specialist" are used interchangeably. The standardized terminology database unifies the descriptions of key control points and input / output items, eliminating subjective differences in expression. This ensures consistency in element descriptions and outputs a standardized second set of enterprise management elements.
[0040] The second set of enterprise management elements is standardized, specifically including: Extract the second enterprise management elements from the second enterprise management element set, determine the texts of the two second enterprise management elements to be compared as text a and text b, and calculate the length of text a as m and the length of text b as n; Construct the edit distance d[i][j], where d[i][j] represents the edit distance between the first i characters of text a and the first j characters of text b. , The recursive formula for the edit distance d[i][j] is as follows: In the formula, a[i−1] represents the i-th character of text a; b[j−1] represents the j-th character of text b; Obtain the recursive edit distance d[m][n], and calculate the text similarity (a,b) between texts a and b. The corresponding calculation formula is as follows: In the formula, max(m,n) represents the maximum value between length m and length n; Set a text similarity threshold, determine the two second enterprise management elements corresponding to texts a and b whose text similarity(a,b) is higher than the text similarity threshold as duplicate enterprise management elements and perform deduplication, and output the enterprise management element library; Metadata information is added to each enterprise management element in the enterprise management element library to construct the enterprise management element library containing metadata information.
[0041] From the second set of enterprise management elements, extract the element texts to be standardized, and arbitrarily select two sets of element texts as comparison objects, namely text a and text b. Using the text length statistics function, obtain the character length m of text a and the character length n of text b, respectively.
[0042] Based on the character sequences of texts a and b, an edit distance model is constructed to measure the degree of character differences between the two sets of texts. This model quantifies the character-level differences between the two texts by calculating the number of character insertion, deletion, and replacement operations in the first i characters of text a and the first j characters of text b, thus forming the complete edit distance. Finally, through recursive calculation, the overall edit distance between the two sets of texts is obtained, intuitively reflecting the basic similarity between the texts.
[0043] The edit distance is converted into text similarity, with smaller edit distances indicating higher text similarity. A reasonable similarity threshold is set based on the characteristics of enterprise management elements. All elements to be processed are compared pairwise. If the similarity between two sets of elements exceeds the threshold, they are considered duplicate elements. Duplicate elements are then deduplicated, retaining only unique and valid elements to avoid redundancy in the element library, and the enterprise management element library is output.
[0044] To supplement each enterprise management element in the enterprise management element database with complete metadata information, including the element's unique identifier, name, definition, source document, attribute tags, applicable department, business domain, and other core information. Through metadata supplementation, a structured description of the enterprise management elements is achieved, ensuring that each element is traceable and identifiable, ultimately constructing a standardized enterprise management element database containing metadata information.
[0045] Based on the multi-dimensional similarity matching model, the name similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: Extract the name keyword sets of enterprise management elements A and B from the enterprise management element database respectively. and p and q represent the sets of name keywords, respectively. and The number of elements in; Calculate the set of name keywords and Number of intersection elements Number of elements in the union ; Based on the number of intersection elements and the number of elements in the union set The improved Jaccard similarity algorithm is used to calculate the name similarity (namesim(A,B)) between enterprise management elements A and B. The corresponding calculation formula is as follows: In the formula, This represents the word order matching coefficient, when the word order is completely identical. When the word order is consistent, This represents the percentage of word order overlap for keywords.
[0046] Select enterprise management elements A and B to be matched from the enterprise management element database. Based on a management domain keyword dictionary, extract the core keywords from their names to form independent name keyword sets. The keyword set for enterprise management element A contains p core words, and the keyword set for enterprise management element B contains q core words. The extraction process filters out redundant modifiers in the names and focuses on terms related to the core attributes of the enterprise management elements to ensure the relevance and effectiveness of the name keyword sets.
[0047] Perform set operations on the name keyword sets of enterprise management elements A and B, count the number of common name keywords (i.e., the number of intersection elements) to reflect the degree of overlap of the core vocabulary of the names; at the same time, calculate the total number of all unique name keywords in the two sets (i.e., the number of union elements) to reflect the overall coverage of the name keywords.
[0048] An improved Jaccard similarity algorithm is adopted, based on the ratio of the number of elements in the intersection to the number of elements in the union, and a word order matching coefficient is introduced for correction. The word order matching coefficient is set in conjunction with the expression norms of enterprise management element names. When the keyword word order of two enterprise management element names is completely identical, the coefficient is set to 1; when the word order partially overlaps, the coefficient is calculated according to the actual overlap ratio. This makes up for the deficiency of the traditional Jaccard algorithm, which only considers word overlap and ignores the influence of word order. Finally, the name similarity of enterprise management elements A and B is obtained. This result accurately reflects the degree of correlation between the core words and the expression order of enterprise management element names.
[0049] Based on the multi-dimensional similarity matching model, the semantic similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: The core descriptive texts of enterprise management elements A and B in the enterprise management element library are extracted and input into the BERT fine-tuning model to be converted into corresponding element semantic vectors. and ; Based on the semantic vector of the elements and The semantic similarity semsim(A,B) between enterprise management elements A and B is calculated using the cosine similarity algorithm. A semantic knowledge base in the management domain is then introduced to correct any deviations in the semantic similarity semsim(A,B). The formula for calculating the semantic similarity semsim(A,B) is as follows: .
[0050] Select enterprise management elements A and B to be matched from the enterprise management element database, and extract their core descriptive text. The core descriptive text should focus on the essential attributes of the element, including the description of actions and objects of key control points, the scope of responsibilities of job roles, specific requirements of input and output items, and core clauses of referenced standards, etc., and remove redundant and modifying information to ensure that the text accurately reflects the core semantics of the element.
[0051] The extracted core descriptive text is input into a BERT model finely tuned using a corpus from the management domain. This model learns the specialized terminology, expression habits, and semantic logic of enterprise management scenarios, converting the text into a 768-dimensional semantic vector of elements. This vector not only captures literal information but also deeply represents the text's underlying semantic features.
[0052] Based on the semantic vectors of enterprise management elements A and B, a cosine similarity algorithm is used to calculate their semantic relevance. This algorithm quantifies the degree of semantic fit by measuring the angle between the vectors, with a value ranging from 0 to 1. The closer the value is to 1, the stronger the semantic relevance between the two elements, such as "purchase contract approval" and "purchase agreement review"; the closer the value is to 0, the greater the semantic difference, such as "production process monitoring" and "financial statement submission".
[0053] A management-specific semantic knowledge base is introduced to correct biases in the initial calculation results. This knowledge base includes synonym relationships, near-synonyms, and contextualized semantic mappings for management terms, which can correct the judgment biases of the general semantic model in professional scenarios. Through knowledge base calibration, the semantic similarity results are ensured to better reflect the actual management practices of enterprises, improving the accuracy and adaptability of the matching of enterprise management elements.
[0054] Based on the multi-dimensional similarity matching model, intelligent association matching is performed on the enterprise management elements in the enterprise management element database, and a dynamically updated association network of the enterprise management elements is output, specifically including: Based on the multi-dimensional similarity matching model, the attribute similarity attrsim(A,B) and source association srcsim(A,B) of enterprise management elements A and B in the enterprise management element database are calculated. Combined with the name similarity namesim(A,B) and the semantic similarity semsim(A,B), the total association similarity totalsim(A,B) of enterprise management elements A and B is calculated. The corresponding calculation formula is as follows: ; Set the first association similarity threshold Second association similarity threshold The total association similarity totalsim(A,B) is compared with the first association similarity threshold. The second association similarity threshold Perform a comparison, if If so, the relationship between enterprise management elements A and B will be automatically established and the relationship type and degree of matching will be marked; if If so, a list of items to be reviewed is generated and sent to the manual review end, which determines whether there is a relationship between enterprise management elements A and B; if If no relationship exists between enterprise management elements A and B, then the dynamically updated relationship network of the enterprise management elements will be output.
[0055] In addition to name similarity and semantic similarity, two key dimensions are added: attribute similarity and source relevance. Attribute similarity is determined by comparing the attribute labels of elements; the more overlapping attributes, the higher the similarity. Source relevance focuses on the source scenario of elements. If two elements come from the same management document or the same business process system, they are given additional relevance weights to improve the accuracy of cross-dimensional matching and to match the business relevance characteristics of enterprise management elements.
[0056] A dynamic weighting mechanism is adopted, configuring adjustable weight coefficients for four dimensions: name similarity, semantic similarity, attribute similarity, and source relevance. The sum of the weight coefficients is 1, which can be flexibly optimized according to enterprise management needs. The total relevance similarity between enterprise management elements A and B is calculated by weighted summation. This result comprehensively reflects the relevance and fit of elements at four levels: literal, semantic, attribute, and business scenario, avoiding the limitations of single-dimensional judgment.
[0057] The first association similarity threshold is set to 0.6, and the second association similarity threshold is set to 0.7. The total association similarity is compared with the two thresholds. If the total association similarity is greater than or equal to the second association similarity threshold, it is determined to be a strong association, and the association relationship between the enterprise's related elements is automatically established, while the association type and association matching degree are marked. If the total association similarity is between the first and second association similarity thresholds, it is determined to be a suspected association, and a list of pending review is generated and pushed to the manual review end. The administrator confirms whether to establish an association based on the actual business situation. If the total association similarity is less than the first association similarity threshold, it is determined to be unrelated, and no association relationship is established after marking.
[0058] When a new enterprise management element is added to the enterprise management element database, a batch matching process is automatically triggered to complete the association calculation with existing enterprise management elements. When the attributes, descriptions, etc., of an existing enterprise management element change, the total association similarity with related enterprise management elements is recalculated, and the association relationship and association matching degree are updated. Through real-time dynamic maintenance, the timeliness and accuracy of the enterprise management element association network are ensured, and a dynamically updated enterprise management element association network is ultimately output.
[0059] like Figure 2 The diagram shown is a system block diagram of an automatic extraction and association matching system for enterprise management elements provided by an embodiment of the present invention. The system includes: The paragraph decomposition module is used to perform sentence preprocessing and semantic-driven structured decomposition of enterprise management documents, and output a set of structured semantic paragraphs with feature annotations. The element extraction module is used to identify, extract, and standardize the enterprise management elements of the structured semantic paragraph set, and output an enterprise management element library containing metadata information. The association matching module is used to construct a multi-dimensional similarity matching model, perform intelligent association matching on enterprise management elements in the enterprise management element library, and output a dynamically updated enterprise management element association network. The change analysis module is used to capture change content and analyze the scope of change impact on the enterprise management element database based on the management element association network, and output change impact analysis report and change early warning information.
[0060] Figure 2 The apparatus of the illustrated embodiment can be used to perform corresponding actions. Figure 1 The steps in the method embodiments shown are implemented in a similar manner and have similar technical effects, and will not be repeated here.
[0061] An electronic device includes a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor performs the steps of an automatic extraction and association matching method for enterprise management elements as described in any of the preceding claims.
[0062] like Figure 3 The diagram shown is a hardware structure schematic of an electronic device according to an embodiment of the present invention. The electronic device 30 includes: a processor 31, a memory 32, and a computer program; wherein... The memory 32 is used to store the computer program, and the memory may also be flash memory. The computer program is, for example, an application program or functional module that implements the above method.
[0063] Processor 31 is configured to execute the computer program stored in the memory to implement the various steps performed by the device in the above method. For details, please refer to the relevant descriptions in the preceding method embodiments.
[0064] Alternatively, the memory 32 can be either standalone or integrated with the processor 31.
[0065] When the memory 32 is a device independent of the processor 31, the device may further include: Bus 33 is used to connect the memory 32 and the processor 31.
[0066] A readable storage medium storing a computer program, which, when executed by a processor, is used to implement the steps of an automatic extraction and association matching method for enterprise management elements as described in any of the preceding claims.
[0067] The readable storage medium can be a computer storage medium or a communication medium. A communication medium includes any medium that facilitates the transfer of computer programs from one location to another. A computer storage medium can be any available medium accessible to a general-purpose or special-purpose computer. For example, a readable storage medium is coupled to a processor, enabling the processor to read information from and write information to the readable storage medium. Of course, the readable storage medium can also be a component of the processor. The processor and the readable storage medium can reside in an Application-Specific Integrated Circuit (ASIC). Alternatively, the ASIC can be located in a user equipment. Of course, the processor and the readable storage medium can also exist as discrete components in a communication device. The readable storage medium can be a read-only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
[0068] The present invention also provides a program product including executable instructions stored in a readable storage medium. At least one processor of the device can read the executable instructions from the readable storage medium, and the at least one processor executes the executable instructions to cause the device to implement the methods provided in the various embodiments described above.
[0069] In the embodiments of the above-described device, it should be understood that the processor can be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), etc. The general-purpose processor can be a microprocessor or any conventional processor. The steps of the method disclosed in this invention can be directly manifested as execution by a hardware processor, or execution by a combination of hardware and software modules within the processor.
[0070] Through the above embodiments, this invention, through its method and system for automatic extraction and association matching of enterprise management elements, performs sentence preprocessing and semantic-driven structured decomposition of enterprise management documents, outputting a set of structured semantic paragraphs with feature annotations; identifies, extracts, and standardizes enterprise management elements from the set of structured semantic paragraphs, outputting an enterprise management element library containing metadata information; constructs a multi-dimensional similarity matching model to intelligently associate and match enterprise management elements in the enterprise management element library, outputting a dynamically updated enterprise management element association network; and based on the management element association network, captures changes to the enterprise management element library and analyzes the scope of change impact, outputting a change impact analysis report and change warning information. This enables automatic extraction, intelligent association matching, change impact analysis, and visualization of enterprise management elements, achieving intelligent management and control of the entire lifecycle of enterprise management elements.
[0071] This invention automatically extracts enterprise management elements and establishes relationships between them, replacing the traditional manual sorting method. This improves the efficiency of enterprise management element extraction and association by over 80%, reducing repetitive work for managers, avoiding human error, and lowering management costs. This invention can automatically analyze the impact range of changes to enterprise management elements and provide early warnings of anomalies, ensuring that after any element changes, related elements are accurately identified, allowing managers to make timely and synchronized modifications, reducing compliance risks and management blind spots. This invention employs a multi-dimensional similarity matching algorithm, combined with a management-specific corpus and semantic knowledge base. Compared to general text matching algorithms, the accuracy of enterprise management element association is improved to over 90%. It is adaptable to the characteristics of management elements in different industries and enterprises of different sizes, exhibiting strong versatility and scalability. This invention uses multi-format visualization to display the impact range of changes to enterprise management elements, helping managers quickly grasp the chain reaction of changes, accurately judge the degree of impact, optimize change decisions, reduce change costs, and improve the consistency and dynamic optimization capabilities of the enterprise management system. This invention lays the foundation for the digital and intelligent upgrading of enterprise management systems through a standardized enterprise management element database and related network, realizes the interconnection of management data, and improves the overall management efficiency of enterprises.
[0072] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features therein. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for automatically extracting and associating enterprise management elements, characterized in that, The method includes: The document performs sentence-by-sentence preprocessing and semantic-driven structured decomposition on enterprise management documents, outputting a set of structured semantic paragraphs with feature annotations. The structured semantic paragraph set is subjected to enterprise management element identification, extraction and standardization processing to output an enterprise management element library containing metadata information; A multi-dimensional similarity matching model is constructed to intelligently associate and match enterprise management elements in the enterprise management element database, and outputs a dynamically updated enterprise management element association network. Based on the aforementioned management element association network, the changes in the enterprise management element database are captured and the scope of their impact is analyzed, resulting in a change impact analysis report and a change warning.
2. The method for automatic extraction and correlation matching of enterprise management elements according to claim 1, characterized in that, The sentence-segment preprocessing of enterprise management documents specifically includes: Obtain the enterprise management documents in different formats and match them with the corresponding document extraction toolkits. Use the document extraction toolkits to extract the plain text content of the enterprise management documents. The plain text content of the document is cleaned by removing headers, footers, page numbers, duplicate blank lines, special symbols, and correcting punctuation errors. The cleaned plain text content of the document is then output. The cleaned document's plain text content is atomically segmented according to punctuation marks to generate a list of sentences. Perform semantic coherence verification on the sentence list, remove meaningless sentences, and output the verified sentence list.
3. The method for automatic extraction and correlation matching of enterprise management elements according to claim 2, characterized in that, The validated list of sentences undergoes semantic-driven structured decomposition, specifically including: Convert the sentences in the validated sentence list into corresponding sentence semantic vectors and output the sentence semantic vector set. The cosine similarity algorithm is used to calculate the semantic vectors of adjacent sentences in the sentence semantic vector set. semantic similarity Output a list of semantic similarities, wherein the semantic similarity The calculation formula is as follows: ; The K-means clustering algorithm is used to divide the semantic similarity in the semantic similarity list into high-relevance semantic similarity and low-relevance semantic similarity. The boundary value between the high-relevance semantic similarity and the low-relevance semantic similarity is calculated and set as the semantic boundary judgment threshold. Based on the semantic boundary determination threshold and the sentence semantic vector set, the sentence is divided into paragraphs, and then a lightweight LLM model is called to perform integrity verification on the divided paragraphs, outputting the structured semantic paragraph set with feature annotations.
4. The method for automatic extraction and correlation matching of enterprise management elements according to claim 1, characterized in that, The identification and extraction of enterprise management elements from the structured semantic paragraph set specifically includes: Construct a feature library of enterprise management elements and determine the characteristic keywords and syntactic structures of key control points, job roles, input and output items, and reference standards; The structured semantic paragraph set is matched with the enterprise management element feature library, including extracting key control points using a keyword dictionary, syntactic analysis model and BERT fine-tuning model for the structured semantic paragraph set, extracting job roles using a named entity recognition model, extracting input and output items using feature word matching, and extracting citation standards using citation identification technology and semantic analysis technology, and outputting the first enterprise management element set; The first set of enterprise management elements is linked and matched with the enterprise organizational structure library and the standardized terminology library. The terminology in the first set of enterprise management elements is normalized, and the second set of enterprise management elements is output.
5. The method for automatic extraction and correlation matching of enterprise management elements according to claim 4, characterized in that, The second set of enterprise management elements is standardized, specifically including: Extract the second enterprise management elements from the second enterprise management element set, determine the texts of the two second enterprise management elements to be compared as text a and text b, and calculate the length of text a as m and the length of text b as n; Construct the edit distance d[i][j], where d[i][j] represents the edit distance between the first i characters of text a and the first j characters of text b. , The recursive formula for the edit distance d[i][j] is as follows: In the formula, a[i−1] represents the i-th character of text a; b[j−1] represents the j-th character of text b; Obtain the recursive edit distance d[m][n], and calculate the text similarity (a,b) between texts a and b. The corresponding calculation formula is as follows: In the formula, max(m,n) represents the maximum value between length m and length n; Set a text similarity threshold, determine the two second enterprise management elements corresponding to texts a and b whose text similarity(a,b) is higher than the text similarity threshold as duplicate enterprise management elements and perform deduplication, and output the enterprise management element library; Metadata information is added to each enterprise management element in the enterprise management element library to construct the enterprise management element library containing metadata information.
6. The method for automatic extraction and correlation matching of enterprise management elements according to claim 1, characterized in that, Based on the multi-dimensional similarity matching model, the name similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: Extract the name keyword sets of enterprise management elements A and B from the enterprise management element database respectively. and p and q represent the sets of name keywords, respectively. and The number of elements in; Calculate the set of name keywords and Number of intersection elements Number of elements in the union ; Based on the number of intersection elements and the number of elements in the union set The improved Jaccard similarity algorithm is used to calculate the name similarity (namesim(A,B)) between enterprise management elements A and B. The corresponding calculation formula is as follows: In the formula, This represents the word order matching coefficient, indicating when the word order is completely identical. When the word order is consistent, This represents the percentage of word order overlap for keywords.
7. The method for automatic extraction and correlation matching of enterprise management elements according to claim 1, characterized in that, Based on the multi-dimensional similarity matching model, the semantic similarity of enterprise management elements in the enterprise management element database is calculated, specifically including: The core descriptive texts of enterprise management elements A and B in the enterprise management element library are extracted and input into the BERT fine-tuning model to be converted into corresponding element semantic vectors. and ; Based on the semantic vector of the elements and The semantic similarity semsim(A,B) between enterprise management elements A and B is calculated using the cosine similarity algorithm. A semantic knowledge base in the management domain is then introduced to correct any deviations in the semantic similarity semsim(A,B). The formula for calculating the semantic similarity semsim(A,B) is as follows: 。 8. The method for automatic extraction and correlation matching of enterprise management elements according to claim 6 or 7, characterized in that, Based on the multi-dimensional similarity matching model, intelligent association matching is performed on the enterprise management elements in the enterprise management element database, and a dynamically updated association network of the enterprise management elements is output, specifically including: Based on the multi-dimensional similarity matching model, the attribute similarity attrsim(A,B) and source association srcsim(A,B) of enterprise management elements A and B in the enterprise management element database are calculated. Combined with the name similarity namesim(A,B) and the semantic similarity semsim(A,B), the total association similarity totalsim(A,B) of enterprise management elements A and B is calculated. The corresponding calculation formula is as follows: ; Set the first association similarity threshold Second association similarity threshold The total association similarity totalsim(A,B) is compared with the first association similarity threshold. The second association similarity threshold Perform a comparison, if If so, the relationship between enterprise management elements A and B will be automatically established and the relationship type and degree of matching will be marked; if If so, a list of items to be reviewed is generated and sent to the manual review end, which determines whether there is a relationship between enterprise management elements A and B; if If no relationship exists between enterprise management elements A and B, then the dynamically updated relationship network of the enterprise management elements will be output.
9. An automatic extraction and association matching system for enterprise management elements, applied to the automatic extraction and association matching method for enterprise management elements as described in any one of claims 1-8, characterized in that, The system includes: The paragraph decomposition module is used to perform sentence preprocessing and semantic-driven structured decomposition of enterprise management documents, and output a set of structured semantic paragraphs with feature annotations. The element extraction module is used to identify, extract, and standardize the enterprise management elements of the structured semantic paragraph set, and output an enterprise management element library containing metadata information. The association matching module is used to construct a multi-dimensional similarity matching model, perform intelligent association matching on enterprise management elements in the enterprise management element library, and output a dynamically updated enterprise management element association network. The change analysis module is used to capture change content and analyze the scope of change impact on the enterprise management element database based on the management element association network, and output change impact analysis report and change early warning information.