A method and system for automatic internalization of external regulations into enterprise internal systems
By employing intelligent parsing of multi-format documents, six-dimensional intelligent extraction, and dynamic Prompt content generation, the system addresses the issues of automating, accurately extracting, and adapting external regulations to industry needs. This enables the efficient generation of internal corporate policies that conform to industry practices, significantly improving work efficiency and reducing compliance risks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING TIANWEI CHENGXIN ELECTRONIC COMMERCE CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
Smart Images

Figure CN122242959A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of enterprise system management technology, and in particular to a method and system for automatically internalizing external regulations into enterprise internal systems. Background Technology
[0002] Currently, businesses face an increasingly complex external regulatory environment, including national laws and regulations, industry regulatory provisions, and international standards. Taking the financial industry as an example, banks, insurance companies, and securities firms need to comply with numerous normative documents issued by regulatory agencies, such as the "Administrative Measures for Electronic Authentication Services," the "Guidelines for Information Technology Risk Management of Commercial Banks," and the "Measures for Compliance Management of Insurance Companies." Businesses need to internalize these external regulations into their internal management systems, operational rules, and work standards to ensure that daily operations comply with regulatory requirements. This process is called "external regulation internalization" or "regulatory internalization."
[0003] The current methods and existing technologies for internalizing external regulations have many shortcomings, mainly including manual writing methods and various related technical solutions: Manual drafting method: Professionals from the company's legal or compliance departments read the original text of the regulations, manually extract the compliance requirements, and then draft internal policy documents based on the company's actual situation. This is the most common practice at present, but it has the problems of low efficiency and high cost. Legal and regulatory structure processing technology: such as Chinese patent CN115545671B "A method and system for structured processing of laws and regulations", mainly solves the problem of structured storage and retrieval of laws and regulations, but does not involve the function of transforming laws and regulations into internal corporate systems; Automatic legal document recognition and generation technology: such as Chinese patent CN110390000A "An automatic legal document recognition and generation system and method", this patent is aimed at legal document processing in judicial acceptance scenarios, which is different from the application scenario of generating internal corporate systems; Legal document recognition technology: such as Chinese patent CN110334710A, focuses only on OCR recognition technology and does not involve content understanding and system generation; Overseas compliance-related technologies: For example, US patent US20140222655A1 "Method, apparatus, computer equipment and storage medium for recognizing legal documents" only provides OCR recognition technology, and US11763320B2 "Automatic regulatory compliance method and system" only stays at the level of regulatory recognition, neither of which automatically generates regulatory documents; Official document formatting and OCR recognition technologies, such as Wanxiang Official Document and Baidu AI Intelligent Document Parsing, only provide formatting or content extraction and recognition functions, without legal semantic analysis and system content generation functions. Financial compliance automation systems, such as Ping An Technology's CN110930155B "Risk Control Methods and Devices" and ICBC's intelligent agents, are mainly aimed at business compliance checks and do not involve the complete process of converting external regulations into internal policy documents.
[0004] Therefore, the core problems with existing technologies are as follows: 1. Lack of end-to-end automation solutions: Existing technologies either only provide compliance check functions or only provide document formatting functions, without an automation solution that covers the complete process of "regulatory interpretation → requirement extraction → system generation → document export". Enterprises still need to invest a lot of manpower in the internalization of regulations, which is inefficient and costly. 2. Insufficient accuracy and systematicness in extracting regulatory requirements: Existing document analysis tools mainly use general text classification methods, which are not optimized for the special semantic structure of regulatory texts. They cannot accurately identify and classify different types of compliance requirements. Manual reading and extraction is time-consuming and laborious, and it is easy to miss key compliance requirements, increasing compliance risks. 3. Lack of industry-adapted template systems: The internal regulations of different industries (banking, insurance, securities, etc.) vary significantly in terms of chapter structure, terminology, and expression style. Existing tools cannot automatically adapt to industry characteristics, and the generated policy documents do not conform to industry practices, requiring a lot of manual modification. 4. Separation of content generation and formatting: Existing official document formatting tools only provide formatting functions. Content must be manually written first, and then formatted. The two steps are completely separated, resulting in a cumbersome workflow, low efficiency, and a high risk of formatting errors. 5. Insufficient ability to parse multiple document formats: External regulations may exist in multiple formats such as PDF, Word, and scanned documents. Existing tools often only support a single format and are particularly weak in processing scanned PDFs, requiring manual preprocessing of documents, increasing workload and reducing efficiency.
[0005] To address the shortcomings of the existing technologies, this invention proposes a method and system for automatically internalizing external regulations into enterprise internal systems, solving the technical problems of automating the entire process of external regulation internalization, accurately extracting regulatory requirements, adapting industry templates, integrating content and layout, and efficiently parsing multi-format documents. Summary of the Invention
[0006] The purpose of this invention is to solve the problems existing in the prior art by proposing a method and system for automatically internalizing external regulations into enterprise internal systems. It automates the entire process of external regulation internalization, accurately extracts regulatory compliance requirements, adapts to the system generation needs of different industries, realizes integrated processing of content generation and formatting, supports intelligent parsing of external regulatory documents in multiple formats, significantly improves the efficiency of external regulation internalization, and reduces compliance risks and enterprise human resource costs.
[0007] To achieve the above objectives, the present invention adopts the following technical solution: A method for automatically internalizing external regulations into corporate internal policies includes the following steps performed in sequence: S1, Intelligent parsing of multi-format documents: It receives external regulatory documents in various formats, and uses a strategy of prioritizing text extraction and supplementing with OCR to extract text content and structural information, outputting standardized text data, and supports intelligent OCR activation mechanism and multi-threaded parallel processing. S2, Intelligent Extraction of Compliance Requirements from Six Dimensions: Based on a six-dimensional classification system, structured extraction is performed on standardized text data. The six-dimensional classification system includes D1 mandatory requirements, D2 terminology definitions, D3 process steps, D4 time limits, D5 penalties for violations, and D6 industry-specific requirements. The extracted results are output as a structured JSON file. S3, Multi-industry Adaptive Template Selection and Adjustment: Design a unified template base class and four industry subclasses. Match the corresponding industry subclass template according to the industry to which the enterprise belongs. Adaptively adjust the template chapter structure based on the dynamic chapter adjustment algorithm. Transform general terms into industry professional terms through a terminology mapping mechanism to form customized templates. S4, Dynamic Prompt-based content generation: Employs a parameterized dynamic Prompt construction method to dynamically generate content based on industry and chapter type parameters. Supports three extraction modes: comprehensive, mandatory_only, and quick. Based on the structured extraction results, content is filled into a customized template to form the initial draft of the policy. S5, Export Enterprise-Level Official Document Format: Automatically format the generated draft of regulations and render it into a Word document conforming to the GB / T 9704-2012 standard before outputting it.
[0008] Preferably, in S2, the mandatory requirements of D1 include obligatory requirements "shall / must" and prohibitive requirements "shall not / prohibited"; The D2 terminology definitions include explanations of terms such as "XX refers to" and "as referred to in these Measures"; The D3 process steps include the application process, the approval process, and the processing process; The D4 time limit includes time limit requirements such as "within X days", "within X months", and "within X years"; The D5 violation penalties include fines / warnings, orders to rectify, revocation of qualifications, and similar penalty clauses; The D6 industry-specific categories include bank accounts, insurance underwriting, and securities disclosure.
[0009] Preferably, in S3, the industry sub-category templates include banking, insurance, securities, and general industry templates.
[0010] Preferably, in S5, the specifications for the enterprise-level official document format include: the page size is A4 (210x297mm), the page margins are 2.54cm at the top, 2.54cm at the bottom, 3.17cm at the left, and 3.17cm at the right; the title is in bold, size 2, centered; the chapter title is in bold, size 3, centered; the article number is in bold, size 4; the body text is in Song typeface, size 4; the body text has 1.5 line spacing and a first-line indent.
[0011] This invention also discloses a system for automatically internalizing external regulations into corporate internal rules. The system implements the above method and includes a user interaction layer, a tool interface layer, and a core processing layer. The core processing layer includes a processor module, an extractor module, a template module, a generator module, and a Prompt module. Each layer and module works together to achieve end-to-end automated conversion from external regulatory documents to corporate internal rules documents. The user interaction layer provides two interaction methods: Web UI and MCP Server. These methods are used to provide user operation interfaces and API call entry points, and support the uploading of regulatory documents, industry selection, extraction mode configuration, and previewing and downloading of policy documents. The tool interface layer encapsulates five standardized core interfaces, enabling decoupled communication between the upper-layer interaction and the lower-layer core processing module; The core processing layer is the core of the system, comprising five major functional modules: processor module, extractor module, template module, generator module, and Prompt module. It is used to execute the core business logic of internalizing external regulations, and realize the fully automated processing of multi-format regulatory documents into internal enterprise policy documents.
[0012] Preferably, the Web UI provides a visual operation interface that supports drag-and-drop document upload, progress viewing, previewing, and downloading, while the MCP Server provides an API service interface that supports programmatic calls from embedded systems.
[0013] Preferably, the tool interface layer encapsulates five standardized core interfaces, namely: The parse_docs interface receives multi-format foreign language documents and outputs structured plain text and document metadata. The extract_req interface receives structured text and outputs a list of structured regulatory requirements categorized in six dimensions. The list_templates interface receives an industry type identifier and outputs a list of templates and chapter structure for the corresponding industry. The generate_reg interface receives a list of regulatory requirements and a template ID, and outputs the initial draft of the internal policy. The export_doc interface receives the initial draft of the policy and outputs a Docx format file conforming to the GB / T 9704 standard.
[0014] Preferably, the processor module includes a PDF processor, a Word processor, and an image processor to achieve intelligent parsing of documents in multiple formats; The extractor module includes a requirement extractor and a six-dimensional classification extractor, enabling intelligent extraction of compliance requirements from six dimensions. The template module stores bank templates, insurance templates, securities templates, and general templates, enabling adaptive template selection and adjustment across multiple industries; The Prompt module includes extracting and generating Prompt, enabling the generation of policy content based on dynamic Prompt. The generator module includes a structure generator, a content generator, and a Docx renderer, enabling the generation of policy content and the export of official document formats.
[0015] Compared with the prior art, the beneficial effects of this invention are as follows: 1. This invention adopts a strategy of prioritizing text extraction and supplementing with OCR, combined with an intelligent OCR activation mechanism and multi-threaded parallel processing, which can efficiently process external regulatory documents in various formats such as PDF, Word, and scanned documents without manual preprocessing, thus reducing manpower input.
[0016] 2. The innovative six-dimensional classification system of this invention is designed with the special semantic structure of regulatory texts in mind. It can accurately identify and classify different types of compliance requirements, reducing the compliance requirement omission rate from 5-10% to less than 1%, effectively reducing corporate compliance risks.
[0017] 3. This invention designs a unified template base class and four industry subclasses, combined with a dynamic chapter adjustment algorithm and terminology mapping mechanism, so that the generated policy documents conform to the conventions and norms of various industries, greatly reducing the amount of manual modification work.
[0018] 4. This invention adopts a parameterized dynamic Prompt construction method, providing three extraction modes: comprehensive, forced, and fast. Enterprises can choose according to their actual needs to meet the external specification internalization requirements in different scenarios and improve the practicality and flexibility of the system.
[0019] 5. This invention integrates the generation of policy content with the formatting of enterprise-level official documents into an automated process, avoiding the cumbersome process of separating content writing and formatting in existing technologies, reducing formatting error rates, and improving work efficiency.
[0020] 6. This invention covers the entire process of "regulatory interpretation → requirement extraction → system generation → document export", breaking the limitations of existing technologies that process only one step. The internalization time of a single regulation is shortened from 3-5 working days to 10-30 minutes, improving efficiency by 50-100 times. Attached Figure Description
[0021] Figure 1 This invention presents a flowchart illustrating a method for automatically internalizing external regulations into corporate internal rules. Figure 2 A schematic diagram of the six-dimensional classification system; Figure 3 A schematic diagram of a multi-industry template system; Figure 4 This is a diagram illustrating the format specifications of the generated Word document. Figure 5 This invention presents a system architecture diagram for automatically internalizing external regulations into enterprise internal rules. Detailed Implementation
[0022] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.
[0023] Reference Figures 1-4 A method for automatically internalizing external regulations into corporate internal policies includes the following steps performed in sequence: S1, Intelligent Parsing of Multi-Format Documents: It can receive external regulatory documents in various formats, such as PDF, Word, and scanned images. It adopts a strategy of prioritizing text extraction and supplementing with OCR to extract text content and structural information, output standardized text data, and supports intelligent OCR activation mechanism and multi-threaded parallel processing. Specifically, Word documents (doc / docx) are parsed natively using tools such as python-docx and Apache POI to extract text content, heading levels, paragraph structure, and table information, preserving the structural features of regulations and outputting standardized structured text data. For PDF documents, tools such as PyPDF2 and PDFMiner are first used to directly extract text. If the PDF is editable, the text and structural information are extracted directly. If the PDF is scanned, the intelligent OCR mechanism is automatically activated if extraction fails. For scanned images / scanned PDFs with activated OCR: 1. Image preprocessing: Noise reduction, skew correction, image enhancement, and edge detection are performed to improve OCR recognition accuracy; 2. Intelligent OCR recognition: A deep learning model using PaddleOCR / YOLO V3+CRNN is used for text recognition, supporting the recognition of Chinese and English, numbers, and special symbols; 3. Multi-threaded parallel processing: Large documents are divided into blocks, and multi-threaded technology is used for parallel recognition to improve parsing efficiency; 4. Output results: Standardized structured text data with redundant spaces, garbled characters, and watermarks removed, including text content, heading levels, and paragraph relationships, providing a foundation for subsequent compliance requirement extraction.
[0024] S2, Intelligent Extraction of Compliance Requirements from Six Dimensions: This feature extracts structured data from standardized text data based on a six-dimensional classification system. The six-dimensional classification system includes D1 Mandatory Requirements, D2 Terminology Definitions, D3 Process Steps, D4 Time Limits, D5 Penalties for Violations, and D6 Industry-Specific Requirements. The extracted data is output as a structured JSON file. Specifically: D1 mandatory requirements include obligatory requirements ("shall / must") and prohibitive requirements ("shall not / prohibit"), with examples such as: "Electronic authentication service providers shall meet the following conditions" and "They shall not forge electronic signatures." D2 terminology definitions include explanations of terms such as "XX refers to" and "as referred to in these Measures". For example: "Electronic authentication services refer to activities that provide authenticity verification for parties involved in electronic signatures." The D3 process steps include the application process, the approval process, and the processing process. An example is extracted: "Electronic Authentication Service License Application Process: Submit materials → Review → Public Announcement → Issuance of Certificate"; D4 time limits include time limits such as "within X days", "within X months", and "within X years". Examples: "Application material review should be completed within 20 days" and "Qualification changes must be filed within 30 days". D5 penalties for violations include fines / warnings, orders to rectify, and revocation of qualifications. An example is: "Those who provide services without obtaining a license will have their relevant qualifications revoked and be fined." D6 industry-specific keywords include those specific to each industry, such as bank accounts, insurance underwriting, and securities disclosure. Example: "Commercial banks' electronic authentication services need to establish an anti-money laundering risk control system."
[0025] S3, Multi-industry Adaptive Template Selection and Adjustment: Design a unified template base class and four industry subclasses. Match the corresponding industry subclass template according to the industry to which the enterprise belongs. Adaptively adjust the template chapter structure based on the dynamic chapter adjustment algorithm (such as adding, deleting and modifying chapters related to risk, supervision and safety). And through the terminology mapping mechanism, convert the general terms in the regulations into industry professional terms to form a customized template adapted to the enterprise. Industry subclass templates include banking, insurance, securities and general industry templates. Specifically, the template base class is designed as a unified general template base class for internal corporate systems, containing seven core chapters: General Provisions, Responsibilities, Processes, Risk Control, Security Management, Supervision and Assessment, and Supplementary Provisions, covering the general structure of systems across all industries. Industry subcategories are derived from the template base class, including templates for banking, insurance, securities, and general systems. Each industry subcategory template adds industry-specific chapters (e.g., banking: anti-money laundering management; insurance: underwriting process management) to the base class, and removes industry-irrelevant content from the general chapters. The dynamic chapter adjustment algorithm adaptively adds, deletes, and modifies industry sub-category templates based on the company's industry and the six-dimensional structured extraction results: if the extraction results contain a large amount of "risk prevention and control" related content, the length of the "risk prevention and control" chapter is automatically increased; if the extraction results do not contain industry-specific content, the corresponding industry-specific chapter is automatically hidden; if the extraction results contain newly added compliance requirements in regulations, the corresponding sub-chapter is automatically added to the template. The terminology mapping mechanism has a built-in industry-specific terminology database that automatically maps general terms in regulations to the professional terms of the company's industry, and maps "general subjects" in regulations to the company's specific name / department. An example is shown below: General content: Business location; Banking industry mapping result: Branch outlets; Insurance industry mapping result: Policyholder / Insured; General content: Applicant; Banking industry mapping result: Certificate applicant; Insurance industry mapping result: Policyholder; Securities industry mapping result: Investor; General enterprise mapping result: Applicant.
[0026] The execution process is as follows: Users select the industry (banking / insurance / securities / general) of their company in the system. Based on the selected industry, the system retrieves the corresponding industry sub-category template from the template library; The system inputs the six-dimensional structured extraction results into the dynamic chapter adjustment algorithm to adaptively adjust the chapter structure of the industry sub-category template; The system uses a terminology mapping mechanism to replace general terms in the extracted results with industry-specific terms and company-specific information. Generate customized templates for your company, with the chapter structure, terminology, and writing style perfectly suited to your industry.
[0027] Output: A customized internal policy template for the enterprise, including a suitable chapter structure, industry-specific terminology, and reserved space for content filling.
[0028] S4, Dynamic Prompt-Based Policy Content Generation: Employing a parameterized dynamic prompt construction method, this method dynamically generates an adapted prompt based on parameters such as the company's industry, template section type, and compliance requirements, and then inputs it into the content generation model. An example is shown below: Parameters: Industry = Banking, Chapter = General Provisions, Extraction Dimensions = D1 + D2, Expression Style = Formal and Standard Dynamic Prompt: "Based on the mandatory requirements (D1) and terminology definitions (D2) of the 'Administrative Measures for Electronic Authentication Services,' write the General Provisions chapter of XX Bank's 'Detailed Rules for Digital Certificate Management' in a formal and standardized banking industry expression style, including the basis for formulation, scope of application, and core definitions, with the word count controlled within 300 words." It supports three extraction modes: comprehensive, mandatory_only, and quick. Based on the six-dimensional structured extraction results, it fills the corresponding policy content into a customized template to form the initial draft of the company's internal policies. Specifically: Comprehensive mode: Extracts all compliance requirements from six dimensions to generate complete and detailed internal regulations, suitable for the formulation of core corporate regulations; Mandatory_only mode: Extracts only the mandatory requirements of D1 and generates policy content centered on compliance obligations, which is suitable for enterprises to quickly implement compliance. Quick mode: Extracts mandatory requirements (D1) and process steps (D3) to generate concise and directly executable operational rules, suitable for the implementation of policies by front-line business departments of enterprises.
[0029] S5, Exporting Enterprise-Level Official Documents: The generated draft of regulations is automatically formatted and rendered into a Word document conforming to the GB / T9704-2012 standard. The enterprise-level official document format specifications include: A4 size (210x297mm), margins of 2.54cm top, 2.54cm bottom, 3.17cm left, and 3.17cm right; titles are in bold, size 2, centered; chapter titles are in bold, size 3, centered; section numbers are in bold, size 4; body text is in SimSun, size 4; body text has 1.5 line spacing and a first-line indent.
[0030] Reference Figure 5The present invention also discloses a system for automatically internalizing external regulations into internal corporate policies. The system implements the above method and includes a user interaction layer, a tool interface layer and a core processing layer. The core processing layer includes a processor module, an extractor module, a template module, a generator module and a Prompt module. Each layer and module works together to achieve end-to-end automated conversion from external regulatory documents to internal corporate policy documents. The user interaction layer provides two interaction methods: WebUI and MCPServer. These provide the user interface and API call entry points, supporting the uploading of regulatory documents, industry selection, extraction mode configuration, and previewing and downloading of policy documents, adapting to different use cases. WebUI offers a visual interface that supports drag-and-drop document uploading, progress viewing, industry template selection, extraction mode configuration, previewing, and downloading, serving as the direct entry point for users. MCPServer provides API service interfaces, supporting programmatic calls from embedded systems to automate batch external regulation internalization tasks, adapting to enterprise-level system integration needs.
[0031] The tool interface layer encapsulates five standardized core interfaces to achieve decoupled communication between the upper-layer interaction and the lower-layer core processing modules. These five standardized core interfaces are: `parse_docs` interface: connects to the multi-format document parsing function, receiving uploaded foreign regulatory documents in various formats such as PDF / Word / images, and outputting structured plain text and document metadata; `extract_req` interface: connects to the six-dimensional compliance requirement extraction function, receiving structured text and outputting a JSON-formatted list of six-dimensional categorized regulatory requirements. The list_templates interface connects to a multi-industry template adaptation function, receives industry type identifiers, and outputs a list of templates for the corresponding industry (banking / insurance / securities / general) and chapter structure, supporting template selection and preview; The generate_reg interface connects to the dynamic Prompt content generation function, receives a list of regulatory requirements and template IDs, and outputs the initial draft of internal regulations. The export_doc interface connects to enterprise-level document formatting and export functions, receiving initial drafts of regulations and outputting Docx format files conforming to the GB / T9704 standard.
[0032] The core processing layer is the heart of the system, comprising five major functional modules: processor module, extractor module, template module, generator module, and Prompt module. It is used to execute the core business logic of internalizing external regulations, and to achieve fully automated processing of multi-format regulatory documents into internal enterprise policy documents.
[0033] The processor module includes a PDF processor, a Word processor, and an image processor, enabling intelligent parsing of documents in multiple formats. Specifically, the PDF processor extracts text and structural information from PDFs with a text layer; the Word processor parses chapters, paragraphs, and formatting from Word documents; and the image processor performs OCR recognition on scanned images and PDFs without a text layer, outputting standardized text.
[0034] The extractor module includes a requirement extractor and a six-dimensional classification extractor, enabling intelligent extraction of compliance requirements across six dimensions. Specifically, the requirement extractor identifies compliance clauses in regulatory texts; the six-dimensional classifier categorizes the extracted compliance requirements according to dimensions D1-D6, outputting structured JSON data.
[0035] The template module stores bank templates, insurance templates, securities templates, and general templates, enabling adaptive template selection and adjustment across multiple industries. Templates are based on a unified base class inheritance and support dynamic addition and deletion of chapters and mapping of industry terminology.
[0036] The Prompt module includes Prompt extraction and Prompt generation, enabling the generation of regulatory content based on dynamic Prompts. Specifically, Prompt extraction guides the large model to extract compliance requirements from regulatory texts. The generated Prompt combines industry templates with structured compliance requirements to guide the large model in generating policy content that conforms to enterprise standards. It supports switching between three extraction modes: comprehensive mode, mandatory_only mode, and quick mode.
[0037] The generator module includes a structure generator, a content generator, and a Docx renderer, enabling the generation of policy content and the export of official document formats. Specifically, the structure generator maps structured compliance requirements to template chapter structures; the content generator fills in template content to generate policy text; and the Docx renderer formats the policy text according to the GB / T9704-2012 standard and exports it as a Word document.
[0038] The present invention will be further described below through specific embodiments: Background: As a financial institution, XX Bank must strictly comply with the compliance requirements for electronic authentication services and digital certificate management in the "Administrative Measures for Electronic Authentication Services" (Order No. 51 of the Ministry of Industry and Information Technology). It must transform external regulations into internal management rules, namely the "XX Bank Digital Certificate Management Rules", to ensure the compliant operation of e-banking, online banking and other businesses.
[0039] The system operation procedure is as follows: User interaction layer operation: Enterprise administrators log in to the system through the WebUI, upload the PDF file of the "Administrative Measures for Electronic Authentication Services", select the "Banking Industry" template in the interface, select "Comprehensive Mode" for extraction mode, and submit a processing request.
[0040] The tool interface layer calls the following interfaces sequentially to complete data interaction: Call the parse_docs (document parsing) interface: transfer the uploaded PDF file to the core processing layer; Call the extract_req (requirement extraction) interface: obtain the parsed text, and initiate a compliance requirement extraction request; Call the list_templates (template list) API: query and return a list of templates specific to the banking industry; Call the generate_reg (system generation) interface: pass in the structured compliance requirements and the bank template, and initiate a system generation request; Call the export_doc (document export) interface: retrieve the generated policy content, and initiate a Word document export request.
[0041] Core processing layer execution: Processor module: The PDF processor extracts the text content and chapter structure of the "Administrative Measures for Electronic Authentication Services" and outputs standardized text data; Extractor Module: The extractor is required to identify compliance clauses. The six-dimensional classification extractor categorizes clauses into D1 mandatory requirements (such as "electronic certification service providers shall ensure the completeness and accuracy of certificate content"), D2 terminology definitions (such as "electronic certification services refer to activities that provide electronic signature verification"), D3 process steps (such as qualification application and approval process), D4 time limits (such as approval within 20 days), D5 penalties for violations (such as license revocation), and D6 industry-specific requirements (such as banks needing to connect to their core systems). The extractor outputs structured data in JSON format. Template module: Matches banking industry templates, adds a "Digital Certificate Lifecycle Management" chapter through a dynamic chapter adjustment algorithm, replaces "Electronic Authentication Service Provider" with "XX Bank's E-Banking Department" in the terminology mapping, and generates customized templates; Prompt module: Build and generate Prompt, "Based on the six-dimensional compliance requirements of the 'Administrative Measures for Electronic Authentication Services' and combined with the digital certificate management scenario of XX Bank, generate comprehensive internal regulations, including general provisions, application process, lifecycle management, risk prevention and control, supervision and management, and supplementary provisions, with terminology conforming to bank standards"; The generator module: The structure generator maps structured data to template chapters, the content generator fills in the content to generate complete policy text, and the Docx renderer typeset according to the GB / T9704-2012 standard, setting A4 paper size, bold 2-point title, Song 4-point body text, 1.5 line spacing, and outputting a standard Word document.
[0042] Implementation Results: The generated "XX Bank Digital Certificate Management Rules" fully covers the core compliance requirements of the "Electronic Authentication Service Management Measures". The chapter structure conforms to the bank's institutional practices and can be implemented directly without manual intervention. The compliance requirement omission rate is less than 1%, and the processing time for a single document is 25 minutes, which is about 50 times more efficient than traditional manual processing (3 working days).
[0043] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A method for automatically internalizing external regulations into corporate internal rules, characterized in that, The following steps are performed sequentially: S1, Intelligent parsing of multi-format documents: It receives external regulatory documents in various formats, and uses a strategy of prioritizing text extraction and supplementing with OCR to extract text content and structural information, outputting standardized text data, and supports intelligent OCR activation mechanism and multi-threaded parallel processing. S2, Intelligent Extraction of Compliance Requirements from Six Dimensions: Based on a six-dimensional classification system, structured extraction is performed on standardized text data. The six-dimensional classification system includes D1 mandatory requirements, D2 terminology definitions, D3 process steps, D4 time limits, D5 penalties for violations, and D6 industry-specific requirements. The extracted results are output as a structured JSON file. S3, Multi-industry Adaptive Template Selection and Adjustment: Design a unified template base class and four industry subclasses. Match the corresponding industry subclass template according to the industry to which the enterprise belongs. Adaptively adjust the template chapter structure based on the dynamic chapter adjustment algorithm. Transform general terms into industry professional terms through a terminology mapping mechanism to form customized templates. S4, Dynamic Prompt-based content generation: Employs a parameterized dynamic Prompt construction method to dynamically generate content based on industry and chapter type parameters. Supports three extraction modes: comprehensive, mandatory_only, and quick. Based on the structured extraction results, content is filled into a customized template to form the initial draft of the policy. S5, Export Enterprise-Level Official Document Format: Automatically format the generated draft of regulations and render it into a Word document conforming to the GB / T 9704-2012 standard before outputting it.
2. The method for automatically internalizing external regulations into enterprise internal systems according to claim 1, characterized in that, In S2, the mandatory requirements of D1 include the obligatory requirement "shall / must" and the prohibitive requirement "shall not / prohibited"; The D2 terminology definitions include explanations of terms such as "XX refers to" and "as referred to in these Measures"; The D3 process steps include the application process, the approval process, and the processing process; The D4 time limit includes time limit requirements in the categories of "within X days", "within X months", and "within X years"; The D5 violation penalties include fines / warnings, orders to rectify, revocation of qualifications, and similar penalty clauses; The D6 industry-specific categories include bank accounts, insurance underwriting, and securities disclosure.
3. The method for automatically internalizing external regulations into enterprise internal systems according to claim 1, characterized in that, In S3, the industry sub-category templates include banking, insurance, securities, and general industry templates.
4. The method for automatically internalizing external regulations into enterprise internal systems according to claim 1, characterized in that, In S5, the specifications for enterprise-level official document format include: page size A4 (210x297mm), margins of 2.54cm top, 2.54cm bottom, 3.17cm left, and 3.17cm right; titles in bold, size 2, centered; chapter titles in bold, size 3, centered; section numbers in bold, size 4; body text in SimSun, size 4; and body text with 1.5 line spacing and first-line indentation.
5. A system for automatically internalizing external regulations into corporate internal rules, executing the method described in any one of claims 1-4, characterized in that, It includes a user interaction layer, a tool interface layer, and a core processing layer. The core processing layer includes a processor module, an extractor module, a template module, a generator module, and a Prompt module. Each layer and module works together to achieve end-to-end automated conversion from external regulatory documents to internal corporate policy documents. The user interaction layer provides two interaction methods: Web UI and MCP Server. These methods are used to provide user operation interfaces and API call entry points, and support the uploading of regulatory documents, industry selection, extraction mode configuration, and previewing and downloading of policy documents. The tool interface layer encapsulates five standardized core interfaces, enabling decoupled communication between the upper-layer interaction and the lower-layer core processing module; The core processing layer is the core of the system, comprising five major functional modules: processor module, extractor module, template module, generator module, and Prompt module. It is used to execute the core business logic of internalizing external regulations, and realize the fully automated processing of multi-format regulatory documents into internal enterprise policy documents.
6. A system for automatically internalizing external regulations into enterprise internal rules according to claim 5, characterized in that, The Web UI provides a visual operation interface that supports drag-and-drop document upload, progress viewing, previewing, and downloading. The MCPServer provides an API service interface that supports programmatic calls from embedded systems.
7. A system for automatically internalizing external regulations into enterprise internal rules according to claim 5, characterized in that, The tool interface layer encapsulates five standardized core interfaces, namely: The parse_docs interface receives multi-format foreign language documents and outputs structured plain text and document metadata. The extract_req interface receives structured text and outputs a list of structured regulatory requirements categorized in six dimensions. The list_templates interface receives an industry type identifier and outputs a list of templates and chapter structure for the corresponding industry. The generate_reg interface receives a list of regulatory requirements and a template ID, and outputs the initial draft of the internal policy. The export_doc interface receives the initial draft of the policy and outputs a Docx format file conforming to the GB / T 9704 standard.
8. A system for automatically internalizing external regulations into enterprise internal rules according to claim 5, characterized in that, The processor module includes a PDF processor, a Word processor, and an image processor, enabling intelligent parsing of documents in multiple formats. The extractor module includes a requirement extractor and a six-dimensional classification extractor, enabling intelligent extraction of compliance requirements from six dimensions. The template module stores bank templates, insurance templates, securities templates, and general templates, enabling adaptive template selection and adjustment across multiple industries; The Prompt module includes extracting and generating Prompt, enabling the generation of policy content based on dynamic Prompt. The generator module includes a structure generator, a content generator, and a Docx renderer, enabling the generation of policy content and the export of official document formats.