Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

40 results about "Document structuring" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping (for example into paragraphs) of sentences in a generated text. It is closely related to the Content determination NLG task.

A content intelligent referencing method for document editing

PendingCN122363575ADocument structuringEngineering

This invention provides a method for intelligent content referencing in document editing, relating to the technical field of intelligent document referencing technology. The method includes: displaying a source document selection interface in response to a user's insertion operation in the main document; displaying at least one page of the source document in response to the user's selection of the source document; determining the selected target page in response to the user's selection of the page; parsing the document structure of the target page, identifying and visually displaying at least one content block within the page; recommending a corresponding insertion method based on the type of the content block; and inserting the content block into the main document according to the selected insertion method in response to the user's selection of the content block and confirmation of the insertion method. This invention solves the problem of low content referencing efficiency, thereby improving the efficiency and accuracy of content referencing.

A content intelligent referencing method for document editing

Owner:HUNAN ZHUOZHI INFORMATION TECHNOLOGY CO LTD

Graph anchor-based multi-agent meta-analysis literature extraction method and system

PendingCN122366452ALinguistic modelDocument analysis

This invention discloses a method and system for extracting meta-analysis literature based on graph-anchored multi-agent systems. The method includes reconstructing unstructured PDFs into a hierarchical semantic document structure through visual document analysis and identifying functional area labels; applying functional area anchoring constraints to limit information search to specified functional area nodes; scheduling a multi-agent collaborative extraction network consisting of a reconnaissance agent, a logic assembly agent, and an indicator retrieval agent; and completing experimental variable identification, logical mapping between treatment and control groups, and cross-paragraph indicator retrieval through structured message dialogue and multi-round collaboration; employing a neural symbolic hybrid engine, where semantics are parsed by a large language model and precise mathematical aggregation operations and dimensional conversions are performed by a deterministic symbolic computation engine; and generating a structured data matrix with inverted index pointers. This application achieves high-precision, zero-error, and traceable automated extraction of meta-analysis literature data.

Graph anchor-based multi-agent meta-analysis literature extraction method and system

Owner:INNER MONGOLIA UNIVERSITY +1

A large model-based automatic audit report generation full-process method

PendingCN122347127ADocument structuringLinguistic model

The application discloses an automatic audit report generation full-process method based on a large model. The application avoids the illusion and content omission problems that are easily caused by a single large language model when processing massive heterogeneous data and complex logic at one time by decomposing a complex report preparation task into multiple independent and professional subtasks, combining the dynamic coordination of the task scheduling matrix, introducing a prompt engineering and post-check mechanism, and strictly correcting the logic, smoothing the text, and standardizing the professional terms of the preliminary audit text. The application filters the possible rough expressions and logical jumps in the content generated by the large language model, ensures that the target audit report file produced has factual accuracy and logical consistency, and guarantees the uniqueness, tamper resistance, and full-process version traceability of the target audit report file issued through the automatic rendering of the document structure tree and the generation of digital fingerprints, thereby meeting the requirements of audit archive compliance management.

A large model-based automatic audit report generation full-process method

Owner:SHENYUAN TECHNOLOGY (NANJING) CO LTD

An electronic document format intelligent proofreading system and method based on multi-dimension checking rules

PendingCN122133629ASemantic analysisCharacter and pattern recognitionElectronic documentDocument structuring

This invention relates to the field of data processing technology, specifically to an intelligent proofreading system and method for electronic document formats based on multi-dimensional verification rules. It includes: a document structure parsing unit; a dynamic verification rule generation unit; an intelligent verification execution unit; and a verification result output unit. This invention eliminates differences in electronic document parsing. After extracting semantic blocks such as titles and body text by a paragraph-level semantic block recognition module, the intelligent verification execution unit calls a multi-dimensional rule set to verify core format elements. The verification result output unit accurately locates the page numbers, paragraphs, and character positions of deviations, generating correction guidelines and multi-format reports containing problem descriptions and compliance examples. This invention solves the problem of the lack of a formal, dedicated proofreading system for core document formats in existing technologies. It can comprehensively identify deviations and clarify the direction of correction, ensuring compliant document circulation while reducing labor costs and improving proofreading efficiency and convenience.

An electronic document format intelligent proofreading system and method based on multi-dimension checking rules

Owner:SHANGHAI HUIZHOU INFORMATION TECH CO LTD

Document creation and management system

PendingUS20260187263A1Document structuringGraphical user interface

Described herein is a method, for managing creation and publication of a document in a documentation management system (DMS), that includes causing display of a first graphical user interface (GUI) on a first client device. The method includes authenticating a first user of the first client device, and causing display of an author view of a hierarchical document structure in a navigational pane of the first GUI. The method includes saving content received in an editor pane of the first GUI as a page in the DMS and associating the first user as an author of the page. The method includes generating a document entry displayed with a draft status indicator corresponding to the draft status of the page. The method includes causing display of a reader-view of the hierarchical document structure on a second GUI with the content of the page displayed in accordance with the draft status.

Document creation and management system

Owner:ATLASSIAN PTY LTD +1

Data processing method, page early warning method, device, medium and program product

PendingCN122286030ALinguistic modelSoftware engineering

This application provides a data processing method, a page alert method, a device, a medium, and a program product, relating to the computer field. The data processing method includes: acquiring document structure information of a predetermined page; extracting page data from the document structure information using a predetermined language model; and performing data diagnostic processing on the page data according to a preset first diagnostic rule using the language model to obtain a diagnostic result. The first diagnostic rule is used to define the judgment conditions for abnormal data on the predetermined page using natural language. In the technical solution of this application, using a language model for automated data extraction and diagnosis of page data helps improve the accuracy and efficiency of data processing, reduces the complexity of data processing, and saves labor costs and computing resources.

Data processing method, page early warning method, device, medium and program product

Data processing method, page early warning method, device, medium and program product

Data processing method, page early warning method, device, medium and program product

Owner:RAJAX NETWORK &TECHNOLOGY (SHANGHAI) CO LTD

Translation method and system based on multi-agent cooperation and multi-stage optimization

PendingCN122433758ADocument structuringStructure analysis

The application discloses a translation method and system based on multi-agent cooperation and multi-stage optimization, and belongs to the technical field of computers. The method obtains complete text and pre-extracts terms through OCR recognition and document structure analysis; generates a translation strategy through input preprocessing and translation task planning; constructs a semantic structure using semantic analysis, and enhances professional terms and domain knowledge in combination with knowledge graph retrieval; dynamically generates multiple translation agents to perform translation in parallel, and generates candidate translations; unifies term expression through term consistency processing, and selects the optimal translation by using multi-dimensional quality evaluation; and finally generates structured translations and updates the knowledge graph. The application significantly improves translation accuracy, term consistency and translation fluency through multi-agent cooperation, knowledge graph enhancement and multi-stage optimization mechanism, and is especially suitable for professional document and publishing level translation scenarios.

Translation method and system based on multi-agent cooperation and multi-stage optimization

Owner:CITIC UNITED CLOUD TECH CO LTD

A method and system for interactive editing and dynamic reconstruction of a bid document based on a large language model

PendingCN122242470AText processingBiological modelsLinguistic modelInteractive editing

This invention discloses a method and system for interactive editing and dynamic reconstruction of tender documents based on a large language model. Step 1: Initialize the tender document structure: The user uploads the tender document, and the system extracts the tender requirements through OCR and text parsing. Step 2: User interactively edits the table of contents: The user performs operations such as adding, deleting, sorting, and renaming nodes in the front-end directory tree area. Step 3: Generate chapter content and bind prompt words. Step 4: Update content and synchronize status: After the large language model returns the generated results, the system writes the new content into the corresponding chapter container. Step 5: Export the tender document: The system reads the structured tender document data. Through systematic innovation in structure awareness, controllable granularity, context fusion, and human-computer collaboration, an efficient, reliable, and flexible interactive reconstruction system for tender documents is constructed.

A method and system for interactive editing and dynamic reconstruction of a bid document based on a large language model

Owner:JIANGXI FASHION TECH

Document segmentation methods, apparatus, computer equipment and storage media

ActiveCN121902816BAccurately identify structural boundariesImprove Segmentation AccuracyDocument structuringEngineering

This application discloses a document segmentation method, apparatus, computer device, and storage medium. In response to a segmentation command, a document to be segmented is acquired; the document is parsed to obtain multiple text units; visual features related to the document layout are determined based on the text units; a segmentation score is calculated based on the visual features; and segmentation is performed based on the segmentation score. In this application, the visual layout information of the document is referenced from a visual layout perspective, avoiding the text extraction quality defects of treating the document as a plain text stream without relying on OCR. Instead, segmentation is performed by combining the visual geometric layout characteristics of the document when the user browses the text, conforming to the browsing patterns of users reading documents, accurately identifying document structural boundaries, and improving the accuracy of text segmentation.

Document segmentation methods, apparatus, computer equipment and storage media

Owner:HANGZHOU YOUZAN TECH CO LTD

Document correction evaluation methods, apparatus, computer equipment and storage media

PendingCN122310145ADocument structuringEvaluation result

This invention relates to the field of document corrector technology, and discloses a document corrector evaluation method, apparatus, computer device, and storage medium. First, an error-level injection operation is performed on the original annotated document to obtain an erroneously annotated document. Then, the document corrector uses the erroneously annotated document for correction processing to obtain an output document. Next, text line-by-line processing is performed on the original annotated document and the output document to obtain line-level matching index data, quantifying the effect of line-level content restoration. Document structure parsing is performed on the original annotated document and the output document to obtain tree structure similarity, reflecting the degree of restoration of the document's hierarchical structure. Finally, based on the line-level matching index data and the tree structure similarity, the correction quality evaluation result of the document corrector is determined, thereby achieving a comprehensive performance evaluation of the document corrector in both line-level content and hierarchical structure dimensions, improving the comprehensiveness and accuracy of document corrector performance evaluation.

Document correction evaluation methods, apparatus, computer equipment and storage media

Owner:BEIJING JIZHI DIGITAL TECH CO LTD

Providing user-guided document structuring using block-based templates

PendingUS20260147989A1Natural language data processingDocument structuringEngineering

The disclosed technology pertains to a block-based system for creating and organizing documents. The technology includes a method for receiving a user’s indication to create a document and providing guidance through multiple templates selected based on the user’s profile and recent activities. The system allows users to preview templates by creating temporary blocks and instantiating them into permanent blocks upon selection. The block model supports dynamic units of information that can be transformed, moved, and nested within workspaces, enabling flexible customization and organization. The system aims to simplify document creation by offering relevant templates and visualizations, enhancing user experience without extensive training.

Providing user-guided document structuring using block-based templates

Owner:NOTION LABS INC

Intelligent agent collaborative document structuring method and system using intermediate representation

PendingCN122242477AText processingElement analysisTheoretical computer science

This invention discloses a method and system for generating structured documents through intelligent agents using intermediate representation, belonging to the interdisciplinary application technology of natural language processing, multimodal content generation, and human-computer interaction. The implementation method of this invention is as follows: 1. Perform structural and element analysis on reducible information sources to form a structured intermediate representation; 2. Utilize data contracts to perform rollback, repair, and update processing on the triggering conditions of the multi-agent set, and initialize the scheduler; 3. Allocate text and multimodal elements to generate the outline structure and page-level organization scheme of the presentation; 4. Generate the final draft of the specification; 5. Select the corresponding template and fill the template slots with fields to generate structured pages; 6. The visual optimization agent optimizes the text style and visual expression of the structured pages; 7. Form a presentation and extract the speech text. Compared with existing technologies, this invention improves the consistency of text and graphics and the visual interactivity of automatically generated presentations.

Intelligent agent collaborative document structuring method and system using intermediate representation

Intelligent agent collaborative document structuring method and system using intermediate representation

Intelligent agent collaborative document structuring method and system using intermediate representation

Owner:BEIJING INST OF TECH

Document verification method and device, electronic equipment and storage medium

ActiveCN121257551BSemantic analysisBiological modelsDocument structuringTheoretical computer science

The application relates to the technical field of text processing, and discloses a document verification method and device, electronic equipment and a storage medium, the method comprising the following steps: text extraction is performed on a plurality of to-be-verified documents to obtain a structured text set; the document structure of the structured text set is analyzed, and a corresponding regular expression set is generated based on the document structure; the to-be-verified documents and standard verification documents are matched based on the regular expression set, and a semantic similarity score and a fuzzy matching score are obtained respectively; a comprehensive score is calculated based on the semantic similarity score and the fuzzy matching score, and the similarity relationship between the to-be-verified documents and the standard verification documents is verified based on the comprehensive score. The application significantly improves the work efficiency and verification accuracy in a large-scale document verification scenario, and solves the problem that the document content cannot be efficiently and accurately verified to conform to a specific standard.

Document verification method and device, electronic equipment and storage medium

Owner:ZHEJIANG CHINT INSTR & METER

A multi-modal based document analysis method and apparatus

PendingCN122113902ASemantic analysisKnowledge based modelsDocument structuringDocument analysis

The application provides a multi-modal based document analysis method and device, relates to the field of unstructured data processing, and comprises the following steps: performing multi-modal analysis on a received unstructured document to obtain document content and corresponding content labels; performing tree structure construction on the document content and the corresponding content labels according to a preset financial system document structure to obtain a directory structure tree; converting the directory structure tree into a knowledge graph, and performing document content retrieval based on the knowledge graph. The application can conveniently and efficiently analyze system documents and provide a convenient and accurate system retrieval approach.

A multi-modal based document analysis method and apparatus

Owner:IND BANK CO

A method for structured processing of bidding documents based on analytical credibility feedback and rule self-evolution mechanism

PendingCN122414158AInformatizationDocument structuring

本发明公开了一种基于解析可信度反馈与规则自演化机制的招投标文档结构化处理方法及系统。该方法通过对招投标文档进行标准化预处理，并结合预设语义规则库进行初步解析，提取关键招标要素信息。随后，将初步结构化结果输入大语言模型进行语义补全与上下文验证，并基于解析结果计算可信度指标。系统进一步通过人工反馈对解析结果进行修正，并动态调整规则库权重，形成自演化规则库，实现解析规则的持续优化。该方法能够提高关键字段提取的准确率和完整率，增强系统对多行业、多类型招投标文档的适应能力，同时降低人工维护成本。所述系统包括数据采集、规则解析、语义校验、可信度评估、人工反馈、规则进化及结果输出模块，各模块协同工作以实现结构化处理的自动化和智能化。本发明具有解析精度高、适应性强、可持续优化等优点，适用于工程建设、政府采购、信息化建设及服务外包等多种业务场景。

A method for structured processing of bidding documents based on analytical credibility feedback and rule self-evolution mechanism

A method for structured processing of bidding documents based on analytical credibility feedback and rule self-evolution mechanism

A method for structured processing of bidding documents based on analytical credibility feedback and rule self-evolution mechanism

OFD cross-format conversion method, computer device and storage medium

PendingCN122173463ADigital data information retrievalNatural language data processingParallel computingDirectory

The application discloses an OFD cross-format conversion method, computer equipment and a storage medium, relates to the technical field of electronic document format conversion, and comprises the following steps: scanning an OFD document file, detecting a compression header identifier, recursively decompressing a nested compression structure and performing directory verification to obtain an OFD document structure; analyzing the structure to obtain root document information, signature information and resource information containing fonts and pictures; constructing a font mapping relationship for replacing missing fonts during rendering based on the resource information; obtaining the page content of each document page according to the root document information and the signature information; concurrently rendering all the page content in a multithreading mode to generate rendering data corresponding to each page; and merging the rendering data of each page to generate a target format output file. The application ensures complete extraction of nested compressed resources through recursive decompression, guarantees consistency of formats by using font mapping, improves conversion efficiency by combining multithreading rendering, and realizes high-fidelity and high-efficiency conversion of OFD documents into formats such as PDF.

OFD cross-format conversion method, computer device and storage medium

Owner:SHENZHEN LEAGSOFT TECH

A document parsing method and system based on size model cooperation and a readable storage medium

PendingCN122313504ADocument structuringEngineering

This invention discloses a document parsing method, system, and readable storage medium based on a large-scale model collaboration, relating to the fields of intelligent document processing and artificial intelligence. It includes: acquiring a page image; inputting the page image into a first model to detect and locate the regions and categories of page elements, outputting location information and category labels; sorting the page elements according to the location information and category labels to generate a page element sequence; inputting image slices of the sorted page elements into a second model for content recognition based on categories, outputting structured recognition results; and generating structured data containing document structure and content based on the structured recognition results. This invention separates and collaboratively processes the document layout perception task and the content semantic understanding task, leveraging the respective advantages of large and small models to achieve automated parsing, reading order recovery, content recognition, and directory structure reconstruction of complex PDF documents, thereby outputting high-quality structured document data.

A document parsing method and system based on size model cooperation and a readable storage medium

Owner:BEIJING YIDAO BOSHI TECH

Medical document retrieval enhancement generation method and system based on father-child node structure

PendingCN122087089AGuaranteed strict correspondenceAvoid cross-semantic topic aliasing issuesText database indexingSpecial data processing applicationsSemantic treeDocument structuring

The invention provides a medical document retrieval enhancement generation method and system based on a father-child node structure, and relates to the technical field of medical document retrieval, and the method comprises the following steps: analyzing an input medical document to obtain a medical text unit sequence; constructing three layers of medical semantic tree nodes; performing quality inspection on each paragraph node to obtain a retrieval abstract; storing the retrieval abstract into a vector database, and storing atomic proposition tetrad contents and paragraph keywords into a full-text retrieval library; routing user query to a vector database to execute semantic retrieval, and simultaneously executing keyword retrieval in parallel in a full-text retrieval library to obtain a primary retrieval result set; and calculating relevance scores of the primary retrieval result set and the user query by adopting a reordering model, and performing descending order arrangement according to the scores to obtain a final retrieval result list. The problems that an existing RAG system is insufficient in document structure perception in a medical document processing scene, the integrity of medical facts is lack of reliable guarantee, and the multi-granularity retrieval cooperation capability is insufficient can be solved.

Medical document retrieval enhancement generation method and system based on father-child node structure

Medical document retrieval enhancement generation method and system based on father-child node structure

Medical document retrieval enhancement generation method and system based on father-child node structure

Owner:XIEHE HOSPITAL ATTACHED TO TONGJI MEDICAL COLLEGE HUAZHONG SCI & TECH UNIV

A method and related apparatus for experimental verification and evaluation based on AI intelligent agents

ActiveCN120873522BArtificial lifeKnowledge representationLinguistic modelEngineering

This application discloses an AI-based experimental verification and evaluation method and related apparatus, relating to the field of computer technology. The method includes: structurally segmenting experimental evaluation documents based on a parent-child segmentation pattern to obtain a two-layer segmented document structure; vectorizing the content of its sub-blocks to generate a vectorized knowledge base; invoking a data extraction agent based on data extraction rules, using an experimental data management software database as the data source, and extracting experimental plan, implementation, and result information from the document using a large language model; vectorizing user evaluation needs based on an experimental verification and evaluation task planning agent, calculating semantic similarity using the vectorized knowledge base, and deriving relevant evaluation algorithm specifications and experimental cost accounting standards; and evaluating the sufficiency, progress, and cost of the experimental verification process based on the specifications and standards using an experimental verification sufficiency, progress, and cost evaluation agent to obtain the experimental verification evaluation results. This application can quickly obtain evaluation results, supporting leadership decision-making.

A method and related apparatus for experimental verification and evaluation based on AI intelligent agents

Owner:BEIJING RAINFE TECH

A multi-modal fusion complex layout document structuring generation method and system

PendingCN122414138ADocument structuringData stream

本发明属于文档处理技术领域，具体涉及一种多模态融合的复杂版面文档结构化生成方法及系统，包括：解析文档获取文档对象单元集合，提取二维坐标位置信息向量与视觉区域特征向量，文本块额外提取内容语义信息向量及版式属性向量。集合输入预训练模型，编码器采用特征空间关联性加权注意力机制生成空间关系分布参数并注入空间偏置项。解码器输入虚拟根节点生成文档对象单元父节点候选概率分布，利用图搜索算法得到全局最优树形逻辑结构，将其转换为结构化数据流。本发明通过融合多模态特征与空间关系偏置，结合全局最优树形图求解机制，实现了复杂文档逻辑结构的精确完整生成。

A multi-modal fusion complex layout document structuring generation method and system

Method, apparatus, device and storage medium for template creation

ActiveCN119473084BDigital data information retrievalSpecial data processing applicationsDocument structuringTheoretical computer science

Embodiments of the present disclosure provide a template creation method, apparatus, device and storage medium to create a target template for document generation. In the method, a template creation instruction for a target template is received. Structure information of the target template is presented, the structure information indicating a document structure generated based on the template creation instruction. The target template is created based on at least corresponding configuration information of one or more document parts included in the target template. According to the scheme of the present disclosure, the structure of the template is first generated according to the creation requirement. The creator of the template is allowed to configure each part to be included in the document, i.e. the creator is allowed to finely adjust the template. In this way, it can be ensured that the generated template is consistent with the expectation of the creator, thereby improving the efficiency of the document generation by the user of the template.

Method, apparatus, device and storage medium for template creation

Method, apparatus, device and storage medium for template creation

Method, apparatus, device and storage medium for template creation

Owner:BEIJING ZITIAO NETWORK TECH CO LTD

Exploiting domain-specific language characteristics for language model pretraining

ActiveUS12682247B2Document structuringData pack

A method, apparatus, non-transitory computer readable medium, and system of training a domain-specific language model are described. One or more aspects of the method, apparatus, non-transitory computer readable medium, and system include obtaining domain-specific training data including a plurality of domain-specific documents having a document structure corresponding to a domain, and obtaining domain-agnostic training data including a plurality of documents outside of the domain. The domain-specific training data and the domain-agnostic training data are used to train a language model to perform a domain-specific task based on the domain-specific training data and to perform a domain agnostic task based on the domain-agnostic training data.

Exploiting domain-specific language characteristics for language model pretraining

Owner:ADOBE INC

An intelligent detection method, device and equipment for document type advanced persistent threats and a medium

PendingCN122333450AAchieve depth perceptionimprove accuracyEngineeringComputers technology

This application discloses an intelligent detection method, apparatus, device, and medium for document-based advanced persistent threats (APTs), relating to the field of computer technology. It includes: static preprocessing of the target document to extract its document structure features and metadata, and establishing a document object relationship graph based on these features and metadata; constructing an isolation sandbox using virtualization technology, and performing dynamic behavior analysis on the target document based on the sandbox and the document object relationship graph to generate corresponding dynamic analysis results; using a neural network architecture configured with bidirectional gated recurrent units to perform deep modeling of the dynamic analysis results and analyze the semantic features of the resulting behavioral sequences to output threat assessment results; the behavioral sequences include user operation logs, device interaction records, and text behavior trajectories. This improves the accuracy and efficiency of document-based APT detection and reduces the false positive rate.

An intelligent detection method, device and equipment for document type advanced persistent threats and a medium

Owner:HANGZHOU DBAPPSECURITY CO LTD +1

Intelligent extraction method and system for CAD drawing multi-format BOM

ActiveCN122135393ACharacter and pattern recognitionManufacturing computing systemsDocument structuringSoftware engineering

The application discloses a kind of CAD drawing multi-format BOM intelligent extraction method and system, method includes: cross-space full quantity lossless analysis;The model space and all layout space of drawing are cross-space full quantity lossless analysis, directly read drawing bottom entity information;Entity information includes text, straight line, block, attribute and table entity;Dynamic loading external identification rule;Five-level progressive BOM priority identification strategy is executed, and according to standard to non-standard is identified gradually, and the next level is identified successfully, and BOM data is obtained;BOM data is cleaned;After cleaning, BOM data follows Office Open XML standard and independently constructs document structure, in the environment without CAD, without Office software dependence, generates standard Excel format bill of materials.The application has the advantages of identification full coverage, offline light weight, rule configurable, cross-space no omission, deployment convenient, data security and the like.

Intelligent extraction method and system for CAD drawing multi-format BOM

Owner:NANJING LETSTECH CO LTD

Information processing system, information processing device, information processing method, and program

PendingJP2026109068AInformation processingDocument structuring

Making the user's requested modifications to the document. [Solution] The information processing system includes a document generation unit that generates a document having a predetermined document structure using predetermined document data, and a display control unit that displays the document. The display control unit displays a group of information extracted from the set of document data that is related to the modification instruction in response to a modification instruction for the document, and the document generation unit performs modifications to the document based on the modification instruction and the information selected from the group of information.

Information processing system, information processing device, information processing method, and program

Owner:RICOH CO LTD

Method and storage medium for enterprise-level unstructured knowledge governance

ActiveCN121597848BText database indexingSpecial data processing applicationsData ingestionDocument structuring

The application provides a method and a storage medium for enterprise-level unstructured knowledge governance. The method comprises the following steps: inputting an initial document into a visual language model for layout analysis to obtain document structure information of a target document; performing segmentation processing on the document structure information to generate a text slice to be vectorized; extracting metadata based on the text slice to generate a structured knowledge content vector; and configuring an index according to the knowledge content vector to form an enterprise-level knowledge governance standard. Through the visual language model for analyzing the document structure, the intelligent segmentation algorithm for generating the text slice, the metadata extraction for constructing the knowledge vector, and the hybrid index configuration, the problems of inaccurate layout analysis, logical structure damage, and low retrieval efficiency in the traditional technology are effectively solved, and the method has the advantages of improving the knowledge governance precision and usability.

Method and storage medium for enterprise-level unstructured knowledge governance

Owner:DIGITAL CHINA CHINA CO LTD +1

A method and system for processing long documents

PendingCN122364436ADocument structuringTheoretical computer science

This invention relates to the field of natural language processing technology and discloses a method and system for processing long documents. The method includes: constructing a unified document object, extracting global semantics, and generating a navigational summary as a global constraint; matching the summary with an expert database, reusing or deriving experts when no match is found; having experts slice the document according to its structure to generate evidence cards containing conclusions, excerpts from the original text, and source information, and establishing a source mapping; jointly verifying the evidence cards and question-answer pairs, removing evidence cards that all depend on a question-answer pair if they fail verification; extracting experience increments, merging and compressing them under memory capacity constraints to update expert memory; and persisting the processing data and generating a status audit card. This invention solves the problem of unreliable output by using global semantic anchoring, dynamic expert routing, traceable evidence, and a closed loop of joint verification, achieving highly reliable, traceable, and auditable long document processing.

A method and system for processing long documents

A method and system for processing long documents

A method and system for processing long documents

Owner:ANHUI AGRICULTURAL UNIVERSITY

An AI-based word document intelligent analysis and structured storage method

PendingCN122366447AEngineeringArtificial intelligence

The application relates to the technical field of document semantic analysis, in particular to a Word document intelligent analysis and structured storage method based on AI. The application performs format attribute consistency analysis on a Word document running node, executes non-destructive logical text segment recombination and constructs a running merging mapping, and establishes a character offset mapping. Then, minimum difference substring detection is performed on different documents through double-layer alignment of a container path intersection and a text segment serial number, and field boundary records are generated. A double-component semantic fingerprint is constructed by extracting a field forward fixed label text hash and combining a field content mode, and a consistent bookmark name is generated through a fingerprint registration table. A target node is positioned according to a running node operation range, format reservation bookmark injection is executed, semantic fingerprints and injection results are written into a database, and field structured storage is realized. On the premise of maintaining the integrity of the original document structure and format, the accuracy and automation degree of document field recognition and structured processing are improved.

An AI-based word document intelligent analysis and structured storage method

Owner:星汇智云科技(江苏)有限公司

A document intelligent navigation method, system, electronic device, and storage medium

ActiveCN120144824BSemantic analysisText processingDocument structuringEngineering

This application provides a document intelligent navigation method, system, electronic device, and storage medium, relating to the field of document processing technology. The method includes: acquiring a target document, wherein the target document includes one of the following document types: XML document, HTML document; extracting document structure information of the target document using XPath rules, and performing semantic parsing of the content of the target document using Natural Language Processing (NLP) to obtain a parsing result, wherein the parsing result is used to represent the relationships between different structural parts in the document structure information; and generating a target navigation directory based on the document structure information and the parsing result, wherein the target navigation directory is used for intelligent navigation. Implementing the technical solution provided in this application achieves the effect of improving the user reading experience.

A document intelligent navigation method, system, electronic device, and storage medium

Owner:BEIJING LINGDING LANHAI TECHNOLOGY CO LTD

Report generation method, apparatus, device, storage medium, and program product

PendingCN122154655ASemantic analysisBiological modelsGeneration processDocument structuring

The application provides a report generation method, device, equipment, storage medium and program product, relates to the technical field of artificial intelligence, and the method comprises the following steps: based on a report source file, a chapter tree representing the report content structure is generated; based on the chapter tree, semantic analysis and structure planning are performed on each chapter to generate report design mode data; the report design mode data and set TSD data are fused to generate an executable design package; and based on the executable design package, a final report file is generated. Through semantic analysis and structure planning, the application realizes understanding of the deep intention of the report content; by fusing the design intention and the style specification, it is ensured that the generated result can be flexibly adapted to specific content requirements while complying with enterprise specifications, so that the determinacy of the generated result in the three dimensions of document structure, visual style and content semantics is realized on the premise of ensuring that the entire generation process is highly automated and intelligent.

Report generation method, apparatus, device, storage medium, and program product

Owner:CHINA MOBILE JIUTIAN ARTIFICIAL INTELLIGENCE TECHNOLOGY (BEIJING) CO LTD +1

Popular searches

Documentation Data science Data Matrix Graph based Information searching Extraction methods Data mining Database Operations research Large model