Document chunking method and system therefor

By chunking documents based on hierarchical structure and context, the method generates high-quality passages, enhancing the accuracy of generative language models in question-answering services.

WO2026141730A1PCT designated stage Publication Date: 2026-07-02POSICUBE CO LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
POSICUBE CO LTD
Filing Date
2024-12-26
Publication Date
2026-07-02

AI Technical Summary

Technical Problem

Existing document chunking methods based on fixed-length token limits disregard document context, leading to corrupted passages and reduced accuracy in generative language models.

Method used

A method and system for chunking documents that considers the hierarchical structure of documents, preprocessing text to maintain context, and generating passages based on a preset maximum passage length, ensuring semantic consistency and completeness.

Benefits of technology

Generates high-quality passages that enhance the usability of passage databases, significantly improving the accuracy of generative language models in question-answering services.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2024021113_02072026_PF_FP_ABST
    Figure KR2024021113_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Provided are a document chunking method and a system therefor. The document chunking method according to some embodiments may comprise the steps of: acquiring a target document; generating processed text in which a hierarchical structure of the target document has been reflected, by preprocessing the target document; generating a plurality of passages by chunking the processed text on the basis of the hierarchical structure and the preset maximum passage length; and storing the plurality of passages in a database. According to this method, since the context of a document is maintained even in passages, high-quality passages having outstanding semantic unity and completeness may be generated, and as a result, a passage DB having high usability may be constructed.
Need to check novelty before this filing date? Find Prior Art