A high-precision Q / A extraction system that adds virtual time information to non-standardized meeting minutes.

The method generates virtual time information and uses region-specific models to enhance Q/A extraction accuracy from non-standardized meeting minutes, addressing the limitations of physical time reliance and standardization gaps.

JP2026105775APending Publication Date: 2026-06-26細谷 有策

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
細谷 有策
Filing Date
2024-12-16
Publication Date
2026-06-26

Smart Images

  • Figure 2026105775000001_ABST
    Figure 2026105775000001_ABST
Patent Text Reader

Abstract

This provides an information processing method that enables highly accurate extraction of Q&A from non-standardized local council meeting minutes texts. [Solution] The method includes the steps of: inputting non-standardized meeting minutes text and automatically identifying conceptual phrases; assigning virtual time information that is independent of physical time; tagging each conceptual phrase with a speaker role and utterance type; detecting user-input keywords and related expressions from the conceptual phrases and associating their location information with role tags and virtual time information; inputting a prompt integrating role tags, keyword locations, virtual time information, and region-specific embedded model output into the LLM to generate Q / A pairs related to specific keywords; and scoring the generated Q / A pairs based on keyword relevance, role consistency, and contextual relevance based on virtual time order, and performing rescoring and display order updates when the user inputs additional keywords.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a technique for precisely extracting question-and-answer (Q / A) pairs related to specific keywords from the minutes texts of local councils and other public institutions. In particular, it uses "virtual time information" that does not depend on physical time information to clarify the discussion structure, combines speaker role estimation, context reinforcement using a region-specific embedding model, and an original optimal prompting method for a large language model (LLM), and is characterized by accurately extracting and utilizing complex Q / A pairs with high precision.

Background Art

[0002] A large number of minutes are accumulated in local councils and public institutions. However, due to non-standardization and lack of meta-information (such as speaker tags, topic structures, etc.), it is difficult to extract the question-and-answer correspondence relationship for keywords. Existing methods often rely on the time information of physical audio recordings and cannot fully reflect the logical and conceptual structures within the document.

[0003] Patent Document 1 deals with construction site meeting analysis but does not delve into public policy contexts, region-specific expressions, or advanced Q / A extraction using LLM utilization. Non-patent documents also only focus on fragmentary elemental technologies (such as minutes segmentation, argument analysis), and no solution that comprehensively combines virtual time information, region-specific models, and LLM optimization has been proposed. Furthermore, at the national level, while the standardization of legal data is progressing, in local councils, tagging is lagging, and the current situation of scattered non-standardized data is an obstacle to improving analysis accuracy.

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Non-Patent Documents

[0005] [Non-Patent Document 1] "Dividing Meeting Minutes for Fact-Finding in Local Assemblies" (Proceedings of the 25th Annual Meeting of the Association for Natural Language Processing, March 2019) [Non-Patent Document 2] "Issue Analysis of Parliamentary Proceedings Using Large-Scale Language Models" (FSS2022, etc.) [Non-Patent Document 3] "Topic Analysis of Local Council Meeting Minutes Using Gaussian LDA" (Proceedings of the 25th Annual Meeting of the Association for Natural Language Processing, March 2019) [Non-Patent Document 4] "The Current State of Legal Data and Prospects for the Application of Digital Technologies to the Legal Field" (Proceedings of the 30th Annual Meeting of the Association for Natural Language Processing, March 2024) [Overview of the Initiative] [Problems that the invention aims to solve]

[0006] This invention generates "virtual time information" from non-standardized meeting minutes text that reflects conceptual and logical structure, independent of the physical time series. By combining this with region-specific embedding models, speaker role estimation, and dynamic, multi-layered prompt generation techniques for LLM, it achieves highly accurate Q / A pair extraction. This method aims to overcome the disparity in data development between national and local governments, enabling a wide range of applications, including policy analysis that takes into account the unique context of each region, and support for preparing responses in future parliamentary proceedings. [Means for solving the problem]

[0007] The present invention provides a computer-executable information processing method that includes at least the following steps. (a) A process of inputting non-standardized meeting minutes text and automatically identifying congressional fixed expressions, proper nouns, grammatical segments, and conceptual phrases that take topic cohesion into account. (b) A process of assigning virtual time information, which is independent of physical time, to each conceptual phrase, taking into account logical dependencies, the relative distance from the question phrase to the answer phrase, the topic turning points, and the keyword introduction points. Here, virtual time information is calculated as a relative sequence index that clarifies the Q→A correspondence based on the character offset from the beginning of the sentence and the phrase index. For example, by using an algorithm that uses the question phrase as a reference point and assigns an incremental index to the subsequent phrases, or by using a calculation procedure that weights according to the degree of topic development, the Q / A correspondence can be clearly captured even without physical timestamps. (c) The process of tagging conceptual phrases with speaker roles (questioner, respondent, chairperson, secretariat, etc.) and speech types (question, response, opinion, meeting facilitator) using a trained role classification model and a region-specific embedding model. (d) A step of detecting user-input keywords and their synonyms and related expressions, and recording the location of the keyword in association with a role tag and virtual time information. (e) A process of inputting a prompt integrating role tags, keyword locations, virtual time information, and region-specific embedded model output into the LLM to generate specific keyword-related Q / A pairs. To generate prompts here, we employ a dynamic prompt construction method that inserts and reinjects conceptual phrases in multiple layers, taking into account the different question and answer formats (all-question-and-answer type, one-question-and-one-answer type, etc.) that vary from parliament to parliament. Furthermore, we optimize the LLM input prompts using proprietary scoring indicators such as keyword relevance, contextual relevance, and reflection of region-specific expressions. (f) A process to score the generated Q / A pairs based on keyword relevance, role consistency, and contextual relevance based on a hypothetical time order, and to rescore them when the user enters additional keywords, thereby dynamically updating the display order of the Q / A pairs. (g) A process to automatically generate templates for future parliamentary responses based on extracted Q / A pairs, etc. (h) The region-specific embedded model learns from accumulated meeting minutes and policy documents at the target municipality and neighboring municipalities / prefecture levels, and quantitatively evaluates the differences from the general model. For example, its effectiveness is supported by improving the accuracy of Q / A extraction that includes region-specific expressions compared to the general model. This enables highly accurate Q / A extraction from non-standardized meeting minutes, which was previously difficult, and represents an innovative method that integrates regional characteristics, unique role systems, and multi-layered prompt generation. [Effects of the Invention]

[0008] According to this invention, by introducing virtual time information from conceptual and logical structures, without relying on physical time information, the accuracy of Q / A identification from non-standardized meeting minutes can be significantly improved. Furthermore, by introducing a region-specific embedded model, the context and vocabulary of specific municipalities and neighboring regions can be accurately reflected, enabling precise analysis that cannot be obtained with general-purpose models. Furthermore, by combining the prompt generation method for LLM with unique metrics and dynamic reconstruction mechanisms, the accuracy and usefulness of responses can be further enhanced. In the future, quantitative evaluation through comparative experiments (for example, Q / A extraction accuracy improving by several tens of percent compared to conventional methods) will also be possible. These results can be utilized in a wide range of fields, including policy analysis, preparation for parliamentary responses, research, and media activities, and provide an advanced analytical platform that takes into account the differences in the degree of standardization between national and local governments. [Brief explanation of the drawing]

[0009] [Figure 1] A flowchart illustrating the process from meeting minutes text to (1) Q / A segmentation, (2) virtual time information assignment, and (3) role tag assignment. [Figure 2] A conceptual diagram comparing the conventional method (physical timestamp only) and the present invention (assignment of virtual time information). [Figure 3] A diagram illustrating the processing flow from the region-specific embedded model to the prompt generation section for the LLM. [Industrial applicability]

[0010] The present invention is useful in various fields such as policy-making support in local councils, parliaments, local government think tanks, research institutions, media organizations, etc., improving the efficiency of parliamentary operations, strengthening accountability, academic research, and international cooperation analysis.

Explanation of Signs

[0011] 1 Q / A Division 2 Virtual Time Information Assignment Unit 3 Role Tag Assignment Unit 4 Region-Specific Embedded Model Processing Unit 5 Prompt Generation Unit

Claims

1. A method of information processing performed by a computer, (a) A step of inputting non-standardized meeting minutes text and automatically identifying conceptual phrases that take into account parliamentary fixed expressions, proper nouns, grammatical segmentation, and topic cohesion, (b) A step of assigning virtual time information that is independent of physical time, taking into account the logical dependencies between the conceptual phrases, the relative textual distance from the point where the question is raised to the point where the answer appears, the points where the topic changes, and the points where keywords are introduced, (c) A step of tagging each conceptual phrase with a speaker role and utterance type using a trained role classification model and a region-specific embedding model, (d) The steps of detecting user-input keywords and related expressions from the conceptual phrase and associating their location information with role tags and virtual time information, (e) A step of inputting a prompt integrating the role tag, keyword location, virtual time information, and region-specific embedded model output into the LLM to generate a Q / A pair related to a specific keyword, (f) The generated Q / A pairs are scored based on keyword relevance, role consistency, and contextual relevance based on a hypothetical time order, and rescoring and display ranking are updated when the user enters additional keywords. An information processing method characterized by including

2. The information processing method according to claim 1, characterized in that the virtual time information is assigned as a logical / conceptual sequence based on a phrase index, the degree of topic change, etc., thereby improving Q / A response accuracy without depending on physical time information.

3. The information processing method according to claim 1 or 2, characterized in that the definition of the conceptual phrase is determined by taking into consideration a group of fixed expressions and proper nouns specific to parliament, and a topic cohesion index.

4. An information processing method according to any one of claims 1 to 3, characterized in that the region-specific embedded model is learned from past meeting minutes and policy documents of the target municipality and neighboring municipalities / prefectures, and the Q / A extraction accuracy and vocabulary handling ability are improved compared to a general-purpose model.

5. An information processing method according to any one of claims 1 to 4, characterized in that, when a user enters additional keywords after extracting Q / A pairs, rescoring is performed using the virtual time information, role tags, and keyword-related information, and the Q / A pair list is dynamically updated.

6. An information processing method according to any one of claims 1 to 5, further comprising the steps of automatically generating a template for a draft response for a future parliamentary session based on extracted Q / A pairs and related information, and inputting prompts that reflect regionally specific terminology and agenda-specific issues into the LLM to create a draft response.

7. An information processing method according to any one of claims 1 to 6, characterized in that, in the process of constructing a region-specific embedded model, knowledge nodes at the prefectural level are generated and referenced, and the vocabulary, policies, and cultural background specific to the target local government assembly are reflected.