Text processing device and text processing method
The text processing apparatus and method address input limitations by employing multiple divisions based on diverse text characteristics, ensuring suitable segment lengths for effective summary generation using generative AI models.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NTT DOCOMO INC
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-18
AI Technical Summary
Conventional methods for dividing text for summary generation using generative AI models face limitations due to input restrictions, leading to unsuitable text division and potential exceeding of token or character limits, resulting in inefficient summary generation.
A text processing apparatus and method that employs multiple divisions of text groups based on various types of information, such as text features, interrogative sentences, and speaker identification, ensuring each divided segment meets predefined length conditions to accommodate AI model input limits.
Enables appropriate text division suitable for summary generation, allowing efficient use of generative AI models by adhering to input restrictions and maintaining consistent segment lengths, thereby improving summary quality.
Smart Images

Figure JP2024044081_18062026_PF_FP_ABST
Abstract
Description
Text processing apparatus and text processing method 【0001】 The present invention relates to a text processing apparatus and a text processing method for processing text. 【0002】 Patent Document 1 describes an information processing apparatus that identifies a location where the trend of a conversation has changed. The apparatus identifies a location where the temporal change rate of the number of spoken characters per unit time exceeds a threshold as a location where the trend of the conversation has changed. Further, during a meeting, the apparatus generates a summary text that summarizes the conversation for each interval delimited by a location where the trend of the conversation has changed. 【0003】 Japanese Patent Application Laid-Open No. 2021-36292 【0004】 It is conceivable to generate a summary of the text of a conversation or minutes of a meeting in a conference using a generative AI (artificial intelligence) model such as an LLM (large language model). However, usually, since there is a limit to the number of tokens or characters that can be input to an LLM, there is a possibility that the entire text to be the target of summary or minutes generation cannot be input to the LLM. 【0005】 Therefore, it is conceivable to divide the text to be the target of summary or minutes generation so as to have a length that can be input to the LLM using the method shown in Patent Document 1. However, like conventional methods such as the method shown in Patent Document 1, simply delimiting at a location where the trend of the conversation has changed may result in the divided text not necessarily being suitable for summary or minutes generation. Also, in the conventional method, there is a possibility that the divided text may not be suitable for input to the LLM. For example, even after division, the divided text may exceed the limit of the LLM. Also, even in other cases, the division of text by the conventional method may not necessarily be appropriate. 【0006】 One embodiment of the present invention has been made in view of the above, and an object thereof is to provide a text processing apparatus and a text processing method capable of appropriately dividing text. 【0007】To achieve the above objective, a text processing device according to one embodiment of the present invention includes: an acquisition unit that acquires a group of texts to be divided, which includes a plurality of consecutive texts; a division unit that acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired by the acquisition unit, and divides the group of texts into multiple parts using the information; and a determination unit that determines whether the length of each of the texts after division by the division unit satisfies a set condition. In response to the determination by the determination unit, the division unit acquires information necessary for dividing the divided texts and of a different type from the previous division, based on the texts included in the divided texts, and divides the divided texts into multiple parts using the information. 【0008】 In the text processing device according to one embodiment of the present invention, a group of text is divided into multiple parts, and it is determined whether the length of each of the divided text parts satisfies a set condition. Based on this determination, the divided text group is further divided into multiple parts. Each division is performed using different types of information based on the text. That is, multiple divisions are performed from different perspectives. Therefore, the text processing device according to one embodiment of the present invention can appropriately divide text. 【0009】 Incidentally, one embodiment of the present invention can be described as an invention of a text processing device as described above, or as an invention of a text processing method as described below. These are substantially the same invention, differing only in category, and produce similar functions and effects. 【0010】In other words, a text processing method according to one embodiment of the present invention includes: an acquisition step in which a text processing device acquires a group of texts that include a plurality of consecutive texts and are to be divided; a division step in which the text processing device acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired in the acquisition step, and uses that information to divide the group of texts into multiple parts; and a determination step in which the text processing device determines whether the length of each of the divided text groups in the division step satisfies a set condition, and in accordance with the determination in the determination step, in the division step, based on the texts included in the divided text groups, it acquires information of a different type from the previous division that is necessary for dividing the divided text group, and uses that information to divide the divided text group into multiple parts. 【0011】 According to one embodiment of the present invention, text can be appropriately split. 【0012】 This figure shows the configuration of a text processing device according to an embodiment of the present invention. This figure shows an example of text group partitioning. This figure shows the calculation of text features. This figure shows the moving average of text features. This figure shows another example of text group partitioning. This flowchart shows a text processing method which is performed by the text processing device according to an embodiment of the present invention. This figure shows the hardware configuration of the text processing device according to an embodiment of the present invention. 【0013】 Embodiments of the text processing device and text processing method according to the present invention will be described in detail below with reference to the drawings. In the description of the drawings, the same elements are denoted by the same reference numerals, and redundant explanations are omitted. 【0014】 Figure 1 shows a text processing device 10 according to this embodiment. The text processing device 10 is a device (system) that divides a group of text containing multiple consecutive texts into multiple parts. The text processing device 10 is used, for example, in the following cases. 【0015】The text to be divided consists of the conversation (dialogue) text from the meeting, arranged in the order in which it was spoken. This text is used to generate a meeting summary or minutes. The generation of the meeting summary or minutes is performed using a generative AI model. The conversation text is input into the generative AI model to generate the meeting summary or minutes. 【0016】 A generative AI model is a model generated by machine learning. A generative AI model is a model that, in response to a prompt input, generates content according to any or a combination of the instructions, context, question, and output format indicated by the prompt, and returns that content as response information. The prompt can also include input information, in which case the generative AI model generates response information based on the input information. A generative AI model may consist of, for example, an LLM (Large-Scale Language Model) and a user interface (UI) for interaction with the user, and may be an interactive AI model that enables text chat or voice chat with the user. Examples of such generative AI models include ChatGPT, GPT®-3.5, GPT-4V, PaLM2, etc. 【0017】A prompt is a set of instructions or information input to a generating AI model. A prompt can include initial information, parameters, and questions necessary for the generating AI model to perform a specific task. In an interactive system, such as a command-line interface (CLI), a prompt indicates instructions or questions entered by the user. The prompt expresses, in text, the commands to be executed by the interactive AI model, the tasks to be performed, the background and context to be considered by the interactive AI model (e.g., roles, conditions), the questions to be answered by the interactive AI model, and the output format of the response information from the interactive AI model. Furthermore, the prompt may include input information that will be the target of the commands / tasks executed by the interactive AI model. Such input information includes data files with specific extensions, such as text data, image data, application-related data, audio data, video data, and still image data. Application-related data refers to data such as document data, tabular data, and graph data that can be processed by a default application program. 【0018】 Typically, AI generators have an upper limit on the number of tokens or characters that can be input at one time (input limit). That is, the number of tokens and characters in the prompts input to the AI generator must be within the above input limit. A token is the unit of a string handled by the AI generator. 【0019】If the length of the conversation text exceeds the input limit of the generation AI model, the conversation text must be divided and input into the generation AI model in order to generate a meeting summary or minutes using the generation AI model. The text processing device 10 performs this division of text groups. For example, as shown in Figure 2, the original text group 100, which is the conversation text, is divided into multiple parts. The text processing device 10 divides the text group 100 in stages. For example, as shown in Figure 2, the text processing device 10 divides the text group 100 into multiple text groups 101, and then further divides the divided text groups 101 into multiple text groups 102. Although the above is referred to as a text group, the divided text groups 101 and 102 do not necessarily need to contain multiple texts. 【0020】 By dividing the text group using the text processing device 10, a meeting summary or minutes can be generated using a generative AI model with input restrictions. In addition, the text processing device 10 divides the text group in a manner suitable for generating a meeting summary or minutes using the generative AI model, even in respects other than input restrictions. 【0021】 Furthermore, the text processing device 10 may perform the division of the text group for purposes other than generating a meeting summary or minutes using a generation AI model. In this case, the text group to be divided may be other than the text of the conversation in the meeting. 【0022】 The text processing device 10 is comprised of a computer such as a PC (personal computer) or a server device. The text processing device 10 may be comprised of multiple computers. The text processing device 10 may be able to send and receive information with other devices via a network in order to acquire information necessary for realizing its functions. 【0023】 Next, the functions of the text processing device 10 according to this embodiment will be described. As shown in Figure 1, the text processing device 10 is configured to include an acquisition unit 11, a division unit 12, and a determination unit 13. 【0024】The acquisition unit 11 is a functional unit that acquires a group of texts that include multiple consecutive texts and are to be divided. The group of texts to be divided is, for example, the text of a conversation in a meeting arranged in the order in which it was spoken, as described above. Each text included in the group of texts to be divided is separated by delimiters such as periods, question marks, and exclamation marks, so that it can be distinguished from other texts. 【0025】 Each text in the text group may be associated with information indicating the time related to the text (for example, the time the statement related to the text was made). The information indicating the time related to the text may be, for example, information indicating the date and time or information indicating the elapsed time from the start of the meeting. In addition, each text in the text group may be associated with information indicating the person related to the text (for example, the person who made the statement that the text is based on). The information indicating the person related to the text may be, for example, information indicating an identifier that has been set in advance for that person. The acquisition unit 11 may acquire the text group together with the information associated with the texts included in the text group. 【0026】 The text group to be divided acquired by the acquisition unit 11 is not limited to those described above, but may include any number of consecutive texts. Furthermore, the text group to be divided may be based on audio, such as the text of the conversation in the above-mentioned meeting, or it may not be based on audio. 【0027】 The acquisition unit 11 may, for example, receive a group of texts transmitted from another device and acquire the group of texts. Alternatively, the acquisition unit 11 may acquire a group of texts by accepting a text input operation from the user to the text processing device 10. Alternatively, the acquisition unit 11 may acquire recorded audio data, which is the source of the group of texts, and perform speech recognition (transcription) on the audio data to acquire a group of texts as a result of the speech recognition. Speech recognition on the audio data can be performed using conventional speech recognition methods. In addition, information associated with the text (information indicating the time and person related to the above texts) may also be acquired through speech recognition. 【0028】 Information identifying a person may be obtained, for example, by performing speaker identification using a speaker identification model during speech recognition. Speaker identification using a speaker identification model may be performed using a conventional method such as the following: Speech segment detection is applied to the speech data to divide the speech data into speech segment units. Feature quantities of the speech data are extracted for each speech segment. The feature quantities are common acoustic features such as MFCC (Mel-frequency cepstrum coefficients). Speaker identification is performed by classifying the feature quantities using a classification model that is a speaker identification model (for example, a common classification model such as SVM (Support Vector Machine)). Speech recognition itself may be performed by the text processing device 10 or by a device other than the text processing device 10. 【0029】 Furthermore, information identifying a person may be obtained by methods other than speech recognition. For example, if the audio data is from a web conference or similar where each participant uses a different audio input device (microphone), meaning each participant joined from an individual device, and each utterance is associated with an audio input device, then information indicating the audio input device associated with the utterance (or information indicating the participant associated with the audio input device (e.g., participant's name)) may be used as information identifying a person. 【0030】 The acquisition unit 11 may acquire the text group by a method other than those described above. The acquisition unit 11 outputs the acquired text group to the splitting unit 12. 【0031】 The splitting unit 12 is a functional unit that, based on the texts included in the text group acquired by the acquisition unit 11, acquires information necessary for splitting the text group and uses that information to split the text group into multiple parts. In response to the determination by the determination unit 13, the splitting unit 12 acquires information that is necessary for splitting the split text group and is of a different type than the previous split, based on the texts included in the split text group, and uses that information to split the split text group into multiple parts. 【0032】The splitting unit 12 may acquire text features that enable the calculation of similarity between texts, as information necessary for splitting the text group or the split text group after splitting. The splitting unit 12 may acquire a moving average of text features that enable the calculation of similarity between texts, as information necessary for splitting the text group or the split text group after splitting. The splitting unit 12 may acquire information indicating whether a text is a question or not, as information necessary for splitting the text group or the split text group after splitting. The splitting unit 12 may acquire information indicating the person related to the text, as information necessary for splitting the text group or the split text group after splitting. 【0033】 The splitting unit 12 receives a group of texts from the acquisition unit 11. The splitting unit 12 divides the input group of texts into multiple parts. Furthermore, depending on the determination by the determination unit 13, the splitting unit 12 further divides the divided group of texts into multiple parts. In other words, the splitting unit 12 performs multiple divisions of the group of texts. The multiple divisions of the group of texts by the splitting unit 12 are performed by acquiring different types of information based on the texts contained in the group of texts, and using the acquired information. Note that among the multiple divisions of the group of texts, at least one of two consecutive divisions uses different types of information. In other words, the multiple divisions of the group of texts only need to include any two consecutive divisions that use different types of information, and the multiple divisions of the group of texts may include multiple divisions that use the same type of information. 【0034】 This information includes, for example, text features that allow for the calculation of similarity between texts, information indicating whether a text is a question or not, and information indicating the person related to the text. The function of the text group division by the division unit 12 will be explained below for each type of information used to divide the text group. 【0035】This section describes a method of splitting texts based on their features, which allows for the calculation of similarity between texts. The splitting unit 12 calculates and obtains the above-mentioned features from the texts of the text group to be split. The splitting unit 12 calculates the features for each pre-defined unit. The pre-defined unit may be the length of time corresponding to a pre-defined text. The length of time corresponding to a text is, for example, the length of time during a meeting. For example, the unit may be in units of one minute (0:00 to 1:00, 1:00 to 2:00, 2:00 to 3:00, ...) as shown in Figure 3. In this case, the unit is identified based on information indicating the time related to the text, which is associated with the text. Alternatively, the pre-defined unit may be a text unit. For example, the unit may be a single text. Alternatively, the pre-defined unit may be a character count unit. Furthermore, the pre-defined unit may be something other than those mentioned above. 【0036】 Feature vectors are, for example, vectors that represent the characteristics of the text included in the above-mentioned units. These vectors are vectors with a predetermined number of dimensions (n-dimensional vectors). The graph in the lower right of Figure 3 schematically shows an example of a feature vector (n-dimensional vector) for a one-minute unit. 【0037】 The splitting unit 12 calculates and obtains feature quantities for each of the above units from the text group to be split, for example, as follows. The splitting unit 12 performs morphological analysis on the text included in the above units to extract content words. Content words are words that have a specific meaning, for example, words of a predetermined part of speech (for example, nouns, verbs, adjectives, and adverbs). For example, in the example shown in Figure 3, the content words "hello," "today," and "meeting" are extracted for the 0:00 to 1:00 unit. The content words "today," "agenda," "progress," and "report" are extracted for the 1:00 to 2:00 unit. The content words "today," "weather," and "rain" are extracted for the 2:00 to 3:00 unit. 【0038】The splitting unit 12 converts the extracted content words into vectors in an n-dimensional space of features. The conversion to content word vectors (vectorization of words) can be performed using conventional language analysis techniques (e.g., Word2Vec). Figure 3 shows examples of content word vectors for each unit: 0:00 to 1:00, 1:00 to 2:00, and 2:00 to 3:00. The splitting unit 12 calculates the average of the content word vectors for each unit as a feature for that unit. The feature generated in this way indicates the trend of the text content in the above unit. If the feature quantities of two texts are similar, it indicates that the texts are similar. By using feature quantities for unit intervals as described above, it is possible to create feature quantities that are robust to outliers in the feature quantities of individual words. 【0039】 Furthermore, features other than those mentioned above may be used as long as they can calculate the similarity between texts. Also, feature calculation may be performed by methods other than those mentioned above. For example, feature calculation (conversion from text to features) may be performed using a generative AI model. By using a generative AI model to calculate features, the calculated features can take into account context that cannot be fully captured by the above-mentioned features. 【0040】 The splitting unit 12 compares the feature quantities of two consecutive units. Specifically, the splitting unit 12 calculates the similarity of the feature quantities between two consecutive units. For example, the splitting unit 12 calculates the angle or cosine similarity of two vectors, which are feature quantities of the two units, as the feature quantity similarity. The splitting unit 12 compares the calculated similarity with a pre-set threshold. If the splitting unit 12 determines that the feature quantities are not similar as a result of the comparison, for example, if cosine similarity is used as the similarity metric, and the cosine similarity is below the threshold, the splitting unit 12 splits the text group between the two consecutive units. This is because if the feature quantities differ significantly between two consecutive units, it is considered that the topic related to the text has shifted between those units. 【0041】The splitting unit 12 does not split the text group if it determines that the feature similarity between any two consecutive units in the text group to be split is similar. Alternatively, if the splitting unit 12 determines that the feature similarity between multiple pairs of consecutive units in the text group to be split is not similar, it may split the text group into all of those pairs of consecutive units. Note that the comparison of feature quantities may be performed by methods other than those described above. Furthermore, the splitting of the text group based on the comparison of feature quantities may be performed by methods other than those described above. 【0042】 Alternatively, the splitting unit 12 may use a moving average of features instead of individual unit features to determine the splitting of the text group (i.e., the transition of topics related to the text). In this case, the splitting unit 12 calculates individual unit features in the same manner as described above. The splitting unit 12 calculates and obtains a moving average of features from the calculated individual unit features. For example, the splitting unit 12 calculates the average (i.e., moving average) of the features of the target unit and the n units before and after it (for example, two before and two after). For example, as shown in Figure 4, if the target unit is the x-th unit, the average of the features in the interval [x-n, x+n] (sliding window) is calculated. The example shown in Figure 4 is an example where n=2. Using the calculated average of features, the splitting unit 12 calculates the similarity of the features of two consecutive units (i.e., the moving average of features shifted by one unit) in the same manner as described above, and splits the text group using this similarity. By using a moving average of features, it is possible to split the text group based on a broader transition of topics compared to using individual unit features. The above describes a method of segmentation based on text features that allows for the calculation of similarity between texts. 【0043】Next, the division based on the information indicating whether the text is an interrogative sentence will be described. The division unit 12 acquires information indicating whether each text included in the text group to be divided is an interrogative sentence. For example, the division unit 12 determines whether each such text is an interrogative sentence and acquires the information. The determination of whether the text is an interrogative sentence may be performed using conventional techniques. For example, the determination of whether the text is an interrogative sentence may be performed by an interrogative sentence determination model generated by machine learning. The interrogative sentence determination model is, for example, a deep learning model learned by pairs of text and a label indicating whether the text is an interrogative sentence. Alternatively, the determination may be performed by a generation AI model. In this case, for example, a prompt to the effect of "Determine whether the input sentence is an interrogative sentence and answer with Yes or No" may be used to perform the determination. 【0044】 Note that the determination may be performed by a device other than the division unit 12 (text processing device 10), rather than by the division unit 12. In that case, the division unit 12 outputs the text to be determined to the device and causes the device to perform the determination, and acquires information indicating whether the text is an interrogative sentence, which is the determination result, from the device. 【0045】 The division unit 12 divides the text group into a plurality based on the acquired information. For example, the division unit 12 divides the text group into two, namely, the text after the first interrogative sentence among the texts included in the text group and the text before it. For example, in a meeting, questions are first asked for each topic, and then question-and-answer sessions are held. In this case, the text group after the first interrogative sentence after division can be regarded as the text group in the question-and-answer phase, and the text group before it after division can be regarded as the text group in the explanation phase. Note that when the text group does not include a text regarded as an interrogative sentence, the text group is not divided. Note that the division of the text group based on the information indicating whether the text is an interrogative sentence may be performed by a method other than the above. The above is the division based on the information indicating whether the text is an interrogative sentence. 【0046】Next, the division based on the information indicating the person related to the text will be described. The division unit 12 acquires information indicating the person related to each text included in the text group to be divided. When performing division using this information, as described above, the acquisition unit 11 acquires information indicating the person related to the text, which is associated with the text included in the text group to be divided. The division unit 12 acquires the information input from the acquisition unit 11 as the information to be used for division. 【0047】 The division unit 12 divides the text group into a plurality based on the acquired information. For example, when the person related to the text is different between two consecutive texts included in the text group, the division unit 12 divides the text group between the two consecutive texts. That is, the division unit 12 makes a consecutive text group related to the same person into one divided text group. Alternatively, the division unit 12 divides the text group using the text related to a preset person as a delimiter. Note that the division of the text group based on the information indicating the person related to the text may be performed by other methods than the above. The above is the division by the information indicating the person related to the text. 【0048】 The division unit 12 may acquire other information necessary for dividing the text group related to the text based on the text included in the text group to be divided, and divide the text group into a plurality using the information. 【0049】 The division unit 12 stores in advance which information to use for division in which order, and divides the text group accordingly. The order is preset by the provider or user of the text processing device 10. The division of the text group in the second and subsequent times is performed according to the determination by the determination unit 13. The multiple divisions include divisions using different types of information. Also, the multiple divisions may include multiple divisions using the same type of information. When performing multiple divisions using the same type of information, in later divisions, the criteria for division are loosened for content where further division is possible. For example, in this case, the threshold value for division determination is loosened so that further division is possible. 【0050】For example, the first division may be based on text features, and the second division may be based on information indicating whether the text is a question or not. In this case, the third and fourth divisions may also be based on text features, and the fifth division may be based on information indicating the person related to the text. Furthermore, the division by the division unit 12 may end after a predetermined number of divisions have been completed, or it may be performed without a set number of divisions. If the division by the division unit 12 is performed without a set number of divisions, the division ends according to the judgment of the judgment unit 13. In this case, the division unit 12 has in advance stored which information to use and in what order to perform repeated divisions. 【0051】 The splitting unit 12 outputs information indicating the result of the splitting of the text group (for example, the text group after splitting) to the determination unit 13. 【0052】 The determination unit 13 is a functional unit that determines whether the length (quantity) of each text group after division by the division unit 12 satisfies the set conditions. The determination unit 13 may compare the length of each text group after division by the division unit 12 with a threshold and, based on the comparison, determine whether the length of each text group after division satisfies the set conditions. The determination unit 13 may also compare the lengths of each text group after division by the division unit 12 with each other and, based on the comparison, determine whether the length of each text group after division satisfies the set conditions. 【0053】 The determination unit 13 makes a decision to determine whether or not to perform the division of the text group again by the division unit 12. The determination unit 13 receives information from the division unit 12 indicating the result of the division of the text group. The determination unit 13 determines whether the length of each of the divided text groups, as indicated by the information received from the division unit 12, satisfies the set conditions. The length of each of the divided text groups is, for example, the number of characters in the divided text group. Alternatively, the length of each of the divided text groups may be the length of time corresponding to the divided text group (for example, the length of time during the meeting as described above). 【0054】The condition is, for example, whether the length of the divided text group is less than or equal to a threshold. The threshold is set in advance by the provider or user of the text processing device 10. When generating a meeting summary or minutes using a generation AI model with input restrictions as described above, the threshold is set so that the length of the divided text group is less than or equal to the input restriction. 【0055】 The determination unit 13 compares the length of each divided text group with a threshold. If the length of the divided text group is less than or equal to the threshold, the determination unit 13 determines that the length of the divided text group satisfies the condition. If the length of the divided text group exceeds the threshold, the determination unit 13 determines that the length of the divided text group does not satisfy the condition. The determination unit 13 instructs the division unit 12 to perform further division in the next round for the divided text group whose length is deemed not to satisfy the condition. Upon receiving this instruction from the determination unit 13, the division unit 12 performs the next round of division for the divided text group in question. 【0056】 The determination by the determination unit 13 may be repeated until all the divided text groups satisfy the conditions. That is, the division of the text groups by the division unit 12 may be repeated until all the divided text groups satisfy the conditions. With this configuration, for example, the length of all the divided text groups can be kept below a threshold, and each of the divided text groups can be input together into a generative AI model with input restrictions. 【0057】 If the number of divisions (upper limit) by the division unit 12 is set in advance, the determination unit 13 may continue to make decisions until all the divided text groups satisfy the conditions or until the number of divisions by the division unit 12 reaches the upper limit. 【0058】Judging based on a threshold for the length of each divided text group may result in variations in the length of each divided text group that satisfies the conditions. For example, as shown in Figure 5, if all of the first divided text groups 101 of the original text group 100 satisfy the conditions, there is a possibility that some divided text groups 101 (the second and third first divided text groups 101 from the top in Figure 5) may be extremely long compared to other divided text groups 101 (the top first divided text group 101 in Figure 5). In this way, the lengths of multiple divided texts (i.e., the density of the divided texts) may become inconsistent. 【0059】 Large differences in length between the resulting text segments can be inappropriate. For example, the appropriate level of granularity for segmenting a topic varies significantly depending on the conversation. For instance, while both a group of self-introductions and progress reports from individual sales representatives involve sequential speaking, it's more appropriate to segment the progress reports in more detail than the self-introductions. This is because progress reports contain more information, requiring a more detailed segmentation. Therefore, segmenting text segments using only pre-set thresholds increases the likelihood of segmenting them at an inappropriate level of granularity (too broad or too detailed) for that particular conversation. 【0060】 For example, if you divide the text group into a 1:9 ratio based on topic, the 90% group will account for almost the entire content. Therefore, summarizing the 90% group may be inappropriate because the details of that group will be unclear (we want to know the details of the 90% group). For this reason, it is best to observe the entire divided text group and, if a significant bias has occurred, take a post-processing method such as dividing it again, as described below. 【0061】The determination unit 13 may, instead of or in addition to using thresholds for the length of each of the divided text groups as described above, compare the lengths of the divided text groups and, based on the comparison, determine whether the length of each of the divided text groups satisfies the set conditions. In this case, the determination unit 13 compares the divided text group with the shortest length with each of the divided text groups. For example, as part of this comparison, the determination unit 13 calculates the difference between the length of each divided text group and the length of the divided text group with the shortest length. 【0062】 The determination unit 13 compares the calculated difference with a threshold. The threshold is set in advance by the provider or user of the text processing device 10. Alternatively, the threshold may be a value based on the length of the shortest divided text group. For example, in this case, the threshold may be a preset percentage (e.g., 30%) of the length of the shortest divided text group as a tolerance. 【0063】 If the difference calculated as a result of the comparison is less than or equal to the threshold, the determination unit 13 determines that the length of the divided text group satisfies the condition. If the difference calculated as a result of the comparison exceeds the threshold, the determination unit 13 determines that the length of the divided text group does not satisfy the condition. The processing after determining the condition is the same as the processing after determining using the threshold for the length of each divided text group as described above. As a result of this determination, for example, as shown in Figure 5, the multiple text groups 102 after the second division will have roughly the same length. Note that in the above, the comparison value was the difference in length between the divided text groups, but the comparison value may be a value other than the difference in length between the divided text groups (for example, a ratio). 【0064】 Furthermore, when used in conjunction with the threshold-based judgment for the length of each of the divided text groups described above, any divided text group that does not meet at least one of the conditions will be further divided. 【0065】The determination made by the determination unit 13 may be anything other than the above, as long as it determines whether the length of each text group after division by the division unit 12 satisfies the set conditions. 【0066】 The determination unit 13 outputs information indicating the result of the text group division (for example, the divided text group) when all the divided text groups satisfy the conditions or when the number of divisions by the division unit 12 reaches the upper limit. For example, if the text group is used to generate a meeting summary or minutes, the determination unit 13 outputs (transmits) the information indicating the result of the text group division to the device (system) or module that performs such generation. The output information indicating the result of the text group division is used to generate a meeting summary or minutes. This use can be carried out in the same way as before. 【0067】 Furthermore, depending on how the text group is used, the determination unit 13 may output (transmit) information indicating the result of the text group division to a device (system) or module other than those described above. Alternatively, the determination unit 13 may output information indicating the result of the text group division in a format that is recognizable to the user. For example, the determination unit 13 may display information indicating the result of the text group division on a display device provided by the text processing device 10. The output of information indicating the result of the text group division from the determination unit 13 may be performed by a method other than those described above and to an output destination other than those described above. The above describes the functions of the text processing device 10 according to this embodiment. 【0068】 Next, using the flowchart in Figure 6, we will explain the text processing method, which is the process (operation method performed by the text processing device 10) executed by the text processing device 10 according to this embodiment. In this process, first, the acquisition unit 11 acquires a group of texts that include multiple consecutive texts and are to be divided (S01, acquisition step). Subsequently, the division unit 12 acquires information necessary for dividing the text group based on the texts included in the text group acquired by the acquisition unit 11 (S02, division step). Subsequently, the division unit 12 divides the text group into multiple parts using this information (S03, division step). 【0069】Next, the determination unit 13 determines whether the length of each of the text groups after division by the division unit 12 satisfies the set conditions (S04, determination step). If it is determined that the length of at least one of the divided text groups does not satisfy the conditions (NO in S04), the division process for the divided text group whose length is determined not to satisfy the conditions is performed as follows. 【0070】 The splitting unit 12 acquires information necessary for splitting the split text group based on the text included in the split text group (S02, splitting step). The information acquired at this time is of a different type than the information used in the previous split. Subsequently, the splitting unit 12 splits the split text group into multiple parts using this information (S03, splitting step). The subsequent judgment process by the judgment unit 13 (S04) and onward is carried out in the same manner as described above. 【0071】 If the total length of all the divided text groups is determined to satisfy the condition (YES in S04), the division of the text group is completed, and the determination unit 13 outputs information indicating the result of the text group division (S05). If the number of divisions performed by the division unit 12 is predetermined, once that number of divisions is completed, the determination unit 13 outputs information indicating the result of the text group division at that point. The above describes the text processing method according to this embodiment. 【0072】 In this embodiment, a group of texts is divided into multiple parts, and it is determined whether the length of each of the divided texts satisfies a set condition. Based on this determination, the divided texts are further divided. Each division is performed using different types of information based on the text. That is, multiple divisions are performed from different perspectives. Therefore, according to this embodiment, text can be divided appropriately. 【0073】As mentioned above, one of the text-based pieces of information used for segmentation may be text features that allow for the calculation of similarity between texts. By using this information for segmentation, appropriate segmentation can be performed. For example, as mentioned above, segmentation can be performed based on topic transitions. 【0074】 As mentioned above, one piece of text-based information used for segmentation may be information indicating whether or not a text is a question. By using this information for segmentation, appropriate segmentation can be achieved. For example, as described above, a group of texts can be divided into a group of texts for the explanation phase and a group of texts for the question-and-answer phase. 【0075】 As mentioned above, one of the pieces of text-based information used for segmentation may be information indicating the person related to the text. By using such information for segmentation, appropriate segmentation can be performed. For example, as mentioned above, segmentation can be performed according to the person who made the utterance related to the text. However, the text-based information used for segmentation does not necessarily have to be any of the above types of information; it is sufficient if it is information that is based on text and can be used for segmentation. 【0076】 As in this embodiment, the determination unit 13 may compare the length of each divided text group with a threshold and, based on the comparison, determine whether the length of each divided text group satisfies the set conditions. With this configuration, for example, text can be appropriately divided when there are input limitations for the generation AI model. Therefore, the appropriately divided text groups can be input together to the generation AI model. 【0077】 As in this embodiment, the determination unit 13 may compare the lengths of the divided text groups and, based on the comparison, determine whether the length of each divided text group satisfies the set conditions. With this configuration, for example, the lengths of the divided text groups can be made equal. However, the determination by the determination unit 13 does not necessarily have to be as described above; it is sufficient to determine whether the length of each divided text group satisfies the set conditions. 【0078】The block diagrams used in the description of the above embodiments show functional units. These functional blocks (components) are realized by any combination of at least one of hardware and software. Furthermore, the method of realizing each functional block is not particularly limited. That is, each functional block may be realized using one device that is physically or logically coupled, or it may be realized using two or more physically or logically separated devices that are directly or indirectly connected (for example, using wired or wireless connections). A functional block may also be realized by combining software with the one or more of the above devices. 【0079】 Functions include, but are not limited to, judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, exploration, confirmation, reception, transmission, output, access, resolution, selection, selection, establishment, comparison, assumption, expectation, assumption, broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating (mapping), and assigning. For example, a functional block (configuration part) that enables transmission is called a transmitting unit or transmitter. In all cases, as mentioned above, the method of implementation is not particularly limited. 【0080】 For example, the text processing device 10 in one embodiment of the present disclosure may function as a computer that performs information processing according to the present disclosure. Figure 7 is a diagram showing an example of the hardware configuration of the text processing device 10 according to one embodiment of the present disclosure. The above-described text processing device 10 may be physically configured as a computer device including a processor 1001, memory 1002, storage 1003, communication device 1004, input device 1005, output device 1006, bus 1007, etc. 【0081】In the following explanation, the term "device" can be replaced with "circuit," "device," "unit," etc. The hardware configuration of the text processing device 10 may include one or more of the devices shown in the figure, or it may be configured to omit some of the devices. 【0082】 Each function in the text processing device 10 is realized by loading predetermined software (programs) onto hardware such as the processor 1001 and memory 1002, which allows the processor 1001 to perform calculations, control communication by the communication device 1004, and control at least one of data reading and writing in the memory 1002 and storage 1003. 【0083】 The processor 1001 controls the entire computer, for example, by running an operating system. The processor 1001 may be composed of a central processing unit (CPU) that includes interfaces with peripheral devices, control devices, arithmetic units, registers, etc. For example, each function in the text processing device 10 described above may be implemented by the processor 1001. 【0084】 Furthermore, the processor 1001 reads programs (program code), software modules, data, etc., from at least one of the storage 1003 and the communication device 1004 into the memory 1002 and executes various processes accordingly. The program used is one that causes the computer to execute at least a part of the operations described in the above embodiment. For example, each function of the text processing device 10 may be implemented by a control program stored in the memory 1002 and operated on the processor 1001. Although the above-described processes have been executed by one processor 1001, they may be executed simultaneously or sequentially by two or more processors 1001. The processor 1001 may be implemented by one or more chips. The program may also be transmitted from a network via a telecommunications line. 【0085】The memory 1002 is a computer-readable recording medium and may consist of at least one of the following: ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), RAM (Random Access Memory), etc. The memory 1002 may also be called a register, cache, main memory, etc. The memory 1002 can store executable programs (program code), software modules, etc., for carrying out information processing according to one embodiment of the present disclosure. 【0086】 The storage 1003 is a computer-readable recording medium and may consist of at least one of the following: an optical disc such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, a magneto-optical disk (e.g., a compact disk, a digital multipurpose disk, a Blu-ray® disc), a smart card, flash memory (e.g., a card, a stick, a key drive), a floppy® disk, a magnetic strip, etc. The storage 1003 may also be called an auxiliary storage device. The storage medium provided by the text processing device 10 may be, for example, a database, server, or other suitable medium including at least one of the memory 1002 and the storage 1003. 【0087】 The communication device 1004 is hardware (transceiver / receiver device) for communicating between computers via at least one of a wired network and a wireless network, and is also referred to as a network device, network controller, network card, communication module, etc. 【0088】The input device 1005 is an input device that accepts input from an external source (e.g., a keyboard, mouse, microphone, switch, button, sensor, etc.). The output device 1006 is an output device that outputs to an external source (e.g., a display, speaker, LED lamp, etc.). The input device 1005 and the output device 1006 may be configured as an integrated unit (e.g., a touch panel). 【0089】 Furthermore, each device, such as the processor 1001 and memory 1002, is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or different buses may be configured for each device. 【0090】 Furthermore, the text processing device 10 may be configured to include hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array), and some or all of each functional block may be realized by such hardware. For example, the processor 1001 may be implemented using at least one of these hardware components. 【0091】 The processing procedures, sequences, flowcharts, etc., of each aspect / embodiment described in this disclosure may be reordered, provided they do not contradict each other. For example, the methods described in this disclosure present various step elements using exemplary order and are not limited to the specific order presented. 【0092】 Input and output information may be stored in a specific location (e.g., memory) or managed using a management table. Input and output information may be overwritten, updated, or appended to. Output information may be deleted. Input information may be transmitted to other devices. 【0093】The determination may be made by a value represented by one bit (0 or 1), by a boolean value (true or false), or by a numerical comparison (for example, a comparison with a predetermined value). 【0094】 Each aspect / embodiment described in this disclosure may be used individually, in combination, or switched between as needed during implementation. Furthermore, notification of specific information (e.g., notification that "X is") is not limited to explicit notification, but may also be implicit (e.g., by not providing such notification). 【0095】 Although the present disclosure has been described in detail above, it will be clear to those skilled in the art that the present disclosure is not limited to the embodiments described herein. The present disclosure can be implemented in modified and altered forms without departing from the intent and scope of the present disclosure as defined by the claims. Therefore, the descriptions in the present disclosure are illustrative and not intended to be restrictive in any way. 【0096】 Software should be broadly interpreted to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, functions, and so on, whether they are called software, firmware, middleware, microcode, hardware description languages, or by any other name. 【0097】 Furthermore, software, instructions, information, etc., may be transmitted and received via a transmission medium. For example, if software is transmitted from a website, server, or other remote source using at least one of wired technology (such as coaxial cable, fiber optic cable, twisted pair, or digital subscriber line (DSL)) and wireless technology (such as infrared or microwave), then at least one of these wired and wireless technologies is included in the definition of a transmission medium. 【0098】 The terms “system” and “network” as used in this disclosure are interchangeable. 【0099】 Furthermore, the information, parameters, etc., described in this disclosure may be expressed using absolute values, relative values from a predetermined value, or corresponding other information. 【0100】 As used in this disclosure, the terms “determining” and “determining” may encompass a wide variety of actions. “Determining” may include, for example, judging, calculating, computing, processing, deriving, investigating, looking up, searching, or inquiring (e.g., searching in a table, database, or other data structure), or ascertaining. “Determining” may also include receiving (e.g., receiving information), transmitting (e.g., sending information), inputting, outputting, or accessing (e.g., accessing data in memory). Furthermore, "judgment" and "decision" can include considering something as having been "judged" or "decided" after resolving, selecting, choosing, establishing, comparing, etc. In other words, "judgment" and "decision" can include considering something as having been "judged" or "decided" after some action. Also, "judgment (decision)" can be reinterpreted as "assuming," "expecting," or "considering." 【0101】The terms “connected,” “coupled,” or any variation thereof, mean any direct or indirect connection or coupling between two or more elements, and may include the presence of one or more intermediate elements between two elements that are “connected” or “coupled” with each other. The coupling or connection between elements may be physical, logical, or a combination thereof. For example, “connection” may be reinterpreted as “access.” As used in this disclosure, two elements may be considered to be “connected” or “coupled” with each other using at least one of one or more wires, cables, and printed electrical connections, and, in some non-limiting and non-exclusive examples, electromagnetic energy having wavelengths in the radio frequency domain, microwave domain, and optical (both visible and invisible) domain. 【0102】 In this disclosure, the phrase "based on" does not mean "based solely on" unless otherwise specified. In other words, the phrase "based on" means both "based solely on" and "based at least on." 【0103】 Any reference to elements using the designations “first,” “second,” etc., as used in this disclosure does not generally limit the quantity or order of those elements. These designations may be used in this disclosure as a convenient way to distinguish between two or more elements. Accordingly, references to the first and second elements do not imply that only two elements may be employed, or that the first element must precede the second element in any way. 【0104】 Where the terms “include,” “including,” and variations thereof are used in this disclosure, these terms are intended to be inclusive, as is the term “comprising.” Furthermore, the term “or” as used in this disclosure is not intended to mean exclusive OR. 【0105】In this disclosure, if articles are added through translation, such as a, an, and the in English, this disclosure may include the fact that the noun following these articles is plural. 【0106】 In this disclosure, the term "A and B are different" may mean "A and B are different from each other." The term may also mean "A and B are each different from C." Terms such as "separate" and "combine" may be interpreted similarly to "different." 【0107】The text processing device and text processing method of the present disclosure have the following configurations: [1] A text processing device comprising: an acquisition unit that acquires a group of texts that includes a plurality of consecutive texts and is to be divided; a division unit that acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired by the acquisition unit, and divides the group of texts into a plurality using the information; and a determination unit that determines whether the length of each of the text groups after division by the division unit satisfies a set condition, wherein, in accordance with the determination by the determination unit, the division unit acquires information necessary for dividing the divided text group and of a different type from the previous division based on the texts included in the divided text group, and divides the divided text group into a plurality using the information. [2] The text processing device according to [1], wherein the division unit acquires text feature quantities that can calculate the similarity between texts as information necessary for dividing the group of texts or the divided text group. [3] The text processing device according to [2], wherein the division unit acquires a moving average of text feature quantities that can calculate the similarity between texts as information necessary for dividing the group of texts or the divided text group. [4] The splitting unit acquires information indicating whether a text is a question as information necessary for splitting the text group or splitting the text group after splitting, as described in any of [1] to [3]. [5] The splitting unit acquires information indicating the person related to the text as information necessary for splitting the text group or splitting the text group after splitting, as described in any of [1] to [4]. [6] The determination unit compares the length of the text group after splitting by the splitting unit with a threshold, and based on the comparison, determines whether the length of each of the text groups after splitting satisfies the set conditions, as described in any of [1] to [5]. [7] The determination unit compares the lengths of the text groups after splitting by the splitting unit with each other, and based on the comparison, determines whether the length of each of the text groups after splitting satisfies the set conditions, as described in any of [1] to [6].[8] A text processing method comprising: an acquisition step in which a text processing device acquires a group of texts that include a plurality of consecutive texts and are to be divided; a division step in which the text processing device acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired in the acquisition step, and divides the group of texts into multiple parts using the information; and a determination step in which the text processing device determines whether the length of each of the divided text groups in the division step satisfies a set condition, wherein, in accordance with the determination in the determination step, in the division step, the device acquires information necessary for dividing the divided text group and of a different type from the previous division based on the texts included in the divided text group, and divides the divided text group into multiple parts using the information. 【0108】 10...Text processing unit, 11...Acquisition unit, 12...Splitting unit, 13...Decision unit, 1001...Processor, 1002...Memory, 1003...Storage, 1004...Communication device, 1005...Input device, 1006...Output device, 1007...Bus.
Claims
1. A text processing device comprising: an acquisition unit that acquires a group of texts that include multiple consecutive texts and are to be divided; a division unit that acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired by the acquisition unit, and divides the group of texts into multiple parts using the information; and a determination unit that determines whether the length of each of the texts after division by the division unit satisfies a set condition, wherein, in response to the determination by the determination unit, the division unit acquires information necessary for dividing the divided texts and of a different type from the previous division based on the texts included in the divided texts, and divides the divided texts into multiple parts using the information.
2. The text processing apparatus according to claim 1, wherein the division unit acquires text feature quantities that enable the calculation of similarity between texts as information necessary for dividing the text group or dividing the text group after division.
3. The text processing device according to claim 2, wherein the division unit acquires a moving average of text features that can calculate the similarity between texts, as information necessary for dividing the text group or dividing the text group after division.
4. The text processing apparatus according to claim 1, wherein the division unit acquires information indicating whether or not a text is a question, as information necessary for dividing the text group or dividing the text group after division.
5. The text processing device according to claim 1, wherein the division unit acquires information indicating a person related to the text as information necessary for dividing the text group or dividing the text group after division.
6. The text processing apparatus according to claim 1, wherein the determination unit compares the length of the text group after division by the division unit with a threshold, and determines, based on the comparison, whether the length of each of the divided text groups satisfies the set conditions.
7. The text processing apparatus according to claim 1, wherein the determination unit compares the lengths of the text groups after division by the division unit and determines, based on the comparison, whether the length of each of the divided text groups satisfies the set conditions.
8. A text processing method comprising: an acquisition step in which a text processing device acquires a group of texts that include a plurality of consecutive texts and are to be divided; a division step in which the text processing device acquires information necessary for dividing the group of texts based on the texts included in the group of texts acquired in the acquisition step, and uses that information to divide the group of texts into multiple parts; and a determination step in which the text processing device determines whether the length of each of the divided text groups in the division step satisfies a set condition, wherein, in accordance with the determination in the determination step, in the division step, the device acquires information necessary for dividing the divided text group and of a different type from the previous division based on the texts included in the divided text group, and uses that information to divide the divided text group into multiple parts.