Information processing systems, information processing methods
The information processing system addresses the issue of unspecified pronouns in transcripts by using a multimodal AI server to create detailed and reliable documents from video data, improving document creation efficiency and clarity.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- SEMICON ENERGY LAB CO LTD
- Filing Date
- 2025-12-04
- Publication Date
- 2026-07-02
Smart Images

Figure 2026110531000001_ABST
Abstract
Description
Technical Field
[0001] One aspect of the present invention relates to an information processing system, an information processing method, or a semiconductor device.
[0002] Note that one aspect of the present invention is not limited to the above technical field. The technical field of one aspect of the invention disclosed in this specification and the like relates to a product, a method, or a manufacturing method. Or, one aspect of the present invention relates to a process, a machine, a manufacture, or a composition of matter. Therefore, more specifically, as the technical field of one aspect of the present invention disclosed in this specification, an information processing device, a semiconductor device, a storage device, a driving method thereof, or a manufacturing method thereof can be cited as an example.
Background Art
[0003] In recent years, the development of language models using neural networks has been actively carried out, and in particular, large language models (LLMs) have attracted attention. A large language model is a natural language processing model learned using a large amount of data. With a large language model, for example, a dialogue model that answers user instructions can be realized. In Non-Patent Document 1, GPT-4 (registered trademark) (Generative Pre-trained Transformer 4) is disclosed as a large language model, and ChatGPT is disclosed as a dialogue model.
[0004] By using a large language model, the capabilities of natural language processing models have been greatly improved. On the other hand, due to the increase in the size of language models, it is difficult to build and operate a language model on one's own in terms of facilities and costs. Therefore, using an external service that provides a language model has become one form of using a language model. Furthermore, language models are progressing toward multimodalization that performs language generation considering the interpretation of image information. Such a model that also handles information other than language is also called a multimodal model or a base model.
Prior Art Documents
[0005] [Non-Patent Document 1] Summary of ChatGPT / GPT-4 Research and Perspective Towards the Future of Large Language Models, Yiheng Liu et al. (Submitted on 4 Apr 2023, [online], Internet)<URL:https: / / arxiv.org / abs / 2304.01852> [Overview of the project] [Problems that the invention aims to solve]
[0006] Conventionally, it is known that speech recognition technology can be used to convert audio data into text data and perform transcription. For example, audio data can be obtained from video data of meetings, and transcripts can be generated from the obtained audio data. However, the transcribed information may contain many demonstrative pronouns that do not specify the subject, and may not be sufficient as a record of conversations.
[0007] One aspect of the present invention, in view of the above-mentioned problems, aims to provide an information processing system that assists in document creation by supplementing referential words in conversation transcripts, etc., whose objects are not specified. Alternatively, it aims to provide a novel information processing system that is excellent in convenience, usefulness, or reliability. Alternatively, it aims to provide a novel information processing method that is excellent in convenience, usefulness, or reliability. Alternatively, it aims to provide a novel information processing system, a novel information processing method, or a novel semiconductor device.
[0008] Furthermore, the description of these problems does not preclude the existence of other problems. Moreover, one aspect of the present invention does not need to solve all of these problems. Other problems will naturally become apparent from the description in the specification, drawings, claims, etc., and it is possible to extract other problems from the description in the specification, drawings, claims, etc. [Means for solving the problem]
[0009] (1) One aspect of the present invention is an information processing system having a first component, a second component, and a third component.
[0010] The first component includes the function of receiving video information and transmitting it to the third component, and the function of receiving and providing annotated documents. The annotated document includes the first document created from the video information. The first document includes an unspecified demonstrative pronoun and annotations, the annotations including information identified as the target of the unspecified demonstrative pronoun.
[0011] The second component includes the function of receiving the first prompt and sending a list to the third component, the function of receiving the second prompt and sending the first document to the third component, and the function of performing processing using a multimodal AI server. The multimodal AI server includes the function of generating a list according to the first prompt and the function of generating the first document according to the second prompt.
[0012] The third component includes functions for receiving video information, lists, and the first document and sharing them within the third component, sending the first and second prompts to the second component, and creating an annotated document and sending it to the first component. The third component also includes a first subcomponent, a second subcomponent, a third subcomponent, and a fourth subcomponent.
[0013] The first subcomponent has the function of dividing video information to create a group of chunked data, which includes identification information, audio information, and still images, with the still images being representative images of the chunked data.
[0014] The second subcomponent has the functionality to transcribe audio information into a second document.
[0015] The third subcomponent comprises a database and a management system. The database has the function of storing a group of chunked data, and the management system has the function of integrating a first document into the chunked data and creating annotated documents from the database.
[0016] The fourth subcomponent includes a function to create a first prompt and a function to create a second prompt by sequentially selecting identification information from a list. The first prompt includes a first instruction and a first table, and the first instruction includes a procedure to generate a list from the first table. The list includes identification information that identifies a second document containing an unspecified demonstrative pronoun. The second prompt includes a second instruction, a second document, and a still image, and the second instruction includes a procedure to generate the first document by identifying the target of the unspecified demonstrative pronoun contained in the second document from the still image.
[0017] (2) Another aspect of the present invention is the above-described information processing system, wherein the third subcomponent has the function of sharing the first table and the second table within the third component.
[0018] The management system has the function of creating a first table and a second table from the database. The first table contains a first column and a second column, with the first column containing identification information and the second column containing the second document. The second table contains a third, fourth, and fifth column, with the third column containing identification information included in the list, the fourth column containing the second document, and the fifth column containing still images.
[0019] (3) Also, one aspect of the present invention is an information processing system in which the first component has a function of receiving and providing an abstract document.
[0020] The second component has a function of receiving a third prompt and sending the abstract document to the third component, and the multimodal AI server has a function of generating an abstract document according to the third prompt.
[0021] The third component has a function of sending the third prompt to the second component and a function of receiving the abstract document and sending it to the first component.
[0022] The fourth sub-component has a function of creating a third prompt. The third prompt includes a third instruction and an annotated document, and the third instruction includes a procedure for generating an abstract document from the annotated document.
[0023] (4) Also, one aspect of the present invention is an information processing system in which the first component has a function of receiving and providing a task list.
[0024] The second component has a function of receiving a fourth prompt and sending the task list to the third component. The multimodal AI server has a function of generating a task list according to the fourth prompt.
[0025] The third component has a function of sending the fourth prompt to the second component and a function of receiving the task list and sending it to the first component.
[0026] The fourth sub-component has a function of creating a fourth prompt. The fourth prompt includes a fourth instruction and an annotated document, and the fourth instruction includes a procedure for generating a task list from the annotated document.
[0027] (5) One aspect of the present invention is an information processing method having a first phase. The first phase includes steps from the first step to the eighteenth step.
[0028] In the first step of the first phase, the first component receives video information and transmits it to the second component. The second component includes a first sub-component, a second sub-component, a third sub-component, and a fourth sub-component. The third sub-component includes a database and a management system.
[0029] In the second step of the first phase, the second component receives video information and shares it within the second component.
[0030] In the third step of the first phase, the first sub-component divides the video information to create a group of chunk data. The group of chunk data includes chunk data, and the chunk data includes identification information, audio information, and a still image. The still image is an image representing the chunk data.
[0031] In the fourth step of the first phase, the second sub-component performs character recognition on the audio information to generate a first document.
[0032] In the fifth step of the first phase, the third sub-component uses the management system to integrate the first document into the chunk data.
[0033] In the sixth step of the first phase, the management system creates a first table from the database and shares the first table within the second component. The first table includes a first column and a second column. The first column includes identification information, and the second column includes the first document.
[0034] In the seventh step of the first phase, the fourth subcomponent creates a first prompt and sends it to the third component. The first prompt includes a first instruction and a first table, the first instruction including a procedure to generate a list from the first table, the list including identification information that identifies a first document containing unspecified referential terms.
[0035] In the eighth step of the first phase, the third component receives the first prompt and generates a list using the multimodal AI server.
[0036] In the ninth step of the first phase, the third component sends the list to the second component.
[0037] In the tenth step of the first phase, the second component accepts a list and shares it within the second component.
[0038] In the 11th step of the first phase, the management system creates a second table from the database and shares the second table within the second component. The second table includes a third, fourth, and fifth column, the third column containing identification information included in the list, the fourth column containing the first document, and the fifth column containing still images.
[0039] In the twelfth step of the first phase, the fourth subcomponent sequentially selects records from the second table, creates a second prompt, and sends it to the third component. The second prompt includes a second instruction, a first document, and a still image, the second instruction including a procedure to generate the second document by identifying the object of an unspecified demonstrative pronoun contained in the first document from the still image. The second document includes an unspecified demonstrative pronoun and a note, the note containing information identified as the object of the unspecified demonstrative pronoun.
[0040] In the 13th step of the first phase, the third component receives the second prompt and generates the second document using the multimodal AI server.
[0041] In the 14th step of the first phase, the third component sends the second document to the second component.
[0042] In step 15 of the first phase, the second component receives the second document and shares it within the second component.
[0043] In the 16th step of the first phase, the management system integrates the second document into the chunked data.
[0044] In the 17th step of the first phase, the management system creates an annotated document from the database and sends it to the first component. The annotated document includes a second document created from the video information.
[0045] In the 18th step of the first phase, the first component receives and provides annotated documents.
[0046] (6) Another aspect of the present invention is the above-described information processing method having a second phase. The second phase follows the first phase, and the second phase has steps 1 through 6.
[0047] In the first step of the second phase, the fourth subcomponent creates a third prompt and sends it to the third component. The third prompt includes a third instruction and an annotated document, and the third instruction includes a procedure for generating a summary document from the annotated document.
[0048] In the second step of the second phase, the third component receives a third prompt and generates a summary document using a multimodal AI server.
[0049] In the third step of the second phase, the third component sends the summary document to the second component.
[0050] In the fourth step of the second phase, the second component receives the summary document and shares it within the second component.
[0051] In the fifth step of the second phase, the second component sends the summary document to the first component.
[0052] In the sixth step of the second phase, the first component receives and provides the summary document.
[0053] (7) Another aspect of the present invention is an information processing method having a third phase. The third phase follows the first phase, and the third phase has steps 1 through 6.
[0054] In the first step of the third phase, the fourth subcomponent creates a fourth prompt and sends it to the third component. The fourth prompt includes a fourth instruction and an annotated document, the fourth instruction includes a procedure for generating a task list from the annotated document.
[0055] In the second step of the third phase, the third component receives the fourth prompt and generates a task list using the multimodal AI server.
[0056] In the third step of the third phase, the third component sends the task list to the second component.
[0057] In the fourth step of the third phase, the second component receives and shares the task list within the second component.
[0058] In the fifth step of the third phase, the second component sends the task list to the first component.
[0059] In the sixth step of the third phase, the first component receives and provides the task list. [Effects of the Invention]
[0060] One aspect of the present invention, in view of the above problems, can provide an information processing system that supplements unspecified referential words included in conversation transcripts, etc., and assists in the creation of documents. Alternatively, it can provide a novel information processing system that is excellent in convenience, usefulness, or reliability. Alternatively, it can provide a novel information processing method that is excellent in convenience, usefulness, or reliability. Alternatively, it can provide a novel information processing system, a novel information processing method, or a novel semiconductor device.
[0061] Furthermore, the description of these problems does not preclude the existence of other problems. Moreover, one aspect of the present invention does not need to solve all of these problems. Other problems will naturally become apparent from the description in the specification, drawings, claims, etc., and it is possible to extract other problems from the description in the specification, drawings, claims, etc. [Brief explanation of the drawing]
[0062] [Figure 1] Figure 1 is a diagram illustrating an example of the configuration of an information processing system. [Figure 2] Figures 2(A) and 2(B) show examples of video information related to the operation of an information processing system. [Figure 3] Figure 3 shows an example of the configuration of components related to the operation of an information processing system. [Figure 4] Figure 4 is a diagram illustrating video information related to the operation of an information processing system. [Figure 5] Figures 5(A) and 5(B) illustrate chunk data related to the operation of an information processing system. [Figure 6] Figure 6 is a diagram illustrating the documentation related to the operation of an information processing system. [Figure 7] Figures 7(A) and 7(B) illustrate the generated tables related to the operation of the information processing system. [Figure 8] Figure 8(A) shows an example of the configuration of prompts related to the operation of the information processing system. Figure 8(B) is a diagram illustrating the generated list related to the operation of the information processing system. [Figure 9] Figure 9(A) shows an example of the configuration of prompts related to the operation of an information processing system. Figure 9(B) is a diagram illustrating the documentation related to the operation of an information processing system. [Figure 10] Figures 10(A) and 10(B) show examples of prompt configurations related to the operation of an information processing system. [Figure 11] Figure 11 is a block diagram illustrating an example of the configuration of an information processing device. [Figure 12] Figure 12 is a flowchart illustrating an example of an information processing method. [Figure 13] Figure 13 is a flowchart illustrating an example of an information processing method. [Figure 14] Figure 14 is a flowchart illustrating an example of an information processing method. [Modes for carrying out the invention]
[0063] Embodiments will be described in detail with reference to the drawings. However, it will be readily apparent to those skilled in the art that the present invention is not limited to the following description, and that its form and details can be modified in various ways without departing from the spirit and scope of the present invention. Accordingly, the present invention is not to be interpreted as being limited to the contents of the embodiments shown below. In the configuration of the invention described below, the same reference numerals are used in common across different drawings for the same parts or parts having similar functions, and repeated descriptions are omitted.
[0064] In this specification, ordinal numbers such as "first," "second," etc., are used to avoid confusion of components and do not limit the number of components or the order of components (e.g., process order or stacking order). Furthermore, even if a term does not have an ordinal number in this specification, an ordinal number may be added in the claims to avoid confusion of components. Even if a term has an ordinal number in this specification, a different ordinal number may be added in the claims. Even if a term has an ordinal number in this specification, an ordinal number may be omitted in the claims.
[0065] In the drawings attached to this specification, components are classified by function and shown as independent blocks in block diagrams. However, in reality, it is difficult to completely separate components by function, and a single component may be involved in multiple functions.
[0066] (Embodiment 1) This embodiment describes an information processing system according to one aspect of the present invention. Figures 1 to 11 are used in the description.
[0067] <Example of an information processing system configuration 1> Figure 1 shows a diagram illustrating an example of the configuration of an information processing system according to one embodiment of the present invention.
[0068] The information processing system described in this embodiment includes component 110, component 130, and component 120.
[0069] For example, the information processing devices that perform the functions of component 110, component 130, and component 120 each include a computing device and a communication device. Furthermore, these communication devices are connected to each other via a network 51 to constitute an information processing system according to one embodiment of the present invention.
[0070] <Component 110 Configuration Example 1> Component 110 has the function of receiving video information MvI and transmitting it to component 120, and the function of receiving annotated documents AnDoc and providing them, for example, to a user 99 of the information processing system (see Figure 1). Specifically, it provides the information to the user 99 of the information processing system using output devices such as display devices, speakers, and printers.
[0071] Example of MvI video information configuration An example of video information MvI is shown in Figure 2(A).
[0072] Video information (MvI) includes both audio and video. For example, materials and audio displayed on a display device can be recorded and used as video information (MvI). Furthermore, video information (MvI) may include scenes where a person points to materials using a pointing device (Dev), such as a mouse pointer, while uttering demonstrative pronouns.
[0073] For example, a panoramic camera can be used to record the progress of a meeting or other event, and this can be used as video information (MvI). An example of video information (MvI) using a panoramic camera is shown in Figure 2(B).
[0074] Note that video information MvI may include scenes where speaker 98 points to a document while uttering demonstrative pronouns.
[0075] The audio in video information (MvI) may contain demonstrative pronouns that cannot identify the object. For example, an example of a statement made during a meeting is shown in the document in the following paragraph. The word "here" in the document is a demonstrative pronoun, and it is difficult to identify the object of the demonstrative pronoun from this document alone.
[0076] "The diagram in the document shows this point."
[0077] In this specification, demonstrative pronouns whose object is unclear or which are difficult to interpret from the context are referred to as "demonstrative pronouns with unspecified object (Dem)."
[0078] Example of annotated document structure in AnDoc format The annotated document AnDoc includes document Doc1(X) created from video information MvI. Document Doc1(X) includes the above-mentioned demonstrative pronoun Dem, whose target is not specified, and annotation Ano(X). Note that annotation Ano(X) includes information that has been identified as the target of the demonstrative pronoun Dem, whose target is not specified.
[0079] <Component 130 Configuration Example 1> Component 130 includes the functions of receiving prompt Pt1 and sending list L1 to component 120, receiving prompt Pt2 and sending document Doc1(X) to component 120, and processing using the multimodal AI server 200 (see Figure 1).
[0080] 《Example Configuration 1 of Multimodal AI Server 200》 The multimodal AI server 200 has the function of generating list L1 according to prompt Pt1 and the function of generating document Doc1(X) according to prompt Pt2. Prompt Pt1, prompt Pt2, list L1, and document Doc1(X) are created by component 120, which will be described later.
[0081] The multimodal AI server 200 can adapt to various tasks using a foundational model based on artificial intelligence (AI). For example, it can collect information from two or more different data types (text data, audio data, image data, video data, etc.), integrate them, and perform processing. The foundational model is an AI model that has the ability to interpret at least both language and images to generate language, and the server has the ability to convert different data into a format that the AI model can interpret.
[0082] <Component 120 Configuration Example 1> Component 120 has the function of receiving video information MvI, list L1, and document Doc1(X) and sharing them within component 120 (see Figure 1). It also has the function of sending prompt Pt1 and prompt Pt2 to component 130. It also has the function of sending annotated document AnDoc to component 110.
[0083] Component 120 comprises subcomponents 120A, 120B, 120C, and 120D. An example configuration of component 120 is shown in Figure 3.
[0084] 《Example Configuration of Subcomponent 120A》 Subcomponent 120A has the function of dividing the video information MvI to create a group of chunked data ChD (see Figure 3).
[0085] Figure 4 shows a diagram illustrating the video information MvI. The horizontal axis represents time, and the diagram schematically represents the audio information AdI and video information Vid at each time point.
[0086] The video information MvI includes audio information AdI and video information Vid. The video information MvI can be divided into chunk data ChD. Chunk data ChD includes the divided video information Vid. A still image Pic representing the chunk data ChD can be extracted from the divided video information Vid. Furthermore, the time (Time) representing the chunk data ChD can be recorded on the still image Pic.
[0087] Figure 5(A) shows an example of the structure of a group of chunk data ChD separated from video information MvI.
[0088] A group of chunk data ChD contains one chunk data ChD(X). In other words, chunk data ChD(X) is one selected from a group of chunk data ChD.
[0089] Each chunk of data, ChD(X), contains an identifier ID(X), an audio information AdI(X), and a still image Pic(X). The still image Pic(X) is a representative image of the chunk of data ChD(X).
[0090] Specifically, chunk data ChD(1) includes identification information ID(1), audio information AdI(1), and still image Pic(1). Still image Pic(1) is a still image representative of chunk data ChD(1). Chunk data ChD(2) includes identification information ID(2), audio information AdI(2), and still image Pic(2). Still image Pic(2) is a still image representative of chunk data ChD(2). Chunk data ChD(3) includes identification information ID(3), audio information AdI(3), and still image Pic(3). Still image Pic(3) is a still image representative of chunk data ChD(3).
[0091] 《Example Configuration of Subcomponent 120B》 Subcomponent 120B has the function of transcribing audio information AdI(X) into document Doc2(X). In other words, subcomponent 120B has the function of creating document Doc2(X) from audio information AdI(X) (see Figure 3).
[0092] 《Example Configuration of Subcomponent 120C》 Subcomponent 120C includes a database (DB) and a management system (DBMS) (see Figure 3).
[0093] The database (DB) has the function of storing a group of chunked data (ChD).
[0094] The management system (DBMS) has the functionality to integrate document Doc1(X) into chunk data ChD(X) and the functionality to create annotated document AnDoc from the database DB (see Figure 5(B)).
[0095] Furthermore, the management system (DBMS) has the functionality to create annotated documents (AnDoc) from a group of chunked data (ChD). Specifically, it has the functionality to create annotated documents (AnDoc) by adding annotations to each chunk of data as needed, and then connecting them in the order of the identification information IDs.
[0096] For example, the management system (DBMS) creates an annotated document Doc(1) if chunk data ChD(1) requires annotation. However, if chunk data ChD(2) does not require annotation, it does not create an annotated document. The management system (DBMS) also creates an annotated document AnDoc by connecting document Doc1(1) associated with chunk data ChD(1), document Doc2(2) associated with chunk data ChD(2), and document Doc1(3) associated with chunk data ChD(3) in the order of their identification information IDs. Similarly, it creates an annotated document AnDoc by connecting document Doc1(X) associated with chunk data ChD(X) (or document Doc2(X) for chunk data ChD(2) where document Doc1(X) is not created) in a predetermined order of their identification information IDs.
[0097] Figure 6 shows an example of an annotated document, AnDoc, where an annotation Ano is added to a demonstrative pronoun Dem whose object is not specified. Note that annotation Ano is information that specifies the object of the demonstrative pronoun Dem, which was previously unspecified.
[0098] 《Example Configuration of Subcomponent 120C 2》 Subcomponent 120C has the function of sharing tables Tbl1 and Tbl2 within component 120. The management system DBMS also has the function of creating tables Tbl1 and Tbl2 from the database DB.
[0099] Figure 7(A) shows an example of a created table Tbl1.
[0100] Table Tbl1 contains columns Col11 and Col12. Column Col11 contains the identification information ID(X). Column Col12 contains the document Doc2(X).
[0101] Figure 7(B) shows an example of the created table Tbl2.
[0102] Table Tbl2 contains columns Col21, Col22, and Col23. Column Col21 contains the identification information ID(X) included in list L1. Column Col22 contains the document Doc2(X). Column Col23 contains the still image Pic(X).
[0103] 《Example Configuration of Subcomponent 120D》 Subcomponent 120D has the function of creating prompt Pt1 and the function of selecting identification information ID(X) sequentially from list L1 to create prompt Pt2 (see Figure 3).
[0104] [Example configuration for Prompt Pt1] Figure 8(A) shows a diagram illustrating the configuration of prompt Pt1. Prompt Pt1 includes instruction g1 and table Tbl1.
[0105] Instruction g1 includes a procedure for generating list L1 from table Tbl1. List L1 includes identification information ID(X) that identifies document Doc2(X). Note that document Doc2(X) includes the demonstrative pronoun Dem, whose target is not specified. In other words, document Doc2(X), identified by the identification information ID(X) listed in list L1, includes the demonstrative pronoun Dem, whose target is not specified.
[0106] For example, the document in the following paragraph can be used as prompt Pt1.
[0107] Document: { Table Tbl1} Please associate the demonstrative pronouns with their objects from the documents included in the table above. If the object is unknown, create a list of documents containing the demonstrative pronoun with an unspecified object, along with their identifying information.
[0108] Figure 8(B) shows the generated list L1. If an unspecified demonstrative pronoun Dem is found, the identifier ID of that chunk is included in list L1.
[0109] Specifically, when the chunk data ChD(1) identified by identification information ID(1) contains an unspecified demonstrative pronoun Dem, list L1 contains identification information ID(1). Also, when the chunk data ChD(3) identified by identification information ID(3) contains an unspecified demonstrative pronoun Dem, list L1 contains identification information ID(3). Furthermore, when the chunk data ChD(X) identified by identification information ID(X) contains an unspecified demonstrative pronoun Dem, list L1 contains identification information ID(X).
[0110] [Example configuration for Prompt Pt2] Figure 9(A) shows a diagram illustrating the configuration of prompt Pt2. Prompt Pt2 includes instruction g2, document Doc2(X), and still image Pic(X).
[0111] Instruction g2 includes a procedure to identify the target of the unspecified demonstrative pronoun Dem contained in document Doc2(X) from the still image Pic(X), and generate document Doc1(X).
[0112] For example, the document in the following paragraph can be used for prompt Pt2.
[0113] Image: Still image Pic(X) Document: Document Doc2(X) Demonstrative pronouns: Demonstrative pronouns that do not specify the object. Identify the object of the demonstrative pronouns in the document from the image and provide your answer.
[0114] Figure 9(B) shows the generated document Doc1(X). Document Doc1(X) is a document that includes the annotation Ano(X) to the demonstrative pronoun Dem, which is included in document Doc2(X) and whose target is not specified.
[0115] This makes it possible to identify the object pointed to by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X) and generate the annotation Ano(X). Furthermore, it is possible to add the annotation Ano(X) to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, it is possible to create an annotated document AnDoc with the annotation Ano(X) added to the audio contained in the video information MvI. Furthermore, the annotated document AnDoc can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided.
[0116] <Component 110 Configuration Example 2> Component 110 has the function of receiving a summary document Sum and providing it, for example, to a user 99 of the information processing system (see Figure 1). Note that the summary document Sum is a document that summarizes an annotated document AnDoc.
[0117] <Component 130 Configuration Example 2> Component 130 has the function of performing processing using the multimodal AI server 200 and the function of receiving prompt Pt3 and sending the summary document Sum to component 120.
[0118] 《Example Configuration 2 of the Multimodal AI Server 200》 The multimodal AI server 200 has the function of generating a summary document (Sum) according to prompt Pt3.
[0119] <Component 120 Configuration Example 2> Component 120 has the function of sending prompt Pt3 to component 130 and the function of receiving the summary document Sum and sending it to component 110.
[0120] 《Example Configuration 3 for Subcomponent 120D》 Subcomponent 120D has the function of creating prompt Pt3.
[0121] [Example configuration for Prompt Pt3] Figure 10(A) shows a diagram illustrating the configuration of prompt Pt3. Prompt Pt3 includes instruction g3 and annotated document AnDoc.
[0122] Instruction g3 includes steps to generate a summary document, Sum, from an annotated document, AnDoc.
[0123] For example, the document in the following paragraph can be used for prompt Pt3.
[0124] "Document:{Annotated document AnDoc}" Please summarize the document.
[0125] This allows for the identification of the object indicated by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X), and the generation of annotation Ano(X). Furthermore, annotation Ano(X) can be added to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, an annotated document AnDoc can be created by adding annotation Ano(X) to the audio contained in the video information MvI. A summary document Sum can also be generated from the annotated document AnDoc. Furthermore, the summary document Sum can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided.
[0126] <Component 110 Configuration Example 3> Component 110 has the function of receiving a task list (TaL) and providing it to, for example, a user 99 of the information processing system (see Figure 1). The task list (TaL) is a list of tasks. Specifically, it is a list created by extracting documents with deadlines using annotated documents (AnDoc), and summarizing the task content, priority, deadline, etc.
[0127] <Component 130 Configuration Example 3> Component 130 has the function of performing processing using the multimodal AI server 200 and the function of receiving prompt Pt4 and sending task list TaL to component 120.
[0128] 《Example Configuration 3 for Multimodal AI Server 200》 The multimodal AI server 200 has the function of generating a task list TaL according to prompt Pt4.
[0129] <Component 120 Configuration Example 3> Component 120 has the function of sending prompt Pt4 to component 130 and the function of receiving task list TaL and sending it to component 110.
[0130] 《Example Configuration of Subcomponent 120D 4》 Subcomponent 120D has the function of creating prompt Pt4.
[0131] [Example configuration for Prompt Pt4] Figure 10(B) shows the configuration diagram of prompt Pt4. Prompt Pt4 includes instruction g4 and annotated document AnDoc.
[0132] Instruction g4 includes a procedure for generating a task list (TaL) from an annotated document (AnDoc).
[0133] For example, the document in the following paragraph can be used for prompt Pt4.
[0134] "Document:{Annotated document AnDoc}" Please create a task list from the document.
[0135] This allows for the identification of the target indicated by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X), and the generation of annotation Ano(X). Furthermore, annotation Ano(X) can be added to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, an annotated document AnDoc can be created by adding annotation Ano(X) to the audio contained in the video information MvI. A task list TaL can also be generated from the annotated document AnDoc. Furthermore, the task list TaL can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided.
[0136] <Example of Information Processing System Configuration 2> Figure 1 is a diagram illustrating an example of the configuration of an information processing system according to one embodiment of the present invention.
[0137] The information processing system described in this embodiment includes component 110, component 120, and component 130.
[0138] For example, an information processing system according to one aspect of the present invention can be configured with an information processing device that performs the functions of component 110, an information processing device that performs the functions of component 120, and an information processing device that performs the functions of component 130. The number of information processing devices constituting the information processing system according to one aspect of the present invention is one or more. Furthermore, for example, an information processing system according to one aspect of the present invention can be configured by connecting multiple information processing devices using a network 51.
[0139] By configuring an information processing system according to one aspect of the present invention using multiple information processing devices, the load related to information processing can be distributed.
[0140] <Example of Information Processing Device Configuration 1> The configuration example 1 of the information processing device described in this embodiment can be used as component 110. The configuration example 1 of the information processing device can also be called a client computer, etc. For example, a desktop computer can be used as component 110.
[0141] Configuration Example 1 of the information processing device can receive data input by a user of an information processing system according to one aspect of the present invention. Furthermore, Configuration Example 1 of the information processing device can provide the user with data output by an information processing system according to one aspect of the present invention.
[0142] In component 110, for example, dedicated application software, a web browser, etc., are operated. Users of an information processing system according to one embodiment of the present invention can access the information processing system through any of these. As a result, they can enjoy services using the information processing system according to one embodiment of the present invention.
[0143] <Example of Information Processing Device Configuration 2> The configuration example 2 of the information processing device described in this embodiment can be used for component 120. For example, a workstation, server computer, supercomputer, etc., can be used for component 120.
[0144] Furthermore, it is preferable that the second example of the information processing device configuration has the functionality of a parallel computer. By using it as a parallel computer, for example, it can perform large-scale calculations necessary for the learning and inference of artificial intelligence (AI).
[0145] Furthermore, the second example of the information processing device configuration can perform processing using a large-scale language model based on AI.
[0146] For example, it is preferable to be able to perform processing using natural language models such as GPT-3(registered trademark), GPT-3.5, GPT-4(registered trademark), LaMDA, Llama2, Llama3, Llama3.2, and Llama3.3.
[0147] <Example of Information Processing Device Configuration 3> The configuration example 3 of the information processing device described in this embodiment can be used for component 130. Note that component 130 is larger in scale and has higher computing power than component 120. For example, a workstation, server computer, supercomputer, etc., can be used for component 130.
[0148] Furthermore, it is preferable that the information processing device configuration example 3 has the functionality of a parallel computer. By using it as a parallel computer, for example, it can perform large-scale calculations necessary for AI training and inference.
[0149] Furthermore, the third example of the information processing device configuration can perform processing using an AI-based foundational model.
[0150] For example, processing can be performed using foundational models such as GPT-3(registered trademark), GPT-3.5, GPT-4(registered trademark), LaMDA, Llama2, Llama3, Llama3.2, and Llama3.3. Processing using GPT-4(registered trademark) is particularly preferable.
[0151] Furthermore, those who provide services using an information processing system according to one embodiment of the present invention do not necessarily need to own the configuration example 3 of the information processing device themselves. For example, a service provider can utilize a portion of the services provided by other businesses, etc., using the configuration example 3 of the information processing device.
[0152] <Example of Network 51 Configuration> A network 51 that can be used in an information processing system according to one embodiment of the present invention can connect multiple information processing devices. This allows the connected multiple information processing devices to transmit and receive data from each other. Furthermore, it allows for the distribution of the information processing load.
[0153] When performing wireless communication, communication protocols or technologies such as 4G, 5G, 6G, or specifications standardized by IEEE, such as Wi-Fi® and Bluetooth®, may be used.
[0154] For example, a local network can be used as network 51. An intranet or extranet can also be used as network 51. Furthermore, PAN (Personal Area Network), LAN (Local Area Network), CAN (Campus Area Network), MAN (Metropolitan Area Network), WAN (Wide Area Network), GAN (Global Area Network), etc., can be used as network 51.
[0155] Furthermore, for example, a global network can be used for network 51. Specifically, the internet, which is the foundation of the World Wide Web (WWW), can be used.
[0156] Furthermore, a person who provides services using an information processing system according to one aspect of the present invention can, for example, provide services using an information processing method according to one aspect of the present invention via a network 51.
[0157] Furthermore, if an information processing system according to one embodiment of the present invention is built within a local network, the possibility of confidential information leakage can be reduced compared to using the Internet.
[0158] <Example of Information Processing Device Configuration 4> Figure 11 is a block diagram illustrating an example configuration of an information processing device according to one embodiment of the present invention.
[0159] An information processing device 20 that can be used in an information processing system according to one embodiment of the present invention includes, for example, an input unit 21, a storage unit 22, a processing unit 23, an output unit 24, and a transmission line 25.
[0160] In the drawings attached to this specification, the components are classified by function and shown as independent blocks in the block diagram. However, in reality, it is difficult to completely separate the components by function, and one component may be involved in multiple functions. For example, a part of the processing unit 23 may function as the input unit 21. Also, one function may be involved in multiple components. For example, the processing performed by the processing unit 23 may be executed by different information processing devices depending on the processing.
[0161] Input section 21 The input unit 21 can receive data from outside the information processing device. For example, the input unit 21 can receive data via the network 51. Specifically, a device such as a personal computer equipped with a communication port or communication function can be used.
[0162] The input unit 21 supplies the received data to either or both of the storage unit 22 and the processing unit 23 via the transmission line 25.
[0163] 《Storage section 22》 The memory unit 22 has the function of storing the program executed by the processing unit 23. The memory unit 22 may also have the function of storing data generated by the processing unit 23 (for example, calculation results, analysis results, inference results), data received by the input unit 21, etc.
[0164] The storage unit 22 may have a database. The information processing device may also have a database separate from the storage unit 22. The information processing device may have the function to retrieve data from a database located outside the storage unit 22, outside the information processing device itself, or outside the information processing system. Furthermore, the information processing device may have the function to retrieve data from both its own database and an external database.
[0165] Either or both of the storage and / or file server can be used in the storage unit 22. Alternatively, a database recording the paths of files stored on the file server can be used in the storage unit 22.
[0166] The storage unit 22 includes at least one of volatile memory and non-volatile memory. Examples of volatile memory include DRAM (Dynamic Random Access Memory) and SRAM (Static Random Access Memory). Examples of non-volatile memory include ReRAM (Resistive Random Access Memory), PRAM (Phase Change Random Access Memory), FeRAM (Ferroelectric Random Access Memory), MRAM (Magnetoresistive Random Access Memory), and flash memory. The storage unit 22 may also include at least one of NOSRAM (registered trademark) and DOSRAM (registered trademark). The storage unit 22 may also include a recording media drive. Examples of recording media drives include hard disk drives (HDD) and solid state drives (SSD).
[0167] NOSRAM is an abbreviation for "Nonvolatile Oxide Semiconductor Random Access Memory (RAM)". NOSRAM is a type of memory where the memory cell is a 2-transistor (2T) or 3-transistor (3T) gain cell, and the transistors are transistors that use metal oxide in the channel formation region (also called OS transistors). OS transistors have an extremely small current flowing between the source and drain when off, i.e., a leakage current. By utilizing the characteristic of extremely low leakage current, NOSRAM can be used as a non-volatile memory by holding charge corresponding to the data within the memory cell. In particular, NOSRAM can read the stored data without destroying it (non-destructive read), making it suitable for computational processing that involves repeating data read operations a large amount. Because the data capacity of NOSRAM can be increased by stacking it, it can be used as a large-scale cache memory, main memory, or storage memory to improve the performance of semiconductor devices.
[0168] DRAM refers to a type of RAM (Random Access Memory) with 1T (transistor) 1C (capacitance) memory cells. DOSRAM is an abbreviation for "Dynamic Oxide Semiconductor RAM." DOSRAM is a type of DRAM formed using OS transistors, and it is a memory that temporarily stores information sent from an external source. DOSRAM is a memory that takes advantage of the low off-current of OS transistors, which suppresses data degradation due to off-current and allows data to be retained for a long period of time. In addition, because the number of data refreshes can be reduced, using DOSRAM can reduce power consumption.
[0169] In this specification, "metal oxide" refers to an oxide of a metal in a broad sense. Metal oxides are classified into oxide insulators, oxide conductors (including transparent oxide conductors), oxide semiconductors (also called oxide semiconductors or simply OS), etc. For example, when a metal oxide is used in the semiconductor layer of a transistor, that metal oxide may be referred to as an oxide semiconductor.
[0170] The metal oxide in the channel-forming region preferably contains indium (In). When the metal oxide in the channel-forming region contains indium, the carrier mobility (electron mobility) of the OS transistor is increased. For example, indium oxide (InOx) or indium gallium zinc oxide (In-Ga-Zn oxide, also written as "IGZO") can be used in the channel-forming region. Furthermore, the metal oxide in the channel-forming region is preferably an oxide semiconductor containing element M. Element M is preferably at least one of aluminum (Al), gallium (Ga), and tin (Sn). Other elements applicable to element M include boron (B), silicon (Si), titanium (Ti), iron (Fe), nickel (Ni), germanium (Ge), yttrium (Y), zirconium (Zr), molybdenum (Mo), lanthanum (La), cerium (Ce), neodymium (Nd), hafnium (Hf), tantalum (Ta), and tungsten (W). However, in some cases, element M may be a combination of multiple elements as mentioned above. Element M is, for example, an element with a high bond energy with oxygen. For example, an element with a higher bond energy with oxygen than indium. Furthermore, the metal oxide containing the channel-forming region is preferably a metal oxide containing zinc (Zn). Metal oxides containing zinc may be more prone to crystallization.
[0171] The metal oxides present in the channel-forming regions are not limited to indium-containing metal oxides. For example, the metal oxides present in the channel-forming regions may be zinc-tin oxides, gallium-tin oxides, or other metal oxides that do not contain indium but contain zinc, gallium, or tin.
[0172] Processing Unit 23 The processing unit 23 has the function of performing calculations, analyses, inferences, and other processing using data supplied from either or both of the input unit 21 and the storage unit 22. The processing unit 23 can supply the generated data (e.g., calculation results, analysis results, inference results) to either or both of the storage unit 22 and the output unit 24.
[0173] The processing unit 23 has the function of acquiring data from the storage unit 22. The processing unit 23 may also have the function of recording or registering data in the storage unit 22.
[0174] The processing unit 23 may, for example, have an arithmetic circuit. The processing unit 23 may, for example, have a central processing unit (CPU). The processing unit 23 may also have a graphics processing unit (GPU). The processing unit 23 may also have a neural processing unit (NPU / neural network processing unit).
[0175] The processing unit 23 may have a microprocessor such as a DSP (Digital Signal Processor). The microprocessor can be implemented using a PLD (Programmable Logic Device) such as an FPGA (Field Programmable Gate Array) or FPAA (Field Programmable Analog Array). The processing unit 23 may also have a quantum processor. The processing unit 23 can perform various data processing and program control by interpreting and executing instructions from various programs via the processor. Programs that can be executed by the processor are stored in at least one of the processor's memory area and the storage unit 22.
[0176] The processing unit 23 may have main memory. The main memory may include at least one of volatile memory such as RAM and non-volatile memory such as ROM (Read Only Memory). Furthermore, the main memory may include at least one of the above-mentioned NOSRAM and DOSRAM.
[0177] For RAM, for example, DRAM or SRAM is used, and a virtual memory space is allocated and used as the workspace for the processing unit 23. The operating system, application programs, program modules, program data, and lookup tables stored in the storage unit 22 are loaded into RAM for execution. These data, programs, and program modules loaded into RAM are each directly accessed and manipulated by the processing unit 23.
[0178] ROM can store BIOS (Basic Input / Output System) and firmware, etc., which do not require rewriting. Examples of ROM include mask ROM, OTPROM (One Time Programmable Read Only Memory), and EPROM (Erasable Programmable Read Only Memory). Examples of EPROM include UV-EPROM (Ultra-Violet Erasable Programmable Read Only Memory), which allows data to be erased by ultraviolet irradiation, EEPROM (Electrically Erasable Programmable Read Only Memory), and flash memory.
[0179] The processing unit 23 may have either or both an OS transistor and a transistor having silicon in its channel formation region (Si transistor).
[0180] The processing unit 23 preferably has an OS transistor. Because the OS transistor has an extremely small off-current, using the OS transistor as a switch to hold the charge (data) that has flowed into a capacitive element that functions as a memory element ensures that the data can be retained for a long period of time. By using this characteristic in at least one of the registers and cache memory of the processing unit, the processing unit can be turned off by operating it only when necessary and saving the information of the previous processing to the memory element in other cases. In other words, normally-off computing becomes possible, and the power consumption of the information processing system can be reduced.
[0181] It is preferable for information processing devices to use AI for at least some of their processing.
[0182] Information processing devices preferably utilize artificial neural networks (ANNs, also simply referred to as neural networks). Neural networks are implemented using circuits (hardware) or programs (software).
[0183] In this specification, the term "neural network" refers to any model that mimics the neural network of living organisms, determines the strength of connections between neurons through learning, and possesses problem-solving capabilities. A neural network has an input layer, an intermediate layer (hidden layer), and an output layer.
[0184] In this specification and other documents, when discussing neural networks, the process of determining the connection strength (also called weight coefficient) between neurons from existing information is sometimes referred to as "learning."
[0185] In this specification and other documents, the process of constructing a neural network using connection strengths obtained through learning and deriving new conclusions from it may be referred to as "inference."
[0186] Output section 24 The output unit 24 can output at least one of the calculation results, analysis results, and inference results from the processing unit 23 to the outside of the information processing device. For example, the output unit 24 can transmit data via the network 51. Specifically, a device such as a personal computer equipped with a communication port or communication function can be used. Alternatively, a device equipped with a communication function may be used for both the input unit 21 and the output unit 24.
[0187] Transmission line 25 The transmission line 25 has the function of transmitting data. Data can be transmitted and received between the input unit 21, the storage unit 22, the processing unit 23, and the output unit 24 via the transmission line 25. Specifically, a LAN or the Internet can be used.
[0188] This embodiment can be appropriately combined with other embodiments shown in this specification.
[0189] (Embodiment 2) This embodiment describes an information processing method according to one aspect of the present invention. The flowcharts in Figures 12 to 14 will be used for this explanation.
[0190] <Example of information processing method 1> Figure 12 is a flowchart illustrating an information processing method according to one embodiment of the present invention.
[0191] One aspect of the present invention is an information processing method having a phase Ph1.
[0192] <Example of Phase 1> Phase Ph1 comprises steps S1 to S18.
[0193] Step S1 In step S1 of phase Ph1, component 110 receives video information MvI and transmits it to component 120.
[0194] Component 120 comprises subcomponents 120A, 120B, 120C, and 120D.
[0195] Step S2 In step S2 of phase Ph1, component 120 receives video information MvI and shares it within component 120.
[0196] Step S3 In step S3 of Phase Ph1, subcomponent 120A divides the video information MvI to create a group of chunk data ChD. This group of chunk data ChD includes chunk data ChD(X).
[0197] The chunk data ChD(X) includes identification information ID(X), audio information AdI(X), and a still image Pic(X). The still image Pic(X) is a representative image of the chunk data ChD(X).
[0198] Step S4 In step S4 of Phase Ph1, subcomponent 120B transcribes the audio information AdI(X) into document Doc2(X).
[0199] Step S5 In step S5 of Phase Ph1, subcomponent 120C integrates document Doc2(X) into chunk data ChD(X) using the management system DBMS.
[0200] Furthermore, subcomponent 120C includes a database (DB) and a management system (DBMS).
[0201] Step S6 In step S6 of phase Ph1, the management system DBMS creates table Tbl1 from the database DB and shares table Tbl1 within component 120.
[0202] Table Tbl1 contains columns Col11 and Col12. Column Col11 contains the identification information ID(X), and column Col12 contains the document Doc2(X).
[0203] Step S7 In step S7 of phase Ph1, subcomponent 120D creates prompt Pt1 and sends it to component 130.
[0204] Prompt Pt1 includes instruction g1 and table Tbl1. Instruction g1 includes a procedure for generating list L1 from table Tbl1. List L1 includes identification information ID(X) that identifies document Doc2(X). Note that document Doc2(X) includes the instruction word Dem, whose target is not specified.
[0205] Step S8 In step S8 of phase Ph1, component 130 receives prompt Pt1 and generates list L1 using the multimodal AI server 200.
[0206] Step S9 In step S9 of phase Ph1, component 130 sends list L1 to component 120.
[0207] Step S10 In step S10 of phase Ph1, component 120 receives list L1 and shares it within component 120.
[0208] Step S11 In step S11 of Phase Ph1, the management system DBMS creates table Tbl2 from database DB and shares table Tbl2 within component 120.
[0209] Table Tbl2 contains columns Col21, Col22, and Col23. Column Col21 contains the identification information ID included in list L1. Column Col22 contains the document Doc2(X). Column Col23 contains the still image Pic(X).
[0210] Step S12 In step S12 of phase Ph1, subcomponent 120D sequentially selects records from table Tbl2, creates prompt Pt2(X), and sends it to component 130.
[0211] The prompt Pt2(X) includes instruction g2, document Doc2(X), and still image Pic(X). Instruction g2 includes a procedure to generate document Doc1(X) by identifying the target of the unspecified demonstrative pronoun Dem contained in document Doc2(X) from the still image Pic(X). Document Doc1(X) includes the unspecified demonstrative pronoun Dem and annotation Ano(X).
[0212] Furthermore, annotation Ano(X) contains information identified as the object of the demonstrative pronoun Dem, whose object is not specified.
[0213] Step S13 In step S13 of phase Ph1, component 130 receives prompt Pt2 and generates document Doc1(X) using the multimodal AI server 200.
[0214] Step S14 In step S14 of Phase Ph1, component 130 sends document Doc1(X) to component 120.
[0215] Step S15 In step S15 of Phase Ph1, component 120 receives document Doc1(X) and shares it within component 120.
[0216] Step S16 In step S16 of Phase Ph1, the management system DBMS integrates document Doc1(X) into chunk data ChD(X).
[0217] Step S17 In step S17 of Phase Ph1, the management system DBMS creates an annotated document AnDoc from the database DB and sends it to component 110.
[0218] Note that the annotated document AnDoc includes document Doc1(X) created from video information MvI. In other words, the annotated document AnDoc is a document that has been transcribed using audio information AdI and video information Vid.
[0219] Step S18 In step S18 of Phase Ph1, component 110 receives an annotated document AnDoc and provides it, for example, to a user 99 of the information processing system.
[0220] This makes it possible to identify the object indicated by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X) and generate an annotation Ano(X). Furthermore, it is possible to add the annotation Ano(X) to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, it is possible to create an annotated document AnDoc with the annotation Ano(X) added to the audio contained in the video information MvI. Furthermore, the annotated document AnDoc can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided. By using an information processing system according to one embodiment of the present invention, it is possible to create documents (such as conversation transcripts) with fewer unclear descriptions based on video information. Furthermore, it is possible to create documents where the objects of demonstrative pronouns are clearly identified.
[0221] <Example of information processing method 2> Figure 13 is a flowchart illustrating an information processing method according to one embodiment of the present invention.
[0222] One aspect of the present invention is an information processing method having a phase Ph2.
[0223] <Example of Phase 2 (Ph2)> Phase 2 follows Phase 1 and comprises steps S1 to S6.
[0224] Step S1 In step S1 of phase Ph2, subcomponent 120D creates prompt Pt3 and sends it to component 130.
[0225] Prompt Pt3 includes instruction g3 and the annotated document AnDoc. Instruction g3 includes a procedure for generating the summary document Sum from the annotated document AnDoc.
[0226] Step S2 In step S2 of phase Ph2, component 130 receives prompt Pt3 and generates a summary document Sum using the multimodal AI server 200.
[0227] Step S3 In step S3 of Phase 2, component 130 sends the summary document Sum to component 120.
[0228] Step S4 In step S4 of Phase 2, component 120 receives the summary document Sum and shares it within component 120.
[0229] Step S5 In step S5 of Phase 2, component 120 sends the summary document Sum to component 110.
[0230] Step S6 In step S6 of Phase 2, component 110 receives the summary document Sum and provides it, for example, to a user 99 of the information processing system.
[0231] This allows for the identification of the object indicated by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X), and the generation of annotation Ano(X). Furthermore, annotation Ano(X) can be added to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, an annotated document AnDoc can be created by adding annotation Ano(X) to the audio contained in the video information MvI. A summary document Sum can also be generated from the annotated document AnDoc. Furthermore, the summary document Sum can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided.
[0232] <Example 3 of information processing methods> Figure 14 is a flowchart illustrating an information processing method according to one embodiment of the present invention.
[0233] One aspect of the present invention is an information processing method having a phase Ph3.
[0234] <Example of Phase 3> Phase 3 follows Phase 1 and comprises steps S1 to S6. In one embodiment of the present invention, one or both of Phase 2 and Phase 3 can be performed after Phase 1. The order in which Phase 2 and Phase 3 are performed is not limited. Furthermore, Phase 2 and Phase 3 may be processed in parallel.
[0235] Step S1 In step S1 of phase Ph3, subcomponent 120D creates prompt Pt4 and sends it to component 130.
[0236] Prompt Pt4 includes instruction g4 and annotated document AnDoc. Instruction g4 includes a procedure for generating a task list TaL from the annotated document AnDoc.
[0237] Step S2 In step S2 of phase Ph3, component 130 receives prompt Pt4 and generates task list TaL using the multimodal AI server 200.
[0238] Step S3 In step S3 of phase Ph3, component 130 sends the task list TaL to component 120.
[0239] Step S4 In step S4 of Phase 3, component 120 receives and shares the task list TaL.
[0240] Step S5 In step S5 of Phase 3, component 120 sends the task list TaL to component 110.
[0241] Step S6 In step S6 of Phase 3, component 110 receives a task list TaL and provides it, for example, to a user 99 of the information processing system.
[0242] This allows for the identification of the target indicated by the unspecified demonstrative pronoun Dem contained in the audio information AdI(X) from the still image Pic(X), and the generation of annotation Ano(X). Furthermore, annotation Ano(X) can be added to the unspecified demonstrative pronoun Dem contained in the audio information AdI(X). Additionally, an annotated document AnDoc can be created by adding annotation Ano(X) to the audio contained in the video information MvI. A task list TaL can also be generated from the annotated document AnDoc. Furthermore, the task list TaL can be provided, for example, to users of an information processing system. As a result, a novel display device with superior convenience, usefulness, and reliability can be provided. [Explanation of Symbols]
[0243] AdI: Voice information, AnDoc: Annotated document, Ano: Annotation, ChD: Chunk data, DB: Database, DBMS: Management system, Dem: Indicative word with no specified target, Dev: Pointing device, ID: Identification information, MvI: Video information, Pic: Still image, Sum: Summary document, TaL: Task list, Vid: Video information, 20: Information processing device, 21: Input unit, 22: Storage unit, 23: Processing unit, 24: Output unit, 25: Transmission path, 51: Network, 98: Speaker, 99: User, 110: Component, 120: Component, 120A: Subcomponent, 120B: Subcomponent, 120C: Subcomponent, 120D: Subcomponent, 130: Component, 200: Multimodal AI server
Claims
1. The first component and The second component, It has a third component, The first component includes a function for receiving video information and transmitting it to the third component, and a function for receiving and providing annotated documents. The annotated document includes a first document created from the video information, The first document described above includes referential terms that do not specify the subject, and annotations, The aforementioned annotation includes information identified as the object of a demonstrative pronoun whose object is not specified. The second component includes a function to receive a first prompt and send a list to the third component, a function to receive a second prompt and send the first document to the third component, and a function to perform processing using a multimodal AI server. The multimodal AI server includes a function to generate the list in accordance with the first prompt, and a function to generate the first document in accordance with the second prompt, The third component includes a function to receive the video information, the list, and the first document and share them within the third component, a function to send the first prompt and the second prompt to the second component, and a function to send the annotated document to the first component. The third component comprises a first subcomponent, a second subcomponent, a third subcomponent, and a fourth subcomponent. The first subcomponent has a function to divide the video information and create a group of chunked data, The aforementioned group of chunk data includes chunk data, The aforementioned chunk data includes identification information, audio information, and still images. The aforementioned still image is a representative image of the chunk data, The second subcomponent is equipped with a function to transcribe the audio information into a second document, The third subcomponent comprises a database and a management system, The database has the function of storing the group of chunked data, The management system includes a function for integrating the first document into the chunk data, and a function for creating the annotated document from the database. The fourth subcomponent comprises a function for creating the first prompt and a function for selecting the identification information in order from the list to create the second prompt, The first prompt includes a first instruction and a first table, The first instruction includes a procedure for generating the list from the first table, The list includes the identification information that identifies the second document containing the referent term for which the subject is not specified, The second prompt includes the second instruction, the second document, and the still image. An information processing system comprising the procedure for generating a first document by causing the second instruction to identify the target of an unspecified instruction word contained in the second document from the still image.
2. The third subcomponent has the function of sharing the first table and the second table within the third component, The management system includes a function to create the first table and the second table from the database. The first table includes a first column and a second column, The first column includes the identification information, The second column includes the second document, The second table includes a third column, a fourth column, and a fifth column, The third column includes the identification information included in the list, The fourth column includes the second document, The information processing system according to claim 1, wherein the fifth column includes the still image.
3. The first component has the function of receiving and providing a summary document, The second component includes a function to receive a third prompt and transmit the summary document to the third component. The multimodal AI server has a function to generate the summary document in accordance with the third prompt, The third component includes a function for sending the third prompt to the second component, and a function for receiving the summary document and sending it to the first component. The fourth subcomponent has the function of creating the third prompt, The third prompt includes the third instruction and the annotated document, The information processing system according to claim 2, wherein the third instruction includes a step of generating the summary document from the annotated document.
4. The first component has the functionality to accept and provide a task list. The second component has the function of receiving a fourth prompt and sending the task list to the third component. The multimodal AI server has a function to generate the task list in accordance with the fourth prompt, The third component includes a function for sending the fourth prompt to the second component, and a function for receiving the task list and sending it to the first component. The fourth subcomponent has the function of creating the fourth prompt, The fourth prompt includes the fourth instruction and the annotated document, The information processing system according to claim 2, wherein the fourth instruction includes a step of generating the task list from the annotated document.
5. This is an information processing method having a first phase, The first phase comprises a first to eighteenth step, In the first step of the first phase, the first component receives video information and transmits it to the second component. The second component comprises a first subcomponent, a second subcomponent, a third subcomponent, and a fourth subcomponent. The third subcomponent comprises a database and a management system, In the second step of the first phase, the second component receives the video information and shares it within the second component. In the third step of the first phase, the first subcomponent divides the video information to create a group of chunked data, The aforementioned group of chunk data includes chunk data, The aforementioned chunk data includes identification information, audio information, and still images. The aforementioned still image is a representative image of the chunk data, In the fourth step of the first phase, the second subcomponent transcribes the audio information into a first document. In the fifth step of the first phase, the third subcomponent integrates the first document into the chunk data using the management system. In the sixth step of the first phase, the management system creates a first table from the database and shares the first table within the second component. The first table includes a first column and a second column, The first column includes the identification information, The second column includes the first document, In the seventh step of the first phase, the fourth subcomponent creates a first prompt and sends it to the third component. The first prompt includes a first instruction and the first table, The first instruction includes a procedure for generating a list from the first table, The list includes the identification information that identifies the first document containing an unspecified referent, In the eighth step of the first phase, the third component receives the first prompt and generates the list using the multimodal AI server. In the ninth step of the first phase, the third component transmits the list to the second component, In the tenth step of the first phase, the second component receives the list and shares it within the second component, In the eleventh step of the first phase, the management system creates a second table from the database and shares the second table within the second component. The second table includes a third column, a fourth column, and a fifth column, The third column includes the identification information included in the list, The fourth column includes the first document, The fifth column includes the still image, In the twelfth step of the first phase, the fourth subcomponent sequentially selects records from the second table, creates a second prompt, and sends it to the third component. The second prompt includes the second instruction, the first document, and the still image. The second instruction includes a procedure for generating a second document by identifying the object of an unspecified instruction word included in the first document from the still image, The second document includes a referential term whose subject is not specified, and annotations, The aforementioned annotation includes information identified as the object of a demonstrative pronoun whose object is not specified. In the 13th step of the first phase, the third component receives the second prompt and generates the second document using the multimodal AI server. In the fourteenth step of the first phase, the third component transmits the second document to the second component. In the 15th step of the first phase, the second component receives the second document and shares it within the second component. In the sixteenth step of the first phase, the management system integrates the second document into the chunk data, In the seventeenth step of the first phase, the management system creates an annotated document from the database and transmits it to the first component. The annotated document includes the second document created from the video information, In the eighteenth step of the first phase, the first component receives and provides the annotated document. Information processing methods.
6. This is an information processing method having a second phase, The second phase follows the first phase, The second phase comprises the first to sixth steps, In the first step of the second phase, the fourth subcomponent creates a third prompt and sends it to the third component. The third prompt includes the third instruction and the annotated document, The third instruction includes a procedure for generating a summary document from the annotated document, In the second step of the second phase, the third component receives the third prompt and generates the summary document using the multimodal AI server. In the third step of the second phase, the third component transmits the summary document to the second component. In the fourth step of the second phase, the second component receives the summary document and shares it within the second component. In the fifth step of the second phase, the second component transmits the summary document to the first component. The information processing method according to claim 5, wherein in the sixth step of the second phase, the first component receives and provides the summary document.
7. This is an information processing method having a third phase, The third phase follows the first phase, The third phase comprises the first to sixth steps, In the first step of the third phase, the fourth subcomponent creates a fourth prompt and sends it to the third component. The fourth prompt includes the fourth instruction and the annotated document, The fourth instruction includes a procedure for generating a task list from the annotated document, In the second step of the third phase, the third component receives the fourth prompt and generates the task list using the multimodal AI server. In the third step of the third phase, the third component transmits the task list to the second component. In the fourth step of the third phase, the second component receives the task list and shares it within the second component. In the fifth step of the third phase, the second component transmits the task list to the first component. The information processing method according to claim 5, wherein in the sixth step of the third phase, the first component receives and provides the task list.