Information processing systems, information processing methods, and programs
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- 前田充宏
- Filing Date
- 2024-12-02
- Publication Date
- 2026-06-12
AI Technical Summary
Multiplexing a plurality of digital signals into one bit stream is time-consuming.
An information processing system that receives and synthesizes video data from multiple sources, allowing viewers to select and display partial video data from specific regions within a composite video stream.
Enables efficient selection and viewing of one video from a group of videos, facilitating multi-angle viewing experiences.
Smart Images

Figure 2026095958000001_ABST
Abstract
Description
【Technical Field】 , , , 【0006】 , , , 【0005】 , , 【0007】 , , 【0001】 The present invention relates to an information processing system, an information processing method, and a program. 【Background Art】 【0002】 Patent Document 1 describes selecting a desired video from a multiplexed stream. 【Prior Art Documents】 【Patent Documents】 【0003】 【Patent Document 1】 Japanese Patent Application Laid-Open No. 2005-312022 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0004】 Multiplexing a plurality of digital signals into one bit stream is time-consuming. 【0005】 The present invention has been made in view of such a background, and an object thereof is to provide a technique capable of selecting and viewing one video from a plurality of videos. 【Means for Solving the Problems】 【0006】 The main invention of the present invention for solving the above problems is an information processing system, comprising: a video receiving unit that receives third video data obtained by synthesizing first and second video data in different regions; a selection unit that receives a selection of the first or second video data; an extraction unit that extracts partial video data from the region of the third video data corresponding to the selected first or second video data; and a display unit that displays the partial video data. 【0007】 Regarding other problems disclosed in the present application and their solutions, they will be clarified by the embodiments of the invention and the drawings. 【Effects of the Invention】 【0008】 According to the present invention, it is possible to select and watch one video from a group of videos. [Brief explanation of the drawing] 【0009】 [Figure 1] This figure shows an example of the overall configuration of an information processing system. [Figure 2] This figure shows an example of the hardware configuration of a computer that provides user terminal 1, management server 2, and broadcaster terminal 3. [Figure 3] This figure shows an example of the software configuration of broadcaster terminal 3. [Figure 4] This figure shows an example of synthesized video data. [Figure 5] This figure shows an example of the software configuration for management server 2. [Figure 6] This figure shows an example of the software configuration of user terminal 1. [Figure 7] This diagram illustrates the extraction of partial video data. [Figure 8] This is a diagram illustrating the operation of an information processing system. [Modes for carrying out the invention] 【0010】 <System Overview> The following describes an information processing system according to one embodiment of the present invention. The information processing system of this embodiment distributes and displays video data, distributing video from multiple camera angles as a single video, and then extracting and displaying the video from the angle selected by the viewer. 【0011】 Figure 1 shows an example of the overall configuration of an information processing system. The information processing system in this embodiment includes a user terminal 1 and a management server 2. The management server 2 is also connected to the distributor terminal 3 via a communication network. The user terminal 1 and the management server 2 are also connected to each other via a communication network. The communication network is, for example, the internet and is constructed using public telephone networks, mobile phone networks, wireless communication channels, Ethernet (registered trademark), etc. 【0012】 User terminal 1 is a computer operated by the user. User terminal 1 can be, for example, a smartphone, a tablet computer, or a personal computer. 【0013】 The management server 2 may be a general-purpose computer such as a workstation or personal computer, or it may be logically implemented through cloud computing. 【0014】 The broadcaster terminal 3 is a computer operated by the video broadcaster. The broadcaster terminal 3 can be, for example, a smartphone, a tablet computer, or a personal computer. Video (and audio) from multiple cameras 4 are input to the broadcaster terminal 3. 【0015】 <Hardware> FIG. 2 is a diagram showing an example of the hardware configuration of a computer that realizes the user terminal 1, the management server 2, and the distributor terminal 3. Note that the illustrated configuration is an example, and other configurations may be adopted. The computer includes a CPU 201, a memory 202, a storage device 203, a communication interface 204, an input device 205, and an output device 206. The storage device 203 stores various data and programs, such as a hard disk drive, a solid state drive, or a flash memory. The communication interface 204 is an interface for connecting to a communication network, such as an adapter for connecting to Ethernet (registered trademark), a modem for connecting to a public telephone network, a wireless communication device for performing wireless communication, a USB (Universal Serial Bus) connector or an RS232C connector for serial communication. The input device 205 inputs data, such as a keyboard, a mouse, a touch panel, a button, or a microphone. The output device 206 outputs data, such as a display, a printer, or a speaker. Note that each functional unit of the user terminal 1, the management server 2, and the distributor terminal 3, which will be described later, is realized by the CPU 201 reading a program stored in the storage device 203 into the memory 202 and executing it, and each storage unit can be realized as a part of the storage areas provided by the memory 202 and the storage device 203. 【0016】 <distributor terminal 3> FIG. 3 is a diagram showing an example of the software configuration of the distributor terminal 3. The distributor terminal 3 includes a video data acquisition unit 311, a composite video creation unit 312, and a video data transmission unit 313. 【0017】 The video data acquisition unit 311 acquires video data captured by the camera 4. In the present embodiment, it is assumed that a plurality of cameras 4 (two or more than three may be used) exist, and the video data acquisition unit 311 can receive video data from the plurality of cameras 4 simultaneously and in parallel. 【0018】 The composite video creation unit 312 creates video data (hereinafter referred to as composite video data or third video data) that combines a plurality of video data (hereinafter referred to as captured video data, first and second video data). In this embodiment, the captured video data is combined in different regions of the composite video data. Specifically, the composite video creation unit 312 can generate composite video data by arranging the frame images constituting each captured video data in predetermined regions within one frame image according to a predetermined arrangement pattern. 【0019】 Note that the number of captured video data included in the composite video data is not limited to two. For example, three or more captured video data can be included in one composite video data. For example, three captured video data can be arranged side by side vertically. Also, four captured video data can be arranged in a 2×2 grid. Six captured video data can be arranged in a 2×3 grid. 【0020】 When N captured video data are included in one composite video data, each frame of the composite video data is divided into N regions. Frame images at the same timing of the corresponding captured video data are arranged in each region. The arrangement pattern of the regions is not limited to the above-described one and can be appropriately designed according to the shooting object, viewing purpose, etc. For example, a layout such as arranging the shooting object in a large central region and arranging images from other angles around it in a smaller size is also possible. 【0021】 Also, when synthesizing N captured video data, the resolution and aspect ratio of each captured video data may be different. The composite video creation unit 312 can perform the synthesis process after unifying all the captured video data to the same aspect ratio by changing the resolution of each captured video data as needed, performing trimming processing or letterboxing processing, etc. 【0022】 Figure 4 shows an example of composite video data. As shown in Figure 4, in this example, each frame of the composite video data 50 has a region divided into four sections, 2x2 vertically. The first recorded video data 51 is placed in the upper left region, the second recorded video data 52 in the upper right region, the third recorded video data 53 in the lower left region, and the fourth recorded video data 54 in the lower right region. The composite video creation unit 312 generates one composite frame image using frame images of each recorded video data at the same timing. Then, it similarly generates the next composite frame image from the next frame image of each recorded video data. By performing this process for all frames of each recorded video data, one composite video data can be obtained. 【0023】 Furthermore, the composite video creation unit 312 may set the audio data contained in multiple recorded video data as the audio data for the composite video data according to predetermined rules. For example, it may combine the audio from all recorded video data into a single audio data, or it may select only the audio from some of the recorded videos according to a predetermined priority. Alternatively, it may set the audio from the corresponding recorded video data for each region and play only the audio from the region selected by the user. In this way, an appropriate audio synthesis method can be adopted depending on the situation and application. 【0024】 The video data transmission unit 313 transmits the synthesized video data to the management server 2. 【0025】 <Management Server 2> Figure 5 shows an example of the software configuration of the management server 2. The management server 2 comprises a video information storage unit 231, a video information transmission unit 211, a video receiving unit 212, and a distribution unit 213. 【0026】 ==Storage section== The video information storage unit 231 stores information related to video data (hereinafter referred to as video information). The video information may include information that identifies the distribution (for example, a live ID), the size of the composite video data (8K, 4K, Full HD, etc.), and information indicating the area in which the video data shot for each angle is composited (area identification information). 【0027】 The angle may be, for example, information that identifies camera 4 or the recorded video data. 【0028】 The region identification information can be in any format, as long as it is information that can uniquely identify each region within the composite video data. For example, a combination of information indicating the number of divisions in the composite video data (e.g., 3x3, or 9 divisions) and information indicating the index of the division region to which each video data is assigned (e.g., a number where the top left is 1 and the bottom right is 9) can be used. In this case, the position and size of the region where each video data should be placed can be determined from the division number and index information. 【0029】 Another method involves identifying each region within the composite video data by the coordinates of its upper-left and lower-right corners. In this case, the region identification information for each captured video data would include a pair of coordinates for the upper-left corner (X1, Y1) and the lower-right corner (X2, Y2) of the corresponding region. The coordinates can be expressed, for example, with the upper-left corner of the composite video data as the origin (0,0), the rightward direction as the positive X-axis, and the downward direction as the positive Y-axis. 【0030】 It is also possible to identify each region using its width and height. In this case, the region identification information for each captured video data includes the coordinates (X,Y) of the top-left corner of the corresponding region, along with its width W and height H. From these values, the position and size of each region can be calculated. 【0031】 Furthermore, the region identification information can be in various formats. Needless to say, those skilled in the art can select and modify formats as appropriate, depending on the required accuracy and application, in addition to those exemplified herein. 【0032】 ==Functional Section== The video information transmission unit 211 transmits video information to the user terminal 1. The video information transmission unit 211 may, for example, set the video information in a script program that is embedded in a web page for video viewing that is sent to the user terminal 1. Alternatively, the script program embedded in the web page may be executed on the user terminal 1, request video information, and transmit the video information in response to the request. 【0033】 The video receiving unit 212 receives synthesized video data from the broadcaster terminal 3. 【0034】 The distribution unit 213 distributes the synthesized video data (third video data) to the user terminal 1. The distribution unit 213 can stream the synthesized video data received by the video receiving unit 212 to multiple user terminals 1 via the network 10. 【0035】 Specifically, the distribution unit 213 includes server software for video streaming. This server software streams synthesized video data in response to distribution requests from user terminals 1, based on communication protocols such as HTTP (Hypertext Transfer Protocol). The distribution unit 213 may also include a load balancing device to process requests from multiple user terminals 1. 【0036】 The distribution unit 213 may distribute the synthesized video data to the user terminal 1 in its original format, or it may convert it to an appropriate format depending on the capabilities and communication environment of the user terminal 1 before distribution. For example, the synthesized video data can be converted to a video format playable on the user terminal 1 (e.g., MP4), or resized to match the screen resolution of the user terminal 1. Furthermore, smooth video playback can be achieved by adjusting the video bitrate according to the network bandwidth. 【0037】 Furthermore, the distribution unit 213 may acquire information regarding the viewing status from the user terminal 1 and perform distribution control based on that information. For example, if video playback on a user terminal 1 is paused, the distribution unit 213 can receive information indicating this and temporarily stop video distribution to that user terminal 1. This makes it possible to suppress unnecessary communication and reduce network load. 【0038】 Furthermore, the distribution unit 213 may also have a function to distribute past composite video data on demand in response to requests from user terminals 1, in addition to live distribution which distributes the same composite video data to multiple user terminals 1 at the same time. 【0039】 <User Terminal 1> Figure 6 shows an example of the software configuration of user terminal 1. User terminal 1 includes a video information acquisition unit 111, a video receiving unit 112, a selection unit 113, an extraction unit 114, and a display unit 115. 【0040】 The video information acquisition unit 111 acquires video information. The video information acquisition unit 111 can acquire video information transmitted from the management server 2. 【0041】 The video receiving unit 112 receives the composite image data. 【0042】 The selection unit 113 accepts the user's selection of the viewing angle (i.e., the selection of camera 4 and the selection of recorded video data). 【0043】 The extraction unit 114 extracts partial video data from the composite video data for the region corresponding to the selected angle. The extraction unit 114 can identify the region corresponding to the selected angle from the video information and extract partial video data from the identified region of the received composite video data. Figure 7 is a diagram illustrating the extraction of partial video data. In the example in Figure 7, the first angle (CAMERA1) is selected, and the captured video data 51 is extracted from the upper left region of the composite image data 50. 【0044】 The display unit 115 displays partial video data. 【0045】 <Operation> Figure 8 is a diagram illustrating the operation of an information processing system. 【0046】 The broadcaster terminal 3 acquires video data from multiple cameras 4 (S301), creates composite video data by arranging the multiple video data in different areas (S302), and sends the composite video data to the management server 2 (S303). The management server 2 sends video information to the user terminal 1 (S304) and also distributes the composite video data from the broadcaster terminal 3 to the user terminal 1 (S305). The user terminal 1 accepts the selection of an angle (S306), and cuts out the area corresponding to the angle from the composite video data and displays it (S307). 【0047】 As described above, the information processing system of this embodiment allows for the efficient distribution of multi-angle videos by combining videos from multiple angles into a single video with different areas arranged in it, and by allowing the viewer's client to cut out and display the area corresponding to the desired angle. 【0048】 Although these embodiments have been described above, they are intended to facilitate understanding of the present invention and are not intended to limit its interpretation. The present invention can be modified and improved without departing from its spirit, and equivalents thereof are also included. 【0049】 For example, the processing performed by each functional unit of the management server 2 described above may be executed by any of the functional units. Furthermore, different functional units may be added to perform some of the processing performed by each of the functional units described above. Also, the functional units of the management server 2 may be distributed across multiple computers. 【0050】 Furthermore, the information stored in each of the storage units of user terminal 1, management server 2, and broadcaster terminal 3 may be stored in any of the storage units. That is, the information stored in the multiple storage units described above may be stored in a single storage unit, or a portion of the information stored in one storage unit may be stored in another storage unit. In addition, the functional units of user terminal 1, management server 2, and broadcaster terminal 3 may be provided in other devices. 【0051】 <Example 1> In the present invention, the distributor terminal 3 is configured to create the composite video data, but the management server 2 may also be configured to create the composite video data. In this case, the composite video creation unit 312 is provided in the management server 2, the video data transmission unit 313 of the distributor terminal 3 transmits multiple captured video data from multiple cameras 4 to the management server 2, and the management server 2 can create composite video data by arranging the frames of each captured video data in a grid pattern to form a frame. 【0052】 <Modification 2> In the above embodiment, the number of captured video data and the number of region divisions were fixed and predetermined, but these may be made dynamically changeable by the user. In this modified example, the user terminal 1 may be equipped with a region selection unit. 【0053】 The region selection unit can specify the number of recorded video data to display and the number of division regions for the composite video data, in response to user instructions. For example, it can present the user with a list of recorded video data included in the currently displayed composite video data, allowing the user to select the desired recorded video data from the list. Alternatively, it may present the user with candidate division patterns, allowing the user to select the desired division pattern. 【0054】 The area selection unit can send information to the management server 2 indicating the number of video data files selected by the user and their division pattern. The composite video creation unit 312 of the management server 2 can reconstruct the composite video data based on this information. Specifically, it arranges the specified video data files in the specified area pattern to generate a new composite frame image, and the reconstructed composite video data can be distributed to the user terminal 1 by the distribution unit 213. 【0055】 The extraction unit 114 of the user terminal 1 can receive the reconstructed composite video data, extract partial video data from a specified area, and display it on the display unit 115. 【0056】 <Variation 3> The above embodiment described multi-angle viewing for normal video, but the present invention is also applicable to multi-angle viewing in a VR (Virtual Reality) environment. Examples of application in a VR environment are described below. 【0057】 In a VR environment, multiple omnidirectional cameras 4 can be used for shooting. Each omnidirectional camera 4 is a camera that can capture images in all directions around the camera by combining a fisheye lens and mirrors. Each omnidirectional camera 4 is placed around the subject being shot, and each can capture images in all directions (360 degrees). 【0058】 The captured omnidirectional video is transmitted to the broadcaster terminal 3. The composite video creation unit 312 of the broadcaster terminal 3 can extract video from multiple omnidirectional videos at a desired viewpoint and combine them to generate three-dimensional composite video data for VR. For example, by extracting video from each omnidirectional video in four directions—forward, right, backward, and left—as seen from the viewer, and combining these, a pseudo-3D space centered on the viewer can be represented. 【0059】 Viewers view the synthesized video data using an HMD (Head Mounted Display). The HMD has a built-in sensor that detects head movement, and can switch the displayed content according to the direction of the viewer's head. For example, if the viewer turns their head to the right, the rightward direction of the synthesized video data will be displayed. The extraction unit 114 of the user terminal 1 can extract and display the corresponding direction of the video from the synthesized video data according to the direction of the viewer's head. 【0060】 Furthermore, the selection unit 113 of the user terminal 1 can accept viewpoint switching operations independently of the viewer's head movements. For example, the viewer can switch to an all-around view from a different angle by pressing a button on the controller. The extraction unit 114 can extract and display the image corresponding to the direction of the head's orientation from the selected all-around view. 【0061】 <Modification 4> The above embodiment described multi-angle viewing for live-streamed video, but it can also be applied to on-demand streaming of previously recorded video. Examples of application to recorded video are described below. 【0062】 The video recording management server 2 according to Modification 4 may include a video recording storage unit that holds video recordings captured by multiple cameras 4. The video recording storage unit may store previously recorded events or live footage, along with video from multiple angles. 【0063】 In Modification 4, the management server 2 reads recorded video from multiple angles from the recorded video storage unit and converts it into composite video data using the composite video creation unit 312. This process is the same as in the live streaming case of the above embodiment. In Modification 4, the composite video creation unit 312 can be provided by the management server 2. The generated composite video data, along with video information, can be delivered to the requesting user terminal 1. 【0064】 The video information acquisition unit 111 of user terminal 1 acquires video information of recorded video received from management server 2. The video information includes a list of available recorded videos and detailed information about each recorded video (shooting date and time, location, summary, etc.). The selection unit 113 of user terminal 1 allows the user to select the recorded video they want to watch from the list. 【0065】 When a user selects a recorded video, the video receiving unit 112 of the user terminal 1 receives the composite video data of the selected recorded video from the management server 2. The extraction unit 114 and the display unit 115 can extract and display video from the composite video data at a desired angle in response to user operation, similar to the embodiment described above. 【0066】 <Modification 5> In the above embodiment, a configuration in which the user manually selects the angle they wish to display was described. However, the present invention is also applicable to a configuration in which the optimal angle is automatically selected using AI. An example of angle selection using AI is described below. 【0067】 The user terminal 1 according to modified example 5 may be equipped with an AI angle selection unit. The AI angle selection unit may include an audio analysis unit, a video analysis unit, and an angle determination unit. 【0068】 The audio analysis unit analyzes the audio data contained in each recorded video data and extracts audio features. Examples of these features include volume, sound pressure, frequency characteristics, and speech recognition results. The video analysis unit analyzes the video frames of each recorded video data and extracts video features. Examples of these features include brightness, color, edges, motion, and object detection results. 【0069】 The angle determination unit selects video footage of high importance based on the audio and video features extracted from each video recording. For example, it can prioritize selecting video footage where an important event is estimated to have occurred, based on criteria such as high volume, recognition of a specific sound, high brightness, significant movement, or detection of a specific person. 【0070】 The angle determination unit can instruct the selected video data to have a larger screen size than other video data. 【0071】 The information of the video footage selected by the AI angle selection unit is sent to the extraction unit 114 of the user terminal 1. The extraction unit 114 extracts the area corresponding to the selected video footage from the composite video data and can display it on the display unit 115 at a specified size. 【0072】 <Variation 6> In the above embodiment, a configuration for viewing one event from multiple angles was described, but the present invention is also applicable to a configuration for viewing multiple events simultaneously. An example of simultaneous viewing of multiple events will be described below. 【0073】 In the system described in Modification 6, multiple cameras 4 are used to film at multiple event venues. The video data from each event venue is received by the corresponding broadcaster terminal 3, and composite video data is generated for each event. The composite video data for all events is aggregated on the management server 2. 【0074】 The distribution unit 213 of the management server 2 can simultaneously distribute composite video data of multiple events to the user terminal 1 via different distribution channels. 【0075】 In the user terminal 1 display screen of the modified version 6, videos of multiple events may be displayed side by side. Within the video display area of each event, video from the composite video data of that event may be played back at an angle determined by user selection or AI selection. The user can prioritize viewing the video of the event they want to see by selecting the video display area of that event. 【0076】 The selection unit 113 of the user terminal 1 accepts an operation to select one event from among multiple displayed events. When a specific event is selected by the user, the display unit 115 can enlarge the video of the selected event to fill the entire screen. 【0077】 <Disclosure Items> Furthermore, this disclosure also includes the following configurations. [Item 1] A video receiving unit that receives a third video data obtained by combining the first and second video data into different regions, A selection unit that accepts the selection of the first or second video data, An extraction unit that extracts partial video data from the region of the third video data corresponding to the selected first or second video data, A display unit that displays the aforementioned partial video data, An information processing system characterized by comprising the following features. [Item 2] The information processing system described in item 1, The system includes a region acquisition unit that acquires region identification information for identifying the aforementioned region, The extraction unit extracts the partial video data from the third video data based on the region identification information. An information processing system characterized by the following. [Item 3] The information processing system described in item 1, The system includes a distribution unit that receives the third video data from the distributor and distributes the received third video data. An information processing system characterized by the following. [Item 4] The information processing system described in item 1, The aforementioned video receiving unit acquires the first and second video data, The system includes a composite video creation unit that creates the third video data based on the first and second video data. An information processing system characterized by the following. [Item 5] The steps include receiving a third video data obtained by combining the first and second video data into different regions, A step of accepting the selection of the first or second video data, A step of extracting partial video data from the region of the third video data corresponding to the selected first or second video data, The steps include displaying the aforementioned partial video data, An information processing method characterized by a computer executing the following. [Item 6] The steps include receiving a third video data obtained by combining the first and second video data into different regions, A step of accepting the selection of the first or second video data, A step of extracting partial video data from the region of the third video data corresponding to the selected first or second video data, The steps include displaying the aforementioned partial video data, A program that causes a computer to execute something. [Explanation of Symbols] 【0078】 1 User terminal 2 Management Server 3. Streamer's device
Claims
[Claim 1] A video receiving unit that receives a third video data obtained by combining the first and second video data into different regions, A selection unit that accepts the selection of the first or second video data, An extraction unit that extracts partial video data from the region of the third video data corresponding to the selected first or second video data, A display unit that displays the aforementioned partial video data, An information processing system characterized by comprising the following features. [Claim 2] The information processing system according to claim 1, The system includes a region acquisition unit that acquires region identification information for identifying the aforementioned region, The extraction unit extracts the partial video data from the third video data based on the region identification information. An information processing system characterized by the following. [Claim 3] The information processing system according to claim 1, The system includes a distribution unit that receives the third video data from the distributor and distributes the received third video data. An information processing system characterized by the following. [Claim 4] The information processing system according to claim 1, The aforementioned video receiving unit acquires the first and second video data, The system includes a composite video creation unit that creates the third video data based on the first and second video data. An information processing system characterized by the following. [Claim 5] The steps include receiving a third video data obtained by combining the first and second video data into different regions, A step of accepting the selection of the first or second video data, A step of extracting partial video data from the region of the third video data corresponding to the selected first or second video data, The steps include displaying the aforementioned partial video data, An information processing method characterized by a computer executing the following. [Claim 6] The steps include receiving a third video data obtained by combining the first and second video data into different regions, A step of accepting the selection of the first or second video data, A step of extracting partial video data from the region of the third video data corresponding to the selected first or second video data, The steps include displaying the aforementioned partial video data, A program that causes a computer to execute something.