Content scene search method and system

The content scene search method and system efficiently process and search scene-based content like webcomics by using an embedding model to extract vector data and construct databases, addressing inefficiencies in manual searching and improving user work efficiency.

JP2026096962APending Publication Date: 2026-06-15NAVER WEBTOON LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
NAVER WEBTOON LTD
Filing Date
2025-12-03
Publication Date
2026-06-15

AI Technical Summary

Technical Problem

Existing content search methods are inefficient and labor-intensive, particularly in the context of scene-based content like webcomics, requiring manual searching through numerous images to find specific scenes, which is time-consuming.

Method used

A content scene search method and system that processes content on a scene-by-scene basis, utilizing an embedding model to extract vector data, and constructs databases for semantic-based searches, enabling efficient retrieval of scene images matching user queries.

🎯Benefits of technology

Improves search performance by allowing semantic-based content scene searches, efficiently providing scene images that match user needs, enhancing work efficiency in content creation, design, and marketing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 2026096962000001_ABST
    Figure 2026096962000001_ABST
Patent Text Reader

Abstract

This invention provides a content scene search method that allows content to be searched on a scene-by-scene basis, and a system to support this method. [Solution] The method includes the steps of: receiving an image file relating to content using at least one processor of a computer device S310; dividing the image file into scene units to generate scene images and storing the scene images in a first database S320; extracting vector data of the scene images using an embedding model and storing the vector data in a second database S330; receiving a query relating to image search using a search tool, converting the query into a vector, and detecting data with high similarity from the vector data in the second database S340; and extracting scene images from the first database that correspond to the data with high similarity among the scene images and providing them as search results of the search tool S350.
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present invention relates to a method for searching for scenes of content and a system for supporting the same. 【Background Art】 【0002】 With the development of technology, the use of digital devices has been increasing. In particular, electronic devices (e.g., smartphones, tablet PCs, etc.) have various functions such as web search through the Internet, music appreciation, video viewing, etc. in addition to communication functions such as calls and text messages. 【0003】 With the spread of electronic devices, different from conventional content consumption media, the consumption of content provided via electronic devices such as personal computers (PCs) and mobile devices has been rapidly increasing. As an example, web comics can be cited. Web comics are comics serialized and distributed via the Internet communication network and are also called webtoons in Korea. 【0004】 As the consumption of content has been steadily increasing, research has been conducted on methods for efficiently producing and managing content. Korean Patent Publication No. 10-2024-0148072 discloses a system for providing a web comic production management service, which discloses an environment for producing web comic images by inserting and arranging characters, backgrounds, and texts. 【0005】 On the other hand, since content is composed of a number of scenes and is a web comic serialized online, it is composed not of simple images but of scene images reflecting a story, the emotions of characters, and the intention of production. Such scene images are frequently utilized in the process of content work and are also used in marketing design work. 【0006】 However, searching for content during the workflow can be inefficient. Often, people have to manually search for necessary scene images while reviewing the content, which can be excessively time-consuming and labor-intensive. 【0007】 Therefore, in recent years, there has been a growing need for services specializing in scene-based content search, in order to enable more efficient content creation. [Prior art documents] [Patent Documents] 【0008】 [Patent Document 1] Korean Published Patent No. 10-2024-0148072 (Publication Date: October 11, 2024) [Overview of the project] [Problems that the invention aims to solve] 【0009】 This invention relates to a content scene search method that allows content to be searched on a scene-by-scene basis, and a system that supports this method. 【0010】 More specifically, the present invention relates to a method and system for providing a content scene search service, which involves processing content on a scene-by-scene basis, constructing a database, and using the database to provide a search service based on images or text. 【0011】 Furthermore, the present invention relates to a method and system for providing a content scene search service based on the meaning contained in a user query. 【0012】 Furthermore, the present invention relates to a content scene search service provision method and system that can search for and provide scene images that match the user's desired conditions. [Means for solving the problem] 【0013】 To solve the above-mentioned problems, the content scene search method according to the present invention may include the steps of: receiving an image file relating to content using at least one processor of a computer device; dividing the image file into scene units to generate scene images and storing the scene images in a first database; extracting vector data of the scene images using an embedding model and storing the vector data in a second database; receiving a query relating to image search using a search tool, converting the query into a vector, and detecting data with a high degree of similarity from the vector data in the second database; and extracting scene images from the first database that correspond to the data with a high degree of similarity from among the scene images and providing them as search results for the search tool. 【0014】 Furthermore, the content scene search service provision system according to the present invention may include: a first database that stores scene images obtained by dividing image files related to content into scene units; a data processing unit that extracts vector data of the scene images using an embedded model; a second database that stores the vector data; and a data search unit that receives queries related to image search from a search tool on a user terminal, converts the queries into vectors, detects data with high similarity from the vector data in the second database, extracts scene images from the scene images in the first database that correspond to the data with high similarity, and provides them as search results for the search tool. 【0015】 Furthermore, the program according to the present invention is executed by one or more processors in an electronic device such as a computer and may include instructions that cause the program to perform the following steps: receiving an image file relating to content; dividing the image file into scene units to generate scene images and storing the scene images in a first database; extracting vector data of the scene images using an embedding model and storing the vector data in a second database; receiving a query relating to image search using a search tool, converting the query into a vector, and detecting data with high similarity from the vector data in the second database; and extracting scene images from the first database that correspond to the data with high similarity and providing them as search results for the search tool. The program according to the present invention may be stored on a computer-readable recording medium. [Effects of the Invention] 【0016】 As described above, the content scene search method and system according to the present invention can improve search performance by dividing the content image file into scene units to generate scene images and storing them in a first database, thereby constructing a database using the data necessary for providing the search service. 【0017】 Furthermore, the content scene search method and system according to the present invention extracts vector data of scene images using an embedded model and stores the vector data in a second database, thereby enabling semantic-based content scene searches even for abstract user queries and efficiently providing the scene images that users need. 【0018】 Furthermore, the content scene search method and system according to the present invention can receive queries related to image searches using a search tool, convert the queries into vectors, and detect highly similar data from vector data in a second database, thereby searching for and providing scene images based on queries corresponding to images or text. 【0019】 In addition, the content scene search method and system according to the present invention can extract a scene image corresponding to highly similar data from the already stored scene images and provide it as a search result in a search tool. In the process of content creation, design, and marketing work, users can easily find the necessary scene images. According to the present invention, the work efficiency of users can be improved. 【Brief Description of Drawings】 【0020】 [Figure 1] It is a conceptual diagram for explaining a content scene search service providing system according to the present invention. [Figure 2] It is a conceptual diagram for explaining a content scene search service providing system according to the present invention. [Figure 3] It is a flowchart for explaining a content scene search method according to the present invention. [Figure 4] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 5] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 6A] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 6B] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 7] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 8] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. [Figure 9] It is a conceptual diagram for explaining a method of searching for and providing a content scene in the present invention. <​​​【0021】 The embodiments disclosed herein will be described in detail below with reference to the accompanying drawings, but regardless of the reference numerals used in the drawings, identical or similar components will be given the same reference numerals, and redundant descriptions thereof will be omitted. The suffixes “module” and “part” used below for components are added or used interchangeably for the sake of ease of writing the specification and do not have a distinct meaning or role in themselves. Furthermore, when describing the embodiments disclosed herein, if it is determined that a specific description of the relevant prior art may obscure the gist of the embodiments disclosed herein, such detailed description will be omitted. In addition, the accompanying drawings are merely for the purpose of facilitating the understanding of the embodiments disclosed herein and should be understood not as limiting the technical ideas disclosed herein, but as including all modifications, equivalents, and substitutions that fall within the concept and technical scope of the present invention. 【0022】 Terms including ordinal numbers such as "1st," "2nd," etc., may be used to describe various components, but the components are not limited to those defined by these terms. These terms are used solely for the purpose of distinguishing one component from another. 【0023】 When it is stated that one component is “linked” or “connected” to another component, it should be understood that it may be directly linked or connected to the other component, but there may also be other components between them. On the other hand, when it is stated that one component is “directly linked” or “directly connected” to another component, it should be understood that there are no other components between them. 【0024】 A singular expression can include multiple expressions unless the context clearly indicates otherwise. 【0025】 In this application, terms such as “includes” or “having” are intended to specify the presence of features, figures, steps, actions, components, parts, or combinations thereof as described in the specification, and should be understood not to preemptively exclude the possibility of the presence or addition of one or more other features, figures, steps, actions, components, parts, or combinations thereof. 【0026】 This invention relates to a method for searching for content scenes and a service provision system using the same. The types of content to which this invention can be applied can be very diverse. For example, at least one of the following types of content is equivalent to the content provided by this invention: webcomics, webnovels, music, ebooks, videos, images, etc. 【0027】 For the sake of explanation, the following will use webcomics as an example. Here, "webcomic" is a portmanteau of "web" and "comics," and refers to cartoons or comics delivered via the internet communication network. 【0028】 Such content may consist of multiple sub-contents. These multiple sub-contents can constitute a series of content. 【0029】 Here, "series" can mean a continuous series of projects or content. 【0030】 In this invention, to avoid confusion between the terms "content" and "sub-content," "sub-content" is referred to as "episode." 【0031】 Furthermore, a single episode may contain multiple scenes separated by image boundaries or other means. For example, an episode may consist of multiple layers such as speech bubbles, outlines, tones, and panel borders (for webcomic panels), and a scene can be defined by the outer border included in the panel border layer. 【0032】 The content scene search service will be described in detail below, along with the attached drawings. Figures 1 and 2 are conceptual diagrams illustrating the content scene search service provision system according to the present invention. Figure 3 is a flowchart illustrating the content scene search method according to the present invention, and Figures 4, 5, 6A, 6B, 7, 8, and 9 are conceptual diagrams illustrating the method of searching for and providing content scenes according to the present invention. 【0033】 As shown in Figure 1, the content scene search service provision system 100 may include at least one of the following: a cut / splitting unit 110, a first database 120, a data processing unit 130, a second database 140, a data storage unit 150, and a search tool 160. 【0034】 The cut division unit 110 can divide the content in a manner suitable for scene searching in order to improve the scene search performance of the content. 【0035】 In the present invention, the content includes multiple scenes, and the image files of the content (original or source data) may include objects related to the multiple scenes (e.g., background, floor, surrounding objects, characters, speech bubbles, text, outlines, etc.). 【0036】 The cut division unit 110 can divide the content image file into scene units and generate multiple scene images by using at least one of the following: an edge detector 111 that detects the outer frame lines of each scene in the content image file, and a speech balloon detector 112 that detects speech bubbles. 【0037】 A scene image may contain at least one object from the content image file. In this invention, a scene search service for content can be provided using the scene image as the basic unit. 【0038】 In other words, in the present invention, the cut division unit 110 can divide the content image file into basic units for providing a search service. 【0039】 The first database 120 may store multiple scene images generated by the cut division unit 110 (see S210, Figure 2). In this invention, the first database may also be referred to as the "source data storage". 【0040】 In the present invention, each scene image may be assigned an image identifier (e.g., an image ID), and the first database 120 may store the scene images in association with the image identifiers. 【0041】 Regarding the first database 120, the data retrieval unit can provide scene images corresponding to content scene searches by users based on image identifiers. 【0042】 The data processing unit 130 can generate information necessary for providing a search service from multiple scene images stored in the first database (see S220, Figure 2). 【0043】 The data processing unit 130 may include at least one of the embedded model 131, pose detector 132, and face detector 133. 【0044】 The data processing unit 130 can extract vector data from the scene image using the embedded model 131. 【0045】 In the present invention, the embedded model 131 may be a CLIP (Contrastive Language-Image Pre-Training) model capable of processing images and text, respectively. 【0046】 The data processing unit 130 can extract vector data from scene images using the image encoder of the CLIP embedded model 131, enabling semantic-based searching. 【0047】 The data processing unit 130 can extract pose information from a scene image using a pose detector 132 and extract face information from a scene image using a face detector 133. 【0048】 Here, "pose information" can be understood as information related to the pose (position, arrangement, orientation, composition, etc.) of objects included in the scene image. For example, pose information may include information about the extent to which a person's body is included (e.g., whole body, upper body, lower body, etc.), whether the person is facing forward, and the person's posture (e.g., "a person in a pose with their arms raised," "a person in a pose sitting"). Furthermore, pose information may include pose information for various objects included in the scene image, such as "a structure in which a desk is placed in front of a red wall (arrangement of objects)," "a view spreading from the top of a mountain (composition of the background)," and "text spreading from left to right (arrangement of text)." 【0049】 "Facial information" refers to information about a person's face, and may include face size (e.g., the size, ratio, and ratio to the horizontal axis of the face area in the scene image), face angle (e.g., front view, profile view, 45-degree angle, downward-facing view, upward-facing view), facial expression (e.g., smiling (corners of the mouth turned up, eyes sparkling), angry face (wrinkled forehead, lips tightly closed), tired face (eyes half-closed, face looking lifeless)), and the gender and age of the person corresponding to the face. 【0050】 The second database is used by the data retrieval unit to perform similarity searches based on vectors, and is sometimes referred to as the "vector search DB" or "vector DB". 【0051】 The second database 140 may store at least one of the vector data and metadata (such as pose information and face information) generated by the data processing unit 130. 【0052】 Furthermore, the second database 140 may store vector data (which may include metadata) and image identifiers of scene images corresponding to the vector data, in an associated manner. 【0053】 For the second database 140, the data retrieval unit can convert user queries related to image retrieval into vectors, detect vector data with a high similarity to the converted query vectors, and provide image identifiers. 【0054】 For the second database 140, the data retrieval unit uses an optimization algorithm for vector-based similarity searches, enabling rapid and efficient searching of a vast number of scene images in a short amount of time. 【0055】 On the other hand, in the present invention, vector data and metadata may be stored separately in the data storage unit 150. 【0056】 The data storage unit 150 can be understood as a backup database to prepare for the loss of the second database 140. 【0057】 In this invention, vector data generated in the data processing unit 130 can be stored in the data storage unit 150 along with metadata (see S230, Figure 2). Furthermore, the vector data and metadata can be transmitted to the second database 140 so that they are stored in the second database (see S240, Figure 2). 【0058】 The search tool 160 is a user interface for providing content scene search services and can receive image search queries from users (see S250, Figure 2). The data retrieval unit can provide the user with scene images corresponding to the user query as search results, based on the first database 120 and the second database, through the search tool 160. 【0059】 The search tool 160 can accept at least one of images and text as user queries. Furthermore, the search tool 160 can accept important elements for searching (selecting) scene images as filter conditions. 【0060】 The filter conditions can vary. The filter conditions may relate to at least one of the following: the pose and / or face of a person in the scene image. Furthermore, the filter conditions may relate to at least one of the following: genre, author, work (specific content), or blind processing of sensitive photographs. 【0061】 The search tool 160 can vectorize user queries. Based on the query vector and filter values ​​corresponding to the filter conditions, the search tool 160 requests the data search unit to search for scene images similar to the query vector, and can receive image identifiers of similar scene images from the data search unit's second database 140 (S260, see Figure 2). 【0062】 The search tool 160 can extract scene images corresponding to user queries from the first database 120 based on image identifiers received from the second database 140 (see S270, Figure 2). The data retrieval unit can also provide the scene images extracted from the first database 120 as search results for user queries through the search tool 160 (see S280, Figure 2). 【0063】 Furthermore, the present invention may be configured to send and receive various information related to the provision of content search services via wired or wireless communication. Such information may be sent and received by a communication unit (or communication module) included in the above configurations 110 to 160. In addition, the present invention can communicate with an external server or user terminal 1 via a separate communication unit. 【0064】 The present invention allows for the construction of a database for searching vectors, using scene images as the basic unit. Furthermore, the database can be used to search for and provide scene images that users are looking for. In particular, the present invention not only classifies scene images (for example, classifying them based on predefined tags (e.g., long hair, face visible)), but also provides a search service for abstract user requests (e.g., "woman with long red hair," "gloomy street") by vectorizing scene images using the embedded model 131. 【0065】 The following describes how to effectively search for and provide scene images even for abstract user queries, based on the configuration described above. 【0066】 In this invention, the process of receiving image files related to content can be performed (see S310, Figure 3). 【0067】 In this invention, in order to provide a content scene search service, it is possible to receive (or collect) image files related to content. In this invention, it is possible to receive image files of content from an external server (for example, a content management server where the image files of the content are registered) or to receive image files from the user terminal 1 of the user (for example, the author) who generated the image files of the content. 【0068】 In this invention, the process of dividing an image file into scene units to generate scene images and storing the scene images in a first database can be performed (see S320, Figure 3). 【0069】 The cut division unit 110 can separate and extract multiple scenes from the content image file and generate a scene image corresponding to each of the multiple scenes. 【0070】 As described above, in the present invention, the content includes multiple scenes, and the image files of the content (original or source data) may include objects related to the multiple scenes (e.g., background, floor, surrounding objects, characters, speech bubbles, text, outlines, etc.). 【0071】 As shown in Figure 4, the cut division unit 110 can detect the outer frame lines 401, 402 and / or callouts 403, 404 of the scene from the image file 400 using at least one of the outer frame line detector 111 and the callout detector 112, and can divide the image file 400 into multiple scene images 410, 420. More specifically, the cut division unit 110 can generate multiple scene images 410, 420 by dividing the image file 400 based on the outer frame lines 401, 402 detected from the image file 400. 【0072】 The cut division unit 110 can generate a scene image by removing callouts that extend beyond the outer frame of the scene, in order to improve the performance of the embedded model 131 that extracts vector data from the scene image. 【0073】 For example, in Figure 4, the first speech bubble 403 ("Would you like some pizza?") does not extend beyond the first outer frame line 401, and the cut division unit 110 can generate a first scene image 410 that includes the first speech bubble 403. In contrast, the second speech bubble 404 ("Excited") is outside the second outer frame line 402, and the cut division unit 110 can remove the second speech bubble 404 and generate a second scene image 420. 【0074】 In other words, the cut division unit 110 can, for example, use the outer frame lines 401 and 402 detected from the image file 400 to extract the region identified by the outer frame lines 401 and 402 in the image file 400 and generate scene images 410 and 420. 【0075】 As shown in Figure 4, the multiple scene images 410 and 420 generated by the cut division unit 110 may be stored in the first database 120. 【0076】 Scene images 410 and 420 stored in the first database 120 may be assigned an image identifier 410a (for example, "a001") to identify the scene image. The first database may store the scene image 410, the image identifier 410a assigned to the scene image 410, along with content information 410b to which the scene image 410 belongs and episode information 410c (for example, episode number information) of the content, all associated with each other. 【0077】 From the first database 120, based on the image identifier, the search tool 160 can detect (or provide) scene images corresponding to data with a high similarity to the user query as search results. 【0078】 In other words, the scene images stored in the first database 120 can be used by the search tool 160 to provide the actual source images (content image files or scene images) for the search results. 【0079】 In this invention, it is possible to extract vector data from a scene image using an embedded model and store the vector data in a second database (see S330, Figure 3). 【0080】 The data processing unit 130 can analyze the multiple scene images 410 and 420 stored in the first database 120 and generate the data necessary for the search service in order to provide a search service. 【0081】 As described above, the data processing unit 130 may include at least one of the embedded model 131, the pose detector 132, and the face detector 133. 【0082】 As shown in Figure 5, the data processing unit 130 can extract vector data 131a of the scene image 410 using the embedded model 131. 【0083】 In the present invention, the embedded model 131 may be a CLIP (Contrastive Language-Image Pre-Training) model capable of processing images and text, respectively. In the present invention, the CLIP model will also be given the same reference numeral "131" in the drawings and described accordingly. 【0084】 The CLIP embedding model 131 can process images and text simultaneously. More specifically, the CLIP embedding model 131 includes an image embedding model and a text embedding model, which may be trained to share the same vector space. Such a CLIP embedding model 131 can measure (determine or calculate) the similarity between an image and text. 【0085】 The data processing unit 130 can use the CLIP embedded model 131 to generate vector data that includes the visual features of the scene image 410 (e.g., "blue sky," "man and woman") and the abstract meaning of the scene (e.g., "the female protagonist is smiling," "male and female employees are deciding on lunch menus at work"). 【0086】 In other words, the data processing unit 130 can generate a vector image corresponding to the scene image 410 by comprehensively considering the objects, text, and semantic context contained in the scene image 410, so that meaning-based searching becomes possible. 【0087】 Furthermore, the data processing unit 130 can extract metadata from the scene image 410 using at least one of the pose detector 132 and the face detector 133. 【0088】 The data processing unit 130 can use the pose detector 132 to extract pose information 132a (e.g., key points, composition of people) from the scene image 410 as metadata, and use the face detector 133 to extract face information 132a (e.g., face size, ratio, etc.) from the scene image 410 as metadata. 【0089】 As described above, in the present invention, "pose information" can be understood as information related to the pose (position, arrangement, orientation, composition, etc.) of objects included in the scene image. For example, the pose information may include information on the extent to which a person's body is included (e.g., whole body, upper body, lower body, etc.), whether the person is facing forward, and the person's posture (e.g., "a person in a pose with their arms raised," "a person in a sitting pose"). The pose information may also include pose information for various objects included in the scene image, such as "a structure in which a desk is placed in front of a red wall (arrangement of objects)," "a view spreading from the top of a mountain (composition of the background)," and "text spreading from left to right (arrangement of text)." 【0090】 "Facial information" refers to information about a person's face, and may include face size (e.g., the size, ratio, and ratio to the horizontal axis of the face area in the scene image), face angle (e.g., front view, profile view, 45-degree angle, downward-facing view, upward-facing view), facial expression (e.g., smiling (corners of the mouth turned up, eyes sparkling), angry face (wrinkled forehead, lips tightly closed), tired face (eyes half-closed, face looking lifeless)), and the gender and age of the person corresponding to the face. 【0091】 The data processing unit 130 can extract various metadata necessary for searching the scene image 410 from the scene image 410. For example, the data processing unit 130 can extract various metadata 134a such as the size information of the scene image, the location information of the scene image 410 in the image file 400, and the genre information of the content to which the scene image 410 belongs. 【0092】 The vector data generated by the data processing unit 130 may be stored in the second database 140. The pose information and face information generated by the data processing unit 130 may also be stored in the second database 140 as metadata along with the vector data. 【0093】 As shown in Figure 5, the second database 140 may store the image identifier 410a, vector data 510, and metadata 520 of each scene image in association with each other. The vector data 510 stored in the second database 140 includes the visual and semantic features 510a, 510b of the scene image (for example, "the female protagonist is smiling," "male and female employees are deciding on lunch menus at work"), and the second database 140 can use the vector data to search for and provide scene images even for abstract user queries. 【0094】 As shown in Figure 5, the second database 140 may store each scene image, associated with its image identifier 410a, vector data 510, and metadata 520. The vector data 510 stored in the second database 140 may include visual and semantic features of the scene image (e.g., "The female protagonist is smiling," 510a; "Male and female employees are deciding on lunch menus in the office," 510a, 510b). Even if such information is stored in the second database 140, a search for scene images corresponding to an abstract user query can be performed by a processor or search engine included in the data processing unit 130, which has access to the second database 140. The processor or search engine compares the features derived from the user query with the vector data 510 and retrieves the scene image with the highest similarity or relevance. Thus, the second database 140 functions as a storage repository, and the actual search and matching processes are performed by the data processing unit 130 using the stored vector data. 【0095】 The metadata 520 stored in the second database 140 includes at least one of the pose information 521 and face information 522 of the scene image. Based on the second database 140, the metadata 520 can be used to search for and provide scene images that conform to various filter conditions 521a, 522a (for example, "upper body visible", "face ratio 20%). 【0096】 The second database 140 can calculate the similarity between the vector data 510 and the query vector corresponding to the user query, and based on that similarity, it can provide (return) an image identifier for the scene image. The second database 140 stores vector data 510 corresponding to each scene image. The similarity calculation between the vector data 510 and the query vector derived from the user query is not performed by the database itself, but by a processor or similarity calculation module included in the data processing unit 130. The processor reads the vector data 510 from the second database 140, calculates a similarity index (such as cosine similarity or distance-based index) between it and the query vector, and can determine the scene image with the highest similarity. Based on the calculated similarity, the data processing unit 130 can provide (return) the image identifier of the corresponding scene image. 【0097】 In this invention, a search tool can receive queries related to image searches, convert the queries into vectors, and perform a process to detect highly similar data from the vector data of a second database (see S340, Figure 3). Furthermore, in this invention, images corresponding to highly similar data from the scene images can be extracted from the first database and provided as search results for the search tool (see S350, Figure 3). 【0098】 The search tool 160 may be executed from the user terminal 1. The search tool 160 may be installed on the user terminal 1 based on user selection. The user can download and install the search tool 160 on the user terminal 1 via the server provided in the present invention. The search tool 160 can also be accessed via a web browser installed on the user terminal 1. In this case, the present invention can provide the search tool 160 as a service page on the user terminal 1. 【0099】 As shown in Figures 6A and 6B, the search tool 160 may include a first area 610 for inputting search information, and a second area 620 for outputting (displaying) scene images corresponding to data with a high similarity to the search information. 【0100】 The first area 610 may include at least one of the following: first input windows 611 and 612 for entering queries from search information, and second input windows 613 and 614 for entering filter conditions. 【0101】 The first input windows 611 and 612 can be used to input an image (reference numeral "630" in Figure 6A) or text (reference numeral "650" in Figure 6B) corresponding to the query. The first input window 611 or 612 may accept an image 630 (see Figure 6A) or text 650 (see Figure 6B) corresponding to a user query. In Figure 6A, the image 630 is shown outside the boundary of the input window 611, but this image 630 is an example of an image selected or uploaded via the input window 611. The illustration is for illustrative purposes only to show the image 630 after it has been added to the system, and does not limit the positional or visual placement of uploaded images within the input window 611. 【0102】 Image queries 630 can be uploaded and input from user terminal 1. The search tool 160 can provide reference images and identify one of the selected reference images as image query 630. As shown in Figure 6A, the first area 610 includes an area for selecting the user query input method, and when the area corresponding to the image upload 160a input method is selected, the search tool 160 can provide a first input window 611 to enable image uploading. When the area corresponding to the reference image 160b input method is selected, the search tool 160 can provide an input window to select one of the at least one reference image registered with the search tool 160. 【0103】 The text query 650 may include natural language text that describes the scene the user is looking for. In this invention, the text query 650 may also include abstract content (for example, "a woman with long red hair," "a gloomy street"). As shown in Figure 6B, the search tool 160 can provide a first input window 612 so that the query text 630 is entered when an area corresponding to the text input method 160c is selected. 【0104】 Furthermore, the search tool 160 can accept both image queries and text queries. In this case, the search tool 160 can search for and provide scene images using both image queries and text queries. 【0105】 The second input windows 613 and 614 allow you to enter various filter conditions for searching for scene images. 【0106】 The filter conditions may relate to at least one of the poses and faces of people in the scene image. 【0107】 As shown in Figure 7(a), the search tool 160 allows you to input filter conditions related to face size. For example, the search tool 160 can input size information (e.g., 710a, 720a) of face images 710 and 720 in the scene image as filter conditions. 【0108】 Furthermore, as shown in Figure 7(b), the search tool 160 can accept filter conditions related to the poses of people. For example, the search tool 160 can accept information 730a, 740a, including the bodies (e.g., upper body, full body) and poses (e.g., front, back) of people 730, 740 included in the scene image, as filter conditions. 【0109】 As shown in Figure 8, the search tool 160 of the data retrieval unit can convert image search queries 630 and 650 into vectors. To enable semantic-based searching, the search tool 160 can convert image queries 630 into vectors using the image encoder 134 of the CLIP embedded model, and convert text queries 650 into vectors using the text encoder 131b of the CLIP embedded model. If both image queries 630 and text queries 650 are input, the search tool 160 can convert both image queries 630 and text queries 650 into vectors. 【0110】 The search tool 160 can request a search by transmitting the query vector along with the filter information 700 to the second database 140. 【0111】 On the other hand, query vectorization can also be performed in the second database 140. In this case, the second database 140 of the data retrieval unit can receive queries related to image searches from the user terminal's search tool and convert the received queries into vectors. The method for converting queries into vectors is the same as that of the search tool 160, so a detailed explanation will be omitted. In the following, the vectorization of queries will not be distinguished as to whether it was performed in the second database 140 or the search tool 160. 【0112】 Query vectorization can be processed in conjunction with the second database 140. In practice, the process of converting received queries into vectors may not be performed by the second database 140 itself, but rather by a processor or vectorization module included in the data processing unit 130 or the search tool 160. The processor can convert received queries into vectors by accessing the second database 140 to obtain the necessary reference data and model parameters, and then executing program instructions that apply the same vectorization method used by the search tool 160. 【0113】 Therefore, the second database 140 functions as a storage repository for vector data and model parameters, and the actual conversion to vectors may be performed by a processor. For simplicity, in the following description, we will not distinguish whether the processor performing the vectorization is related to the second database 140 or to the search tool 160. 【0114】 The second database 140 of the data retrieval unit can use query vectors and filter information to return image identifiers of scene images to the search tool 160 in order of similarity. 【0115】 The data retrieval unit can calculate the similarity between the query vector and the vector data of the scene image based on the second database 140. For example, the data retrieval unit can calculate the similarity between the query vector and multiple vector data based on cosine similarity based on the second database 140. The data retrieval unit can provide the image identifiers of the scene images to the search tool 160 in descending order of similarity based on the second database 140. 【0116】 In this case, the data retrieval unit can provide the search tool 160 with an image identifier for a scene image having metadata corresponding to the filter conditions, based on the second database 140. 【0117】 The data retrieval unit can provisionally identify scene images corresponding to the filter conditions based on the second database 140. Furthermore, the data retrieval unit can calculate the similarity between each of the identified scene image vector data and the query vector based on the second database 140, and return the image identifiers to the search tool 160 in order of decreasing similarity to the query and corresponding to the filter conditions. 【0118】 Furthermore, if the query includes both images and text (when a user searches for a desired image using both images and text), the data retrieval unit can identify scene images with a high similarity to either the image or the text based on the second database 140, and return the image identifier of the scene image with a high similarity to the other among the identified scene images. 【0119】 For example, the data retrieval unit can first calculate the similarity between a text query and a scene image based on the second database 140. The data retrieval unit can then calculate the similarity between a predetermined number (or predetermined ratio) of scene images that have a high similarity to the first text query and the image query. The data retrieval unit can also return image identifiers from the identified scene images in descending order of similarity to the image query. 【0120】 Furthermore, as shown in Figure 9, the data retrieval unit can, based on the second database 140, respond to a search request from the search tool 160 and provide response data 900 for scene images with a high degree of similarity to the query. The response data 900 may include at least some of the following: the image identifier of the scene image with a high degree of similarity to the query, the similarity score, the content information to which the scene image belongs (e.g., the title of the work), the episode information of the content to which the scene image belongs (e.g., the episode number information), the location information of the scene image in the content image file (e.g., the coordinates where the scene image begins), and metadata (e.g., face size ratio information, whether the character appears in full or upper body, etc.). 【0121】 The data retrieval unit can use the image identifier returned based on the second database 140 to detect scene images from the first database 120. 【0122】 The search tool 160 can transmit the image identifier returned based on the second database 140 to the first database 120 and request the data retrieval unit to extract the scene image corresponding to the image identifier. 【0123】 Based on the first database 120, the data retrieval unit can respond to a request from the search tool 160 by returning to the search tool 160 the scene image corresponding to the image identifier from among multiple scene images. 【0124】 In other words, the data retrieval unit can extract scene images from the first database 120 that correspond to data with a high similarity to the query entered by the user, and provide them as search results in the search tool 160. 【0125】 The second area 620 of the search tool 160 may display (output) scene images as search results for the query. In this case, the search tool 160 can display scene images 641, 642, 661, and 662 in the second area 620 in order of their similarity to queries 630 and 650. 【0126】 As shown in Figure 6A, the search tool 160 can display scene images in descending order of similarity along the first direction A to the second direction B. The search tool 160 can output scene image 641, which has the highest similarity to the image query 630, and display scene image 642, which has the second highest similarity, in the second direction B of scene image 641. As shown in Figure 6B, the search tool 160 can output scene image 661, which has the highest similarity to the text query 650, in the first direction A, and display scene image 662, which has the next highest similarity, along the second direction B from the first direction A. 【0127】 Furthermore, the search tool 160 can display in the second area 620 at least one of the following: a thumbnail image of a scene-based image and episode information of the content to which the scene-based image belongs. 【0128】 The search tool 160 can provide, as search results, at least a portion of the following: thumbnails of scene images with high similarity to queries 630 and 650; content information to which the scene image belongs; episode information of the content to which the scene image belongs; location information of the scene image within the episode; and the original image of the scene image. 【0129】 As shown in Figure 6A, the search tool 160 can display a specific scene image 641 (or a thumbnail of a scene image) with a high similarity to the query 630 in the second area 620. The search tool 160 can also display content information 641a (e.g., content title or a link to the content page), episode number information, or location information of the specific scene image in the episode (e.g., "scroll 7.7%") around the specific scene image 641 (or thumbnail). The search tool 160 can further display a graphic object 641b around the specific scene image 641 (or thumbnail) that is linked to providing (downloading) the original image of the specific scene image. Based on the selection of the graphic object 641b, the search tool 160 can provide the specific scene image stored in the first database 120. 【0130】 As described above, the content scene search method and system according to the present invention can improve search performance by dividing the content image file into scene units to generate scene images and storing them in a first database, thereby constructing a database using the data necessary for providing the search service. 【0131】 Furthermore, the content scene search method and system according to the present invention extracts vector data of scene images using an embedded model and stores the vector data in a second database, thereby enabling semantic-based content scene searches even for abstract user queries and efficiently providing the scene images that users need. 【0132】 Furthermore, the content scene search method and system according to the present invention can receive queries related to image search using a search tool, convert the queries into vectors, detect highly similar data from vector data in a second database, and search for and provide scene images based on queries corresponding to images or text. 【0133】 Furthermore, the content scene search method and system according to the present invention can extract scene images corresponding to highly similar data from the already stored scene images in the first database and provide them as search results for the search tool. Users can easily find the scene images they need for content creation, design, and marketing work, and the present invention can improve the user's work efficiency. 【0134】 Furthermore, the present invention described above can be implemented as computer-readable code or instruction words on a medium on which a program is recorded. That is, the present invention can also be provided in the form of a program or in the form of a program stored on a recording medium. 【0135】 On the other hand, computer-readable media include all types of recording devices that store data readable by a computer system. Examples of computer-readable media include HDDs (Hard Disk Drives), SSDs (Solid State Disks), SSDs (Silicon Disk Drives), ROMs, RAMs, CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. 【0136】 Furthermore, the computer-readable medium may include storage and may be a server or cloud storage accessible by electronic devices via communication. In this case, the computer can download the program according to the present invention from the server or cloud storage via wired or wireless communication. 【0137】 Furthermore, in the present invention, the computer described above is an electronic device equipped with a processor, i.e., a CPU (Central Processing Unit), and its type is not particularly limited. In the present invention, the "computer" described above may be implemented by an electronic device including at least one hardware processor and at least one memory device. The processor (CPU, GPU, DSP, ASIC, etc.) may execute program instructions stored in memory. The memory may include non-temporary computer-readable media such as ROM, RAM, flash memory, and other storage devices that store instructions for performing functions relating to the data processing unit 130, the retrieval tool 160, and other software modules described herein. 【0138】 Furthermore, the computer may include input / output interfaces and communication circuits that enable data exchange with user terminal 1, first database 120, and second database 140. Therefore, the functions described herein are realized by the execution of program instructions by such a processor, and the term “computer” is not limited to any particular architecture or configuration. 【0139】 On the other hand, the detailed description above should not be interpreted restrictively in any way, but should be considered illustrative. The scope of the invention should be determined by a reasonable interpretation of the appended claims, and all modifications within the scope of the equivalents of the invention are included within the scope of the invention. 【0140】 This application claims priority based on Patent Application No. 10-2024-0177684, filed with the Korean Intellectual Property Office on 3 December 2024, and the entire contents of said Patent Application No. 10-2024-0177684 are incorporated into this application by reference.

Claims

[Claim 1] A computer device has at least one processor, The steps include receiving image files related to the content, The steps include dividing the aforementioned image file into scene units to generate scene images, and storing the scene images in a first database, The steps include extracting vector data of the scene image using an embedded model and storing the vector data in a second database, The steps include receiving a query related to image search using a search tool, converting the query into a vector, and detecting data with high similarity from the vector data of the second database, A content scene search method comprising the steps of: extracting scene images from the first database that correspond to the data with high similarity from among the aforementioned scene images, and providing them as search results for the search tool. [Claim 2] A content scene search method according to claim 1, wherein a detector is used to detect the outline of the scene and speech bubbles from the image file, and speech bubbles that extend beyond the outline of the scene are removed to generate the scene image. [Claim 3] The scene images stored in the first database are assigned an image identifier. The content scene search method according to claim 1, wherein, based on the image identifier, a scene image corresponding to the highly similar data is provided as a search result in the search tool. [Claim 4] The step of storing in the second database is: The steps include: extracting the vector data from the scene image using the image encoder of the embedded model so that semantic-based searching is possible; A content scene search method according to claim 1, comprising the steps of extracting pose information using a pose detector and extracting face information using a face detector. [Claim 5] The content scene search method according to claim 4, characterized in that the embedded model is a CLIP model capable of processing images and text, respectively. [Claim 6] The content scene search method according to claim 4, wherein the pose information and the face information are stored as metadata together with the vector data in the second database. [Claim 7] The content scene search method according to claim 1, further comprising the step of storing the vector data together with metadata in a data storage unit and transmitting it to the second database. [Claim 8] The step of detecting the data with high similarity is: The steps include converting the query into a vector and requesting a search from the data retrieval unit of the computer device along with the filter information, The content scene search method according to claim 1, comprising the step of detecting image identifiers of the scene images in order of decreasing similarity based on the vector and the filter information using the data search unit. [Claim 9] The steps provided as search results by the aforementioned search tool are: The content scene search method according to claim 8, wherein the scene image is detected from the first database using the image identifier and transmitted to a user terminal on which the search tool is executed. [Claim 10] The aforementioned search tool is It includes a first area where search information is entered, and a second area where a scene image corresponding to the data with high similarity is displayed. The content scene search method according to claim 1, wherein the second region displays the images of the scenes in descending order of similarity. [Claim 11] The first region is, A first input window into which an image or text corresponding to the aforementioned query is entered, The content scene search method according to claim 10, further comprising: a second input window into which a filter condition relating to at least one of the pose and face of a person in the aforementioned scene image is entered. [Claim 12] In the aforementioned second region, The content scene search method according to claim 10, wherein at least one of the following is displayed: a thumbnail image of the scene unit image and episode information of the content to which the scene unit image belongs. [Claim 13] A first database that stores scene images obtained by dividing image files related to the content into scene units, A data processing unit that extracts vector data of the scene image using an embedded model, A second database for storing the aforementioned vector data, A content scene search service provision system comprising: a data search unit that receives a query related to image search from a search tool on a user terminal, converts the query into a vector, detects highly similar data from the vector data of the second database, extracts scene images from the scene images of the first database that correspond to the highly similar data, and provides them as search results for the search tool. [Claim 14] The aforementioned vector data is extracted from the scene image using the image encoder of the embedded model, The content scene search service provision system according to claim 13, wherein the embedded model is a CLIP model capable of processing images and text, respectively. [Claim 15] A program executed by one or more processors in an electronic device, The aforementioned program, The steps include receiving image files related to the content, The steps include dividing the aforementioned image file into scene units to generate scene images, and storing the scene images in a first database, The steps include extracting vector data of the scene image using an embedded model and storing the vector data in a second database, The steps include receiving a query related to image search using a search tool, converting the query into a vector, and detecting data with high similarity from the vector data of the second database, A program that includes a command to perform the steps of: extracting scene images from the first database that correspond to the data with the highest similarity among the aforementioned scene images, and providing them as search results for the search tool.