Information processing device, information processing method, and program
The information processing apparatus enhances image labeling efficiency and accuracy by using supplementary information to display similar images together, facilitating quick and precise label selection.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- CANON KK
- Filing Date
- 2024-12-03
- Publication Date
- 2026-06-15
Smart Images

Figure 2026096254000001_ABST
Abstract
Description
[Technical Field] 【0001】 This invention relates to an information processing technology for classifying subjects and other objects depicted in an image. [Background technology] 【0002】 Labeled image data is used for various purposes, such as training machine learning models or classifying images to improve their usability, and tools exist for assigning labels. However, when it is necessary to label a large number of images, this task is labor-intensive, and efficiency improvements are needed. Furthermore, if the worker has a poor understanding of the concepts represented by the labels, it may be difficult to determine the correct label, and incorrect labels may be assigned. 【0003】 Patent Document 1 discloses an information processing device that has a function to classify and display images that have already been labeled on the screen of a labeling tool according to the label. By using the information processing device described in Patent Document 1, workers can work while referring to the display of labeled images, which is expected to improve work efficiency and reduce errors in label judgment. [Prior art documents] [Patent Documents] 【0004】 [Patent Document 1] Japanese Patent Publication No. 2019-114018 [Overview of the Initiative] [Problems that the invention aims to solve] 【0005】 However, if there are many types of labels to assign, it can take time to find the correct label among the displayed labels, and it becomes difficult to assign labels by comparing images that look similar, which may reduce both the efficiency and accuracy of the work. 【0006】 Therefore, the present invention aims to improve the efficiency and accuracy of work. [Means for solving the problem] 【0007】 The information processing apparatus of the present invention comprises: a storage means for storing an image to which a label has been assigned and supplementary information associated with the image; a similarity acquisition means for acquiring the similarity between supplementary information associated with a target image and supplementary information associated with the image stored in the storage means; and a candidate output means for selecting at least one label candidate for the target image based on the similarity and outputting the label candidate and an image corresponding to the label candidate. The similarity acquisition means acquires the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image stored in the storage means as the similarity, and the storage means stores the label candidate selected by the user from among the label candidates output by the candidate output means as a label to be assigned to the target image. [Effects of the Invention] 【0008】 According to the present invention, the efficiency and accuracy of work can be improved. [Brief explanation of the drawing] 【0009】 [Figure 1] This figure shows an example of the functional configuration of an information processing system. [Figure 2] This is a flowchart of the labeling process. [Figure 3] This is a flowchart of the similarity acquisition process. [Figure 4] This figure shows an example of supplementary information for the target image. [Figure 5] This figure shows an example of a candidate image list. [Figure 6] This figure shows an example of what is displayed on the screen. [Figure 7] This is a flowchart for selecting or inputting labels according to user instructions. [Figure 8]It is a diagram showing an example of the screen display content according to the second modification. [Figure 9] It is a diagram showing a functional configuration example of an information processing system according to the third modification. [Figure 10] It is a flowchart of the labeling process according to the third modification. [Figure 11] It is a diagram showing an example of the screen display content according to the fourth modification. [Figure 12] It is a diagram showing a hardware configuration example of an information processing device. 【Embodiments for Carrying Out the Invention】 【0010】 Hereinafter, embodiments according to the present invention will be described with reference to the drawings. The following embodiments do not limit the present invention, and not all of the plurality of features described in these embodiments are essential for the solution means of the present invention, and those plurality of features may be arbitrarily combined. The configuration of the embodiment can be appropriately modified or changed according to the specifications of the device to which the present invention is applied and various conditions (usage conditions, usage environment, etc.). Also, in the following embodiments, the same or similar configurations and processing steps are denoted by the same reference numerals, and duplicate descriptions are omitted. 【0011】 FIG. 1 is a diagram showing a functional configuration example of an information processing system including an information processing device according to the present embodiment. As shown in FIG. 1, the information processing system includes a server device 100 which is an example of an information processing device according to the present embodiment, and at least one terminal device 200. Note that, in FIG. 1, an information processing system in which the server device 100 and the terminal device 200 are separated is taken as an example, but the information processing device according to the present embodiment may be a single device including both functions of those server device and terminal device. 【0012】 The display unit 150 of the terminal device 200 includes a display panel such as a liquid crystal panel or an organic EL panel, and displays images, character information, etc. The reception unit 160 of the terminal device 200 includes an input device such as a mouse, a keyboard, a touch panel, etc., and receives various instructions from the user. 【0013】 The storage unit 110 of the server device 100 stores two types of data: labeled data and data to be labeled. The labeled data is a set of an image, the label associated with it, and supplementary information. Assume that an image processed by the information processing system of this embodiment shows some subject, and the label is a concept representing the name or state of the subject. However, assume that the labels are concepts within a consistent framework, and there is no state where labels representing the name of the subject and labels representing the state of the subject are mixed. Also, the supplementary information is information regarding the time and space at the time of image capture, and includes at least one or more of, for example, position information at the time of image capture, time information at the time of image capture, and weather information at the time of image capture. For example, as position information, position information such as latitude and longitude obtained by GPS, etc., as time information, the date and time of year, and as weather information, temperature, humidity, precipitation, etc. are examples. These are just examples, and the supplementary information may further include other information. The data to be labeled is data that is the target of label assignment in the information processing system of this embodiment, and is at least a set of an image and the supplementary information associated with it. Note that the image that is the target of label assignment may be either an image that already has a label or an image that does not have a label. 【0014】 Hereinafter, the label assignment process to an image executed in the information processing system shown in FIG. 1 will be described while referring to the flowcharts of FIGS. 1 and 2. FIG. 2 is a flowchart showing the flow of a label assignment process, which is an example of information processing executed by the server device 100 according to this embodiment. In the following description, as a specific example, assume that the subjects in the image are various types of birds, and the label is the type of the bird that is the subject. Also, assume that the storage unit 110 already stores a sufficient number of data for the similarity acquisition unit, which will be described later, to acquire (calculate) the similarity. 【0015】 First, as part of step S101, the acquisition unit 120 of the server device 100 acquires one image to be labeled (hereinafter referred to as the target image) and supplementary information associated with that target image from the data to be labeled stored in the storage unit 110. 【0016】 Next, in step S102, the similarity acquisition unit 130 acquires the similarity between the supplementary information of the target image and the supplementary information associated with each other image stored in the storage unit 110, and outputs a candidate image list containing the similarity scores, etc. The method for obtaining similarity using the similarity acquisition unit 130 will be explained below with reference to the flowchart in Figure 3 and the table in Figure 4 showing the values for each item included in the supplementary information associated with the target image. As shown in the table in Figure 4(a), the items included in the supplementary information associated with the target image are, for example, values indicating the date and time of time information, the latitude and longitude of location information, and the temperature, humidity, and precipitation of weather information. 【0017】 First, in step S201, the similarity acquisition unit 130 converts the values of periodic items among the supplementary information associated with the target image into angles. In the example in Figure 4(a), the periodic items are "date" and "time". The similarity acquisition unit 130 defines, for example, January 1st as 0°, December 31st as 365°, and 1 day as 1 / 365°, and converts the "date" value into an angle. The date "7 / 21" (July 21st) exemplified in Figure 4(a) is the 202nd day from January 1st, so the similarity acquisition unit 130 calculates 202 [days] ÷ 365 [days] = 0.55 as the angle value representing July 21st. Furthermore, the similarity acquisition unit 130 converts the "time" value into an angle, for example, by defining 0:00 as 0°, 23:59 as 365°, and 1 minute as 1 / 1440°. Since the time "17:04" exemplified in Figure 4(a) is 1024 minutes from 0:00, the similarity acquisition unit 130 calculates 1024 [minutes] ÷ 1440 [minutes] = 0.71 as the angle value representing 17:04. Figure 4(b) shows supplementary information for the target image after the date and time values of the periodic items have been converted into angles, as described above. 【0018】 Next, in step S202, the similarity acquisition unit 130 multiplies the value of each item in the supplementary information of the target image by a weight (hereinafter referred to as the scale adjustment weight) to adjust for the differences in scale between those items. One possible way to set the scale adjustment weight is to use the reciprocal of the maximum value for each item in the supplementary information stored in the storage unit 110. For example, if the maximum value of each item is as shown in Figure 4(c), the value of each item in the supplementary information after multiplying by the reciprocal of that maximum value will be as shown in Figure 4(d). Furthermore, the supplementary information associated with each image stored in the storage unit 110 is assumed to have the same values for each item as the supplementary information for the target image, and to have undergone preprocessing. 【0019】 Next, in step S203, the similarity acquisition unit 130 calculates the vector distance between the supplemental information of the target image and each piece of supplemental information stored in the storage unit 110 as the similarity of the supplemental information. The Euclidean distance d shown in the following equation (1) can be used as the vector distance. 【0020】 【number】 【0021】 However, in equation (1), t = (t1, t2, ..., t n ) is the supplementary information vector of the target image, r=(r1,r2,···,r n ) is assumed to be a supplementary information vector for the image stored in the storage unit 110. 【0022】 Next, in step S204, the similarity acquisition unit 130 outputs a list of candidate images as a similarity matrix that stores the calculated inter-vector distance (similarity), etc. Figure 5 shows an example of a candidate image list. As illustrated in Figure 5, the candidate image list stores the file name of each image stored in the storage unit 110, the label associated with that image, the similarity (vector distance) calculated in step S203, and supplementary information, all sorted in ascending order of similarity. Note that in Figure 5, descriptions of images from the 7th image onward are omitted. 【0023】 Let's return to the flowchart in Figure 2. After the processing in step S102 described above, when we proceed to step S103, the candidate output unit 140 selects label candidates based on the similarity acquired by the similarity acquisition unit 130 in step S102. As for the method of selecting label candidates, we will assume that a predetermined number of labels will be selected from the candidate image list output by the similarity acquisition unit 130, in order from the top, so as not to have any overlaps. For example, if the candidate output unit 140 selects three label candidates, the label candidates selected from the candidate image list in Figure 5 will be "Great Tit," "Japanese White-eye," and "Long-tailed Tit." Note that the number of selected label candidates is set to three for illustrative purposes in the diagram, and in reality, it is assumed that a larger number of label candidates will be selected. 【0024】 Next, in step S104, the candidate output unit 140 outputs the target image input via the similarity acquisition unit 130, each label candidate selected in step S103, and the images corresponding to those label candidates to the terminal device 200. As a result, the display panel of the display unit 150 of the terminal device 200 displays the label candidates and their corresponding images selected by the candidate output unit 140 of the server device 100, as well as the target image. 【0025】 Figure 6 shows an example of a display screen on the display panel of the terminal device 200. The display unit 150 of the terminal device 200 displays the screen shown in Figure 6, for example, by using a browser or by executing a dedicated application program for label assignment. In the display screen of Figure 6, the image display area 151 is the area where the target image is displayed. The candidate image display area 152 is the area where each label candidate selected by the candidate output unit 140 of the server device 100 and the corresponding images are displayed. A scroll bar is also provided in the candidate image display area 152. When the user of the terminal device 200 operates the scroll bar, the display unit 150 of the terminal device 200 moves each label candidate and image vertically within the candidate image display area 152 in accordance with the operation of the scroll bar. This allows the user to view label candidates and images that are not fully displayed in the candidate image display area 152. 【0026】 The images displayed in the candidate image display area 152 may, for example, be images to which label candidates selected from the candidate image list output by the similarity acquisition unit 130 have been attached. In the example in Figure 5, images with filenames "0610.jpg", "0366.jpg", and "0248.jpg" are displayed. Alternatively, the image displayed in the candidate image display area 152 may be, for example, a representative image for each label that has been pre-set for that label. 【0027】 Furthermore, in the display screen of Figure 6, the direct input button 153 is a virtual button that the user presses when directly entering the label string. The next button 154 is a virtual button that the user presses to indicate that they want to proceed to the next screen. The exit button 155 is a virtual button that the user presses when they want to exit the display of the screen in Figure 6. 【0028】 The reception unit 160 of the terminal device 200 receives instructions from the user regarding the display screen shown in Figure 6. For example, if the user selects any of the label candidates within the candidate image display area 152, the reception unit 160 transmits information about the selected label to the server device 100. Also, for example, if the direct input button 153 is pressed, the reception unit 160 informs the display unit 150 that the direct input button 153 has been pressed. In this case, the display unit 150 displays a label input window (not shown) for the user to directly input the label string. When the user inputs the label string into the label input window via the on-screen keyboard using a keyboard or touch panel, the reception unit 160 transmits information about the entered label to the server device 100. As a result, in step S105 of Figure 2, the server device 100 obtains information about the label specified by the user through operation on the display screen of the terminal device 200. 【0029】 Figure 7 is a flowchart of the process performed in the terminal device 200, which involves the user selecting a label candidate via the display screen in Figure 6 or the user directly inputting a label. In the terminal device 200, each time the target image is changed, the loop process from step S300 to step S306 in Figure 7 is performed. 【0030】 In step S300, the display unit 150 of the terminal device 200 moves or switches the label candidates and corresponding images displayed within the candidate image display area 152 in response to the user's operation on the scroll bar of the candidate image display area 152 in Figure 6. That is, the user can instruct the display of label candidates and their corresponding images within the candidate image display area 152 to move by operating the scroll bar, thereby enabling them to find an appropriate label candidate. Furthermore, examples of user scrolling operations include the following operations on the mouse or touch panel included in the reception unit 160. For example, these include dragging the scroll bar with the mouse, rotating the mouse wheel while the cursor is over the candidate image display area 152 with the mouse, and moving the scroll bar up or down by touching the touch panel. 【0031】 Next, as part of step S301, the reception unit 160 of the terminal device 200 determines whether the user has performed an operation to indicate a label candidate within the candidate image display area 152, or whether the direct input button 153 has been pressed. For example, if the user selects one of the multiple label candidates within the candidate image display area 152, the reception unit 160 proceeds to step S302. When the process proceeds to step S302, the reception unit 160 selects a label candidate indicated by the user within the candidate image display area 152. That is, the reception unit 160 selects a label candidate if a user operation, such as clicking with the mouse included in the reception unit 160 or pressing a button on the touch panel, is performed on any of the label candidates in the candidate image display area 152. In the example in Figure 6, the appropriate label is "Long-tailed Tit," which is the third label candidate displayed within the candidate image display area 152, and the user has indicated and selected this "Long-tailed Tit" label candidate. This corresponds to the supplementary information for the row with rank "6" in the candidate image list in Figure 5. 【0032】 On the other hand, in step S301, if the reception unit 160 determines that the direct input button 153 has been pressed by the user, it determines that the user has entered the label directly and proceeds to step S303 of the terminal device 200's processing. When the process proceeds to step S303, the display unit 150 displays a label input window (not shown) for the user to directly input the label string. When the user enters a string into the label input window, the reception unit 160 acquires the entered string as the label. 【0033】 After step S302 or step S303 described above, the reception unit 160 determines, as the process of step S304, whether or not the next button 154 has been pressed by the user. If it is determined in step S304 that the next button 154 has been pressed, the reception unit 160 determines that the user has completed specifying a label for the current target image. Then, as the process of step S305, the reception unit 160 sends the label information obtained in step S302 or step S304 to the server device 100, assuming that it is the label specified by the user for the target image displayed in the image display area 151. The terminal device 200 then transitions to loop processing for the next target image sent from the server device 100. On the other hand, if the next button 154 has not been pressed by the user, the reception unit 160 returns to step S300. 【0034】 Furthermore, the reception unit 160 constantly monitors whether the end button 155 has been pressed during the loop processing, and if the end button 155 is pressed by the user, the processing shown in the flowchart of Figure 7 is terminated. The process of sending the label information acquired in step S302 or step S304 to the server device 100 may be performed not only when the next button 154 is pressed, but also when the end button 155 is pressed. 【0035】 As described above, the storage unit 110 of the server device 100 receives the label information specified by the user and transmitted from the terminal device 200, associates the label with the target image, and stores it together with the supplementary information of the target image. In the information processing system according to this embodiment, the assignment of a label to a target image is considered complete when the label specified by the user, its target image, and supplementary information are stored in the storage unit 110. 【0036】 As described above, the information processing system according to this embodiment utilizes supplementary information attached to each image and displays only images corresponding to label candidates with a high degree of similarity in supplementary information on the terminal device screen. This allows the user to perform label assignment work by referring only to candidates that are more likely to be selected from among many labels, thereby reducing the time spent referring to label candidates and enabling efficient label assignment work. Furthermore, since label candidates with similar supplementary information are displayed close together on the terminal device screen, the user can assign labels while comparing label candidates that may have similar appearances to the subject. In other words, according to this embodiment, accurate label assignment work is possible regardless of the user's level of understanding of the concept of labels, thereby improving the accuracy of label assignment work. 【0037】 <First variation> Next, as a first modification of this embodiment, an example will be described in which a weight (hereinafter referred to as the contribution adjustment weight) is introduced to adjust the contribution of each item to the distance (similarity) when the similarity acquisition unit 130 calculates the similarity (distance between supplementary information). In the first modification, the similarity acquisition unit 130 updates the contribution adjustment weight for the corresponding item according to the difference for each item between the supplementary information attached to the target image and the supplementary information attached to the image corresponding to the label candidate selected by the user. In the first modification, by updating the contribution adjustment weight, the contribution of items that are not effective when selecting label candidates can be reduced. 【0038】 In the first modified example, the similarity acquisition unit 130 uses the weighted Euclidean distance d shown in the following equation (2) when calculating the similarity in step S203 described above. 【0039】 【number】 【0040】 However, in equation (2), t and r are the same as in the example in equation (1), while w=(w1,w2,···,wn The ) represents the contribution adjustment weight and is assumed to have the same number of dimensions as the supplemental information vector mentioned above. The contribution adjustment weight is to be initialized at the start of the labeling process. Possible methods for initializing the contribution adjustment weight include setting all components of the weight to 1. 【0041】 Furthermore, the similarity acquisition unit 130 of the first modified example updates the contribution adjustment weights based on the label candidates selected by the terminal device 200 in response to user instructions in step S203 of Figure 7. Possible formulas for updating the contribution adjustment weights are shown in the following formulas (3) to (5). 【0042】 w ← w - εmΔdδ Equation (3) ε=aexp(-bcount) Equation (4) Δd = |ts| Equation (5) 【0043】 However, in the above formula, a and b are predetermined constants, count is the number of images that have already been labeled, and m is the number of label candidates displayed above the label selected by the user. Also, t is the supplementary information vector of the target image, s is the supplementary information vector of the image with the label selected by the user, and δ is δ(j, argmax(Δd)), j=1,2,...,n. 【0044】 The updating of contribution adjustment weights using the above update formula will be explained using the example in Figure 5. For example, if a=0.1, b=2, and the number of images that have already been labeled is 1, then equation (4) above becomes: ε = (1 / 10)exp(-2) = 0.0135 This is the result. Furthermore, the item with the largest difference between the supplementary information for the row with rank "6" selected by the user and the supplementary information for the target image in Figure 4(d) is "humidity," therefore Δdδ is, Δdδ=(0,0,0,0,0,0,0.66,0) This is the result. 【0045】 Furthermore, since there are 5 label candidates displayed above the label selected by the user, m = 5. Therefore, assuming that the contribution adjustment weight is initialized to 1, the contribution adjustment weight for the "humidity" item after the update is given by equation (3): w6-εmΔd6=1-0.0446=0.9554 This is the result. 【0046】 According to the update formula described above, the more label candidates that appear higher than the label selected by the user—that is, the more labels that were viewed by the user but not selected—the greater the update amount of the contribution adjustment weight. This means that the weight can be updated more significantly when the appropriate label is displayed at a lower rank. In other words, when calculating label similarity, the contribution adjustment weight for items that do not effectively contribute to the selection of any label is gradually reduced, thereby decreasing the contribution of those items to the similarity. Note that the method of updating the contribution adjustment weight is not limited to the example described above. Then, the storage unit 110 of the server device 100 in the first modified example stores the updated contribution adjustment weights. Subsequently, when a label is assigned to the next target image, the similarity acquisition unit 130 reads the contribution adjustment weights stored in the storage unit 110 and uses them in the calculation of equation (2). 【0047】 As explained above, in the first modified example, the server device 100 repeatedly updates the contribution adjustment weights. In the first modified example, when calculating label similarity, the contribution adjustment weights for items that do not effectively contribute to the selection of any label are updated to gradually decrease, thereby reducing the contribution of those items to the similarity. As a result, according to the first modified example, label candidates that are more likely to be selected by the user are more likely to be displayed higher up in the candidate image display area, the time it takes for the user to find the appropriate label from among the label candidates is reduced, and the label assignment process can be made more efficient. 【0048】 <Second variation> Next, as a second modification of this embodiment, an example will be described in which, in addition to the label candidates and corresponding images mentioned above, the content of the supplementary information items that contributed to the selection of the label candidates is also displayed on the display screen of the display unit 150 of the terminal device 200. In the case of the second modification, the candidate output unit 140 of the server device 100 outputs to the terminal device 200 the selected label candidates and corresponding images as described above, along with the content of the supplementary information items that contributed to the selection of the label candidates. 【0049】 Figure 8 shows an example of the display screen of the display unit 150 of the terminal device 200 in a second modified example. In the display screen shown in Figure 8, the display unit 150 displays the contents of the supplementary information items below each label candidate in the candidate image display area 152. In the second modified example, the candidate output unit 140 of the server device 100 determines the content of the supplementary information items that contributed to the selection of label candidates, to be displayed in the candidate image display area 152 on the display unit 150 of the terminal device 200, as follows: 【0050】 The candidate output unit 140 first calculates the difference for each corresponding item between the supplementary information associated with the target image and the supplementary information associated with the image corresponding to each label candidate, and determines the item with the smallest difference. For example, if we take the supplementary information of the target image in Figure 4(d) and the supplementary information of the image "0610.jpg" in the candidate image list in Figure 5 as examples, the item with the smallest difference will be "Date". 【0051】 Next, the candidate output unit 140 determines the wording corresponding to the item determined based on the difference (wording that represents the content of the supplementary information item). The specifications for the content of the wording shall be determined in advance for each item. For example, if the item determined based on the difference is "date", then wording such as "Found on [Year] [Month] [Day]" is determined, including that date. Also, for example, if the item determined based on the difference is "latitude", then wording such as "Found within [Number] km north-south" is determined based on the difference between the two "latitude" values of the supplementary information of the target image and the supplementary information of the label candidate. 【0052】 In the second modified example, the text determined by the candidate output unit 140 as described above is output to the terminal device 200 along with the label candidate and the corresponding image, and is displayed as the content of the supplementary information item below each label candidate in the candidate image display area 152 of the display unit 150. In the example described above, the text to be displayed in the candidate image display area 152 (the content of the supplementary information item) was determined based on the single item with the smallest difference. However, the candidate output unit 140 may, for example, determine text that represents the content of multiple items with large differences. 【0053】 As explained above, the candidate output unit 140 of the second modified version selects one or more items for the image corresponding to the label candidate based on the difference between each item in the supplementary information associated with the image and the supplementary information associated with the target image. The candidate output unit 140 then outputs the combination of the selected items and the values of those items together with the label candidate and the image corresponding to the label candidate. As a result, in the case of the second modified version, in addition to the label candidate and its corresponding image, the content of items for which the difference in supplementary information items between it and the target image was small is also displayed on the screen. This allows the user to see what supplementary information is similar for each label candidate. Furthermore, according to the second modified version, for example, when the label to be assigned is a type of animal that is characteristic of its habitat, or when it is a phenomenon that is observed under similar weather conditions, it can serve as a basis for determining an appropriate label. Therefore, it can be expected that the second modified version will enable more accurate label assignment. 【0054】 <Third variation> Next, as a third modification of this embodiment, we will describe an example in which the server device 100 further includes an image similarity acquisition unit 170 that acquires the similarity between the image features of the target image and the image stored in the storage unit 110 (hereinafter referred to as image similarity). Figure 9 is a diagram showing an example of the functional configuration of the information processing system according to the third modified example. Figure 10 is a flowchart showing the flow of label assignment processing, which is an example of information processing according to the third modified example. Below, the label assignment processing to images performed in the server device 100 of the information processing system according to the third modified example will be explained with reference to the flowcharts in Figures 9 and 10. In the server device 100 shown in Figure 9, the storage unit 110 to the candidate output unit 140 have the same configuration as the server device 100 shown in Figure 1, so their explanation will be omitted. Also, in the flowchart in Figure 10, the processing of steps S101 to S106 is the same as the corresponding steps in Figure 2 mentioned above, so their explanation will be omitted. In the case of the server device 100 of the third modified example, after step S102, the process proceeds to step S107, and then after step S108, the process proceeds to step S103 and subsequent steps. 【0055】 When the process proceeds to step S107, the image similarity acquisition unit 170 extracts image features from the target image and each image stored in the storage unit 110, and obtains the image similarity between the image features of the target image and the image features of each image stored in the storage unit 110. An example of an image feature extracted by the image similarity acquisition unit 170 is a color histogram. The method for calculating a color histogram is well known, so its explanation is omitted. When a color histogram is used as an image feature, the correlation coefficient r shown in equations (6) and (7) below can be used to represent the similarity. 【0056】 【number】 【0057】 However, in the above equation, T=(T1,T2,···,T n ) is the color histogram of the target image, and R=(R1,R2,···,R n) Let \(h\) be the color histogram of the image stored in the storage unit 110, and \(n\) be the number of bins of the color histogram. The above formula represents the color histogram for one of the three RGB channels of the image, and similar correlation coefficients are calculated for the other two channels. 【0058】 Note that the image features extracted by the image similarity acquisition unit 170 are not limited to the color histogram, and the image similarity is not limited to the correlation coefficient. Examples of other image feature amounts extracted by the image similarity acquisition unit 170 include the HOG feature amount and intermediate feature amounts obtained by inputting each image into a CNN (Convolutional Neural Network). Also, examples of the similarity of the image feature amounts include the inner product and Euclidean distance between the image feature amount vectors. Also, the image similarity acquisition unit 170 may calculate a plurality of different image feature similarities. Also, in the third modification example, the image features of the image stored in the storage unit 110 may be calculated and stored in advance. 【0059】 Next, when proceeding to step S108, the image similarity acquisition unit 170 outputs a similarity \(d'\) obtained by adding the image similarity \(r\) acquired in step S107 to the similarity \(d\) acquired by the similarity acquisition unit 130 in step S203. The similarity \(d'\) can be calculated, for example, by the following formula (8). 【0060】 \(d' = d + w\cdot r\) Formula (8) r ·r Formula (8) 【0061】 However, in the above formula (8), \(r=(r_1,r_2,\cdots,r_{n})\) is an image similarity vector having image similarities for each component, and \(w=(w_1,w_2,\cdots,w_{n})\) is a contribution adjustment weight corresponding to the image similarity vector. n ) is an image similarity vector having image similarities for each component, and \(w\) r =(w r1 ,w r2 ,···,w rn ) is a contribution adjustment weight corresponding to the image similarity vector. Note that when updating the contribution adjustment weight as described in the first modification example, for the difference vector \(\Delta d\) of the supplementary information in formula (5) and the contribution adjustment weight \(w\), the image similarity vector \(r\) and the contribution adjustment weight \(w\)r The vectors to which each of these has been added are used for calculation. This allows the same method as the first modification to be used in the third modification. 【0062】 As explained above, the server device 100 of the third modified example calculates similarity by adding the similarity of image features to the similarity of supplementary information. In other words, in the candidate output unit 140 of the third modified example, level candidates are selected by also taking into account the visual information of the subject. As a result, more appropriate label candidates are displayed higher on the screen of the display unit 150 of the terminal device 200, making the labeling process even more efficient. 【0063】 <Fourth variation> Next, as a fourth modification of this embodiment, an example will be described in which the display screen of the display unit 150 of the terminal device 200 further displays the previous image display area 156 and the previous label input button 157, as shown in Figure 11. 【0064】 In the fourth variation, the display screen shown in Figure 11 displays the previous image display area 156 and the previous label input button 157. The previous image display area 156 is an area where a reduced image of the image to which a label was assigned immediately before (i.e., immediately before) is displayed. The previous label input button 157 is a virtual button that the user can press to specify the same label as the image to which a label was assigned immediately before as the label for the current target image. In the fourth modified case, the candidate output unit 140 of the server device 100 generates a reduced image of the image to which the label was previously assigned (the previous target image) and sends it to the terminal device 200. As a result, the reduced image of the image to which the label was previously assigned (the previous target image) is displayed in the previous image display area 156 of the terminal device 200. The reduced image may also be generated and displayed by the display unit 150 of the terminal device 200. When the user presses the previous label input button 157, the same label that was assigned to the image to which the label was previously assigned is selected as the label for the current target image. The subsequent processing is the same as in the embodiment described above. 【0065】 According to the fourth modification, the user can assign the same label to the target image as the label assigned immediately before by pressing the label input button 157 in a single button operation. This allows the user to assign a label without having to search for an appropriate label from the label candidates if it can be determined that the subject of the current target image is the same as the subject of the previous image, making the label assignment process even more efficient. 【0066】 <Hardware configuration of the information processing device> Figure 12 is a diagram showing an example of the hardware configuration of an information processing device that can realize the server device 100 according to this embodiment. The CPU (Central Processing Unit) 1201 controls various devices connected to the bus 1208 and performs information processing related to each functional unit of the server device 100 described above. ROM (Read Only Memory) 1202 stores the BIOS program and boot program. RAM (Random Access Memory) 1203 is used as the main memory of CPU 1201. The large-capacity memory 1204 is included in the storage unit 110 and stores the aforementioned images, supplementary information, labels, and the information processing program according to this embodiment. The information processing program stored in the large-capacity memory 1204 is loaded into the RAM 1203 and executed by the CPU 1201. This realizes the various functional units of the server device 100 described above. 【0067】 The input unit 1205 is a keyboard, mouse, touch panel, etc., and processes information input from the user. The display unit 1206 displays images and various processing results. I / O (Input / Output) 1207 is connected to, for example, an imaging device, an external recording device (not shown), a network, or an external display device, and communicates with them. Bus 1208 connects the CPU 1201, ROM 1202, RAM 1203, large-capacity memory 1204, input unit 1205, display unit 1206, and I / O 1207 in a manner that enables them to communicate with each other. 【0068】 The present invention can also be realized by supplying a program that implements one or more of the functions of the above-described embodiments to a system or device via a network or storage medium, and by having one or more processors in the computer of that system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that implements one or more of the functions. The embodiments described above are merely examples of how the present invention can be implemented, and the technical scope of the invention should not be interpreted as being limited by them. In other words, the present invention can be implemented in various ways without departing from its technical concept or its main features. 【0069】 This embodiment includes the following configurations, methods, and programs. (Composition 1) A storage means for storing a labeled image and supplementary information associated with the image, A similarity acquisition means for acquiring the similarity between supplementary information associated with the target image and supplementary information associated with the image stored in the storage means, Candidate output means that selects at least one label candidate for the target image based on the similarity and outputs the label candidate and the image corresponding to the label candidate, It has, The similarity acquisition means acquires the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image stored in the storage means as the similarity. The storage means is characterized by storing the label candidate selected by the user from among the label candidates output by the candidate output means as a label to be assigned to the target image. (Configuration 2) The information processing apparatus according to Configuration 1, characterized in that the similarity acquisition means calculates the distance between vectors using the values of vectors obtained by weighting items included in supplementary information associated with the target image and items included in supplementary information associated with the image stored in the storage means. (Composition 3) The information processing apparatus according to configuration 2, characterized in that the weights include scale adjustment weights for adjusting for differences in scale for each item. (Composition 4) The information processing device according to configuration 3, characterized in that the similarity acquisition means sets the reciprocal of the maximum value for each item as the scale adjustment weight, and multiplies the value of the item by the reciprocal as the weight for the item. (Composition 5) The information processing device according to any one of configurations 2 to 4, characterized in that the weights include contribution adjustment weights for adjusting the contribution of each item when obtaining the similarity of each item. (Composition 6) The information processing device according to configuration 5, wherein the similarity acquisition means updates the contribution adjustment weight for the corresponding item according to the difference for each item between the supplemental information associated with the target image and the supplemental information associated with the image corresponding to the label candidate selected by the user. (Composition 7) The information processing apparatus according to configuration 5, characterized in that the similarity acquisition means calculates the contribution adjustment weight from among the label candidates output by the candidate output means, according to the result of the user's selection of the label candidates. (Composition 8) The information processing apparatus according to configuration 7, characterized in that the similarity acquisition means calculates the contribution adjustment weight based on the number of label candidates output by the candidate output means that are ranked higher than the label candidates selected by the user. (Composition 9) The information processing apparatus according to configuration 8, characterized in that the similarity acquisition means increases the amount of the contribution adjustment weight update as the number of label candidates output higher than the label candidates selected by the user increases. (Composition 10) The information processing apparatus according to any one of configurations 1 to 9, characterized in that the candidate output means selects one or more items for an image corresponding to the label candidate based on the difference between each item contained in the supplementary information associated with the image and the supplementary information associated with the target image, and outputs the combination of the selected items and the values of those items together with the label candidate and the image corresponding to the label candidate. (Composition 11) The system further includes an image similarity acquisition means for acquiring the image similarity between the target image and the image stored in the storage means, The information processing apparatus according to any one of configurations 1 to 10, characterized in that the similarity acquisition means acquires the similarity obtained by adding the image similarity to the similarity of the supplementary information. (Composition 12) The information processing apparatus according to configuration 11, characterized in that the similarity acquisition means further weights the image similarity and adds the weighted image similarity to the similarity of the supplementary information. (Composition 13) The information processing apparatus according to any one of configurations 1 to 12, characterized in that the storage means stores, based on instructions from the user, the same label that was assigned to the target image immediately preceding the new target image as the label to be assigned to the new target image. (Composition 14) The information processing apparatus according to configuration 13, characterized in that the candidate output means outputs the selected label candidates and the image corresponding to the label candidates for the new target image, as well as outputting a reduced image of the previous target image to which a label has already been assigned. (Composition 15) The information processing device according to any one of configurations 1 to 14, characterized in that the supplementary information includes at least one of the following: location, time, and weather information at the time the image was acquired. (Composition 16) The candidate output means outputs the label candidates and the images corresponding to the label candidates as information displayed on the display unit of a terminal device having a display unit and a reception unit for receiving instructions from a user. The information processing apparatus according to any one of configurations 1 to 15, characterized in that the storage means stores the label candidate selected by the user of the terminal device through the reception unit as a label to be assigned to the target image. (Method 1) A storage step of storing a labeled image and supplementary information associated with the image, A similarity acquisition step that acquires the similarity between supplementary information associated with the target image and supplementary information associated with the image saved in the saving step, A candidate output step of selecting at least one label candidate for the target image based on the similarity and outputting the label candidate and the image corresponding to the label candidate, It has, In the similarity acquisition step, the similarity is determined by obtaining the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image saved in the saving step. The information processing method is characterized in that, in the saving step, the label candidate selected by the user from among the label candidates output in the candidate output step is saved as a label to be assigned to the target image. (Program 1) A program that causes a computer to function as an information processing device described in any one of configurations 1 through 16. [Explanation of symbols] 【0070】 110: Storage unit, 120: Acquisition unit, 130: Similarity acquisition unit, 140: Candidate output unit, 150: Display unit, 160: Reception unit
Claims
[Claim 1] A storage means for storing a labeled image and supplementary information associated with the image, A similarity acquisition means for acquiring the similarity between supplementary information associated with the target image and supplementary information associated with the image stored in the storage means, Candidate output means that selects at least one label candidate for the target image based on the similarity and outputs the label candidate and the image corresponding to the label candidate, It has, The similarity acquisition means acquires the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image stored in the storage means as the similarity. The storage means is characterized by storing the label candidate selected by the user from among the label candidates output by the candidate output means as a label to be assigned to the target image. [Claim 2] The information processing apparatus according to claim 1, characterized in that the similarity acquisition means calculates the distance between vectors using the values of vectors obtained by weighting the items included in the supplementary information associated with the target image and the items included in the supplementary information associated with the image stored in the storage means. [Claim 3] The information processing apparatus according to claim 2, characterized in that the weights include scaling weights for adjusting the differences in scale for each item. [Claim 4] The information processing apparatus according to claim 3, wherein the similarity acquisition means sets the reciprocal of the maximum value for each item as the scale adjustment weight, and multiplies the value of the item by the reciprocal as the weight for the item. [Claim 5] The information processing apparatus according to claim 2, characterized in that the weights include contribution adjustment weights for adjusting the contribution of each item when obtaining the similarity of each item. [Claim 6] The information processing apparatus according to claim 5, wherein the similarity acquisition means updates the contribution adjustment weight for the corresponding item according to the difference for each item between the supplemental information associated with the target image and the supplemental information associated with the image corresponding to the label candidate selected by the user. [Claim 7] The information processing apparatus according to claim 5, characterized in that the similarity acquisition means calculates the contribution adjustment weight from among the label candidates output by the candidate output means, according to the result of the user's selection of the label candidates. [Claim 8] The information processing apparatus according to claim 7, characterized in that the similarity acquisition means calculates the contribution adjustment weight based on the number of label candidates output by the candidate output means that are ranked higher than the label candidates selected by the user. [Claim 9] The information processing apparatus according to claim 8, characterized in that the similarity acquisition means increases the amount of the contribution adjustment weight update as the number of label candidates output higher than the label candidates selected by the user increases. [Claim 10] The information processing apparatus according to claim 1, characterized in that the candidate output means selects one or more items for an image corresponding to the label candidate based on the difference between each item contained in the supplementary information associated with the image and the supplementary information associated with the target image, and outputs the combination of the selected items and the values of those items together with the label candidate and the image corresponding to the label candidate. [Claim 11] The system further includes an image similarity acquisition means for acquiring the image similarity between the target image and the image stored in the storage means, The information processing apparatus according to any one of claims 1 to 10, characterized in that the similarity acquisition means acquires the similarity obtained by adding the image similarity to the similarity of the supplementary information. [Claim 12] The information processing apparatus according to claim 11, characterized in that the similarity acquisition means further weights the image similarity and adds the weighted image similarity to the similarity of the supplementary information. [Claim 13] The information processing apparatus according to claim 1, characterized in that the storage means stores, based on instructions from the user, the same label that was assigned to the target image immediately preceding the new target image as a label to be assigned to the new target image. [Claim 14] The information processing apparatus according to claim 13, wherein the candidate output means outputs the selected label candidates and the image corresponding to the label candidates for the new target image, and also outputs a reduced image of the previous target image to which a label has already been assigned. [Claim 15] The information processing apparatus according to claim 1, characterized in that the supplementary information includes at least one of the following pieces of information: location, time, and weather when the image was acquired. [Claim 16] The candidate output means outputs the label candidates and the images corresponding to the label candidates as information displayed on the display unit of a terminal device having a display unit and a reception unit for receiving instructions from a user. The information processing apparatus according to claim 1, characterized in that the storage means stores the label candidate selected by the user of the terminal device through the reception unit as a label to be assigned to the target image. [Claim 17] A storage step of storing a labeled image and supplementary information associated with the image, A similarity acquisition step that acquires the similarity between supplementary information associated with the target image and supplementary information associated with the image saved in the saving step, A candidate output step of selecting at least one label candidate for the target image based on the similarity and outputting the label candidate and the image corresponding to the label candidate, It has, In the similarity acquisition step, the similarity is determined by obtaining the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image saved in the saving step. The information processing method is characterized in that, in the saving step, the label candidate selected by the user from among the label candidates output in the candidate output step is saved as a label to be assigned to the target image. [Claim 18] Computers, A storage means for storing a labeled image and supplementary information associated with the image, A similarity acquisition means for acquiring the similarity between supplementary information associated with the target image and supplementary information associated with the image stored in the storage means, Candidate output means that selects at least one label candidate for the target image based on the similarity and outputs the label candidate and the image corresponding to the label candidate, It has, The similarity acquisition means acquires the vector distance between the supplementary information associated with the target image and the supplementary information associated with the image stored in the storage means as the similarity. The storage means is a program that causes the storage means to function as an information processing device that stores the label candidate selected by the user from among the label candidates output by the candidate output means as a label to be assigned to the target image.