Information processing device, information processing method, and recording medium
The information processing device employs a machine learning model to extract semantic features from images and natural language criteria, facilitating objective image evaluation and automatic image improvement, addressing the inefficiencies of subjective human judgment in existing systems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NEC CORP
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-25
AI Technical Summary
Existing image evaluation systems rely heavily on subjective human judgment and struggle to efficiently and objectively evaluate images against complex criteria expressed in natural language, requiring extensive manual effort to collect images that meet review criteria.
An information processing device that uses a machine learning model to extract semantic features from both images and natural language review criteria, enabling objective evaluation and comparison across different data formats, and optionally includes a correction instruction unit to modify images based on evaluation results.
Enables efficient, objective, and consistent image evaluation, allowing for the application of complex criteria without human subjectivity, and supports automatic image improvement based on evaluation feedback.
Smart Images

Figure JP2024044565_25062026_PF_FP_ABST
Abstract
Description
Information Processing Apparatus, Information Processing Method, and Recording Medium
[0001] This disclosure relates to an information processing apparatus, an information processing method, and a recording medium.
[0002] Various image processing technologies have been proposed. Patent Document 1 discloses a technology for determining whether an input image matches a user's theme. The technology refers to the attached information given to the image and extracts an image that matches the user's theme from the image group. For example, when the theme is "a photo evaluated by French women in their 20s", an image that has received a favorable evaluation (e.g., "like") from French women in their 20s is extracted. The technology uses the extracted image as training data for machine learning and generates an inference engine that evaluates whether the input image matches the theme. Then, the technology processes the input image with the inference engine to determine whether the input image matches the user's theme.
[0003] Japanese Unexamined Patent Application Publication No. 2019-114243
[0004] An example of the purpose of this disclosure is to propose a new image processing technology.
[0005] According to one aspect of this disclosure, at least one computer acquires at least one analysis target image, receives an input of review criteria described in natural language, extracts semantic feature amounts from the at least one analysis target image and the review criteria using a machine learning model that handles a plurality of data formats, and the machine learning model that handles the plurality of data formats generates an evaluation of the conformity of the at least one analysis target image to the review criteria using the semantic feature amounts of the analysis target image and the review criteria. An information processing method is provided, which is characterized by the above.
[0006] Furthermore, according to one aspect of this disclosure, a recording medium is provided which records a program that causes a computer to perform the following steps: acquiring at least one image to be analyzed; receiving input of review criteria written in natural language; extracting semantic features from the at least one image to be analyzed and the review criteria using a machine learning model that handles multiple data formats; and the machine learning model that handles multiple data formats using the semantic features from the image to be analyzed and the review criteria to generate a suitability evaluation of the at least one image to be analyzed to the review criteria.
[0007] Furthermore, according to one aspect of this disclosure, an information processing device is provided, comprising: an image acquisition means for acquiring at least one image to be analyzed; an evaluation criteria input means for receiving evaluation criteria written in natural language; a feature extraction means for extracting semantic features from the at least one image to be analyzed and the evaluation criteria using a machine learning model that handles multiple data formats; and an evaluation generation means in which the machine learning model that handles multiple data formats generates an evaluation of the conformity of the at least one image to be analyzed to the evaluation criteria using the semantic features of the image to be analyzed and the evaluation criteria.
[0008] Figure 1 is a diagram showing an example of a functional block diagram of an information processing device. Figure 2 is a flowchart showing an example of the processing flow of an information processing device. Figure 3 is a diagram showing an example of the hardware configuration of an information processing device. Figure 4 is a diagram showing another example of a functional block diagram of an information processing device. Figure 5 is a flowchart showing another example of the processing flow of an information processing device. Figure 6 is a diagram showing another example of a functional block diagram of an information processing device. Figure 7 is a flowchart showing another example of the processing flow of an information processing device. Figure 8 is a diagram showing another example of a functional block diagram of an information processing device. Figure 9 is a flowchart showing another example of the processing flow of an information processing device. Figure 10 is a diagram showing another example of a functional block diagram of an information processing device. Figure 11 is a flowchart showing another example of the processing flow of an information processing device. Figure 12 is a diagram showing another example of a functional block diagram of an information processing device. Figure 13 is a flowchart showing another example of the processing flow of an information processing device. Figure 14 is a diagram showing another example of a functional block diagram of an information processing device. Figure 15 is a flowchart showing another example of the processing flow of an information processing device. Figure 16 is a diagram showing another example of a functional block diagram of an information processing device. Figure 17 is a flowchart showing another example of the processing flow of an information processing device. Figure 18 is a diagram showing an example of a screen provided by an information processing device. Figure 19 shows another example of a screen provided by an information processing device. Figure 20 shows another example of a screen provided by an information processing device.
[0009] The principles of this disclosure will be explained with reference to several exemplary embodiments. These embodiments are provided for illustrative purposes only and should be understood as aiding those skilled in the art to understand and implement this disclosure without implying any limitation on the scope of this disclosure. The disclosures described in this specification may be implemented in various ways other than those described below.
[0010] In the following description and claims, unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as those generally understood by those skilled in the art to which this disclosure belongs.
[0011] Embodiments of this disclosure will be described below with reference to the drawings. Each drawing is merely illustrative for illustrating one or more embodiments. Each drawing may be associated not only with one specific embodiment but also with one or more other embodiments. As those skilled in the art will understand, various features or steps described with reference to any one drawing can be combined with features or steps shown in one or more other drawings, for example, to create embodiments not explicitly shown or described. Not all features or steps shown in any one drawing are necessarily required to illustrate an exemplary embodiment, and some features or steps may be omitted. The order of steps shown in any of the drawings may be changed as appropriate.
[0012] (Embodiment 1) Figure 16 is a functional block diagram showing an overview of the information processing device 10. Figure 17 is a flowchart showing an example of the processing flow executed by the information processing device 10.
[0013] As shown in Figure 16, the information processing device 10 includes an image acquisition unit 110, an evaluation criteria input unit 120, a feature extraction unit 130, and an evaluation generation unit 140. These functional units execute the processes shown in the flowchart of Figure 17.
[0014] In S11, the image acquisition unit 110 acquires at least one image to be analyzed. In S12, the review criteria input unit 120 accepts review criteria written in natural language. In S13, the feature extraction unit 130 uses a machine learning model that handles multiple data formats to extract semantic features from at least one image to be analyzed acquired in S11 and the review criteria input in S12. In S14, the evaluation generation unit 140 uses the semantic features of the image to be analyzed and the review criteria extracted in S13 to generate an evaluation of the suitability of at least one image to be analyzed to the review criteria.
[0015] Note that the processing order of S11 and S12 is not limited to the example shown. S11 may be performed after S12, or S11 and S12 may be performed in parallel.
[0016] In this way, the information processing device 10 accepts an "image (image to be analyzed)" as input for evaluation. The information processing device 10 also accepts "text indicating the criteria for evaluation (evaluation criteria written in natural language)" as input. The information processing device 10 then evaluates whether the input "image" conforms to the "evaluation criteria written in natural language (text)". In this way, the information processing device 10 can evaluate whether "the object to be evaluated (drawing) conforms to evaluation criteria defined by data (text) of a different type than the object to be evaluated".
[0017] Furthermore, the information processing device 10 extracts semantic features from both the image to be analyzed and the text defining the review criteria. In other words, the information processing device 10 generates comparable data (features) from the image to be analyzed and the review criteria, which are different types of data. The information processing device 10 then uses the generated comparable data to evaluate whether the image to be analyzed conforms to the review criteria. With such an information processing device 10, it is possible to directly compare and evaluate the image to be analyzed and the review criteria, which are different types of data.
[0018] Here, as a comparative example of the information processing device 10, one could consider a case where multiple images that meet the review criteria defined in text are collected, and an "evaluation model that evaluates whether an image meets the review criteria" is generated by machine learning on these multiple images. In this example, the image to be analyzed is input into the evaluation model and processed to determine whether the image to be analyzed meets the review criteria. In such a comparative example, there is a problem that a lot of effort is required to collect multiple images that meet the review criteria. If there are various variations of the image to be analyzed, it is necessary to collect multiple images that meet the review criteria for each variation, and the effort becomes even greater. The information processing device 10 can mitigate such problems.
[0019] (Embodiment 2) Embodiment 2 will now be described.
[0020] [Explanation of the premise] In image review and evaluation, there has been a challenge in that it is difficult to evaluate images efficiently and consistently because it often relies on subjective human judgment. Furthermore, when review criteria are written in natural language, it is not easy to directly apply those criteria to image evaluation. Therefore, this embodiment aims to perform efficient and objective image evaluation by comprehensively analyzing images and review criteria written in natural language.
[0021] [Description of Configuration] Referring to Figure 1, the outline of the information processing device 10 according to this embodiment will be described. Figure 1 is a block diagram showing an example of the information processing device 10 according to this embodiment. The information processing device 10 comprises an image acquisition unit 110, an evaluation criteria input unit 120, a feature extraction unit 130, an evaluation generation unit 140, and a display unit 150.
[0022] The image acquisition unit 110 acquires at least one image to be analyzed. The image acquisition unit 110 can acquire the image to be analyzed by various methods. For example, the image acquisition unit 110 may acquire the image to be analyzed that the user has input to the information processing device 10 by any means. In addition, the image acquisition unit 110 may acquire the image to be analyzed from an external device. In addition, the image acquisition unit 110 may be equipped with an image generation function and generate a new image to be analyzed. The image generation function may be a function that automatically generates images (image generation AI (artificial intelligence), etc.). In addition, the image generation function may be a function that draws or generates images based on user input. In addition, the image generation function may be a function that generates an image by taking a picture of a subject with a camera. Specific acquisition methods include taking pictures with a camera, reading from a database, and downloading via a network. Other methods include acquiring design data created with a design tool.
[0023] The "image to be analyzed" may be a still image or a moving image. Furthermore, the image to be analyzed may be a color image (RGB image) or a black and white image. Also, the image to be analyzed may be a two-dimensional image or a three-dimensional image.
[0024] The evaluation criteria input unit 120 accepts evaluation criteria written in natural language. Input methods include direct input via keyboard, voice input using speech recognition, and selection from a pre-prepared list of evaluation criteria. It is also possible to read from various document file formats. In addition, it is possible to read evaluation criteria written on paper using character recognition technology (e.g., OCR (optical character recognition)).
[0025] The following is an example of the evaluation criteria that are entered into the evaluation criteria input unit 120.
[0026] [Example of a design application] An example of examination standards is the design examination standards related to the examination of design applications. For example, the design examination standards stipulate the following requirement of "novelty": "Designs that have been publicly known in Japan or abroad before the filing of the design registration application, designs described in distributed publications, or designs that have become available to the public through telecommunication lines, as well as designs similar to these designs, cannot be registered as designs."
[0027] Furthermore, the design examination standards stipulate, for example, the following criteria for determining similarity: "The similarity of designs is determined from the perspective of consumers (including traders). The method of determining similarity is as follows: First, it is determined whether the use and function of the articles, etc., to which the filed design and the cited design relate are identical or similar. Next, the shapes, etc., of both designs are identified, and commonalities and differences are extracted. In identifying the shapes, etc., both the basic configuration and specific configurations are considered. In the individual evaluation of commonalities and differences, the parts that are likely to attract the attention of consumers are identified as the essential parts of the design, and emphasis is placed on their similarity. In the case of partial designs, commonalities and differences in the use and function, location, size, scope, shape, etc., of the 'part for which design registration is sought' are identified. Finally, when determining the similarity of the designs as a whole, all commonalities and differences of both designs are observed comprehensively, and it is determined whether they evoke different aesthetic feelings in consumers."
[0028] The examination criteria input unit 120 can accept input of examination criteria that include subjective elements as described above and are written in text. The examination criteria input unit 120 may accept input of the expressions themselves that are written in the examination criteria, or it may accept input of the content that summarizes the examination criteria. For example, the examination criteria input unit 120 may accept input of text that summarizes the design examination criteria as follows.
[0029] [Example of evaluation criteria to be entered] "Please evaluate the image from the following perspectives: 1. Evaluate the similarity of the use and function of the items from the perspective of the consumer. 2. Analyze the following elements regarding the shape and other features: - Basic configuration - Specific details - Identification of key parts that attract the consumer's attention 3. Extract and evaluate similarities and differences 4. Overall evaluation of aesthetic appeal"
[0030] It should be noted that the text (examination standards) accepted by the examination standards input unit 120 is not limited to the design examination standards described above. The text (examination standards) accepted by the examination standards input unit 120 also includes quality standards and brand guidelines related to images and content, as well as self-regulatory standards of various industry associations.
[0031] Specifically, the review criteria input unit 120 may accept input as review criteria at least one of the following: • Nutritional information labeling standards for food packaging • Advertising regulations for pharmaceuticals • Product labeling standards for apparel products • Landscape regulations concerning the design of buildings • Design guidelines for rights holders in the commercialization of characters • Safety standards for automobile exterior design • Age appropriateness labeling standards for toys • Ingredient labeling standards for cosmetic packaging • Certification mark standards for environmentally friendly products • Website accessibility standards • Age restriction standards for video content • UI (user interface) / UX (user experience) guidelines for smartphone applications • Corporate logo usage regulations, ethical regulations for advertising materials • Environmental consideration standards for package design • Fire safety certification standards for building materials • Safety standards labeling standards for home appliances • Ethical standards for AI-generated content • NFT (non-fungible token) certification standards for digital art • Avatar design regulations for metaverse spaces • Safety standards for augmented reality (AR) content • Product labeling standards related to sustainability
[0032] Here, as another example of evaluation criteria, we will explain specific examples of evaluation criteria for certification marks for environmentally friendly products and functional claims for health foods.
[0033] [Examples of environmentally friendly product certification and health food functional claims] For example, the review criteria input unit 120 can accept specific review criteria in text format for environmentally friendly product certification marks, such as the following.
[0034] Please evaluate the environmental certification mark on the product packaging from the following perspectives: 1. Mark visibility evaluation - Adherence to minimum size (7mm x 7mm or larger) - Use of specified colors (Process Green C100% + Y100%) - Brightness difference from background (contrast ratio 4.5:1 or higher) 2. Environmental information display requirements - CO 21. Display of reduction rate (reduction rate of 15% or more) - Display of recyclable material content - Specific numerical values for the effect of reducing environmental impact 2. Prohibited expressions - Exaggerated expressions such as "fully environmentally friendly" - Environmental claims without specific numerical evidence 3. Placement requirements - Visible from the product name - Spacing from other certification marks (minimum 5 mm)
[0035] Furthermore, the review criteria input unit 120 can accept specific review criteria in text format for functional claims of health foods, such as the following:
[0036] Please evaluate the package design of the functional food from the following perspectives: 1. Essential elements of functional claims - The phrase "functional food" (8 points or larger) - Name and amount of the functional ingredient - Recommended daily intake - Functionality supported by scientific evidence 2. Warning label requirements - Statement that "This product is not intended for the diagnosis, treatment, or prevention of disease" - Statement that "Maintain a balanced diet based on staple foods, main dishes, and side dishes." 3. Layout regulations - Size and position of the display on the main surface - Relationship with the nutrition information 4. Prohibited expressions - Claims of medicinal efficacy - Exaggerated claims of expected effects
[0037] These review criteria are processed by the feature extraction unit 130, etc., described later, and used to evaluate the image to be analyzed. In this way, by accepting review criteria in text format, it is possible to perform the review even if the review criteria contain ambiguous elements. In particular, in the case of health foods and alternative foods, it is desirable to convey information accurately, but review criteria are often not established and contain ambiguity. In this way, by accepting review criteria in text format, it is possible to perform the review even if the review criteria contain ambiguous elements. In this embodiment, by enabling flexible input of review criteria in text format, it is possible to support the evaluation of images to be analyzed in various fields.
[0038] The feature extraction unit 130 extracts semantic features from the image to be analyzed and the review criteria. The feature extraction unit 130 can extract semantic features that indicate visual features from the image to be analyzed. In addition, the feature extraction unit 130 can extract semantic features that indicate linguistic features from the review criteria.
[0039] The feature extraction unit 130 can perform the extraction using a machine learning model that handles multiple data formats. This machine learning model can accept both image data and natural language text as input and extract semantic features from each. For example, a convolutional neural network (CNN) can be used for images, and a large-scale language model can be used for text, and a multimodal machine learning model can be constructed by combining these. It is also possible to use existing multimodal foundational models. An example of a large-scale language model is BERT (bidirectional encoder representations from transformers), but it is not limited to this.
[0040] The feature extraction unit 130 may use a multimodal model that handles data of different formats in a unified feature space, such as the CLIP model (Contrastive Language-Image Pre-training). For example, by using the CLIP model, the feature extraction unit 130 can handle the feature spaces of images and natural language in a unified manner. The CLIP model is pre-trained on pairs of images and their captions from a large internet, and projects both onto the same feature space using an image encoder and a language encoder. This makes it possible to directly compare the visual features of an image with the linguistic features of the review criteria.
[0041] Specifically, feature extraction using the CLIP model is performed in the following steps: First, an image encoder converts the image to be analyzed into a vector representation. Next, a language encoder converts the review criteria text into a vector representation. Then, the CLIP model projects both onto the same feature space, making it possible to calculate semantic similarity.
[0042] Note that the generation model used by the feature extraction unit 130 to extract semantic feature amounts is not necessarily limited to the above example. Any model can be used as long as it is a generation model that can handle multiple data formats in one feature amount space.
[0043] The feature extraction unit 130 may use, for example, any one of the following generation models. These models can convert data of multiple different formats into an appropriate feature amount space and capture the semantic relationship between the formats. - A multimodal generation model that can uniformly handle multiple modalities such as text, images, audio, and videos - A conversion model capable of mutual conversion between different data formats - A multimodal generation model based on VAE (Variational Autoencoder) or GAN (Generative Adversarial Network) - A latent representation learning model that associates multiple modalities with a common latent space - A model that learns the relationship between modalities using an attention mechanism such as Transformer - A generation model based on a diffusion model (Diffusion Model)
[0044] Thus, the feature extraction unit 130 can perform appropriate feature extraction for various types of data by using "a generation model having a common feature amount space that can uniformly handle multiple modalities and appropriately learns the semantic relationship between the modalities".
[0045] Here, as an example, an example of the processing of the feature extraction unit 130 when performing compliance checking on the actions of characters included in a moving image will be described.
[0046] Suppose the image acquisition unit 110 has acquired a moving image of a character. The feature extraction unit 130 extracts moving image data including the posture and movement of the character as time-series feature amounts and projects them onto a feature amount space common to the text of the examination criteria.
[0047] For example, suppose the examination criteria input unit 120 has received an input of the following text (examination criteria) as the action criteria of the character in the virtual space.
[0048] Please evaluate the movement of the 3D character from the following perspectives: 1. Safety evaluation of movement - Detection of sudden movements (acceleration threshold 3 m / s²) 2 (The above) - Maintaining balance (limiting the range of movement of the center of gravity) - Limiting the range of motion of joints (within the natural range of motion of the human body) 2. Animation quality - Smoothness of movement (no abrupt changes between frames) - Expression of natural weight - Appropriateness of secondary movements (swaying of clothes and hair) 3. Ethical considerations - Restriction of violent movements - Restriction of movements that contain sexual suggestions - Restriction of movements that insult specific cultures or religions
[0049] The feature extraction unit 130 can, for example, represent character movement data and evaluation criteria text in a common feature space and evaluate them.
[0050] The evaluation generation unit 140 generates an evaluation of the conformity of the image to the review criteria using the semantic features extracted by the feature extraction unit 130. The "semantic features extracted by the feature extraction unit 130" include semantic features that represent the visual features of the image to be analyzed and semantic features that represent the linguistic features of the review criteria.
[0051] The "suitability evaluation" generated by the evaluation generation unit 140 is an index indicating the extent to which the analyzed image conforms to the review criteria, and may be expressed in various formats, such as numerical scores, category classifications, and evaluation comments in text format. Methods such as calculating the similarity between extracted semantic features, or similarity calculation and classification using machine learning models, may be used to generate the suitability evaluation. Furthermore, a configuration is also possible in which images and text are input simultaneously into a single integrated model to directly generate the suitability evaluation.
[0052] The display unit 150 displays the conformity assessment generated by the assessment generation unit 140. Methods for displaying the conformity assessment include numerical scores, graphs, color coding, and text explanations. The display unit 150 may also output the conformity assessment via a display or projection device provided by the information processing device 10. Alternatively, the display unit 150 may transmit the conformity assessment to an external device and have the external device display the conformity assessment.
[0053] [Explanation of Operation] Figure 2 is a flowchart showing an example of the operation of the information processing device 10. A series of processes of the information processing device 10 will be explained with reference to Figure 2.
[0054] First, the image acquisition unit 110 acquires the image to be analyzed (S110). Next, the review criteria input unit 120 receives the review criteria written in natural language (S120). Note that the processing order of S110 and S120 is not limited to this.
[0055] Next, the feature extraction unit 130 extracts semantic features from the image to be analyzed acquired in S110 and the review criteria entered in S120 (S130). In this process, the feature extraction unit 130 extracts visual features from the image to be analyzed and linguistic features from the review criteria as semantic features.
[0056] The evaluation generation unit 140 uses the semantic features extracted in S130 to generate an evaluation of the suitability of the analyzed image to the review criteria (S140). Finally, the display unit 150 displays the evaluation results generated in S140 (S150).
[0057] [Example of Operation Screen] The operation screen of the information processing device 10 that executes the processing of each step in the flowchart of Figure 2 will be described below. Figure 18 is a diagram showing an example of the operation screen of the information processing device 10.
[0058] The user interface consists of a header at the top of the screen and two columns below it: an input section and an output section. The input section is located on the left side of the screen, and the output section is located on the right side.
[0059] The input section has an area for acquiring the image to be analyzed and an area for entering the review criteria, arranged vertically. In the area for acquiring the image to be analyzed, you can acquire the image to be analyzed by uploading, acquiring it from an external source, or generating the image. In the area for entering the review criteria, you can select the type of review criteria (design review criteria, quality criteria, brand guidelines, etc.) and enter the corresponding review criteria in natural language.
[0060] The output section has a feature extraction results area and a suitability evaluation results area arranged vertically. The feature extraction results area displays the visual features extracted from the analyzed image and the linguistic features extracted from the review criteria. The suitability evaluation results area displays the suitability evaluation of the analyzed image to the review criteria based on the feature extraction results. The suitability evaluation consists of a progress bar indicating the overall degree of suitability and a detailed explanation of the evaluation results.
[0061] An evaluation execution button is located at the bottom of the output section. After acquiring the images to be analyzed and entering the evaluation criteria, clicking the evaluation execution button will execute the processes in S130 and S140 (feature extraction and suitability evaluation generation).
[0062] This user interface allows users to intuitively perform tasks ranging from acquiring images to be analyzed to inputting review criteria and confirming evaluation results. Each function is organized into sections, allowing users to proceed with operations according to the processing flow. [Hardware Configuration] An example of the hardware configuration of the information processing device 10 is described below. Each functional unit of the information processing device 10 is realized by any combination of hardware and software. Those skilled in the art will understand that there are various variations in the implementation method and the device. The software includes programs that are pre-installed on the device at the time of shipment, as well as programs downloaded from recording media such as CDs (Compact Discs) or from servers on the Internet.
[0063] Figure 3 shows an example of the hardware configuration of the information processing device 10. In the example in Figure 3, the information processing device 10 (computer 100) includes a processor 101, memory 102, and a communication interface 103. These parts may be connected by a bus or the like. The memory 102 stores at least a portion of the program 104. The communication interface 103 includes an interface necessary for communication with other network elements.
[0064] When program 104 is executed in cooperation with the processor 101 and memory 102, etc., the computer 100 performs at least some of the processing of embodiments of this disclosure. Memory 102 may be of any type. Memory 102 may, in non-limited examples, be a non-temporary computer-readable storage medium. Memory 102 may also be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. Although only one memory 102 is shown for computer 100, computer 100 may have several physically different memory modules. Processor 101 may be of any type. Processor 101 may include one or more general-purpose computers, dedicated computers, microprocessors, digital signal processors (DSPs), and, in non-limited examples, processors based on multicore processor architectures. Computer 100 may have multiple processors, such as application-specific integrated circuit chips that are time-dependent to a clock that synchronizes the main processor.
[0065] Embodiments of this disclosure may be implemented in hardware or in dedicated circuitry, software, logic, or any combination thereof. Some embodiments may be implemented in hardware, while others may be implemented in firmware or software that can be executed by a controller, microprocessor, or other computing device.
[0066] This disclosure also provides at least one computer program product tangibly stored on a non-temporary computer-readable storage medium. The computer program product includes computer-executable instructions, such as instructions contained in a program module, and is executed on a device on a target real or virtual processor to perform the processes or methods of this disclosure. The program module includes routines, programs, libraries, objects, classes, components, data structures, etc., that perform a specific task or implement a specific abstract data type. The functionality of the program module may be combined or divided among the program module as desired in various embodiments. The machine-executable instructions of the program module can be executed on a local or distributed device. On a distributed device, the program module can reside on both local and remote storage media.
[0067] Program code for performing the method of this disclosure may be written in any combination of one or more programming languages. These program codes are provided to a processor or controller of a general-purpose computer, a dedicated computer, or other programmable data processing device. When the program code is executed by the processor or controller, the functions / operations in the flowchart and / or block diagrams it implements are performed. The program code may run entirely on a machine, partially on a machine, as a standalone software package, partially on a machine, partially on a remote machine, or entirely on a remote machine or server.
[0068] Programs can be stored and supplied to a computer using various types of non-temporary computer-readable media. Non-temporary computer-readable media include various types of tangible recording media. Examples of non-temporary computer-readable media include magnetic recording media, magneto-optical recording media, optical disc media, and semiconductor memory. Magnetic recording media include, for example, flexible disks, magnetic tapes, and hard disk drives. Magneto-optical recording media include, for example, magneto-optical disks. Optical disc media include, for example, Blu-ray discs, CD (Compact Disc)-ROM (Read Only Memory), CD-R (Recordable), and CD-RW (ReWritable). Semiconductor memory includes, for example, solid-state drives, mask ROMs, PROMs (Programmable ROMs), EPROMs (Erasable PROMs), flash ROMs, and RAMs (Random Access Memory). Programs may also be supplied to a computer using various types of temporary computer-readable media. Examples of temporary computer-readable media include electrical signals, optical signals, and electromagnetic waves. Temporary computer-readable media can supply programs to a computer via wired communication channels such as electric wires and optical fibers, or via wireless communication channels.
[0069] [Example of Use Case] As a specific example of this embodiment, the processing when design rules are input as review criteria will be described. For example, the review criteria input unit 120 accepts input of design rules (review criteria) such as "the logo should be placed in the upper left corner of the screen and its size should be within 10% of the screen" and "the background color should be the company's color and the contrast ratio should be 4.5:1 or higher." The feature extraction unit 130 analyzes these review criteria using natural language processing technology and extracts linguistic features (semantic features).
[0070] Furthermore, the image acquisition unit 110 acquires the image to be analyzed. The feature extraction unit 130 then analyzes the image using image recognition technology and extracts visual features (semantic features) such as the position and size of the logo, background color, and contrast ratio. The evaluation generation unit 140 compares the extracted linguistic features (semantic features) of the evaluation criteria with the visual features (semantic features) of the image and quantifies the degree of conformance to each criterion.
[0071] For example, if a logo is placed in the upper left corner of the screen but occupies 12% of the screen area, the evaluation generation unit 140 will generate a suitability evaluation of "Position: Suitable, Size: Unsuitable". Similarly, the evaluation generation unit 140 can also calculate the similarity between the background color and the company's colors, as well as the actual contrast ratio, and generate a comprehensive suitability evaluation that includes these results.
[0072] The display unit 150 presents the generated conformity assessment to the user. For example, the display unit 150 may generate and output natural language indicating the content of the generated conformity assessment. The display unit 150 may also generate and output a specific feedback message from the generated conformity assessment, such as the following: The display unit 150 may generate the feedback message using, for example, generation AI. "The logo placement is appropriate, but the size is inappropriate. The size needs to be reduced by 2%. The background color conforms to the company's colors, but the contrast ratio is 4.2:1, which is below the standard."
[0073] [Description of Effects] The information processing device 10 in this embodiment achieves the same effects as the information processing device 10 in Embodiment 1. Furthermore, the information processing device 10 can comprehensively analyze two different types of data: "images" and "evaluation criteria described in natural language." This enables objective image evaluation that does not rely on human subjectivity, and allows for the efficient and consistent evaluation of a large number of images. In addition, complex evaluation criteria described in natural language can be directly applied to image evaluation through machine learning models.
[0074] (Embodiment 3) Embodiment 3 will now be described. Embodiment 3 differs from Embodiments 1 and 2 in that it further includes a correction instruction receiving unit and an image generation unit in addition to the configuration of either Embodiment 1 or 2.
[0075] [Explanation of the premise] In image review and evaluation, there is a growing need to improve images based on objective evaluation results, rather than relying solely on subjective human judgment. However, it is not easy to reflect evaluation results in specific image modifications. Therefore, this embodiment aims to provide a function that accepts modification instructions in natural language and automatically modifies the image based on them. This will allow users to intuitively improve images based on evaluation results.
[0076] [Description of Configuration] Referring to Figure 4, an overview of the information processing device 20 according to this embodiment will be described. Figure 4 is a block diagram showing an example of the information processing device 20 according to this embodiment. The information processing device 20 comprises an image acquisition unit 110, an examination criteria input unit 120, a feature extraction unit 130, an evaluation generation unit 140, a display unit 150, a correction instruction reception unit 210, and an image generation unit 220. The basic configurations of the image acquisition unit 110, the examination criteria input unit 120, the feature extraction unit 130, the evaluation generation unit 140, and the display unit 150 are the same as in Embodiments 1 and 2. Therefore, the correction instruction reception unit 210 and the image generation unit 220 will be described in detail here.
[0077] The modification instruction receiving unit 210 receives modification instructions for the image to be analyzed based on the suitability evaluation generated by the evaluation generation unit 140, in natural language. For example, the modification instruction receiving unit 210 may receive modification instructions from a user who has confirmed the suitability evaluation. For example, the modification instruction receiving unit 210 can receive modification instructions from the user such as "make the background brighter" or "make the logo larger." In addition, the modification instruction receiving unit 210 may generate modification instructions. For example, the modification instruction receiving unit 210 may compare the semantic features of the image to be analyzed with the semantic features of the evaluation criteria for each item of the evaluation criteria, and determine items with a difference or a difference greater than or equal to a threshold as items to be modified. The modification instruction receiving unit 210 may then generate modification instructions that indicate instructions to change the semantic features of the items to be modified extracted from the image to be analyzed in order to eliminate the difference.
[0078] Furthermore, the correction instruction receiving unit 210 may consider the review criteria entered in the review criteria input unit 120, determine whether the correction instruction conforms to the review criteria, and then issue the correction instruction to the image generation unit 220, which will be described later. This determination is made before the image generation unit 220 generates the corrected image. This ensures that the corrected image conforms to the review criteria. Below is an example of the process for determining whether a correction instruction conforms to the review criteria.
[0079] [Example of Judgment Criteria and Correction Instructions] For example, suppose the Judgment Criteria Input Unit 120 receives the following as a Judgment Criteria: "The logo should be placed in the upper left corner of the screen, and its size should be within 10% of the screen."
[0080] The correction instruction receiving unit 210 then receives the following correction instruction from the user: "Display the logo 25% larger."
[0081] In this case, the correction instruction receiving unit 210 determines whether the size of the logo will fit within 10% of the screen (conform to the criteria) if the current logo is displayed 25% larger. If the determination is in accordance with the review criteria, the correction instruction receiving unit 210 accepts the correction instruction. The image generation unit 220, described later, then corrects the image to be analyzed based on the correction instruction received from the user and generates a corrected image. On the other hand, if the determination is in accordance with the review criteria, the correction instruction receiving unit 210 does not have to accept the correction instruction. If the correction instruction is not accepted, the image generation unit 220, described later, will not correct the image to be analyzed based on that correction instruction.
[0082] Furthermore, if the result of the judgment does not conform to the review criteria, the correction instruction receiving unit 210 may notify the user to revise the correction instruction. Also, for example, the correction instruction receiving unit 210 may notify the user of a proposed change to the correction instruction, such as, "The enlargement of the logo based on the correction instruction may exceed the limits of the review criteria. Shall we adjust the size so that it fits within 10% of the screen?" If the user gives input agreeing to the proposed correction instruction, the correction instruction receiving unit 210 may revise the correction instruction received from the user as proposed (to conform to the review criteria). Also, if the user gives input disagreeing with the proposed correction instruction, the correction instruction receiving unit 210 does not have to revise the correction instruction received from the user. In this case, the correction instruction receiving unit 210 may accept the correction instruction received from the user as is. In this case, the image generation unit 220, which will be described later, will revise the image to be analyzed based on the correction instruction (unrevised) received from the user and generate a revised image. Furthermore, if the user gives input disagreeing with the proposed correction instruction, the correction instruction receiving unit 210 does not have to accept the correction instruction entered by the user. In this case, the image generation unit 220, described later, does not perform any modification of the analysis target image (generation of a modified image) based on the modification instructions (unmodified) received from the user.
[0083] The image generation unit 220 generates a corrected image based on the correction instructions received by the correction instruction receiving unit 210 and the image to be analyzed acquired by the image acquisition unit 110. The image generation unit 220 can generate a corrected image by correcting the image to be analyzed based on the correction instructions. The image generation unit 220 can generate a corrected image using a machine learning model that handles multiple data formats. This machine learning model is based on a large-scale language model specialized for image generation and image editing tasks, and can interpret instructions in natural language and correct images based on them.
[0084] [Explanation of Operation] Figure 5 is a flowchart showing an example of the operation of the information processing device 20. A series of processes of the information processing device 20 will be explained with reference to Figure 5.
[0085] First, the image acquisition unit 110 acquires the image to be analyzed (S210). Also, the review criteria input unit 120 receives review criteria written in natural language (S220). The processing order of S210 and S220 is not limited to this.
[0086] Next, the feature extraction unit 130 extracts semantic features from the image to be analyzed acquired in S210 and the review criteria input in S220 (S230). Then, the evaluation generation unit 140 uses the semantic features extracted in S230 to generate an evaluation of the suitability of the image to be analyzed to the review criteria (S240).
[0087] Next, the display unit 150 displays the evaluation results generated in S240 (S250). Then, the correction instruction receiving unit 210 receives correction instructions (user input) for the image to be analyzed in natural language (S260). Finally, the image generation unit 220 generates and outputs a corrected image based on the correction instructions received in S260 and the original image to be analyzed (S270).
[0088] The information processing device 20 may repeat the above process. That is, the information processing device 20 may acquire the corrected image generated in S270 as the image to be analyzed (S210). Then, the information processing device 10 may perform the processing from S230 onward using the newly acquired image to be analyzed. The information processing device 20 may also perform the above-described loop processing in response to instructions from the user.
[0089] The other configurations of the information processing device 20 in Embodiment 3 can be the same as those of the information processing device 10 in Embodiments 1 and 2.
[0090] [Example of Operation Screen] The operation screen of the information processing device 20 that executes the processing of each step in the flowchart of Figure 5 will be described below. Figure 19 is a diagram showing an example of the operation screen of the information processing device 20.
[0091] The user interface includes sections for inputting evaluation results and correction instructions, as well as a preview of the corrected image and a correction history.
[0092] The evaluation results area displays compliant and non-compliant items in different colors. For example, compliant items are displayed in green and non-compliant items in red, with detailed explanations for each item. In addition, the overall evaluation shows the degree of compliance with the evaluation criteria in both numerical and text format. Below that, in the correction instruction input area, you can input correction instructions based on the evaluation results in natural language.
[0093] The edited image preview area displays the edited image generated based on the editing instructions. Below that, the editing history area displays the edits made to date in chronological order. Each editing history entry records the type of edit and the specific changes made.
[0094] The correction instruction input area includes a "Generate Corrected Image" button. Clicking this button initiates the process of generating the corrected image. The generated corrected image is immediately displayed in the preview area in the right column.
[0095] This user interface allows users to intuitively perform a series of actions, from acquiring images for analysis to checking evaluation results, entering correction instructions, and confirming the corrected images. Each function is organized into sections, allowing users to proceed according to the processing flow. Furthermore, it is possible to track the image improvement process by referring to the correction history.
[0096] [Description of Effects] The information processing device 20 of Embodiment 3 can achieve the same effects as the information processing device 10 of Embodiments 1 and 2.
[0097] Furthermore, the information processing device 20 of Embodiment 3 can receive correction instructions for the image to be analyzed in natural language based on the suitability evaluation results, and can automatically generate a corrected image according to those instructions. This allows users to improve the image to be analyzed using intuitive language instructions, and to efficiently create images that meet the review criteria. Moreover, by repeating this process, it becomes possible to gradually improve the quality of the image to be analyzed.
[0098] (Embodiment 4) Embodiment 4 will now be described. Embodiment 4 differs in that, in addition to the configuration of any of Embodiments 1 to 3, it further includes a reason generation unit.
[0099] [Explanation of the premise] In image review and evaluation, it is important to understand not only the suitability judgment result, but also the reasons that led to that judgment. However, it is time-consuming and difficult to maintain consistency for humans to describe detailed reasons for each evaluation result. Therefore, this embodiment aims to provide a function that automatically generates reasons for both suitability and non-suitability based on the suitability evaluation results. This will enable users to improve images based on deeper insights.
[0100] [Description of Configuration] Referring to Figure 6, an overview of the information processing device 30 according to this embodiment will be described. Figure 6 is a block diagram showing an example of the information processing device 30 according to this embodiment. In addition to the configuration of Embodiment 1 or 2, the information processing device 30 includes a reason generation unit 310. Here, the reason generation unit 310 will be described in detail. The configuration of the other functional units is the same as in Embodiments 1 to 3.
[0101] The reason generation unit 310 automatically generates reasons when the image to be analyzed conforms to the review criteria, based on the conformity evaluation generated by the evaluation generation unit 140. It also automatically generates reasons when the image to be analyzed does not conform to the review criteria, based on the conformity evaluation generated by the evaluation generation unit 140. The reason generation unit 310 uses a large-scale language model to generate the above reasons expressed in natural language, taking the conformity evaluation and review criteria as input.
[0102] [Explanation of Operation] Figure 7 is a flowchart showing an example of the operation of the reason generation unit 310.
[0103] First, the reason generation unit 310 receives a suitability evaluation from the evaluation generation unit 140 (S310).
[0104] Next, the reason generation unit 310 generates a suitable reason (S320). For example, the reason generation unit 310 uses a large-scale language model to generate a reason such as: "This image under analysis may meet the review criteria. In particular, the use of color conforms to the criteria and effectively represents the product's features."
[0105] Next, the reason generation unit 310 also generates reasons for non-compliance (S330). For example, the reason generation unit 310 uses a large-scale language model to generate the following reason: "This image under analysis may not meet the review criteria. The representation of the main product features is somewhat unclear, and some of the colors used may be outside the range specified by the criteria."
[0106] If the review criteria include multiple evaluation items, the reason generation unit 310 can generate reasons for compliance in evaluation items where the image to be analyzed conforms to the review criteria, and generate reasons for non-compliance in evaluation items where the image to be analyzed does not conform to the review criteria. Note that the processing order of S320 and S330 is not limited to the illustrated example. S320 may be performed after S330, or S320 and S330 may be performed in parallel.
[0107] Finally, the display unit 150 displays both reasons for generation (S340).
[0108] [Example of Operation Screen] The operation screen of the information processing device 30 that executes the processing of each step in the flowchart of Figure 7 will be described below. Figure 20 is a diagram showing an example of the operation screen of the information processing device 30.
[0109] The user interface includes an evaluation and reasons section. This section displays, from top to bottom, the overall evaluation result, reasons for suitability, reasons for non-suitability, and a detailed analysis. The overall evaluation result area visually displays the degree of conformity to the evaluation criteria as a percentage and a progress bar.
[0110] The "Reasons for Compliance" area displays the points in which the analyzed image meets the review criteria, item by item. Each item includes a heading and a detailed explanation of the compliance. Similarly, the "Reasons for Non-Compliance" area displays the points in which the analyzed image does not meet the review criteria, item by item.
[0111] The detailed analysis area has a collapsible structure, and clicking on it displays detailed information such as the evaluation methodology, rationale for the analysis, and recommended actions. This information is useful for gaining a deeper understanding of the evaluation results.
[0112] This user interface allows users to understand the evaluation results and their reasons from multiple perspectives. The clear distinction between suitable and unsuitable points, along with detailed explanations, makes it easy to identify areas for improvement and areas to maintain in the image.
[0113] [Description of Effects] In this embodiment, the information processing device 30 can automatically generate and present to the user the reasons for both suitability and non-suitability based on the suitability evaluation. This allows the user to obtain a multifaceted interpretation of the suitability evaluation and to judge the suitability of the image to be analyzed based on deeper insights. Furthermore, by understanding both the suitability and non-suitability points, the user can clearly grasp the points that should be improved and the points that should be maintained in the image to be analyzed, and make more effective corrections and improvements.
[0114] (Embodiment 5) Embodiment 5 will now be described. Embodiment 5 differs in that, in addition to the configuration of any of Embodiments 1 to 4, it further includes a similar image search unit.
[0115] [Explanation of the premise] In image review and evaluation, it is preferable to perform a more objective and fair evaluation by comparing a single image with other similar images, rather than evaluating it independently. However, manually searching for similar images is time-consuming and prone to oversight. Therefore, this embodiment aims to realize a function that automatically searches for similar images based on the image to be analyzed and provides them as comparison targets. This enables more efficient and comprehensive image evaluation.
[0116] [Description of Configuration] Referring to Figure 8, an overview of the information processing device 40 according to this embodiment will be described. Figure 8 is a block diagram showing an example of the information processing device 40 according to this embodiment. The information processing device 40 includes a similar image search unit 410 in addition to the configuration of any of embodiments 1 to 4. Here, the similar image search unit 410 will be described in detail. The configuration of the other functional units is the same as in embodiments 1 to 4.
[0117] The similar image search unit 410 automatically performs a similar image search based on the image to be analyzed acquired by the image acquisition unit 110, and acquires multiple images for comparison. For example, the similar image search unit 410 can automatically perform a similar image search as soon as the image acquisition unit 110 has acquired at least one image to be analyzed. The similar image search unit 410 searches for similar images from a large image database using a machine learning model that extracts image features and calculates similarity. Alternatively, the similar image search unit 410 may use a multimodal model that can handle both images and natural language (e.g., CLIP) to search for similar images while considering both the visual and semantic features of the images.
[0118] [Explanation of Operation] Figure 9 is a flowchart showing an example of the operation of the similar image search unit 410.
[0119] First, the similar image search unit 410 receives the image to be analyzed from the image acquisition unit 110 (S410). Next, the similar image search unit 410 extracts the features of the received image to be analyzed (S420). Depending on the selected search method, this feature extraction may include the extraction of visual features, the generation of hash codes, the generation of feature representations using a multimodal model, or the analysis of metadata.
[0120] Next, the similar image search unit 410 searches for similar images from the image database using the features extracted in S420 (S430). In S430, depending on the search method, similarity calculation of features, comparison of hash codes, similarity calculation using a multimodal model, metadata matching, etc., may be performed.
[0121] Next, the similar image search unit 410 identifies the similar images to be output from among the similar images found in S430 (S440), and transmits the identified similar images to the evaluation generation unit 140 and the display unit 150 (S450). In S440, the similar image search unit 410 may, for example, identify a predetermined number of similar images starting with those with the highest similarity, or it may identify the similar images to be output based on other conditions. The evaluation generation unit 140 may calculate and output the similarity between the image to be analyzed and the similar images. The display unit 150 may also present the similar images to the user. For example, the display unit 150 may display the image to be analyzed and the similar images side by side.
[0122] Furthermore, when combined with the reason generation unit 310 of Embodiment 4, the reason generation unit 310 may generate reasons for similarity and reasons for dissimilarity. For example, the reason generation unit 310 can generate a reason such as, "This image to be analyzed is similar to the image in the search results. In particular, the composition and color scheme are very similar, suggesting the same product category." Alternatively, the reason generation unit 310 can generate a reason such as, "This image is not similar to the image in the search results. The overall atmosphere is different, and in particular, the background processing method and the arrangement of the main objects are significantly different." This makes it possible to analyze the similarity between images in more detail in light of the review criteria. The reason generation unit 310 may also have the generating AI determine whether the image to be analyzed and the similar image are similar or not, and generate the reasons for that determination.
[0123] [Description of Effects] In this embodiment, the information processing device 40 can automatically search for images similar to the image to be analyzed and provide them as comparison images. This eliminates the need for the user to manually search for comparison images, allowing for more efficient evaluation and analysis of the image to be analyzed. Furthermore, by combining various search methods, it becomes possible to search for a wide range of similar images that consider not only visual similarity but also semantic similarity, enabling a more multifaceted understanding of the image's uniqueness and market positioning.
[0124] (Embodiment 6) Embodiment 6 will now be described. Embodiment 6 differs from Embodiments 1 to 5 in that, in addition to the configuration of any of Embodiments 1 to 5, it further includes a target layer identification unit and a modification policy generation unit.
[0125] [Explanation of the premise] When correcting images based on image review and evaluation, it is preferable to consider not only general review criteria but also the effect on a specific target group. However, devising an appropriate correction strategy for each target group is a difficult task that requires specialized knowledge and experience. Therefore, this embodiment aims to provide a function that automatically proposes a correction strategy that is effective for a specific target group. This makes it possible for users to efficiently create images that meet review criteria while simultaneously appealing to their target group.
[0126] [Description of Configuration] Referring to Figure 10, an overview of the information processing device 50 according to this embodiment will be described. Figure 10 is a block diagram showing an example of the information processing device 50 according to this embodiment. In addition to the configuration of any of embodiments 1 to 5, the information processing device 50 includes a target layer identification unit 510 and a modification policy generation unit 520. Here, the focus will be on the target layer identification unit 510 and the modification policy generation unit 520. The configuration of the other functional units is the same as in embodiments 1 to 5.
[0127] The target group identification unit 510 estimates a specific target group from the image to be analyzed. For example, the target group identification unit 510 may have the generating AI estimate the target group of the image to be analyzed. For example, the target group identification unit 510 may have the generating AI estimate the attributes of people who are likely to be interested in the image to be analyzed, and the people identified by those attributes may be the target group. Alternatively, the target group identification unit 510 may accept input of the attributes of the target group from the user and identify the target group based on the attribute information input by the user. Here, "target group" refers to the consumer group that is the main target of the product or service, and is defined by attributes such as age, gender, occupation, and hobbies.
[0128] The revision policy generation unit 520 automatically proposes an effective revision policy based on the identified target group and review criteria. Here, "revision policy" refers to specific instructions or suggestions for changing the visual elements (color, composition, font, image size, etc.) of the image to be analyzed.
[0129] [Explanation of Operation] Figure 11 is a flowchart showing an example of the operation of the target layer identification unit 510 and the correction policy generation unit 520.
[0130] First, the target layer identification unit 510 acquires information to identify the target layer and estimates the target layer of the image to be analyzed based on that information (S510). For example, the target layer identification unit 510 can receive the image to be analyzed and analyze it using image recognition technology or a machine learning model to estimate the target layer from the elements contained in the image to be analyzed. Alternatively, the target layer identification unit 510 may accept input of the attributes of the target layer from the user and identify the target layer based on the attribute information input by the user.
[0131] Next, the target layer identification unit 510 transmits (outputs) information indicating the estimated target layer to the correction policy generation unit 520 (S520).
[0132] The modification policy generation unit 520 receives information indicating the target audience and review criteria as input (S530), and generates an effective modification policy using a large-scale language model (S540). For example, the modification policy generation unit 520 may have the generation AI generate a modification policy for the image to be analyzed in order to make it an image that "meets the review criteria and is appealing to the target audience."
[0133] For example, if the target audience is "women in their 20s" and the evaluation criterion is "clearly expressing the product's features," the revision policy generation unit 520 can generate a revision policy like this: "Use pastel colors and make the key parts of the product larger. Also, use a handwritten-style font to give a more approachable impression."
[0134] The correction policy generation unit 520 outputs the generated correction policy to the display unit 150. The display unit 150 presents the received correction policy to the user (S550). The correction policy generation unit 520 may also output the generated correction policy to the correction instruction receiving unit 210. The image generation unit 220 may correct the image to be analyzed based on the correction policy input to the correction instruction receiving unit 210 and generate a corrected image.
[0135] [Description of Effects] In this embodiment, the information processing device 50 can automatically propose effective modification strategies for a specific target group. This allows users to efficiently create images that meet the review criteria while simultaneously appealing to the target group. Furthermore, automating the identification of the target group reduces the burden on users and enables more objective targeting.
[0136] (Embodiment 7) Embodiment 7 will now be described. Embodiment 7 differs from Embodiments 1 to 6 in that, in addition to the configuration of any of Embodiments 1 to 6, it includes a review step decomposition unit and multiple machine learning models.
[0137] [Explanation of the premise] In image review and evaluation, there are limitations to processing complex review processes with a single general-purpose model. Each review step has its own specific judgment criteria and evaluation methods, requiring a specialized approach for each. Therefore, this embodiment aims to provide a function that breaks down the image review procedure into multiple steps and automates it by combining multiple machine learning models specialized for each step. By utilizing multiple machine learning models specialized for each step, more professional and accurate evaluation becomes possible at each review step.
[0138] [Description of Configuration] Referring to Figure 12, an overview of the information processing device 60 according to this embodiment will be described. Figure 12 is a block diagram showing an example of the information processing device 60 according to this embodiment. In addition to the configuration of any of embodiments 1 to 6, the information processing device 60 includes a review step decomposition unit 610 and a plurality of machine learning models 620. Here, the review step decomposition unit 610 and the plurality of machine learning models 620 will be described in detail. The configuration of the other functional units is the same as in embodiments 1 to 6.
[0139] The review step decomposition unit 610 decomposes the image review procedure into multiple review steps. Each review step performs a review of individual evaluation items. Thus, by decomposing into multiple review steps, the multiple evaluation items included in the review criteria are broken down into individual review steps. Each review step indicates individual evaluation items based on the review criteria. Note that each review step and the evaluation items for each review step can be flexibly set according to the review target and purpose.
[0140] As an example, the operation of the examination step breakdown unit 610 will be explained using the case where the examination criteria are the design examination criteria.
[0141] First, let's assume that the product label design is entered as the image to be analyzed, and the text "Examine based on the design application guidelines" is entered as the examination criteria.
[0142] The examination step breakdown unit 610 breaks down the input examination criteria into, for example, the following examination steps: • Evaluation of novelty • Search for similar designs • Comparison with publicly known designs
[0143] The method by which the examination step decomposition unit 610 decomposes the input text (examination criteria) in this manner is not particularly limited, and any method can be used. For example, the examination step decomposition unit 610 may input examination criteria or examination guidelines containing multiple examination steps into the generating AI and have it extract multiple examination steps from them.
[0144] The review step decomposition unit 610 selects a machine learning model corresponding to each review step from, for example, a memory unit (not shown), and has the selected machine learning model execute a task corresponding to each review step.
[0145] Here, we will describe an example of how the review step decomposition unit 610 selects a machine learning model from the memory unit.
[0146] The memory unit stores machine learning models and metadata indicating the functions of those machine learning models. Multiple machine learning models with various functions are generated in advance and stored in the memory unit. Examples of machine learning model functions include similar image search, image feature extraction, similarity calculation, image comparison, shape analysis, and pattern matching.
[0147] The examination step decomposition unit 610 identifies the characteristics of each decomposed examination step. For example, the examination step decomposition unit 610 may have the generating AI extract the characteristics of each examination step from the content of the examination criteria and guidelines for each examination step. Then, based on the characteristics of each examination step and the functions of each machine learning model, the examination step decomposition unit 610 selects a machine learning model for each examination step. For example, the examination step decomposition unit 610 may have the generating AI select a machine learning model from among the machine learning models having the above-mentioned functions that is preferable for use in each examination step having the extracted characteristics. For example, for an examination step with the characteristic of "searching for similar designs," the examination step decomposition unit 610 can select a machine learning model that has functions such as "similar image search," "image feature extraction," and "similarity calculation." Also, for an examination step with the characteristic of "comparison with publicly known designs," the examination step decomposition unit 610 can select a machine learning model that has functions such as "image comparison," "shape analysis," and "pattern matching." If multiple machine learning models are identified as candidates, the review step decomposition unit 610 may select the most suitable model from among them based on performance indicators such as accuracy and processing speed, or on past usage records.
[0148] Multiple machine learning models 620 assigned to each examination step perform evaluations specific to each examination step. For example, when evaluating novelty, the examination step decomposition unit 610 may perform the evaluation by combining a similar design search model and an image comparison model. The similar design search model searches for similar designs from a large design database. The image comparison model quantitatively evaluates the similarity between the searched designs and the image to be analyzed.
[0149] The method by which the review step decomposition unit 610 combines various models is not limited to the above example. The review step decomposition unit 610 may also use a method called a Mixture of Expert (MoE).
[0150] Specifically, we will explain the case where the examination step decomposition unit 610 uses MoE.
[0151] [When using MoE] MoE refers to the structure of a machine learning model composed of multiple machine learning models (experts) and gate models. When using MoE, the review step decomposition unit 610 obtains multiple available machine learning models.
[0152] As an example, the examination step breakdown unit 610 is assumed to acquire the following three machine learning models in order to evaluate the three aspects of a design application: (1) novelty, (2) ease of creation, and (3) industrial applicability: • A novelty evaluation model specialized in image feature extraction and similarity calculation. • A creativity evaluation model specialized in evaluating the originality of a design. • An industrial applicability evaluation model specialized in evaluating manufacturability.
[0153] Each machine learning model can calculate an evaluation score for the input image to be analyzed. The evaluation scores output by each machine learning model may be normalized to values that are comparable to each other, for example, between 0 and 1.
[0154] Next, the review step decomposition unit 610 uses a gate model to determine the weights of each of the aforementioned machine learning models. The gate model is a model that has been trained to dynamically determine the weights of each expert (machine learning model) by analyzing the features of the image to be analyzed and the review criteria (text).
[0155] The gate model assigns the weights p of each expert to the input x using the following formula. i Calculate (x).
[0156]
[0157] x is the input to the gate model. Specifically, x is a vectorized representation of information from both the image to be analyzed and the review criteria (text). For example, the image is vectorized using an image feature extraction model (such as CNN), and the text is vectorized using a large-scale language model (such as BERT). The input x is the result of integrating these vectors. x is a vector that represents the semantic features of both the image to be analyzed and the review criteria.
[0158] h(x) is the output of the gate model. h(x) itself is an internal value used to calculate the weight of each expert. Rather than h(x) having direct meaning on its own, it is used as input to calculate the weight pi(x) of each expert. h(x) forms the basis for determining the importance and applicability of each expert. Rather than the specific value of h(x) itself having meaning, it is used as a medium for calculating the weight of each expert. The values of each element of h(x) are used to calculate the weight of each machine learning model (expert). h(x) is an intermediate value calculated internally by the gate model.
[0159] The gate model internally calculates h(x) based on the input x. Then, using this h(x) as a basis, the gate model calculates and outputs pi(x), which is the weight of each expert, according to Equation 1. In other words, the final output of the gate model is the weight pi(x) of each expert. h(x) is an intermediate value used to calculate pi(x). The process of calculating h(x) internally within the gate model is determined by the gate model's learning process.
[0160] N represents the total number of machine learning models acquired. In the example above, N = 3. I and j are the sequential numbers of the experts.
[0161] The final output y is calculated by weighting and integrating the outputs of each expert. Specifically, the final output y is given by the following formula.
[0162]
[0163] Here, E i(x) shows the output of expert i. When evaluating (1) novelty, (2) ease of creation, and (3) industrial applicability, an example of the output result is as follows.
[0164] The novelty evaluation model E1(x) outputs search results for similar designs and a similarity score (e.g., "Similarity to the most similar design: 0.85"). The originality evaluation model E2(x) outputs an originality evaluation score for the design elements (e.g., a score based on criteria such as "whether or not it is a creation by combining existing design elements"). The industrial applicability evaluation model E3(x) outputs a manufacturability evaluation result score (e.g., a score based on criteria such as "whether or not it can be mass-produced using industrial production methods").
[0165] The weights calculated by the gate model (e.g., p1(x)=0.5, p2(x)=0.3, p3(x)=0.2) are applied to these individual output results to generate the final evaluation result. For example, the sum of the values obtained by multiplying the evaluation scores output by each machine learning model by the weights of each machine learning model may be used as the final evaluation result for the analyzed image.
[0166] For example, when evaluating (1) novelty, (2) ease of creation, and (3) industrial applicability, the output results are expressed in the following format: The novelty evaluation model E1(x) outputs search results for similar designs and similarity scores (e.g., "Similarity to the most similar design: 0.85"). The originality evaluation model E2(x) outputs the originality evaluation score of the design elements (e.g., "Creation by combining existing design elements"). The industrial applicability evaluation model E3(x) outputs the evaluation results of manufacturability (e.g., "Mass production possible by industrial production methods").
[0167] For these individual output results, the weights calculated by the gate model (e.g., p1(x)=0.5, p2(x)=0.3, p3(x)=0.2) are applied, and the final output y can be, for example, the following result: "This design shows a relatively high score (0.85) in novelty, and is particularly outstanding in industrial applicability (0.95). Although there are some issues with originality (0.70), it shows a high degree of suitability (0.825) in overall evaluation."
[0168] Furthermore, the weighting of the gate model is dynamically adjusted according to the input review criteria and the characteristics of the item being evaluated. For example, in cases where novelty assessment is particularly important, the weight of the novelty assessment model may be automatically set higher.
[0169] Furthermore, when the evaluation step decomposition unit 610 integrates the output results of multiple machine learning models, it may also consider the reliability of each machine learning model. Various techniques can be used to calculate the reliability of each machine learning model. For example, the reliability of each machine learning model may be calculated using a predetermined calculation algorithm based on the past evaluation accuracy of each machine learning model and the quality of the input data (image to be analyzed). For example, the lower the resolution of the image to be analyzed, the lower the reliability of the evaluation model that performs shape analysis may be set. The evaluation step decomposition unit 610 may, for example, use the sum of the values obtained by multiplying the evaluation score output by each machine learning model by the weight and reliability of each machine learning model as the final evaluation result of the image to be analyzed.
[0170] Furthermore, the examination step decomposition unit 610 can also determine whether additional examination steps are necessary based on the integrated evaluation results. For example, for each examination step, other examination steps that are required if the integrated evaluation results meet predetermined conditions may be predefined. The examination step decomposition unit 610 may then determine whether additional examination steps are necessary based on the integrated evaluation results of each examination step and the above definitions. For example, if the novelty evaluation result falls within a predetermined numerical range, it may be defined that an additional examination model is performed to conduct a more detailed similarity analysis.
[0171] This MoE model allows for more accurate evaluation by efficiently changing the weighting of multiple models according to specific tasks, and can flexibly adapt to increases or decreases in the number of available models.
[0172] [Explanation of Operation] Figure 13 is a flowchart showing an example of the operation of the review step decomposition unit 610 and multiple machine learning models 620.
[0173] First, the examination step decomposition unit 610 receives the examination criteria as input (S610) and decomposes them into multiple examination steps (S620).
[0174] Next, the review step decomposition unit 610 selects a machine learning model corresponding to each decomposed review step (S630), and evaluates the image to be analyzed using the selected machine learning model (S640).
[0175] For example, when examining a design, the following procedures may be carried out.
[0176] In the "novelty assessment" review step, a machine learning model is used to perform similar image searches and image comparisons. This machine learning model compares existing designs in the database with the image being analyzed and quantifies the degree of similarity. Based on this similarity, the machine learning model can calculate an evaluation score using a predetermined calculation.
[0177] In the "Assessment of Creative Difficulty" evaluation step, a machine learning model is used to analyze the complexity and originality of the design. This machine learning model analyzes the complexity of the shape, the combination of colors, the arrangement of elements, etc., and can calculate an evaluation score indicating the difficulty of creation through predetermined calculations based on the analysis results.
[0178] In the "Evaluation of Functional Design" examination step, a machine learning model is used to determine whether elements within an image are functional or decorative. This machine learning model analyzes the shape and arrangement of each design element and, based on the analysis results, calculates an evaluation score indicating its functionality.
[0179] The evaluation step decomposition unit 610 integrates the evaluation results (evaluation scores) of each machine learning model (S650) and outputs the final evaluation result (S660).
[0180] Furthermore, this method is applicable not only to design examination but also to image evaluation in various fields such as trademarks, user interfaces, and advertising designs. By appropriately setting the examination steps and corresponding machine learning models, detailed evaluations tailored to the characteristics of each field can be performed.
[0181] [Description of Effects] In this embodiment, the information processing device 60 can break down the image review procedure into detailed review steps and automate each review step by combining multiple machine learning models specialized for each step. This enables more professional and accurate evaluation at each review step, resulting in high-quality review results overall. Furthermore, the transparency of the review process is improved, and users can individually check the evaluation results at each review step.
[0182] (Embodiment 8) Embodiment 8 will now be described. Embodiment 8 differs from Embodiments 1 to 7 in that, in addition to the configuration of any of Embodiments 1 to 7, it further includes an examination progress input unit and an examination criteria generation unit.
[0183] [Explanation of the premise] In image review and evaluation, it is preferable to continuously improve the review criteria by leveraging past review experience, rather than relying on fixed review criteria. However, it is difficult for humans to analyze a large amount of review history and extract generalized criteria from it. Therefore, this embodiment aims to provide a function that automatically generates new review criteria from past review history. This improves the consistency and transparency of the review process and makes it possible to build a flexible review system that can respond to changing market and technological trends.
[0184] [Description of Configuration] Referring to Figure 14, an overview of the information processing device 70 according to this embodiment will be described. Figure 14 is a block diagram showing an example of the information processing device 70 according to this embodiment. In addition to the configuration of any of embodiments 1 to 7, the information processing device 70 includes an examination progress input unit 710 and an examination criteria generation unit 720. Here, the examination progress input unit 710 and the examination criteria generation unit 720 will be described in detail. The configuration of the other functional units is the same as in embodiments 1 to 7.
[0185] The review progress input unit 710 accepts the review progress associated with at least one of the two images as input.
[0186] "Examination history" refers to information that shows the content of past examinations. Specifically, the examination history includes information generated during the examination process and the examination results. If the examination is a design examination, the examination history includes information such as the issued notices of reasons for refusal, responses to them, and the final examination results.
[0187] One of the two images is the image to be analyzed. For example, the review process input unit 710 can obtain the review process of the image to be analyzed in the past. The other of the two images is an image other than the image to be analyzed, which has been subject to review in the past. For example, the review process input unit 710 can obtain the review process of the image other than the image to be analyzed in the past. The other of the two images can be an image similar to the image to be analyzed, or an image of a case where the review situation is similar to that of the image to be analyzed. For example, a user may search for such an image using any method, obtain the review process of that image, and input it into the information processing device 70.
[0188] The examination progress input unit 710 accepts examination progress, such as that of a regular user, as input for examination progress related to the image to be analyzed. For example, the examination progress entered by the user can be searched by any means and input into the information processing device 70. As described in the above embodiment, in one example, the image to be analyzed is evaluated for conformity to the design examination standards. In this case, the examination progress input unit 710 can accept the examination progress of past design examinations of the image to be analyzed as input for examination progress related to the image to be analyzed. Alternatively, the examination progress input unit 710 can accept the examination progress of past design examinations of images other than the image to be analyzed as input for examination progress related to the image to be analyzed.
[0189] The review criteria generation unit 720 generates new review criteria based on the information shown in the input review progress.
[0190] [Explanation of Operation] Figure 15 is a flowchart showing an example of the operation of the review progress input unit 710 and the review criteria generation unit 720.
[0191] First, the examination progress input unit 710 receives the examination progress as input (S710). This includes the contents of the notice of reasons for rejection, the applicant's response, and images before and after the revision.
[0192] Next, the examination criteria generation unit 720 analyzes the input examination history (S720). In this process, the examination criteria generation unit 720 uses natural language processing technology to understand the examination history (content of the notice of reasons for rejection, applicant's response, etc.). The examination criteria generation unit 720 also uses image processing technology to analyze the changes in the image before and after the modification.
[0193] Next, based on the analysis results of S720, the examination criteria generation unit 720 generates new examination criteria (S730). For example, the examination criteria generation unit 720 extracts common points from multiple rejection notices and generates generalized examination criteria. Alternatively, the examination criteria generation unit 720 estimates the elements that examiners consider important from the changes in images before and after modification and formalizes them as examination criteria. For example, the examination criteria generation unit 720 may estimate the elements that examiners consider important in the rejection reason from the changes in images before and after the rejection reason is resolved and formalize them as examination criteria. For example, the examination criteria generation unit 720 may use generated AI to explain the changes in images before and after the rejection reason is resolved in natural language and include that explanation in the examination criteria as "an example of guidelines for resolving the rejection reason."
[0194] The review criteria generation unit 720 updates the existing review criteria based on the review criteria generated in S730 (S740). For example, the review criteria generation unit 720 may integrate the review criteria generated in S730 with the existing review criteria. Alternatively, the review criteria generation unit 720 may add the review criteria generated in S730 as new review criteria.
[0195] [Description of Effects] In this embodiment, the information processing device 70 can generate new review criteria from past review history. For example, the information processing device 70 can generate new review criteria based on the past review history of the image to be analyzed, the past review history of images similar to the image to be analyzed, and the past review history of images in cases where the review situation is similar to that of the image to be analyzed. This improves the consistency and transparency of the review process, and enables users to create and modify images based on more specific and practical review criteria. In addition, the information processing device 70 can create review criteria based on the past review history of the image to be analyzed and similar cases. Furthermore, it becomes possible to continuously update and improve the review criteria, and a flexible review system that can respond to changing market and technological trends can be built.
[0196] Generative AI can be implemented, for example, by a neural network. A neural network contains multiple artificial neurons, each with synapses connecting them. Each synapse has a weight. When such a neural network receives input, it performs calculations using the weights associated with each synapse and produces an output corresponding to the input. The model representing the connection relationships between neurons and synapses is stored in memory, for example, in software format. Alternatively, the model may be implemented as a dedicated circuit. Similarly, the weights of each synapse are also stored in memory in software format. Alternatively, a circuit representing the weights may be implemented in a dedicated circuit. When constructing a generative AI using multiple models, it is not necessarily required that all models be stored in the same memory. There are many different models that use such neural networks. Generative AI can be implemented by adopting and substituting a wide variety of models, such as Transformers, Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs).
[0197] While this disclosure has been described above with reference to embodiments, it is not limited thereto. The structure and details of this disclosure can be combined with various modifications and embodiments that will be understood by those skilled in the art within the scope of this disclosure.
[0198] Some or all of the above embodiments may also be described as follows, but are not limited to the following: 1. An information processing method characterized in that at least one computer acquires at least one image to be analyzed, accepts input of review criteria written in natural language, extracts semantic features from the at least one image to be analyzed and the review criteria using a machine learning model that handles multiple data formats, and generates a conformity evaluation of the at least one image to be analyzed to the review criteria using the semantic features of the image to be analyzed and the review criteria. 2. An information processing method according to 1, characterized in that at least one computer accepts instructions for modifying the image to be analyzed based on the conformity evaluation in natural language, and generates a modified image based on the modification instructions and the image to be analyzed using a machine learning model that handles multiple data formats. 3. An information processing method according to 2, characterized in that at least one computer modifies the modification instructions based on the modification instructions and the review criteria. 4. 1. An information processing method according to 1 or 2, characterized in that at least one computer automatically generates reasons for suitability and non-suitability based on the suitability evaluation. 5. An information processing method according to any one of 1 to 3, characterized in that at least one computer automatically performs a similar image search and acquires multiple images for comparison when it has acquired at least one image to be analyzed. 6. An information processing method according to any one of 1 to 4, characterized in that at least one computer automatically proposes a correction policy tailored to a specific target group. 7. An information processing method according to any one of 1 to 5, characterized in that the image review procedure is broken down into multiple review steps and automated by combining multiple machine learning models specialized for each review step.8. An information processing method according to any one of 1 to 6, characterized in that at least one computer takes as input the review process associated with at least one of two images, and generates review criteria based on the information shown in the review process. 9. An information processing method according to 2, wherein at least one computer presents the conformity evaluation to the user and receives input of the correction instructions from the user. 10. An information processing method according to 9, wherein at least one computer determines, before generating the corrected image, whether the image obtained by correcting the analysis target image based on the correction instructions conforms to the review criteria. 11. An information processing method according to 10, wherein if at least one computer determines that the image obtained by correcting the analysis target image based on the correction instructions conforms to the review criteria, it corrects the analysis target image based on the correction instructions and generates the corrected image. 12. 13. An information processing method according to 10 or 11, wherein if at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the review criteria, the information processing method does not generate the modified image. 14. An information processing method according to 13, wherein if at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the review criteria, the information processing method proposes a change to the modification instruction to the user. 15. An information processing method according to 14, wherein if at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the review criteria, the information processing method presents the user with a proposed change that includes the changes to the modification instruction.16. An information processing method according to 14 or 15, wherein if at least one computer receives input from the user indicating that it does not agree to the proposed changes, the method modifies the image to be analyzed based on the modification instructions and generates the modified image. 17. An information processing method according to 14 or 15, wherein if at least one computer receives input from the user indicating that it does not agree to the proposed changes, the method modifies the image to be analyzed based on the modification instructions and generates the modified image. 18. An information processing method according to 14 or 15, wherein if at least one computer receives input from the user indicating that it does not agree to the proposed changes, the method does not generate the modified image. 19. 10. A recording medium that records a program causing a computer to execute the following steps: acquiring at least one image to be analyzed; receiving input of review criteria written in natural language; extracting semantic features from the at least one image to be analyzed and the review criteria using a machine learning model that handles multiple data formats; and generating a conformity evaluation of the at least one image to be analyzed to the review criteria using the semantic features of the image to be analyzed and the review criteria. 20. An information processing device having: an image acquisition means for acquiring at least one image to be analyzed; a review criteria input means for receiving input of review criteria written in natural language; a feature extraction means for extracting semantic features from the at least one image to be analyzed and the review criteria using a machine learning model that handles multiple data formats; and an evaluation generation means for generating a conformity evaluation of the at least one image to be analyzed to the review criteria using the semantic features of the image to be analyzed and the review criteria.
[0199] Some or all of the appendices 2 to 18, which are dependent on the information processing method described in appendice 1 above, may also be dependent on the recording medium in appendice 19 and the information processing device in appendice 20 in the same dependent relationship as between appendice 1 and appendices 2 to 18. Furthermore, some or all of the configurations described as appendices can be realized in various hardware, software, various recording means for recording software, or systems, without departing from the embodiments described above.
[0200] 10 Information Processing Unit 20 Information Processing Unit 30 Information Processing Unit 40 Information Processing Unit 50 Information Processing Unit 60 Information Processing Unit 70 Information Processing Unit 110 Image Acquisition Unit 120 Review Criteria Input Unit 130 Feature Extraction Unit 140 Evaluation Generation Unit 150 Display Unit 210 Correction Instruction Reception Unit 220 Image Generation Unit 310 Reason Generation Unit 410 Similar Image Search Unit 510 Target Layer Identification Unit 520 Correction Policy Generation Unit 610 Review Step Decomposition Unit 620 Machine Learning Model 710 Review Progress Input Unit 720 Review Criteria Generation Unit 101 Processor 102 Memory 103 Communication Interface 104 Program
Claims
1. An information processing method characterized by: at least one computer acquiring at least one image to be analyzed; accepting input of review criteria described in natural language; using a machine learning model that handles multiple data formats to extract semantic features from the at least one image to be analyzed and the review criteria; and using the semantic features of the image to be analyzed and the review criteria to generate a conformity evaluation of the at least one image to be analyzed to the review criteria.
2. The information processing method according to claim 1, characterized in that at least one computer receives instructions for modifying the image to be analyzed based on the suitability evaluation in natural language, and generates a modified image based on the instructions for modification and the image to be analyzed using a machine learning model that handles the plurality of data formats.
3. The information processing method according to claim 2, characterized in that at least one computer modifies the modification instruction based on the modification instruction and the examination criteria.
4. An information processing method according to claim 1 or 2, characterized in that at least one computer automatically generates reasons for conformity and non-conformity based on the conformity evaluation.
5. An information processing method according to any one of claims 1 to 3, characterized in that, when at least one computer acquires at least one image to be analyzed, it automatically performs a similar image search and acquires a plurality of images to be compared.
6. An information processing method according to any one of claims 1 to 4, characterized in that at least one computer automatically proposes a modification policy tailored to a specific target layer.
7. An information processing method according to any one of claims 1 to 5, characterized in that the image review procedure is broken down into multiple review steps and automated by combining multiple machine learning models specialized for each review step.
8. An information processing method according to any one of claims 1 to 6, characterized in that at least one computer takes as input the examination process associated with at least one of two images, and generates examination criteria based on the information shown in the examination process.
9. An information processing method according to claim 2, wherein at least one computer presents the suitability evaluation to a user and receives input of the correction instructions from the user.
10. An information processing method according to claim 9, wherein at least one computer determines, before generating the modified image, whether the image obtained by modifying the image to be analyzed based on the modification instructions conforms to the review criteria.
11. An information processing method according to claim 10, wherein at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction conforms to the examination criteria, and modifies the image to be analyzed based on the modification instruction to generate the modified image.
12. An information processing method according to claim 10 or 11, wherein at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the examination criteria, and the modified image is not generated.
13. An information processing method according to claim 12, wherein if at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the examination criteria, the information processing method proposes a change to the modification instruction to the user.
14. An information processing method according to claim 13, wherein if at least one computer determines that the image obtained by modifying the image to be analyzed based on the modification instruction does not conform to the examination criteria, the information processing method presents the user with a proposed change including the changes to the modification instruction.
15. An information processing method according to claim 14, wherein at least one computer modifies the modification instruction based on the modification when it receives input from the user indicating agreement to the modification.
16. An information processing method according to claim 14 or 15, wherein if at least one computer receives input from the user indicating that it does not agree to the proposed changes, the information processing method modifies the image to be analyzed based on the modification instructions and generates the modified image.
17. An information processing method according to claim 14 or 15, wherein if at least one computer receives input from the user indicating that it does not agree to the proposed changes, the information processing method modifies the image to be analyzed based on the modification instructions and generates the modified image.
18. An information processing method according to claim 14 or 15, wherein at least one computer does not generate the modified image when it receives input from the user indicating that it does not agree to the proposed changes.
19. A recording medium that stores a program causing a computer to perform the following steps: acquiring at least one image to be analyzed; accepting input of review criteria written in natural language; extracting semantic features from the at least one image to be analyzed and the review criteria using a machine learning model that handles multiple data formats; and generating a suitability evaluation of the at least one image to be analyzed to the review criteria using the semantic features of the image to be analyzed and the review criteria.
20. An information processing device comprising: an image acquisition means for acquiring at least one image to be analyzed; an evaluation criteria input means for receiving evaluation criteria written in natural language; a feature extraction means for extracting semantic features from the at least one image to be analyzed and the evaluation criteria using a machine learning model that handles multiple data formats; and an evaluation generation means for generating an evaluation of the suitability of the at least one image to be analyzed to the evaluation criteria using the semantic features of the image to be analyzed and the evaluation criteria.