Classification device, classification method, and program
The classification device employs adversarial learning to generate indexes that can identify similar cases across different domains, addressing the limitations of conventional methods by using a classification model with an encoder, prototype, and output layer to facilitate effective retrieval of similar information.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- HITACHI SOLUTIONS EAST JAPAN LTD
- Filing Date
- 2022-02-24
- Publication Date
- 2026-06-22
AI Technical Summary
Conventional classification methods using prototype-based deep learning struggle to generate indexes that can effectively search for similar cases across different domains of information.
A classification device and method that utilizes adversarial learning to generate indexes by maximizing the prediction error of domain codes, allowing for the classification of information into different domains and enabling the search for similar cases using a classification model comprising an encoder layer, prototype layer, and output layer, with an index creation and similar information search unit.
Enables the search for similar cases across different domains by preventing domain-specific information from being included in the index, thereby facilitating effective retrieval of similar information through a multi-dimensional vector representation and visualization in scatter plots.
Smart Images

Figure 0007877012000001 
Figure 0007877012000002 
Figure 0007877012000003
Abstract
Description
Technical Field
[0001] The present invention relates to a classification apparatus, a classification method, and a program.
Background Art
[0002] In recent years, as a method for classifying information, a deep learning method using a neural network has been increasingly used.
[0003] Among them, the prototype-based method shown in Non-Patent Document 1 can perform not only classification but also retrieval of similar information by using the index as its output, and is attracting attention as a method that can show similar cases serving as the basis for AI judgment. In the prototype-based method, the classification model converts input data into an index by an encoder layer, calculates the distance between the index and a plurality of prototypes in a prototype layer, and performs classification in an output layer based on the distance. Here, a prototype is a typical index value in training data and is learned as a parameter of the classification model during training. With this configuration, information with similar input data contents and the same classification results is learned so that the index becomes a close value.
[0004] Also, the technique described in Patent Document 1 is an improvement of the prototype-based method so that column data such as text data and time-series data can be used as input data, and can be applied to the classification of information expressed in text.
[0005] These technologies can be used, for example, to classify the causes of malfunctions in on-site repair services. In this case, the input data could include, for example, product item names representing the general classification of products, product numbers to identify products, codes such as major and minor classifications representing the general classification of malfunctions, text data describing the details of the malfunction reported by the customer, and text data describing the results confirmed by the repair engineer on-site. By applying a prototype-based method to this input data, the causes of malfunctions can be classified, and similar cases can be searched using an index. By referring to the repair reports of those cases, appropriate action can be taken.
[0006] Searching for similar cases using indexes offers advantages not only in repair services but also in quality analysis of accumulated repair service data. Housing equipment manufacturers providing repair services have a need to cross-sectionally analyze whether similar cases exist in other domains (e.g., other product categories) for defects occurring in a particular domain (e.g., a specific product category). [Prior art documents] [Patent Documents]
[0007] [Patent Document 1] U.S. Patent Application Publication No. 2020 / 0364504 [Non-patent literature]
[0008] [Non-Patent Document 1] Oscar Li, et al., “Deep Learning for Case-Based Reasoning Through Prototypes: A Neural Network That Explains Its Predictions,” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, No. 1, 2018. [Overview of the project] [Problems that the invention aims to solve]
[0009] However, conventional technologies had the challenge of making it difficult to generate indexes that could search for similar cases belonging to different domains of information.
[0010] For example, in the technologies described in Non-Patent Document 1 and Patent Document 1, all input data, including the product code, is converted into an index. As a result, the index includes product information, and even if the nature and cause of the defect are similar, the index will not be close if the product items are different. Therefore, it is difficult to search for similar cases with different domains (product items) using indexes generated by conventional technologies.
[0011] The present invention was made to solve the above problems and aims to provide a classification device, classification method, and program that can generate an index so that similar cases belonging to different domains of information can also be searched. [Means for solving the problem]
[0012] An example of a classification device according to the present invention is: A classification device that classifies information to be classified into one of several classification results, The classification device comprises a classification model learning unit and a classification execution unit, The classification model learning unit learns a classification model by performing machine learning based on a domain code representing the domain to which the information to be classified belongs, attribute information relating to the information to be classified, and classification result information representing the correct classification result to which the information to be classified should be classified, and generates a trained model. The aforementioned classification model is, An encoder layer that accepts input relating to some or all of the aforementioned attribute information and outputs an index which is a multi-dimensional numerical vector, A prototype layer that outputs the similarity between the prototype learned by the aforementioned machine learning and the aforementioned index, An output layer that accepts the aforementioned domain code and similarity input and outputs a classification result, A domain code prediction unit predicts the domain code based on the aforementioned index, Equipped with, The classification model learning unit performs machine learning in which adversarial learning is applied to maximize the prediction error of the domain code by the domain code prediction unit. The classification execution unit classifies the information to be classified using the trained model based on the domain code and attribute information.
[0013] In one example, the classification device further includes: An index creation unit stores the relationship between the index output by the encoder layer and the classification target information corresponding to the index in an index table. A similar information search unit retrieves indexes with similar values to the search target index as similar indexes, based on the search target index, which is an index for the classification target information to be searched, and the index table, and outputs the classification target information associated with the similar index. It is equipped with.
[0014] In one example, the similar information retrieval unit is: The Euclidean distance is less than or equal to a specified threshold, or The cosine similarity must be greater than or equal to the specified threshold. The system determines whether the index values are similar based on one of the following criteria.
[0015] In one example, the index is a vector of three or more dimensions. The similar information search unit dimensionally compresses the similar index into a two-dimensional vector using either principal component analysis or t-SNE, and displays the two-dimensional vector in a scatter plot.
[0016] In one example, the similar information search unit clusters the similar index by cluster analysis. The scatter plot includes a bubble chart, and the bubble chart represents the center point of the cluster and the number of the similar indexes belonging to the cluster by the center point of the circle and the size of the circle, respectively.
[0017] The classification method according to the present invention is a classification method for classifying classification target information into any one of a plurality of classification results, and is executed by the above-described classification device.
[0018] The program according to the present invention causes a computer to function as the above-described classification device.
Advantages of the Invention
[0019] According to the classification device, classification method, and program of the present invention, an index can be generated so as to be able to search for similar cases in which the domains to which the information belongs are different.
Brief Description of the Drawings
[0020] [Figure 1] Configuration of the classification device 10 according to Embodiment 1 of the present invention. [Figure 2] Configuration example of the classification target information table 21. [Figure 3] Configuration example of the classification result table 22. [Figure 4] Configuration example of the training data table 23. [Figure 5] Configuration example of the index table 25. [Figure 6] Flowchart showing an example of the operation of the classification device 10. [Figure 7] Detailed example of step S1. [Figure 8] Configuration example of the classification model. [Figure 9] Detailed example of step S2. [Figure 10] Detailed example of step S3. [Figure 11] Detailed example of step S4. [Modes for carrying out the invention]
[0021] Hereinafter, embodiments of the present invention will be described based on the attached drawings. [Embodiment 1] Figure 1 shows the configuration of a classification device 10 according to Embodiment 1 of the present invention. The classification device 10 is a device that classifies information to be classified into one of several classification results by performing the information classification method described herein. In this embodiment, the information to be classified is a product defect report, but it can be various types of information as an example. Specifically, it can be applied to images, time-series data such as electrocardiograms, text data, etc.
[0022] The classification device 10 has a hardware configuration similar to that of a known computer, and includes, for example, a storage unit 20 and a processing unit 30. The storage unit 20 includes, for example, storage media such as semiconductor memory devices and magnetic disk devices. Some or all of the storage media may be non-transitory storage media. The processing unit 30 includes, for example, a processor.
[0023] Furthermore, the classification device 10 may include input / output means (not shown). The input / output means may include, for example, input devices such as a keyboard and mouse, output devices such as a display and printer, and communication devices such as a network interface.
[0024] The memory unit 20 stores the classification target information table 21, the classification result table 22, the training data table 23, the trained classification model 24, and the index table 25. The processing unit 30 functions as a classification model learning unit 31, a classification execution unit 32, an index creation unit 33, and a similar information retrieval unit 34.
[0025] The memory unit 20 may store a program not shown. The processor may execute this program, thereby enabling the classification device 10 to perform the functions described herein. That is, the program may cause the computer to function as the classification device 10 (more specifically, the processor to function as the classification model learning unit 31, the classification execution unit 32, the index creation unit 33, and the similar information retrieval unit 34).
[0026] Figure 2 shows an example of the structure of the classification target information table 21. The classification target information table 21 stores information related to each defect report, which is the information to be classified, by associating it with information that identifies the defect report (information ID).
[0027] Information regarding defect reports includes a domain code and attribute information. The domain code represents the domain, which is a broad classification to which the defect report belongs. In this embodiment, the domain is the item of product related to the defect report, but in modified examples, the domain can be appropriately defined by a person skilled in the art.
[0028] The attribute information is not particularly limited as long as it is related to the defect report and is different from the domain code, but in this embodiment it includes the product number, the major category of the defect, the minor category of the defect, text data representing the customer's complaint, and text data representing the results of the defect check by the worker.
[0029] Figure 3 shows an example of the configuration of the classification result table 22. For each defect report, the classification result table 22 stores the classification result of the defect report associated with the defect report's information ID. In this embodiment, the classification result is a code representing the cause of the defect, but in modified examples, those skilled in the art can define it as appropriate. This classification result is, for example, an appropriate one specified by a human, and in machine learning, it can be used as classification result information representing the correct classification result to which the information to be classified should be classified. Furthermore, as will be described later, the classification result may also be included after classification has been performed by the classification model learning unit 31.
[0030] Figure 4 shows an example of the structure of the training data table 23. For each defect, the training data table 23 stores the information ID of the defect report, the domain code of the defect report, the attribute information of the defect report, the classification result of the defect report, etc., associated with the defect report information ID. The domain code and attribute information can be the same as those in the classification target information table 21 in Figure 2, and the classification result can be the same as those in the classification result table 22 in Figure 3.
[0031] The classification model learning unit 31 of the classification device 10 learns a classification model by performing machine learning based on the training data table 23, and generates a trained classification model 24 as a trained model. The specific process in this case will be described later with reference to Figure 7, etc.
[0032] Figure 5 shows an example of the configuration of the index table 25. For each defect report, the index table 25 stores the index of the defect report associated with the information ID of that defect report. The index is a multi-dimensional (preferably multi-dimensional) numerical vector output by the encoder layer 42, which will be described later, taking some or all of the attribute information of the defect report as input, and includes elements such as index element 1, index element 2, index element 3, etc.
[0033] Figure 6 is a flowchart illustrating an example of the operation of the classification device 10. This flowchart shows the method of information classification performed by the classification device 10.
[0034] First, the classification model learning unit 31 of the classification device 10 performs training of the classification model (step S1). At this point, the classification target information table 21 and the classification result table 22 are assumed to be stored in the storage unit 20.
[0035] Figure 7 shows a detailed example of step S1. In step S1, the classification model learning unit 31 first links the classification target information table 21 and the classification result table 22 using an internal join with the information ID as the key, and stores it in the training data table 23 (step S11). The training data table 23 is generated in this way.
[0036] Next, the classification model learning unit 31 learns a classification model based on the data in the training data table 23 (step S12). This will be explained in detail below.
[0037] Figure 8 shows an example of the configuration of the classification model learning unit 31. The classification model learning unit 31 learns a prototype-based neural network using machine learning. In particular, adversarial learning is used to learn the index so that the prediction error of the domain code is maximized, and to prevent the index from containing domain-specific information. Then, the information to be classified is classified by adding (concatenating) a vector corresponding to the domain code to the similarity vector, which is the output of the prototype layer, and inputting it into the output layer.
[0038] The classification model learned by the classification model learning unit 31 comprises an embedding layer 41, an encoder layer 42, a prototype layer 43, an output layer 44, and a domain code prediction unit 45.
[0039] The embedding layer 41 accepts domain codes and attribute information included in the classification target information as input and outputs multiple numerical vectors. In this embodiment, vector v1 corresponds to the domain code (product item), vector v2 corresponds to the product number, vector v3 corresponds to the major category of the defect, vector v4 corresponds to the subcategory of the defect, and vectors v5 to v2 correspond to words 1 to n included in the customer's request. n+4 , vector v corresponding to words '1~m' included in the verification result n+5 ~v n+m+4 The following will be output.
[0040] The specific process for converting domain codes and other codes into vectors can be appropriately designed by a person skilled in the art based on prior art. Similarly, the specific process for splitting text data into words and converting each word into a vector can also be appropriately designed by a person skilled in the art based on prior art.
[0041] The encoder layer 42 generates an index based on some or all of the vectors corresponding to attribute information from the vectors generated by the embedding layer 41. In this embodiment, vectors corresponding to customer offers and confirmation results are used, but those skilled in the art can design which vectors to use as appropriate, and for example, all vectors may be used. However, vectors corresponding to domain codes are not used, and only vectors corresponding to attribute information are used. The encoder layer 42 can be configured using, for example, a recurrent neural network (RNN) including multiple layers 42a, but is not limited to this.
[0042] In this way, the encoder layer 42 accepts input regarding some or all of the attribute information and outputs an index.
[0043] The prototype layer 43 uses prototypes p1~p as parameters. k It holds (where k is 1 or greater). Each prototype is a vector with the same dimensions as the index. Based on the data in the training data table 23, the prototypes are learned as part of the parameters of the classification model by machine learning performed by the classification model learning unit 31. As a result, each prototype is learned to be a typical index value for the defect reports included in the training data table 23. The specific processing of the machine learning in this case can be appropriately designed by a person skilled in the art based on known technologies, including Patent Document 1 and Non-Patent Document 1.
[0044] Prototype layer 43 consists of prototypes p1~p k The similarity between each of these values and the index output from the encoder layer 42 is calculated and output.
[0045] The definition of similarity can be appropriately designed by those skilled in the art, but for example, the Euclidean distance between vectors or the cosine similarity between vectors can be used as the similarity measure. Using such methods, the similarity between an index and a prototype is calculated as a single scalar value, and all (k) similarities can be concatenated to generate a k-dimensional similarity vector.
[0046] The output layer 44 accepts input including a vector v1 corresponding to the domain code (product item) output from the embedding layer 41, a portion of the vector corresponding to the attribute information, and a similarity vector, and outputs the cause of the defect as a classification result. The output layer 44 can be constructed using, for example, an RNN containing multiple layers 44a, but is not limited to this. When using an RNN, vectors v1 to v4 corresponding to the domain code, etc., can be concatenated with the similarity vector and input.
[0047] The cause of a malfunction is represented by a code, for example, as shown in the classification result table 22 in Figure 3. The specific process for converting the output of the output layer 44 (for example, a vector) into a code can be designed as appropriate by a person skilled in the art, but one example is described below. First, each of the multiple classification results is assigned a number. Then, for each of the multiple classification results, the output layer 44 generates a vector whose elements include a value representing the probability that the information to be classified into that classification result. That is, if there are q possible values for the code representing the classification result, the output layer 44 generates a q-dimensional vector. Then, the classification result corresponding to the element with the largest value among the elements of this q-dimensional vector is output by the output layer 44.
[0048] Here, the input to the output layer 44 includes a vector v1 corresponding to the domain code and a similarity vector. Thus, the output layer 44 accepts input regarding the domain code and similarity and outputs the classification result.
[0049] The domain code prediction unit 45 predicts the domain code based on the index. The domain code prediction unit 45 can be configured using, for example, a neural network including multiple fully connected layers, but is not limited to this.
[0050] The classification model learning unit 31 learns a classification model by performing machine learning based on the classification results output by the output layer 44 and the domain codes output by the domain code prediction unit 45. Here, the classification model learning unit 31 updates the parameters of the classification model so that the error between the output classification results and the classification results associated with the information to be classified in the training data table 23 is minimized, and the error between the output domain codes and the domain codes associated with the information to be classified in the training data table 23 is maximized.
[0051] In other words, the classification model learning unit 31 performs machine learning by applying adversarial learning so as to maximize the prediction error of domain codes by the domain code prediction unit 45. This adversarial learning prevents domain-specific information from being included in the index, and makes it possible to generate an index that can search for similar cases to which the information to be classified belongs, even if they are in different domains.
[0052] Furthermore, the classification model learning unit 31 learns a classification model by performing machine learning based on domain codes, attribute information, and classification result information representing the correct classification result, in parallel with this adversarial learning. The classification model learning unit 31 generates a trained classification model 24 by performing these two types of machine learning in parallel.
[0053] The specific error propagation calculations during learning can be appropriately designed by those skilled in the art. For example, for learning the encoder layer 42, a GRL (Gradient Reversal Layer) may be placed before the input layer of the domain code prediction unit 45, and the learning process may be designed to minimize the error propagated back from the output layer 44 and the error propagated back from the domain code prediction unit 45 (with the gradient reversed by the GRL).
[0054] As described above, the classification model learning unit 31 generates a trained classification model 24. The trained classification model 24, once training is complete, may have the same layer structure as the classification model before training, but the domain code prediction unit 45 may be omitted.
[0055] Step S12 (Figure 7) is completed in this way. Next, the classification model learning unit 31 stores the trained classification model 24 in the memory unit 20 (step S13). This completes step S1 (Figure 6).
[0056] After step S1, the classification execution unit 32 performs classification (step S2). That is, the classification execution unit 32 classifies the information to be classified by predicting the classification result using the trained classification model 24 based on the domain code and attribute information included in the newly input information to be classified.
[0057] Figure 9 shows a detailed example of step S2. In step S2, the classification execution unit 32 first takes the classification target information specified by the user of the classification device 10 (for example, it may be newly entered classification target information or one of the records in the new classification target information table 21) as input and performs classification using the trained classification model 24 (step S21).
[0058] Next, the classification execution unit 32 stores the classification results output by the output layer of the trained classification model 24 in the classification result table 22, associating them with the information ID of the input classification target information (step S22). In this way, step S2 is completed.
[0059] As shown in Figure 6, after step S2, the index creation unit 33 generates the index table 25 (step S3).
[0060] Figure 10 shows a detailed example of step S3. In step S3, the index creation unit 33 first takes all the information to be classified (for example, all records in the new information to be classified table 21) as input and performs classification using the trained classification model 24 (step S31).
[0061] Next, the index creation unit 33 stores the relationship between the index output by the encoder layer 42 of the trained classification model 24 and the original classification target information corresponding to that index in the index table 25 (step S32). In this way, step S3 is completed.
[0062] As shown in Figure 6, after step S3, the similar information search unit 34 performs a search for similar information (step S4).
[0063] Figure 11 shows a detailed example of step S4. In step S4, the similar information retrieval unit 34 first obtains a specific index (hereinafter referred to as the "search target index") (step S41). The search target index is obtained, for example, by referring to the index table 25 using the information ID of the classification target information specified by the user as the key, but it may be specified or obtained by other methods.
[0064] Next, the similarity information retrieval unit 34 identifies indexes from the index table 25 that have similar values to the search target index, based on the search target index and the index table 25 (step S42). The index identified here will be referred to as a "similar index" below. There may be multiple similar indexes.
[0065] Next, the similar information search unit 34 uses the information ID associated with the acquired similar index as a key to refer to the classification target information table 21, retrieves the corresponding classification target information, and outputs it as a search result (step S43).
[0066] Here, the adversarial learning described above makes it possible to prevent domain-specific information from being included in the index, and to search for similar cases belonging to different domains than the information being searched.
[0067] The similarity determination process by the similarity information retrieval unit 34 can be appropriately designed by a person skilled in the art. For example, it may determine whether the index values are similar based on whether the Euclidean distance between the search target index and each index in the index table 25 is less than or equal to a specified threshold. Alternatively, the similarity information retrieval unit 34 may determine whether the index values are similar based on whether the cosine similarity between the search target index and each index in the index table 25 is greater than or equal to a specified threshold.
[0068] Although not specifically shown in the diagram, the similarity information retrieval unit 34 may display the similarity index using a scatter plot. The scatter plot is represented, for example, in two dimensions. In particular, if the index is a vector of three or more dimensions, the similarity information retrieval unit 34 may compress the similarity index to a two-dimensional vector and display the two-dimensional vector using a scatter plot. The specific processing for dimensionality reduction can be appropriately designed by those skilled in the art, but for example, the principal component analysis method or the t-SNE (t-distributed Stochastic Neighbor Embedding) method may be used.
[0069] By using such a two-dimensional scatter plot, users can more easily understand the results of similarity searches.
[0070] Furthermore, although not specifically shown in the diagrams, the similarity information retrieval unit 34 may cluster the similarity indexes using cluster analysis. The specific processing of the cluster analysis can be appropriately designed by those skilled in the art. The results of the clustering can be displayed as a scatter plot as described above. For example, the scatter plot may include a bubble chart. The bubble chart may represent the center point of a cluster and the number of similarity indexes belonging to a cluster by the center point of a circle (represented, for example, by two-dimensional coordinates) and the size of a circle (represented, for example, by radius or area), respectively.
[0071] By using such a bubble chart, users of the classification device 10 can more easily understand the results of similarity searches.
[0072] The specific structure of the classification model is not limited to that shown in Figure 8. Those skilled in the art can make appropriate modifications based on prior art, including Patent Document 1 and Non-Patent Document 1. [Explanation of symbols]
[0073] 10...Classification device 20...Storage section 21…Classification Target Information Table 22…Classification Result Table 23... Training data table 24… Pre-trained classification models 25…Index Table 30… Processing Unit 31…Classification Model Learning Unit 32…Classification Execution Unit 33…Index Creation Section 34… Similar Information Search Department 41…Embedding layer 42… Encoder layer 43…Prototype layer 44…Output layer 45... Domain Code Prediction Unit
Claims
1. A classification device that classifies information to be classified into one of several classification results, The classification device comprises a classification model learning unit and a classification execution unit, The classification model learning unit learns a classification model by performing machine learning based on a domain code representing the domain to which the information to be classified belongs, attribute information relating to the information to be classified, and classification result information representing the correct classification result to which the information to be classified should be classified, and generates a trained model. The aforementioned classification model is, An encoder layer that accepts input relating to some or all of the aforementioned attribute information and outputs an index which is a multi-dimensional numerical vector, A prototype layer that outputs the similarity between the prototype learned by the aforementioned machine learning and the aforementioned index, An output layer that accepts the aforementioned domain code and similarity input and outputs a classification result, The system includes a domain code prediction unit that predicts the domain code based on the aforementioned index, The classification model learning unit performs machine learning in which adversarial learning is applied to maximize the prediction error of the domain code by the domain code prediction unit. The classification execution unit classifies the information to be classified using the trained model based on the domain code and attribute information. The encoder layer outputs the index without using the domain code, The domain code prediction unit predicts the domain code based on the index which does not contain information about the domain code, The classification model learning unit, in the machine learning process, performs machine learning in parallel with adversarial learning, based on the domain code, the attribute information, and the correct classification result, so as to minimize the error between the classification result output by the output layer and the correct classification result. Classification device.
2. The aforementioned classification device further, An index creation unit stores the relationship between the index output by the encoder layer and the classification target information corresponding to the index in an index table; and a similar information retrieval unit obtains indexes with similar values to the search target index as similar indexes based on the search target index, which is an index for the classification target information to be searched, and the index table, and outputs the classification target information associated with the similar index. A classification device according to claim 1, comprising:
3. The aforementioned similar information retrieval unit, The Euclidean distance is less than or equal to a specified threshold, or The cosine similarity must be greater than or equal to the specified threshold. The classification device according to claim 2, which determines whether the index values are similar based on any of the following criteria.
4. The aforementioned index is a vector of three or more dimensions. The similarity information retrieval unit uses either principal component analysis or t-SNE to reduce the similarity index to a two-dimensional vector, and displays the two-dimensional vector as a scatter plot. The classification device according to claim 2 or 3.
5. The similar information retrieval unit clusters the similar index by cluster analysis, The scatter plot includes a bubble chart, in which the center point of a cluster and the number of similar indices belonging to the cluster are represented by the center point and size of a circle, respectively. The classification device according to claim 4.
6. A classification method for classifying information to be classified into one of a plurality of classification results, the classification method being performed by the classification device described in claim 1.
7. A program that causes a computer to function as the classification device described in claim 1.