Information analysis support method and information analysis support system

The method estimates the distribution of possible values for missing health data using a risk model, addressing the limitations of single-value interpolation in life insurance assessments, thereby improving the accuracy of risk analysis.

JP7883448B2Active Publication Date: 2026-07-01HITACHI LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
HITACHI LTD
Filing Date
2023-01-12
Publication Date
2026-07-01

AI Technical Summary

Technical Problem

Existing methods for interpolating missing health data values in life insurance underwriting assessments are inadequate when multiple possible values exist, as they typically substitute a single value without considering distribution.

Method used

An information analysis support method that estimates the distribution of possible values for missing data using a risk model, based on reference health information from multiple individuals, to calculate and correct health risk values.

Benefits of technology

Supports accurate risk analysis by estimating and correcting missing or abnormal values, enhancing the reliability of health risk assessments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007883448000001
    Figure 0007883448000001
  • Figure 0007883448000002
    Figure 0007883448000002
  • Figure 0007883448000003
    Figure 0007883448000003
Patent Text Reader

Abstract

To provide an information analysis support method and information analysis support system, which support risk analysis when a missing or anomalous value or the like is included in input information about health.SOLUTION: In an information analysis support system 101, a database 107 holds health information about a health condition of a person to be analyzed, a risk model for calculating a risk value related to the health, and reference health information about health conditions of a plurality of persons. The information analysis support system 101 includes: a risk value calculation section 115 that calculates one or more correction values on the basis of the reference health information as a candidate of a value for correcting a value of an item to be corrected included in the health information of the person to be analyzed, and calculates one or more risk values about health of the person to be analyzed on the basis of the risk model and the health information of the person to be analyzed corrected by the correction value; and a risk determination section 116 that determines risk related to the health of the person to be analyzed on the basis of one or more risk values.SELECTED DRAWING: Figure 1
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a technique for assisting in the analysis of information related to a health condition.

Background Art

[0002] In the underwriting assessment of life insurance, the future payment risk is assessed using the information on the health condition notified by the applicant, and a decision on whether to accept the application is made. The notification includes the results of medical examinations, the presence or absence of pre-existing diseases, medical histories such as hospitalizations and surgeries, etc., but the information is not necessarily comprehensive, and some items may be missing. Therefore, in the assessment, measures are taken for missing values, such as confirmation by a professional in charge, speculation, application of predetermined interpolation values, or risk assessment by ignoring the missing items.

[0003] As a technique for interpolating missing inspection values, for example, there is a technique described in JP 2020-52886 (Patent Document 1). Patent Document 1 states that "a series of learning data including missing values is acquired, and from the series of learning data, for each predetermined aggregation unit, a representative value of the data and an effective rate representing the ratio of the presence of valid data are calculated, and the representative value and the effective rate are input into an estimation model, and the estimation model is learned so as to minimize the error based on the difference between the output obtained and the representative value. Also, a series of estimation data including missing values is acquired, and from the series of estimation data, for each predetermined aggregation unit, a representative value of the data and an effective rate representing the ratio of the presence of valid data are calculated, and the representative value and the effective rate are input into the learned estimation model to obtain feature quantities or perform data estimation for the series of estimation data."

Prior Art Documents

Patent Documents

[0004]

Patent Document 1

Summary of the Invention

[0005] The method described in Patent Document 1 above performs missing value interpolation using a pre-prepared missing value interpolation estimation formula. However, this method substitutes only a single value, and does not describe cases where there are multiple possible values ​​for the missing value, such as when the value of a missing item is estimated to have a certain range.

[0006] Therefore, the present invention aims to solve the above problems by providing a method that enables evaluation using a risk model by estimating the distribution of possible values ​​for missing data based on notification information that has missing data. [Means for solving the problem]

[0007] To solve at least one of the above problems, a representative example of the invention disclosed herein is an information analysis support method executed by a computer system, the computer system comprising a processor and a storage device connected to the processor, the storage device holding health information relating to the health status of a subject of analysis, a risk model for calculating health risk values, and reference health information relating to the health status of multiple persons, the information analysis support method comprising: a first step in which the processor calculates one or more correction values ​​based on the reference health information as candidate values ​​for correcting the values ​​of correction target items included in the health information of the subject of analysis; a second step in which the processor calculates one or more health risk values ​​relating to the health of the subject of analysis based on the health information of the subject of analysis corrected by the one or more correction values ​​and the risk model; and a third step in which the processor determines the health risk relating to the subject of analysis based on the one or more risk values. In the first step, the processor extracts information from the reference health information that is similar to the health information of the subject of analysis, and calculates a plurality of correction values ​​based on the distribution of values ​​of the same items as the correction target items included in the extracted information. In the second step, the processor calculates a plurality of risk values ​​by inputting the plurality of correction values ​​into the risk model. In the third step, the processor outputs information indicating the frequency distribution of the occurrence of the risk values ​​and the criteria for determining the level of risk based on the risk values. It is characterized by the following: [Effects of the Invention]

[0008] According to one aspect of the present invention, when the input health information contains missing data or abnormal values, risk analysis is supported by estimating values ​​to correct them based on data from multiple individuals and using these values ​​to calculate risk values. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments. [Brief explanation of the drawing]

[0009] [Figure 1] This is a block diagram showing an example of the configuration of the information analysis support system in Embodiment 1 of the present invention. [Figure 2] This is an explanatory diagram showing an example of basic information managed by the input information management unit in Embodiment 1 of the present invention. [Figure 3] This is an explanatory diagram showing an example of health checkup information managed by the input information management unit of Embodiment 1 of the present invention. [Figure 4] This is an explanatory diagram showing an example of medical history information managed by the input information management unit of Embodiment 1 of the present invention. [Figure 5] This is an explanatory diagram showing an example of distribution information managed by the distribution information management unit of Embodiment 1 of the present invention. [Figure 6] This is an explanatory diagram showing an example of relevance information managed by the relevance information management unit of Embodiment 1 of the present invention. [Figure 7] This is an explanatory diagram showing an example of risk model parameter information managed by the risk model information management unit in Embodiment 1 of the present invention. [Figure 8] This is an explanatory diagram showing an example of frequency determination definition information managed by the risk determination definition management unit in Embodiment 1 of the present invention. [Figure 9] This is an explanatory diagram showing an example of risk determination definition information managed by the risk determination definition management unit in Embodiment 1 of the present invention. [Figure 10] This flowchart shows an example of the processing performed by the information analysis support system of Embodiment 1 of the present invention. [Figure 11] This flowchart shows an example of the interpolation process for missing values ​​performed by the information analysis support system of Embodiment 1 of the present invention. [Figure 12]It is a flowchart showing an example of the risk determination process executed by the information analysis support system according to Example 1 of the present invention. [Figure 13] It is an explanatory diagram showing an example of the user interface displayed in the process until the information analysis support system according to Example 1 of the present invention calculates the appearance frequency. [Figure 14] It is an explanatory diagram showing an example of the first user interface displayed when the information analysis support system according to Example 1 of the present invention executes the interpolation process. [Figure 15] It is an explanatory diagram showing an example of the second user interface displayed when the information analysis support system according to Example 1 of the present invention executes the interpolation process. [Figure 16] It is an explanatory diagram showing an example of the user interface for displaying the result of risk determination by the information analysis support system according to Example 1 of the present invention. [Figure 17] It is a block diagram showing an example of the configuration of the information analysis support system according to Example 2 of the present invention. [Figure 18] It is a flowchart showing an example of the process executed by the information analysis support system according to Example 2 of the present invention. [Figure 19] It is a flowchart showing an example of the interpolation process executed by the information analysis support system according to Example 2 of the present invention. [Figure 20] It is an explanatory diagram showing an example of the user interface displayed in the process until the interpolation process by the information analysis support system according to Example 2 of the present invention. [Figure 21] It is an explanatory diagram showing an example of the user interface for displaying the result of risk determination by the information analysis support system according to Example 2 of the present invention.

Mode for Carrying Out the Invention

[0010] Hereinafter, embodiments of the present invention will be described based on the drawings.

Embodiment

[0011] Figure 1 is a block diagram showing an example of the configuration of the information analysis support system 101 of Embodiment 1 of the present invention.

[0012] The information analysis support system 101 is a computer system that includes, for example, an input unit 102 such as a keyboard and mouse, an output unit 103 representing a display that outputs display data, a CPU (Central Processing Unit) 104, memory 105, a communication unit 108, and a storage medium 106.

[0013] The information analysis support system 101 includes an interpolation value distribution calculation unit 111, an occurrence frequency calculation unit 112, a relevance information extraction unit 113, a discretization processing unit 114, a risk value calculation unit 115, and a risk determination unit 116. The functions of each unit from the interpolation value distribution calculation unit 111 to the risk determination unit 116 are realized by the CPU 104 executing programs stored in the storage medium 106. When these programs are executed by the CPU 104, at least a portion of them may be copied to the memory 105 as needed.

[0014] The information analysis support system 101 is connected to a database 107. The database 107 includes an input information management unit 121, a distribution information management unit 122, a relevance information management unit 123, a risk model information management unit 124, and a risk determination definition management unit 125.

[0015] As described later, the input information management unit 121 manages basic information 200 (Figure 2), health checkup information 300 (Figure 3), and medical history information 400 (Figure 4). The distribution information management unit 122 manages distribution information 500 (Figure 5). The relevance information management unit 123 manages relevance information 600 (Figure 6). The risk model information management unit 124 manages risk model parameter information 700 (Figure 7). The risk judgment definition management unit 125 manages frequency judgment definition information 800 (Figure 8) and risk judgment definition information 900 (Figure 9).

[0016] The database 107 may be stored, for example, in a storage system connected to the information analysis support system 101 via a network, or it may be built into the information analysis support system 101 (for example, by being stored in a storage medium 106). If the database 107 is stored in a system outside the information analysis support system 101, at least a portion of its contents may be copied to the storage medium 106 or memory 105 as needed. The entire system, including a computer having an input unit 102, an output unit 103, a CPU 104, memory 105, and a storage medium 106, and the database 107, may also be called the information analysis support system.

[0017] Furthermore, the information analysis support system 101 may be implemented by a single computer having the configuration shown in Figure 1, for example, but it may also be implemented by multiple computers. For example, the information held by the database 107 described above may be distributed and stored in multiple storage media 106 or memory 105, or the functions of the information analysis support system 101 described above may be distributed and executed by multiple CPUs 104 of multiple computers.

[0018] Figure 2 is an explanatory diagram showing an example of basic information 200 managed by the input information management unit 121 of Embodiment 1 of the present invention.

[0019] Basic Information 200 is basic information about each individual. Here, we will explain using the example of an information analysis support system 101 assisting an insurance company in performing information analysis to evaluate the risk of paying insurance claims for individuals who have applied to join an insurance product. In this example, Basic Information 200 is basic information about each individual, extracted from the information submitted by each individual to the insurance company when applying to join the insurance company's insurance product.

[0020] Specifically, the basic information 200 includes a personal ID 201 to identify each person, a gender 202 to identify each person's gender, a date of birth 203 to identify each person's date of birth, and an application acceptance date 204 to identify the date on which the application from each person was accepted. The basic information 200 described above is just an example, and the basic information 200 may include various other pieces of information about each person as needed.

[0021] Although this embodiment describes an example of risk analysis by an insurance provider as described above, the present invention is not limited to this and can also be applied, for example, to health guidance provided to residents by local governments.

[0022] Figure 3 is an explanatory diagram showing an example of health checkup information 300 managed by the input information management unit 121 of Embodiment 1 of the present invention.

[0023] Health checkup information 300 is information regarding the results of health checkups (health examinations) received by each individual, and may be information submitted by each individual as part of the information disclosed when applying for insurance. Health checkup information 300 includes personal ID 301, examination date 302, BMI 303, fasting blood glucose 304, HbA1c 305, interview results 306, and findings 307, etc.

[0024] Personal ID 301 is information that identifies each individual and corresponds to Personal ID 201 in Basic Information 200. Examination Date 302 indicates the date the health examination was conducted. BMI 303, Fasting Blood Glucose 304, and HbA1c 305 are examples of test values ​​obtained as a result of the health examination. Interview Results 306 are the results of the interview conducted during the health examination and may include information such as whether or not the person has a drinking habit or exercise habits. Findings 307 are information about matters that were pointed out as a result of the health examination.

[0025] The information described above is a typical example of information obtained as a result of a health checkup. In practice, the health checkup information 300 does not necessarily have to include at least one of these items, and may include information on other items. Furthermore, the health checkup information 300 may also include results of tests and interviews from past outpatient visits or hospitalizations, etc., in addition to the health checkup.

[0026] Figure 4 is an explanatory diagram showing an example of medical history information 400 managed by the input information management unit 121 of Embodiment 1 of the present invention.

[0027] The medical history information 400 is information extracted from the declaration information that each person has submitted to the insurance provider when applying for insurance, and concerns illnesses and injuries (medical history) that each person has experienced in the past. For example, the content that each person entered in the item corresponding to medical history in the declaration information may be retained as the medical history information 400. The medical history information 400 includes the personal ID 401, the name of the illness or injury 402, the illness or injury code 403, the hospitalization period 404, the surgical information 405, and the medication 406.

[0028] Personal ID 401 is information that identifies each individual and corresponds to Personal ID 201 in Basic Information 200. Illness Name 402 and Illness Code 403 are information that identifies each individual's illness or injury. Hospitalization Period 404 indicates whether each individual was hospitalized and, if so, the duration of hospitalization. Surgical Information 405 indicates whether each individual underwent surgery and, if so, identifies the surgery. Medication 406 indicates whether each individual took medication and, if so, identifies the medication. Furthermore, for individuals with no pre-existing medical history (in the examples in Figures 2 to 4, the individual with Personal ID P004), there is no record in the Pre-existing Medical History Information 400.

[0029] In the examples shown in Figures 2 to 4, all items contain valid values. However, in reality, there may be items for which values ​​cannot be obtained from the notification information due to reasons such as the measurement not being performed during the health checkup or omissions in the notification information. Furthermore, invalid values ​​may be recorded. If a value for an input item in the risk assessment model cannot be obtained from the notification information, that item will be considered a missing value.

[0030] Figure 5 is an explanatory diagram showing an example of distribution information 500 managed by the distribution information management unit 122 of Embodiment 1 of the present invention.

[0031] Distribution information 500 is information referenced to interpolate missing values ​​in the notification information, and includes information on the frequency distribution of numerical values ​​related to health status in a group of people. Distribution information 500 may be generated based on information on the health status of a group of people (hereinafter also referred to as reference information). This group should preferably include a sufficiently large number of people and may not include the person who will be the subject of the risk assessment. The reference information may be, for example, information extracted from health checkup information and claims information provided from external sources, or it may be information on past policyholders held by the insurance company.

[0032] The distribution information 500 shown in Figure 5 includes distribution ID 501, frequency 502, BMI 503, fasting blood glucose 504, and HbA1c 505, etc. Although omitted in Figure 5, the distribution information 500 may also include numerous other items such as systolic blood pressure, diastolic blood pressure, and triglycerides.

[0033] Distribution information 500 may include multiple distributions, each with a different discretization pattern (specifically, for example, the step size of the discretization). Distribution ID 501 is information that identifies each distribution. Frequency 502 indicates the frequency of occurrence of a numerical value, and may be, for example, the number of people corresponding to that value. BMI 503, fasting blood glucose 504, and HbA1c 505 indicate the categories of BMI, fasting blood glucose, and HbA1c values ​​extracted from health checkup information, etc.

[0034] For example, the first row of distribution information 500 in Figure 5 shows that the frequency of occurrence (e.g., the number of people corresponding to these values) of the combination of values ​​where BMI is between 18 and 18.9, fasting blood glucose is between 80 and 89, and HbA1c is between 5.0 and 5.1 is 12. The second row shows that the frequency of occurrence of the combination of values ​​where BMI is between 18 and 18.9, fasting blood glucose is between 80 and 89, and HbA1c is between 5.2 and 5.3 is 22. These frequency distributions are identified by distribution ID "D001".

[0035] On the other hand, row 6 of distribution information 500 in Figure 5 shows that the frequency of occurrence for the combination of values ​​where BMI is between 18 and 18.9, fasting blood glucose is between 80 and 100, and HbA1c is between 5.0 and 5.4 is 30. Row 7 shows that the frequency of occurrence for the combination of values ​​where BMI is between 20 and 21.9, fasting blood glucose is between 80 and 100, and HbA1c is between 5.0 and 5.4 is 44. These frequency distributions are identified by distribution ID "D002".

[0036] In the example above, the distribution with distribution ID "D001" (hereinafter also referred to simply as distribution D001; the same applies to other distributions) shows the frequency of occurrence of combinations of values ​​in each category when BMI values ​​are divided into increments of "1", fasting blood glucose into increments of "10", HbA1c into increments of "0.1", and the values ​​of other items (omitted in Figure 5) are also divided into specified increments. On the other hand, distribution D002 shows the frequency of occurrence of combinations of values ​​in each category when BMI values ​​are divided into increments of "2", fasting blood glucose into increments of "20", and HbA1c into increments of "0.5". These may be extracted by applying the respective increments to the same health checkup information, etc., from the same population.

[0037] Figure 6 is an explanatory diagram showing an example of the relevance information 600 managed by the relevance information management unit 123 of Embodiment 1 of the present invention.

[0038] The correlation information 600 includes information indicating the degree of correlation between the values ​​of each item in the health checkup information, etc., which was the source of the frequency distribution extraction of the distribution information 500. For example, the correlation coefficient of the values ​​of each item may be used as the degree of correlation. In the example in Figure 6, there is a strong correlation between systolic blood pressure and diastolic blood pressure, but there is almost no correlation between systolic blood pressure and fasting blood glucose. For example, the correlation information 600 may be generated by calculating the correlation between the values ​​of items in the health checkup information, etc., which was the source of the frequency distribution extraction of the distribution information 500.

[0039] In addition to the simple correlation coefficient described above, nonlinear correlation or correlation using a model may also be used as a measure of relevance. A model-based method, for example, involves constructing a predictive model with systolic blood pressure as the dependent variable and other items as independent variables, and then evaluating the contribution of the independent variables.

[0040] Figure 7 is an explanatory diagram showing an example of risk model parameter information 700 managed by the risk model information management unit 124 of Embodiment 1 of the present invention.

[0041] The risk model parameter information 700 includes a risk assessment model ID 701 that identifies the risk assessment model, a model type 702 that indicates the type of model, and model parameters 703 that indicate the structure and parameters of the model.

[0042] In the example shown in Figure 7, parameters for a simple regression model with age as the explanatory variable and risk value as the dependent variable (Risk Assessment Model ID: 1), parameters for a multiple regression model with age, sex, and blood pressure as explanatory variables and risk value as the dependent variable (Risk Assessment Model ID: 2), and parameters for a logistic regression model with age, sex, and blood pressure as explanatory variables and risk value as the dependent variable (Risk Assessment Model ID: 3) are registered. These models are, for example, models maintained by healthcare providers for risk assessment, and their generation method is not limited. This information allows for trial assessments using multiple models and selection of the optimal model. Furthermore, an appropriate model can be selected according to the product characteristics.

[0043] Figure 8 is an explanatory diagram showing an example of frequency determination definition information 800 managed by the risk determination definition management unit 125 of Embodiment 1 of the present invention.

[0044] The frequency determination definition information 800 includes information that defines criteria for determining the frequency of occurrence (in other words, the rarity of the data) identified based on the distribution information 500. Specifically, the frequency determination definition information 800 includes a definition ID 801, a frequency determination target item 802, a frequency determination criterion 803, and a frequency determination result 804. The definition ID 801 is information that identifies each definition. The frequency determination target item 802 is information that identifies the value item to be used for determining the frequency of occurrence. The frequency determination criterion 803 indicates the criteria for determining the frequency of occurrence. The frequency determination result 804 indicates the determination result of the frequency of occurrence based on each criterion.

[0045] In this example, the set from the first to the third row constitutes one definition (Definition RA001). Similarly, the set from the fourth to the sixth row constitutes another definition (Definition RA002), and the set from the seventh to the ninth row constitutes yet another definition (Definition RA003). The frequency determination item 802 for Definition RA001 is "all item combinations," the frequency determination item 802 for Definition RA002 is "all items," and the frequency determination item 802 for Definition RA003 is "BMI, fasting blood glucose, HbA1c." The correspondence between the respective frequency determination criteria 803 and frequency determination results 804 is the same as in Definition RA001.

[0046] For example, definition RA003 indicates that, based on reference information, the number of individuals whose BMI, fasting blood glucose, and HbA1c values ​​are all smaller than those of the individuals being risk-analyzed is tallied. If the proportion of these individuals to the total number of people (i.e., the cumulative relative frequency) is less than 5% or greater than 95%, the values ​​of the individuals being risk-analyzed are determined to be rare. If the proportion is 5% or greater and less than 10%, or 90% or greater and less than 95%, the values ​​of the individuals being risk-analyzed are determined to be somewhat rare. If the proportion is 10% or greater and less than 90%, the values ​​of the individuals being risk-analyzed are determined to be standard.

[0047] Similarly, definition RA001 indicates that the same assessment as definition RA003 is performed on all items, not just BMI, fasting blood glucose, and HbA1c. On the other hand, definition RA001 indicates that if the value of any one of the items is determined to be somewhat rare, the values ​​of the person being analyzed are determined to be somewhat rare, and if the value of any one of the items is determined to be rare, the values ​​of the person being analyzed are determined to be rare.

[0048] Risk assessments for individuals with unusual values, such as those classified as rare or somewhat rare in frequency assessments, may not fit existing models or statistical data. Therefore, as described later, alerts are issued based solely on rarity (see Figure 16).

[0049] Figure 9 is an explanatory diagram showing an example of risk determination definition information 900 managed by the risk determination definition management unit 125 of Embodiment 1 of the present invention.

[0050] The risk assessment definition information 900 includes information that defines criteria for determining the level of risk based on the distribution of risk values ​​calculated using a risk assessment model. Specifically, the risk assessment definition information 900 includes a definition ID 901, risk distribution criteria 902, and risk assessment result 903. The definition ID 901 is information that identifies each definition. The risk distribution criteria 902 indicates the criteria for determining the level of risk based on the distribution of risk values. The risk assessment result 903 indicates the result of determining the level of risk based on each criterion.

[0051] For example, the first line of the risk determination definition information 900 shown in Figure 9 indicates that a high risk is determined if the maximum value of the calculated risk value distribution exceeds 2.0. The second line indicates that a high risk is determined if the mean value of the calculated risk value distribution exceeds 1.5. The third line indicates that a high risk is determined if the standard deviation of the calculated risk value exceeds 3.0. The fourth line indicates that a high risk is determined if the third quartile of the calculated risk value exceeds 2.0.

[0052] Because the distribution of calculated risk values ​​may differ depending on the dependent variable being assessed and the nature of the model applied, multiple judgment criteria are provided in advance so that appropriate criteria can be set for each case.

[0053] Figure 10 is a flowchart showing an example of the process performed by the information analysis support system 101 of Embodiment 1 of the present invention.

[0054] Once processing begins (step 1001), the information analysis support system 101 reads the input information (step 1002). Here, the input information is, for example, notification information. For example, the information analysis support system 101 may read information about the person who is the target of risk analysis from among the basic information 200, health checkup information 300, and medical history information 400.

[0055] Next, the information analysis support system 101 performs processing settings (step 1003). For example, the information analysis support system 101 may set the risk assessment model to be used for risk analysis, the discretization step size when performing the discretization process described later (i.e., the step size of the divisions shown in Figure 5), and the definition for determining the rarity of the data.

[0056] Next, the information analysis support system 101 reads the reference information (step 1004). For example, the information analysis support system 101 may read the entirety of the basic information 200, health checkup information 300, and medical history information 400. This information includes at least information about several people other than the person being analyzed for risk, and may further include information about the person being analyzed for risk.

[0057] Next, the information analysis support system 101 performs discretization processing on the read reference information. Specifically, the discretization processing unit 114 of the information analysis support system 101 discretizes the reference information read in step 1004 using the step size set in step 1003. As a result, the values ​​contained in the reference information are divided into sections with the set step size (i.e., discretized).

[0058] Next, the information analysis support system 101 calculates the frequency of occurrence of the discretized reference information values ​​(step 1006). Specifically, the frequency calculation unit 112 of the information analysis support system 101 counts the number of values ​​corresponding to each category included in the reference information and calculates the frequency of occurrence of the values ​​for each category based on that count. The distribution of the calculated frequency of occurrence is stored as distribution information 500.

[0059] The information analysis support system 101 may pre-set various step sizes for each item and calculate the frequency distribution of values ​​for each item in the reference information, and store the results as distribution information 500. Figure 5 shows such an example. In that case, in step 1006, the distribution corresponding to the specified step size may be read from the distribution information 500. Alternatively, in the processing settings of step 1003, a distribution to be used for interpolation processing may be specified from the distributions included in the distribution information 500, and in step 1006, that specified distribution may be read.

[0060] Furthermore, in step 1006, the information analysis support system 101 may calculate the missing value rate for each item of the input information, and the cumulative relative frequency in the reference information. These will be described later with reference to Figure 13.

[0061] Next, the information analysis support system 101 performs interpolation processing (step 1007). Details of the interpolation processing will be described later (see Figure 11).

[0062] Next, the information analysis support system 101 performs a risk assessment (step 1008). Details of the risk assessment will be described later (see Figure 12).

[0063] This concludes the processing of the information analysis support system 101.

[0064] Figure 11 is a flowchart showing an example of the interpolation process for missing values ​​performed by the information analysis support system 101 of Embodiment 1 of the present invention.

[0065] The process shown in Figure 11 is performed in step 1007 of Figure 10. Once the process begins (step 1101), the information analysis support system 101 extracts missing items (step 1102). Missing items are items in the data input to the risk assessment model used that do not contain corresponding values ​​in the input information; in other words, items for which values ​​could not be obtained from the notification information of the person being analyzed for risk.

[0066] Next, the information analysis support system 101 determines in step S1102 whether missing items have been extracted (step 1103). If missing items have been extracted (step 1103: Yes), the information analysis support system 101 performs correlation analysis of reference information (step 1104) and extracts items with a high correlation to the missing items (step 1105). For example, the information analysis support system 101 may refer to the relevance information 600 and obtain a predetermined number of items with a high degree of relevance to the missing items.

[0067] Next, the information analysis support system 101 extracts values ​​for multiple items extracted in step 1105 from the input information, and extracts data from the reference information for individuals whose values ​​are in the same category as those extracted (step 1106). The data extracted here will also be referred to as the similar group. Details of the extraction of the similar group will be described later.

[0068] Next, the information analysis support system 101 determines whether the number of data points for individuals extracted in step 1106 (i.e., the number of relevant individuals) is greater than or equal to a predetermined number (step 1107). If the number of data points for individuals extracted is less than the predetermined number (step 1107: No), the information analysis support system 101 removes at least one item from the items referenced during extraction (step 1108), and then repeats step 1106 for the values ​​of the remaining items.

[0069] If the number of data points for individuals extracted in step 1106 is greater than or equal to a predetermined value (step 1107: Yes), the information analysis support system 101 calculates interpolated data (step 1109) and stores the results (step 1110). At this time, the information analysis support system 101 calculates multiple interpolated data based on the distribution of values ​​in the reference information, as will be described later.

[0070] This completes the interpolation process for missing values ​​(step 1111). If it is determined in step S1102 that no missing items were extracted (step 1103: No), the information analysis support system 101 will terminate the interpolation process for missing values ​​without executing steps 1104 to 1110.

[0071] Figure 12 is a flowchart showing an example of the risk determination process performed by the information analysis support system 101 of Embodiment 1 of the present invention.

[0072] The process shown in Figure 12 is performed in step 1008 of Figure 10. When the process starts (step 1201), the information analysis support system 101 reads the risk assessment model (step 1202). For example, the parameters of the specified risk assessment model are read from the risk model parameter information 700. Next, the information analysis support system 101 reads the data that has undergone the interpolation process in step 1007 of Figure 10 (step 1203).

[0073] Next, the information analysis support system 101 calculates risk values ​​by applying the risk assessment model read in step 1202 to the data read in step 1203 (step 1204). If the values ​​of one or more missing items are interpolated by the interpolation process, multiple values ​​are obtained as the values ​​to interpolate each missing item, and the distribution of risk values ​​is calculated based on these values.

[0074] Next, the information analysis support system 101 performs a risk assessment based on the risk value calculated in step 1204 and outputs the result (step 1205).

[0075] This completes the risk assessment process (Step 1206).

[0076] Next, an example of the user interface provided by the information analysis support system 101 when executing the processes shown in Figures 10 to 12 will be explained with reference to Figures 13 to 16.

[0077] Figure 13 is an explanatory diagram showing an example of a user interface displayed during the process of calculating the frequency of occurrence in the information analysis support system 101 of Embodiment 1 of the present invention.

[0078] The risk analysis execution screen 1300 shown in Figure 13 is an example of the display data output by the information analysis support system 101, and includes, for example, an input information display unit 1301 and a statistical calculation result display unit 1302. The information displayed on the risk analysis execution screen 1300 and the operation of the screen will be explained below with reference to Figures 13 and 10.

[0079] The input information display unit 1301 includes multiple input data fields 1303, a setting input field 1304, and a processing execution button 1305. The input data fields 1303 display fields for data identifying the person subject to risk analysis (person ID) and fields for input data of the risk assessment model. For example, when a user enters the identification information of the person subject to risk analysis into the corresponding data field, the values ​​of the input data of the risk assessment model obtained from the person's notification information (for example, the information read in step 1002) are displayed in the corresponding fields. For example, data previously obtained from the person's notification information and held in the information analysis support system 101 may be displayed in each field, or the user may directly enter data obtained from the notification information into each field.

[0080] The setting input field 1304 is used to input information indicating the settings to be applied to the risk analysis (step 1003). These settings may include, for example, information specifying the risk assessment model to be used for the risk analysis, information specifying the distribution of reference information to be used for interpolation processing, and information specifying the definition when determining the rarity of the input data. An input data field 1303 corresponding to each item of the input information of the specified risk assessment model is displayed, and the values ​​of each item obtained from the disclosure information of the person subject to risk analysis are displayed in each input data field 1303. If there are any items for which a value could not be obtained from the disclosure information, these become missing items. In the example in Figure 13, HbA1c and HDL are missing items.

[0081] When the user operates the execution button 1305, the discretization process of reference information (step 1005) and frequency calculation (step 1006) are performed for the input information items other than missing items, and the results are displayed in the statistical calculation result display unit 1302.

[0082] In Figure 13, the statistical calculation result display unit 1302 displays Table 1306 and Table 1307 as the statistical calculation results.

[0083] Table 1306 includes data items 1306A, input values ​​1306B, missing data rate 1306C, and cumulative relative frequency 1306D. Data item 1306A displays all data items except the subject ID from the data items displayed in input data field 1303. Input values ​​1306B display the input values ​​for each data item, i.e., the values ​​of each data item displayed in input data field 1303. Input values ​​1306B for missing items may be left blank. Missing data rate 1306C and cumulative relative frequency 1306D display the missing data rate and cumulative relative frequency in the reference information for the value of each data item, respectively.

[0084] Here, we will describe some examples of the data shown in Table 1306 in Figure 13.

[0085] The input value 1306B, missing value rate 1306C, and cumulative relative frequency 1306D corresponding to the value "age" in data item 1306A are "41", "0.0", and "40.5", respectively. Of these, "41" indicates that the age obtained from the disclosure information of the person subject to risk analysis is 41 years old. "0.0" indicates that the missing value rate for items indicating age in the reference information is 0%, meaning that the age value is included in the data of all persons included in the reference information. "41" indicates that the cumulative relative frequency of the age "41 years old" in the reference information is 40.5%, meaning that the proportion of persons aged 41 or younger among the persons included in the reference information is 40.5%.

[0086] The input value 1306B, missing value 1306C, and cumulative relative frequency 1306D corresponding to the value "Gender" in data item 1306A are "Male," "0.0," and "64.3," respectively. Of these, "Male" indicates that the gender obtained from the notification information of the person subject to risk analysis is male. "0.0" indicates that the missing value rate for items indicating gender in the reference information is 0%. "64.3" indicates that the cumulative relative frequency of the gender "Male" in the reference information is 64.3%, that is, the proportion of men among the people included in the reference information is 64.3%.

[0087] The input values ​​1306B, the missing value rate 1306C, and the cumulative relative frequency 1306D corresponding to the value "systolic blood pressure" in data item 1306A are "140," "1.5," and "78.1," respectively. Of these, "140" indicates that the systolic blood pressure value obtained from the disclosure information of the person being analyzed for risk is 140. "1.5" indicates that the missing value rate for systolic blood pressure in the reference information is 1.5% (i.e., the data for systolic blood pressure is not included in the data of 1.5% of all people included in the reference information). "78.1" indicates that the cumulative relative frequency of systolic blood pressure of "140" in the reference information is 78.1%, meaning that 78.1% of the people included in the reference information have a systolic blood pressure of 140 or less.

[0088] The input value 1306B, missing value 1306C, and cumulative relative frequency 1306D corresponding to the value "HbA1c" in data item 1306A are "-", "45.4", and "-", respectively. The first and second "-" values ​​indicate that the HbA1c value for the person subject to risk analysis could not be obtained from the notification information, and therefore the cumulative relative frequency could not be calculated. "45.4" indicates that the missing value rate for HbA1c in the reference information is 45.4%.

[0089] Table 1307 contains information showing the results of determining how rare (i.e., rarity) the input data of the person subject to risk analysis is compared to reference information, based on the information in Table 1306. Specifically, Table 1307 includes frequency determination items 1307A, cumulative relative frequencies 1307B, and rarity determination 1307C. Frequency determination items 1307A are information that specifies the data items used to determine rarity. This may correspond to the definition of frequency determination definition information 800 shown in Figure 8. Cumulative relative frequencies 1307B show the cumulative relative frequencies of the input data of the person subject to risk analysis, calculated based on the data items specified by frequency determination items 1307A. Rarity determination 1307C shows the rarity determined based on the calculation result of cumulative relative frequencies 1307B.

[0090] Figure 13 shows an example of determining the rarity of the values ​​of each data item shown in Table 1306 based on the definition with definition ID "RA001" from the frequency determination definition information 800 shown in Figure 8. In this example, the frequency determination target item 1307A is "all item combinations", the cumulative relative frequency 1307B is "8.2%", and the rarity determination 1307C is "somewhat rare".

[0091] Figure 14 is an explanatory diagram showing an example of a first user interface displayed when the information analysis support system 101 of Embodiment 1 of the present invention performs interpolation processing.

[0092] The risk analysis execution screen 1400 shown in Figure 14 is an example of the display data output by the information analysis support system 101, and includes, for example, a setting information display unit 1401 and an interpolation processing result display unit 1402. The information displayed on the risk analysis execution screen 1400 and the operation of the screen will be explained below with reference to Figures 14, 10, and 11.

[0093] The setting information display unit 1401 includes a target person ID field 1403, an interpolation target item setting field 1404, an interpolation processing setting field 1405, and an interpolation processing execution button 1406. The target person ID field 1403 displays information identifying the person being analyzed for risk. This corresponds to the target person ID field in the input data field 1303.

[0094] The Interpolation Target Item Setting field 1404 displays the data items to be interpolated, i.e., missing items. For example, if HbA1c and HDL are missing items as shown in Figure 13, they will be displayed as interpolation target items in the Interpolation Target Item Setting field 1404. Alternatively, the user may read the missing items from the information shown in Figure 13 and input them as interpolation target items in the Interpolation Target Item Setting field 1404.

[0095] The interpolation processing settings section 1405 displays the settings that will be applied to the interpolation process to be executed. For example, the user may use the interpolation processing settings section 1405 to set the similarity group extraction method, minimum number of items, interpolation index, etc.

[0096] The similarity group extraction method is a setting for extracting information from reference information to fill in missing items. For example, information similar to the information of the person being analyzed for risk is extracted from the reference information to fill in missing items. The group of information extracted in this way is also called a similarity group. A method for extracting this similarity group can be set. For example, if a method is set to extract information in order of relevance based on correlation coefficients, then data with one or more items that have a high relevance to the item to be filled in will be extracted from the reference information. This corresponds to the processing in steps 1104 to 1106 in Figure 11.

[0097] The minimum number of items is set to a lower limit on the number of data points (in this embodiment, the number of people) included in the similar group. If the number of items in the similar group is small, there is a concern that proper interpolation may not be possible due to bias in the data used for interpolation. For this reason, a lower limit (e.g., 100 items) can be set for the number of items included in the similar group in order to ensure accurate interpolation. This is referenced in the determination in step 1107 of Figure 11.

[0098] The interpolation index is set to be an index used to interpolate missing items based on data included in similar groups. For example, it can be set to interpolate missing items using at least one of the following values ​​from the data included in similar groups: mean, median, quartile, maximum, minimum, etc.

[0099] When a user enters their desired settings in the interpolation processing settings field 1405 and operates the interpolation processing execution button 1406, interpolation processing is performed on the items to be interpolated, as set in the interpolation target item settings field 1404, according to the settings entered in the interpolation processing settings field 1405 (step 1007), and the result is displayed in the interpolation processing result display unit 1402.

[0100] In the example in Figure 14, the interpolation results for HbA1c and HDL, which are the items to be interpolated, are displayed as Processing Result 1 and Processing Result 2, respectively. Each processing result displays the items to be processed, the items extracted from the similar group, the number of extracted similar groups, and the values ​​of each interpolation candidate.

[0101] In the example in Figure 14, HbA1c, the first interpolation target item, is displayed as the processing target item for processing result 1. Fasting blood glucose, age, and BMI are displayed as similar group extraction items. This indicates that in step 1105 of Figure 11, fasting blood glucose, age, and BMI, which have a high correlation with HbA1c, were selected as data items to be referenced for similar group extraction based on the correlation information 600. Also, 2022 is displayed as the number of extracted similar groups. This indicates that 2022 individuals similar to the person being analyzed for risk were extracted from the reference information in terms of fasting blood glucose, age, and BMI, and the data of these individuals was extracted as a similar group.

[0102] Let's explain how similar groups are extracted in the example above. As described above, if the similar group extraction items corresponding to the HbA1c item being processed are fasting blood glucose, age, and BMI, then, based on the reference information, information on individuals whose fasting blood glucose is similar to the input data value "101", whose age is similar to the input data value "41", and whose BMI is similar to the input data value "20.2" will be extracted as a similar group. Here, whether the values ​​of each item are similar or not may be determined based on whether the values ​​belong to the same category based on the distribution information used.

[0103] As described above, by extracting data from individuals whose data has a high degree of relevance (e.g., correlation) with the values ​​of the items to be interpolated, and then performing interpolation based on this similarity group, highly accurate interpolation can be achieved.

[0104] If the number of extracted individuals is less than the minimum number set in the interpolation processing setting field 1405 (100 in the example in Figure 14), the information analysis support system 101 will remove one of the similar group extraction items (for example, BMI, which has the lowest correlation among fasting blood glucose, age, and BMI) from the similar group extraction items and extract the similar group again (steps 1107: No, 1108, and 1106). Alternatively, instead of reducing the number of similar group extraction items (or in addition to reducing them), a distribution with a larger discretization step size may be applied by referring to the distribution information 500. This ensures that a sufficient number of data is available for interpolation and reduces the impact of bias due to a small amount of data.

[0105] Furthermore, the values ​​for each interpolation candidate in Processing Result 1 are displayed, specifically the values ​​for Candidate 1 to Candidate 5. In the example in Figure 14, Candidates 1 to 5 are displayed as the minimum, first quartile, median, third quartile, and maximum HbA1c values ​​of the similar group, respectively. These are the values ​​of the indices set as interpolation indices in the interpolation processing settings section 1405.

[0106] In the example in Figure 14, HDL, the second interpolation target item, is displayed as the processing target item for Processing Result 2. Triglycerides, LDL, and gender, which have a high correlation with HDL, are displayed as similar group extraction items. In this case as well, similar groups are extracted in the same way as HbA1c described above. As a result, in the example in Figure 14, 1051 items are displayed as the number of extracted similar groups. The minimum value, first quartile, median, third quartile, and maximum value of HDL for the similar groups are displayed as Candidate 1 to Candidate 5 values ​​in Processing Result 2.

[0107] As described above, in the example in Figure 14, five interpolation candidates are calculated for each of the two items to be interpolated. Therefore, there are 5 × 5 = 25 possible combinations of interpolation candidate values. These 25 combinations may be displayed as shown in Figure 15.

[0108] The interpolation method is not limited to the above; for example, interpolation using a model generated by machine learning may be used. In that case, depending on the algorithm used, processes such as correlation analysis (step 1104) and extraction of identically categorized data (step 1106) may be incorporated into the model. Even when using such a model, multiple possible values ​​for the item to be interpolated are calculated as interpolation candidates based on the distribution of values ​​extracted from the reference information. In any method adopted, the item to be interpolated is interpolated not as a single value, but as a distribution of possible values.

[0109] The same effect can be achieved by interpolating as a distribution of possible values ​​rather than a single value.

[0110] Figure 15 is an explanatory diagram showing an example of a second user interface displayed when the information analysis support system 101 of Embodiment 1 of the present invention performs interpolation processing.

[0111] The risk analysis execution screen 1500 shown in Figure 15 is an example of display data output by the information analysis support system 101, and includes, for example, an input information display unit 1501 and an interpolation result display unit 1502. The input information display unit 1501 includes multiple input data fields 1503 and an interpolation processing execution button 1504. The input data fields 1503 are the same as the input data fields 1303 of the input information display unit 1301 shown in Figure 13. When the user operates the interpolation processing execution button 1504, the interpolation processing is executed, and the result is displayed in the interpolation result display unit 1502.

[0112] The interpolation result display unit 1502 displays the results of interpolating missing items using interpolation candidates. As shown in Figure 13, two items, HbA1c and HDL, are missing, and five interpolation candidates are calculated for each, as shown in Figure 14. In this case, 25 interpolation results corresponding to the 25 combinations of interpolation candidates are displayed.

[0113] Specifically, in the interpolation results for interpolation data IDs "1" to "5" in the interpolation result display unit 1502, the HDL value is "20" for candidate 1, the HbA1c values ​​are "4.0" to "7.6" for candidates 1 to 5 respectively, and the values ​​of the other items are the same as those displayed in the input data field 1503. The interpolation results for interpolation data IDs "6" to "10" are the same as the interpolation results for interpolation data IDs "1" to "5", except that the HDL value is "31" for candidate 2. The interpolation results for interpolation data IDs "11" to "15" are the same as the interpolation results for interpolation data IDs "1" to "5", except that the HDL value is "55" for candidate 3. The interpolation results for interpolation data IDs "16" to "20" are the same as the interpolation results for interpolation data IDs "1" to "5", except that the HDL value is "70" for candidate 4. The interpolation results for interpolation data IDs "21" through "25" are the same as the interpolation results for interpolation data IDs "1" through "5", except that the HDL value is "100" for candidate 5.

[0114] Figure 16 is an explanatory diagram showing an example of a user interface for displaying the risk assessment results of the information analysis support system 101 of Embodiment 1 of the present invention.

[0115] The risk analysis execution screen 1600 shown in Figure 16 is an example of display data output by the information analysis support system 101, and includes, for example, an input information display unit 1601 and a result display unit 1602.

[0116] The input information display unit 1601 includes a subject ID field 1603, a risk assessment definition setting field 1604, and a risk assessment execution button 1605. The subject ID field 1603 displays information identifying the person being analyzed for risk. This corresponds to the subject ID field in the input data field 1303. The risk assessment definition setting field 1604 sets criteria for determining the level of risk based on the distribution of risk values ​​calculated using the risk assessment model. For example, the user may select one of the multiple definitions shown in Figure 9 and set it in the risk assessment definition setting field 1604. When the user operates the risk assessment execution button 1605, a risk assessment is performed based on the data of the person being analyzed for risk, after missing items have been interpolated, and the result is displayed in the result display unit 1602.

[0117] The results display unit 1602 includes a risk judgment result display unit 1606, an overall judgment result display unit 1607, and a risk distribution display unit 1608.

[0118] The risk distribution display unit 1608 shows the distribution of risk values ​​calculated by inputting the interpolated data into the risk assessment model. As shown in Figure 15, if 25 different data sets are generated by interpolation, 25 different risk values ​​can be obtained by inputting these data into the risk assessment model. The frequency distribution of these risk values ​​is displayed in the risk distribution display unit 1608. In the example in Figure 16, some of the 25 risk values ​​are judged as high risk based on the risk judgment criterion value. This risk judgment criterion value is a standard based on the definition set in the risk judgment definition setting field 1604.

[0119] The risk assessment result display unit 1606 displays the results of the risk assessment based on the distribution of risk values, etc. In the example in Figure 16, the assessment results for two assessment items are displayed: whether it is a rare case and whether there is a possibility of high risk. For example, as shown in Table 1307 in Figure 13, if the input information of the person subject to risk analysis is determined to be somewhat rare, it may be determined to be a rare case. Also, as shown in the risk distribution display unit 1608, if the calculated risk value includes a value that is determined to be high risk, it may be determined to be a possibility of high risk.

[0120] The overall judgment result display unit 1607 displays the overall judgment result based on the risk judgment result shown in the risk judgment result display unit 1606. In the example in Figure 16, as described above, the judgment result falls under the category of a rare case and has the potential to be high-risk, so it is judged as "requires confirmation," which requires manual verification when determining whether the person subject to the risk judgment is eligible to join. Note that this judgment criterion is just one example, and various judgment criteria can actually be set.

[0121] The functions of the information analysis support system 101 in Embodiment 1 described above may also be provided via an API (Application Programming Interface). For example, when the information analysis support system 101 receives notification information via a network (not shown) connected to the communication unit 108, it may store it in the database 107, execute the processes shown in Figures 10 to 12, and output the data necessary to display the information shown in Figures 13 to 16 via the network. The same applies to the functions of Embodiment 2 described later.

[0122] According to the above Example 1, when the values ​​of the notification information items are unavailable, the values ​​of the missing items can be interpolated based on information obtained from a large number of people. In this case, the values ​​of the missing items can be interpolated with high accuracy by referring to information with similar values ​​to items that are highly related to the missing items. By inputting the interpolated values ​​of the missing items in this way into the risk assessment model, the risk can be estimated with higher accuracy compared to when the missing items are not interpolated. Furthermore, by calculating multiple possible values ​​for the missing items and performing a risk assessment based on these values, multiple risk values ​​can be estimated, and the likelihood and magnitude of the risk value exceeding a predetermined standard can be evaluated.

[0123] In Example 1, the process of filling in the values ​​of missing items is described as "interpolation." This is an example of filling in values ​​when an item has no value, or when a value exists but is unreliable for some reason, and this can also be described as "correction." The process of correcting an item when a value exists but is abnormal is described in Example 2. [Examples]

[0124] Next, Embodiment 2 of the present invention will be described. Except for the differences described below, each part of the system in Embodiment 2 has the same function as each part of Embodiment 1 shown in Figures 1 to 16, which are denoted by the same reference numerals, so their descriptions will be omitted.

[0125] In Example 1, input items with no values ​​(i.e., missing data items) are interpolated for reasons such as not being measured during a health checkup. In contrast, in Example 2, items with abnormal values ​​are corrected. Here, abnormal values ​​refer not to values ​​that are actually obtained through correct processing as values ​​outside the normal range during a health checkup, but rather values ​​that should not be obtained even if there is a problem with the subject's health. Such abnormal values ​​can be entered, for example, due to errors in recording measurement results or errors when reading measurement results with an optical character recognition (OCR) device.

[0126] Figure 17 is a block diagram showing an example of the configuration of the information analysis support system 101 in Embodiment 2 of the present invention.

[0127] In the information analysis support system 101 of Embodiment 2, the storage medium 106 further includes an input value correction processing unit 117. The function of the input value correction processing unit 117 is realized by the CPU 104 executing a program stored in the storage medium 106, similar to the functions of the other parts.

[0128] Figure 18 is a flowchart showing an example of the processing performed by the information analysis support system 101 of Embodiment 2 of the present invention.

[0129] The processing performed by the information analysis support system 101 in Example 2 is the same as that of the information analysis support system 101 in Example 1 shown in Figure 10, except that the interpolation process (step 1801) and input value correction process (step 1802) are performed after the frequency calculation process (step 1006), followed by the risk determination process (step 1008).

[0130] Figure 19 is a flowchart showing an example of interpolation processing performed by the information analysis support system 101 of Embodiment 2 of the present invention.

[0131] The process shown in Figure 19 is performed in step 1801 of Figure 18. The interpolation process in Example 2 is basically the same as the interpolation process in Example 1 shown in Figure 11. However, in Example 1, interpolation was performed on missing items, whereas in Example 2, interpolation is performed regardless of whether there are missing items or not. For this reason, the interpolation process in Example 2 shown in Figure 18 does not include step 1103, and the processes from step 1101 to step 1111 are executed sequentially on all input items. Alternatively, the process shown in Figure 19 may be performed on input data where the cumulative relative frequency of its value satisfies a predetermined condition.

[0132] Figure 20 is an explanatory diagram showing an example of the user interface displayed in the information analysis support system 101 of Embodiment 2 of the present invention during the processing up to the interpolation process.

[0133] The risk analysis execution screen 2000 shown in Figure 20 is an example of display data output by the information analysis support system 101, and includes, for example, an input information display unit 2001 and an input information correction result display unit 2002.

[0134] The input information display unit 2001 includes multiple input data fields 2003 and a processing execution button 2004. The input data fields 2003 display fields for data identifying the person subject to risk analysis (subject ID) and fields for input data of the risk assessment model. These are the same as the multiple input data fields 1303 shown in Figure 13. However, there are no missing items in the example in Figure 20. Also, while an HbA1c value of around 4-6% is generally considered normal, and a value exceeding 6.5% strongly suggests diabetes, an extremely high value of 80% is entered in the example in Figure 20.

[0135] When the user operates the process execution button 2004, the following processes are performed for the input information items other than missing items: discretization of reference information (step 1005), frequency calculation (step 1006), interpolation (step 1801), and input value correction (step 1802). The results are then displayed in the input information correction result display unit 2002.

[0136] In Figure 20, the input information correction result display unit 2002 shows Table 2005 as the result of calculating statistical quantities.

[0137] Table 2005 includes data items 2006A, input values ​​2006B, missing data rate 2006C, cumulative relative frequency 2006D, correction value candidates 2005E, difference 2005F, and anomaly determination 2005G. Of these, data items 2006A, input values ​​2006B, missing data rate 2006C, and cumulative relative frequency 2006D are the same as data items 1306A, input values ​​1306B, missing data rate 1306C, and cumulative relative frequency 1306D in Table 1306 shown in Figure 13.

[0138] The candidate correction value 2005E is a candidate correction value calculated by interpolation (step 1801) and input value correction (step 1802). The difference 2005F is the difference between the input value 1005B and the candidate correction value 2005E calculated for the item that has been corrected. The abnormality determination 2005G is the result of determining whether the input value 2005B is abnormal.

[0139] For example, among the input data items shown in Figure 20, the HbA1c value is, as mentioned above, an extremely high value far removed from the range of normally expected values. Therefore, the cumulative relative frequency of this value is 100%. The information analysis support system 101 calculates the correction value candidate 2005E by performing the processing shown in Figure 19 on HbA1c. This processing is performed in the same way as in Example 1 when HbA1c was a missing item. For example, the correction value candidate 2005E may be calculated for items with extremely small or extremely large cumulative relative frequencies, such as those with a cumulative relative frequency of less than 1% or greater than 99%. Alternatively, the correction value candidate 2005E may be calculated for all items and all of them may be displayed along with their cumulative relative frequencies, or the correction value candidate 2005E may be displayed only for items with extremely large or small cumulative relative frequencies.

[0140] In Example 1, multiple values ​​representing the distribution of HbA1c values, such as minimum, maximum, median, and quartiles, were calculated as the HbA1c value. In Example 2, however, a representative value of the distribution is calculated. The representative value may be, for example, the mean, median, or mode. In the example in Figure 20, "5.5" is calculated as the candidate corrected value 2005E for HbA1c, and the difference 2005F between this and the input value 2005B, which is "80", is "74.5".

[0141] As described above, a large difference between the input value 2005B and the candidate correction value 2005E indicates that, for items highly correlated with HbA1c, the HbA1c values ​​of individuals with similar values ​​to the person being analyzed for risk are significantly different from the input HbA1c value. In this case, it is suspected that an incorrect value has been entered due to a mistake in filling out or reading the notification information, and therefore the abnormality judgment 2005G will be "Confirmation Required," which requires user verification.

[0142] Figure 21 is an explanatory diagram showing an example of a user interface for displaying the risk assessment results of the information analysis support system 101 of Embodiment 2 of the present invention.

[0143] The risk analysis execution screen 2100 shown in Figure 21 is an example of display data output by the information analysis support system 101, and includes, for example, an input information display unit 2101 and a result display unit 2102.

[0144] The input information display unit 2101 includes a subject ID field 2103, a risk assessment definition setting field 2104, and a risk assessment execution button 2105. These are the same as the subject ID field 1603, risk assessment definition setting field 1604, and risk assessment execution button 1605 shown in Figure 16. When a user operates the risk assessment execution button 2105, a risk assessment is performed based on the data of the person subject to risk analysis, both before and after the correction of the value of one of the items, and the result is displayed in the result display unit 2102.

[0145] The results display unit 2102 includes a risk distribution display unit 2106. The risk distribution display unit 2106 shows the risk value 2106A calculated by inputting the uncorrected input values ​​into the risk assessment model, and the risk value 2106B calculated by inputting the input values ​​corrected by the correction value candidate 2005E into the risk assessment model. In the example in Figure 20, the uncorrected risk value 2106A is very large, but the corrected risk value 2106B is small. For example, if the determination of whether a person subject to risk analysis is judged as high risk differs depending on whether correction is applied or not, the user can reconfirm the input values ​​suspected of being incorrect or having reading errors, and if errors are found, the correct input values ​​can be used to perform an accurate assessment.

[0146] In this embodiment, we have shown an example of calculating one corrected risk value based on one candidate correction value. However, similar to Embodiment 1, multiple candidate correction values ​​may be calculated based on the distribution of reference information, multiple risk values ​​may be calculated based on these, and the distribution of risk values ​​may be evaluated.

[0147] Furthermore, the system of the embodiment of the present invention may be configured as follows.

[0148] (1) An information analysis support method executed by a computer system, wherein the computer system comprises a processor (e.g., CPU 104) and a storage device connected to the processor (e.g., a storage medium 106 and a database 107), the storage device holding health information relating to the health status of the person being analyzed (e.g., information relating to the person being analyzed from the information managed by the input information management unit 121), a risk model for calculating health risk values ​​(e.g., risk model parameter information 700), and reference health information relating to the health status of multiple persons (e.g., information relating to at least multiple persons other than the person being analyzed from the information managed by the input information management unit 121), and the information analysis support method is performed by the processor The method includes: a first step (e.g., steps 1007, 1801) in which the processor calculates one or more correction values ​​(e.g., candidate interpolation values ​​for missing items in Example 1, or candidate correction values ​​for outliers in Example 2) based on reference health information as candidate values ​​for correcting the values ​​of items to be corrected included in the health information of the subject (e.g., values ​​of missing items in Example 1 or outliers in Example 2); a second step (e.g., step 1204) in which the processor calculates one or more risk values ​​regarding the health of the subject based on the health information of the subject corrected by one or more correction values ​​and a risk model; and a third step (e.g., step 1205) in which the processor determines the risk regarding the health of the subject based on one or more risk values.

[0149] This allows for risk analysis to be supported even if the information of the person being analyzed includes missing values ​​due to data gaps or outliers due to input errors, by supplementing them with information from other individuals.

[0150] (2) In the first step of (1) above, the processor calculates multiple correction values ​​(for example, candidates 1 to 5 in Figure 14) based on the distribution of values ​​of the same items as the correction target items extracted from the reference health information. In the second step, the processor calculates multiple risk values ​​by inputting the multiple correction values ​​into the risk model. In the third step, the processor outputs information showing the frequency distribution of the risk values ​​(for example, the risk distribution in Figure 16) and the criteria for determining the level of risk based on the risk values ​​(for example, the criteria values ​​in Figure 16).

[0151] This helps determine whether or not there is a possibility that the risk value will exceed a predetermined standard.

[0152] (3) In the first step of (2) above, the processor extracts information similar to the health information of the subject of analysis (e.g., similar groups) from the reference health information, and calculates several correction values ​​based on the distribution of values ​​of the same items as the correction target items included in the extracted information.

[0153] This allows for highly accurate correction.

[0154] (4) In the first step of (3) above, the processor extracts information from the reference health information that includes values ​​for one or more items that are highly correlated with the values ​​of the items to be corrected, and which are similar to the values ​​of the person being analyzed, as information similar to the health information of the person being analyzed (for example, steps 1105, 1106).

[0155] This allows for accurate correction based on data from individuals similar to the person being analyzed.

[0156] (5) In the first step of (4) above, the processor extracts information about multiple individuals whose values ​​are similar to those of the subject of analysis, based on the values ​​of two or more items that have a high correlation with the value of the item to be corrected, as information similar to the subject of analysis's health information (e.g., steps 1105, 1106). If the number of individuals whose information is extracted is less than a predetermined standard (e.g., step 1107: No), the processor extracts information about multiple individuals whose values ​​are similar to those of the subject of analysis, based on the values ​​of one or more items obtained by removing the least correlated item from the two or more items that have a high correlation with the value of the item to be corrected, as information similar to the subject of analysis's health information.

[0157] This allows for accurate correction based on data from a sufficient number of individuals similar to the subject of analysis.

[0158] (6) In (4) above, the degree of relevance is an indicator that shows the degree of correlation between the values ​​of the items.

[0159] This allows for accurate correction based on data from individuals similar to the person being analyzed.

[0160] (7) In (2) above, if the health information of the subject of analysis includes multiple correction items, in the first step, the processor calculates multiple correction values ​​for each of the multiple correction items based on the distribution of values ​​of the same items as the correction items extracted from the reference health information, and in the second step, the processor calculates multiple risk values ​​by inputting all combinations of the multiple correction values ​​for the multiple correction items (for example, the interpolation results in Figure 15) into the risk model.

[0161] This helps determine whether the risk value is likely to exceed a predetermined threshold, based on all possible combinations of interpolation results.

[0162] (8) In the first step of (2) above, the processor calculates one or more correction values, which are at least one of the minimum, maximum, median, quartile, and mean values ​​of the same items as the correction target items extracted from the reference health information.

[0163] This generates several reasonable correction values ​​within the estimated range of possible values ​​for the item to be corrected.

[0164] (9) In (1) above, the items to be corrected are the items of the health information of the person being analyzed that do not have a value (for example, missing data items).

[0165] This allows for correction (interpolation) of missing data.

[0166] (10) In (1) above, the items to be corrected are the items with abnormal values ​​among the health information items of the person being analyzed.

[0167] This allows for the correction of outliers.

[0168] (11) In (10) above, an item with an abnormal value is an item among the health information items of the subject of analysis in which the frequency of occurrence of the value of that item in the distribution of values ​​of the same item included in the reference health information is less than a predetermined standard.

[0169] This allows for the proper extraction of outliers.

[0170] (12) In the third step of (10) above, the processor outputs information indicating the risk value calculated based on the health information of the analyte before correction by the correction value, and the risk value calculated based on the health information of the analyte after correction by the correction value.

[0171] This helps in risk analysis when there are outliers.

[0172] It should be noted that the present invention is not limited to the embodiments described above, and various modifications are included. For example, the embodiments described above are explained in detail for a better understanding of the present invention, and are not necessarily limited to those having all of the configurations described. Furthermore, it is possible to replace parts of the configuration of one embodiment with the configuration of another embodiment, and it is possible to add configurations from other embodiments to the configuration of one embodiment. In addition, it is possible to add, delete, or replace parts of the configuration of each embodiment with other configurations.

[0173] Furthermore, each of the above configurations, functions, processing units, and processing means may be implemented in hardware, either partially or entirely, by designing them as integrated circuits, for example. Alternatively, each of the above configurations and functions may be implemented in software by a processor interpreting and executing programs that implement each function. Information such as programs, tables, and files that implement each function can be stored in storage devices such as non-volatile semiconductor memory, hard disk drives, and SSDs (Solid State Drives), or in computer-readable non-temporary data storage media such as IC cards, SD cards, and DVDs.

[0174] Furthermore, the control lines and information lines shown are those deemed necessary for explanation purposes, and do not necessarily represent all control lines and information lines in the actual product. In practice, it can be assumed that almost all components are interconnected. [Explanation of Symbols]

[0175] 101 Information Analysis Support System 102 Input section 103 Output section 104 CPU 105 memory 106 Storage medium 107 Databases 108 Communications Department 111 Interpolation Value Distribution Calculation Unit 112 Frequency Calculation Unit 113 Relevance Information Extraction Unit 114 Discretization Processing Unit 115 Risk Value Calculation Unit 116 Risk Assessment Unit 117 Input Value Correction Processing Unit 121 Input Information Management Department 122 Distribution Information Management Department 123 Relevance Information Management Department 124 Risk Model Information Management Department 125 Risk Assessment Definition Management Department

Claims

1. A method for supporting information analysis performed by a computer system, The computer system comprises a processor and a storage device connected to the processor. The aforementioned storage device is Health information regarding the health status of the subjects of analysis, A risk model for calculating health-related risk values, It maintains reference health information regarding the health status of multiple individuals, The aforementioned information analysis support method is The first step involves the processor calculating one or more correction values ​​based on the reference health information as candidate values ​​for correcting the values ​​of the correction target items included in the health information of the person being analyzed, The processor performs a second step of calculating one or more risk values ​​regarding the health of the subject of analysis based on the health information of the subject of analysis corrected by the one or more correction values ​​and the risk model. The processor includes a third step of determining the health risk of the subject of analysis based on the one or more risk values, In the first step, the processor extracts information similar to the health information of the subject of analysis from the reference health information, and calculates a plurality of correction values ​​based on the distribution of values ​​of the same items as the correction target items included in the extracted information. In the second step, the processor calculates a plurality of risk values ​​by inputting the plurality of correction values ​​into the risk model. The information analysis support method is characterized in that, in the third step, the processor outputs information indicating the frequency distribution of occurrence of the multiple risk values ​​and the criteria for determining the level of risk based on the risk values.

2. The information analysis support method according to Claim 1, An information analysis support method characterized in that, in the first step, the processor extracts information of multiple individuals, including values ​​similar to the values ​​of the subject of analysis, from the reference health information, for one or more items that have a high correlation with the values ​​of the items to be corrected, as information similar to the health information of the subject of analysis.

3. The information analysis support method according to Claim 2, In the first step, the processor, From the aforementioned reference health information, information from multiple individuals containing values ​​similar to those of the subject of analysis, for two or more items with a high correlation to the value of the item to be corrected, is extracted as information similar to the subject of analysis's health information. An information analysis support method characterized in that, if the number of people in the extracted information is less than a predetermined standard, information of multiple people containing values ​​similar to the values ​​of the person being analyzed is extracted as information similar to the health information of the person being analyzed, based on the values ​​of one or more items obtained by removing the least correlated item from two or more items with a high correlation to the value of the item to be corrected from the reference health information.

4. The information analysis support method according to Claim 2, The aforementioned correlation index is an indicator that shows the degree of correlation between values ​​in an item, and is a method for supporting information analysis.

5. The information analysis support method according to Claim 1, If the health information of the subject of analysis includes multiple items to be corrected, in the first step, the processor calculates multiple correction values ​​for each of the multiple items to be corrected based on the distribution of values ​​of the same items as the items to be corrected extracted from the reference health information. An information analysis support method characterized in that, in the second step, the processor calculates the plurality of risk values ​​by inputting all combinations of the plurality of correction values ​​for the plurality of items to be corrected into the risk model.

6. The information analysis support method according to Claim 1, An information analysis support method characterized in that, in the first step, the processor calculates at least one of the minimum, maximum, median, quartile, and mean values ​​of the same item as the item to be corrected extracted from the reference health information as the one or more correction values.

7. The information analysis support method according to Claim 1, The information analysis support method is characterized in that the items to be corrected are items among the health information items of the person being analyzed that do not have a value.

8. The information analysis support method according to Claim 1, The information analysis support method is characterized in that the items to be corrected are items with abnormal values ​​among the health information items of the person being analyzed.

9. The information analysis support method according to claim 8, The information analysis support method is characterized in that the items with abnormal values ​​are items among the health information items of the subject of analysis, in which the frequency of occurrence of the value of the item is smaller than a predetermined standard in the distribution of values ​​of the same item included in the reference health information.

10. The information analysis support method according to Claim 8, An information analysis support method characterized in that, in the third step, the processor outputs information indicating the risk value calculated based on the health information of the person being analyzed before correction by the correction value, and the risk value calculated based on the health information of the person being analyzed after correction by the correction value.

11. An information analysis support system, It comprises a processor and a storage device connected to the processor, The aforementioned storage device is Health information regarding the health status of the subjects of analysis, A risk model for calculating health-related risk values, It maintains reference health information regarding the health status of multiple individuals, The aforementioned processor, A first step involves calculating one or more correction values ​​based on the reference health information as candidates for correction values ​​of the correction target items included in the health information of the person being analyzed, A second step involves calculating one or more risk values ​​regarding the health of the subject of analysis based on the health information of the subject of analysis corrected by the one or more correction values ​​and the risk model. A third step is performed to determine the health risks of the subject of analysis based on the aforementioned risk values ​​of 1 or more, In the first step, information similar to the health information of the subject of analysis is extracted from the reference health information, and a number of correction values ​​are calculated based on the distribution of values ​​of the same items as the correction target items included in the extracted information. In the second step, by inputting the multiple correction values ​​into the risk model, multiple risk values ​​are calculated. An information analysis support system characterized in that, in the third step, it outputs information showing the frequency distribution of the occurrence of risk values ​​and the criteria for determining the level of risk based on the risk values.