A device for identifying the duties of healthcare professionals and a method for generating a judgment model.

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A motion detection sensor and machine-learning model for healthcare professionals accurately identify tasks without keyword recollection, addressing inaccuracies and workload issues in existing methods.

JP7875546B2Active Publication Date: 2026-06-18CARE COM +1

View PDF 4 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: CARE COM
Filing Date: 2022-05-24
Publication Date: 2026-06-18

Smart Images

Figure 0007875546000001
Figure 0007875546000002
Figure 0007875546000003

Patent Text Reader

Abstract

To provide a work specification device of a medical worker which highly accurately specifies at least one of the work content and the action content without imposing a burden on the medical worker, and a determination model generation method.SOLUTION: A work specification device 10 comprises a work determination unit 12 which inputs movement data of a medical worker detected by a movement detection sensor 101 to a determination model 13 and determines the work content of the medical worker. The work determination unit generates for each medical worker a determination model 13 with machine learning processing using movement data detected by the movement detection sensor 101 and character data of the speaking content converted from sound data detected by a sound detection sensor in the time of work execution, constitutes the determination model 13 so as to output information indicating the work content associated in advance with the speaking content when the movement data is input, and specifies the work content corresponding to the movement at the time when the work is performed without speaking a specific keyword with the determination model 13 unique to the medical worker.SELECTED DRAWING: Figure 2

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] The present invention relates to a device for identifying the tasks performed by healthcare professionals and a method for generating a judgment model, and more particularly to a device for identifying the content of tasks or actions performed by healthcare professionals and a method for generating a judgment model applicable thereto. [Background technology]

[0002] To prevent overtime, work-related accidents, and health problems among healthcare workers (including nurses and caregivers), it is common practice to understand the tasks and actions performed by healthcare workers and implement measures such as reviewing those tasks. A commonly known method for understanding the work of healthcare workers is for them to record their own work. However, having healthcare workers record their daily work increases their workload, which can increase the likelihood of overtime, work-related accidents, and health problems. Furthermore, there is a risk that records of work performed by healthcare workers themselves may be inaccurate due to omissions or misunderstandings, making it difficult to guarantee the authenticity of the work records.

[0003] To solve these problems, a technique is known that recognizes the caregiver's voice acquired by a voice acquisition device (e.g., a microphone) as text information, and then compares this text information with keywords pre-stored to indicate the type of caregiving activity, thereby estimating the caregiving activity performed (see, for example, Patent Document 1). However, when estimating caregiving activities performed using only a voice acquisition device, while it may reduce the effort required to input caregiving activities, it still leaves challenges in verifying the facts of the caregiving activities.

[0004] As a technology for solving the above problems, an assistive device equipped with a posture sensor (an acceleration sensor and a rotation detector) is worn by a caregiver, and a user terminal equipped with a position sensor is carried by the caregiver. Based on the posture of the caregiver detected by the posture sensor, care work is specified, and based on the position of the caregiver detected by the position sensor, the care location is specified (for example, see Patent Document 2).

Prior Art Documents

Patent Documents

[0005]

Patent Document 1

Patent Document 2

Summary of the Invention

Problems to be Solved by the Invention

[0006] In the technology described in Patent Document 1, the caregiver needs to always utter a voice that completely matches a specific keyword determined in advance. Therefore, there is a problem that the caregiver has to remember the keywords for each task and always use them, continuously burdening the caregiver. Also, in the technology described in Patent Document 2, although a posture sensor is used to identify the care work of the caregiver, since the posture sensor detects various states due to individual differences, etc., it is difficult to appropriately associate the posture detected by the posture sensor with the work content of the caregiver.

[0007] The present invention has been made to solve such problems, and an object thereof is to be able to accurately identify at least one of the work and actions performed by medical staff without burdening the medical staff.

Means for Solving the Problems

[0008] To solve the above-mentioned problems, the present invention inputs motion data detected by a motion detection sensor that detects actions other than speech of medical personnel into a judgment model that has been trained for the target medical personnel, which is generated for each medical personnel by machine learning processing using training data, in order to determine the content of at least one of the tasks and actions performed by the medical personnel. The judgment model is generated by machine learning processing that uses motion data detected by the motion detection sensor for medical personnel and text data of speech content converted from speech data detected by the speech detection sensor for medical personnel as training data, so that when motion data is input, it outputs information indicating at least one of the tasks and actions that are pre-associated with the speech content of the speech data. [Effects of the Invention]

[0009] According to the present invention configured as described above, when medical professionals perform their duties, they do not need to constantly emit sounds that match predetermined keywords. If they perform their duties normally, their actions are detected by a motion detection sensor, and at least the content of the work and actions corresponding to those actions are identified. Furthermore, since the judgment model for determining at least one of the content of the work and actions based on motion data is machine-trained for each individual medical professional, the likelihood of correctly determining at least one of the content of the work and actions from the actions of individual medical professionals is increased. As a result, according to the present invention, the content of at least one of the work and actions performed by medical professionals can be identified with high accuracy without burdening the medical professionals. [Brief explanation of the drawing]

[0010] [Figure 1] This figure shows an example of the overall configuration of the business identification system according to this embodiment. [Figure 2] This is a block diagram showing an example of the functional configuration of a task-specific device according to this embodiment. [Figure 3]This is a diagram illustrating the operation of the first determination model according to this embodiment. [Figure 4] This is a diagram illustrating the operation of the second determination model according to this embodiment. [Figure 5] This block diagram shows an example of the functional configuration of the model generation device according to this embodiment. [Figure 6] This figure shows an example of the overall configuration of the learning system according to this embodiment. [Figure 7] This figure shows an example of the learning period setting information used for machine learning of the first decision model according to this embodiment. [Figure 8] This flowchart shows an example of the operation of the model generation device according to this embodiment (an example of the processing procedure for the model generation method). [Modes for carrying out the invention]

[0011] Hereinafter, one embodiment of the present invention will be described with reference to the drawings. Figure 1 is a diagram showing an example of the overall configuration of a work identification system equipped with a medical professional's work identification device (hereinafter simply referred to as the work identification device) according to this embodiment. As shown in Figure 1, the work identification system of this embodiment comprises the work identification device 10, motion detection sensor 101, and repeater 110 of this embodiment. The motion detection sensor 101 and the repeater 110 are connected by wireless communication means, and the repeater 110 and the work identification device 10 are connected by a communication network 120 such as a LAN (Local Area Network) or WAN (Wide Area Network). Here, the communication network 120 may be wired or wireless.

[0012] The motion detection sensor 101 detects actions other than speech from medical personnel and periodically outputs motion data representing the content of those actions. This motion detection sensor 101 is, for example, a posture detection sensor that detects the posture of medical personnel and is attached to the medical personnel's head, chest, arms (hands), waist, legs, etc. The posture detection sensor includes, for example, any of the following: a 3D sensor that detects position and height from the floor, an acceleration sensor, and a gyroscope sensor.

[0013] The repeater 110 is installed in multiple locations within the facility where medical personnel work. It receives motion data from the motion detection sensor 101 via wireless communication means such as a wireless LAN, and transmits the received motion data to the task identification device 10 via the communication network 120. The repeater 110 is installed in necessary locations, such as each room and corridor within the facility.

[0014] The task identification device 10 performs a process to identify the content of the tasks performed by the medical professional based on the medical professional's motion data transmitted from the motion detection sensor 101 via the relay device 110. Identifying the content of the tasks here means identifying which of several types of tasks the medical professional performed.

[0015] Figure 2 is a block diagram showing an example of the functional configuration of the task identification device 10 according to this embodiment. As shown in Figure 2, the task identification device 10 of this embodiment includes an operation data acquisition unit 11 and a task determination unit 12 as its functional configuration. The task determination unit 12 implements a determination model 13. This determination model 13 is generated for each medical professional by machine learning processing using training data, which will be described later. In other words, the task determination unit 12 implements multiple determination models 13 generated for each medical professional.

[0016] The above-mentioned functional blocks 11 and 12 can be configured using hardware, a DSP (Digital Signal Processor), or software. For example, when configured using software, the functional blocks 11 and 12 are actually configured using a computer's CPU, RAM, ROM, etc., and are realized by the operation of programs stored in the RAM and ROM. These programs may also be stored on other storage media such as hard disks or semiconductor memory.

[0017] The motion data acquisition unit 11 acquires motion data of medical personnel detected by the motion detection sensor 101. As described above, motion data is output periodically and sequentially from the motion detection sensor 101, and the motion data acquisition unit 11 acquires this motion data sequentially. The motion data acquisition unit 11 adds a timestamp indicating the date and time the motion data was acquired, and outputs the motion data and timestamp together to the work determination unit 12.

[0018] The task determination unit 12 inputs the motion data acquired by the motion data acquisition unit 11 into one of several determination models 13 generated for each medical professional, specifically the determination model 13 that has been trained for the medical professional being determined, and determines the actions and tasks performed by the medical professional. At this time, the task determination unit 12 determines the period to be determined for the motion data sequentially supplied from the motion data acquisition unit 11, based on the timestamp input together with the motion data. The period to be determined indicates which period of motion data detected within a series of motion data will be used as the target for determination when determining whether or not an action included in the task was performed by a medical professional. Details of this period to be determined will be described later with reference to Figure 3.

[0019] The judgment model 13 is generated by machine learning processing using motion data detected by the motion detection sensor 101 for medical personnel and text data of speech content converted from speech data detected by the speech detection sensor 102 (described later using Figure 6) for medical personnel as training data. This judgment model 13 is machine learning-based to output information indicating the action and work content that is pre-associated with the speech content of the speech data when motion data is input.

[0020] The determination model 13 includes a first determination model 13a and a second determination model 13b. The first determination model 13a is configured to input operation data and output information representing the utterance content corresponding to the operation data, and the first determination model 13a is generated by machine learning processing using the above-described learning data. The second determination model 13b is configured to input the information representing the utterance content output from the first determination model 13a and output the information indicating the action content and business content associated in advance with the utterance content. Specifically, the second determination model 13b outputs the information indicating the action content and business content corresponding to the utterance content based on the association information in which the utterance content, action content, and business content are associated in advance.

[0021] FIG. 3 is a diagram for explaining the operation content of the first determination model 13a. FIG. 3(a) shows that a certain business A is carried out by one action A ACT1 and when performing this action A ACT1 a medical worker makes a related utterance A UTT1 . Also, it shows that the time required for performing this action A ACT1 is T A1 . For example, when the business A is a vital sign acquisition business, after a medical worker makes an utterance A UTT1 such as "I will take your temperature", it means that the medical worker performs an action A A1 of taking the temperature over the time T ACT1 . Here, when taking the temperature, the medical worker performs an action specific to the action A ACT1 of taking the temperature, and a series of postures during the action are sequentially detected by the motion detection sensor 101, and the detected motion data is sequentially acquired by the motion data acquisition unit 11.

[0022] The first determination model 13a is based on a series of operation data sequentially supplied from the operation data acquisition unit 11 during the time T A1 and estimates the utterance A UTT1 from the operation data.Information representing the content (text data of the utterance) is output. Here, the time T is determined by the timestamp supplied together with the series of action data. A1 The count is performed. In the example shown in Figure 3(a), time T A1 This becomes the period for determination. For example, the first determination model 13a determines utterance A based on the action data. UTT1 When information about the utterance is output, the time T that was previously stored in association with that utterance is then output. A1 This will be determined as the period for evaluation.

[0023] As mentioned above, when a healthcare worker performs action A ACT1 The motion data detected by the motion detection sensor 101 while performing the action A ACT1 In relation to this, a medical professional's speech A UTT1 There is a strong correlation between the content and the first decision model 13a. The first decision model 13a is machine learning-based to reflect the correlation between the motion data detected by the motion detection sensor 101 and the speech content grasped from the speech data detected by the speech detection sensor 102 (see Figure 6), and action A ACT1 When corresponding action data is input, the corresponding utterance content (Action A) is generated. ACT1 Related utterance A UTT1 Outputs information representing the contents of [the program / content].

[0024] Figure 3(b) shows that a certain task B involves two actions B. ACT1 ,B ACT2 These actions B ACT1 ,B ACT2 When performing this procedure, healthcare workers will each use the relevant utterance B. UTT1 ,B UTT2 This indicates that these actions B will be performed. ACT1 ,B ACT2 The time required to carry out each of these actions is T B1 ,T B2 This indicates that the medical professional performed action B. ACT1 ,B ACT2When performing these actions, each action will be specific to that action, and the series of postures during these actions will be sequentially detected by the motion detection sensor 101, and the detected motion data will be sequentially acquired by the motion data acquisition unit 11.

[0025] The first decision model 13a is based on time T B1 Based on a series of operation data sequentially supplied from the operation data acquisition unit 11 during this time, the utterance B is estimated from the operation data. UTT1 It outputs information representing the content of [the first decision model]. B2 Based on a series of operation data sequentially supplied from the operation data acquisition unit 11 during this time, the utterance B is estimated from the operation data. UTT2 Outputs information representing the content of [the specified time]. In this case, time T B1 ,T B2 These will be the respective periods for evaluation.

[0026] Figure 4 is a diagram illustrating the operation of the second judgment model 13b. Figure 4 shows an example of association information that is formed by pre-associating utterance content, action content, and work content. Figure 4(a) shows one action A as in Figure 3(a) ACT1 This shows an example of association information corresponding to the case where one task A is performed. Figure 4(b) shows two actions B as in Figure 3(b). ACT1 ,B ACT2 This shows an example of association information that corresponds to the case where one task B is performed.

[0027] In Figure 4(a), six strings representing different types of speech content are associated with one action, namely taking body temperature, and one task, namely obtaining vital signs. The six strings representing different types of speech content are listed as possible speech content that a healthcare professional might utter when performing the task of obtaining vital signs; in other words, they are possible speech content outputs from the first judgment model 13a. The second judgment model 13b determines whether the string output from the first judgment model 13a matches any of the strings recorded as association information, and if a match is found, it outputs information on the corresponding action and task.

[0028] Figure 4(b) shows multiple types of utterances B. UTT1 -1,B UTT1 For a string representing -2,..., one action B ACT1 Along with being associated with, multiple types of utterance content B UTT2 -1,B UTT2 For a string representing -2,..., one action B ACT2 These two actions B are associated with each other. ACT1 ,B ACT2 One task B is associated with each combination. The second judgment model 13b determines that the string output from the first judgment model 13a is action B ACT1 Corresponding utterance B UTT1 -1,B UTT1 -2,··· matches any of the strings, AND action B ACT2 Corresponding utterance B UTT2 -1,B UTT2 If it is determined that the string matches any of -2,··· then action B ACT1 ,B ACT2 The system outputs information on the actions shown and the tasks shown for task B.

[0029] Furthermore, in the case of tasks performed by multiple actions, there are tasks where the order of execution of the multiple actions is always fixed, and tasks where the order of execution is not fixed and can be changed arbitrarily. In the case of tasks where the order of execution is fixed, information indicating the corresponding task content is output only when the strings of utterances recorded in the association information are output sequentially from the first judgment model 13a, as shown in Figure 4(b). On the other hand, in the case of tasks where the order of execution is not fixed, information indicating the corresponding task content is output when the strings of utterances recorded in the association information are output in any order from the first judgment model 13a, as shown in Figure 4(b).

[0030] In the association information shown in Figure 4, information on the acceptable number of characters for each string of utterance content is recorded. The acceptable number of characters refers to the number of characters that can be allowed to be different between the string output by the first judgment model 13a and the string recorded as association information. For example, if the acceptable number of characters is "0", the string output by the first judgment model 13a and the string recorded as association information must be an exact match. If the acceptable number of characters is "1", the string output by the first judgment model 13a is considered to be a match if there is one or no character that differs from the string recorded as association information.

[0031] Figure 5 is a block diagram showing an example of the functional configuration of the model generation device 20 according to this embodiment, which generates a judgment model 13 by machine learning. Figure 6 is a diagram showing an example of the overall configuration of the learning system including the model generation device 20. In Figure 6, components with the same reference numerals as those shown in Figure 1 have the same function, so redundant explanations are omitted here.

[0032] As shown in Figure 6, when generating the judgment model 13 using machine learning, the medical professional carries a voice detection sensor 102 in addition to the motion detection sensor 101. The voice detection sensor 102 is, for example, a microphone. The microphone may be built into a mobile device such as a smartphone, be included in a headset, or be a lapel microphone that can be attached to clothing with a clip or the like.

[0033] The voice detection sensor 102 detects the speech of medical personnel and outputs the voice data. Note that the voice data detected by the voice detection sensor 102 also includes voices other than those of medical personnel. The repeater 110 transmits the voice data detected by the voice detection sensor 102, in addition to the motion data detected by the motion detection sensor 101, to the model generation device 20.

[0034] As shown in Figure 5, the model generation device 20 of this embodiment includes, as a functional configuration, a training data acquisition unit 21 and a judgment model generation unit 22. The model generation device 20 also includes a training data storage unit 23 as a storage medium. The training data acquisition unit 21 includes, as a specific functional configuration, an operation data acquisition unit 21a, an audio data acquisition unit 21b, and a text conversion unit 21c. The judgment model generation unit 22 includes, as a specific functional configuration, a first judgment model generation unit 22a and an association information generation unit 22b.

[0035] The above-mentioned functional blocks 21 and 22 can be configured using hardware, a DSP, or software. For example, when configured using software, the functional blocks 21 and 22 are actually configured using a computer's CPU, RAM, ROM, etc., and are realized by the operation of programs stored in the RAM and ROM. These programs may also be stored on other storage media such as hard disks or semiconductor memory.

[0036] The learning data acquisition unit 21 sequentially acquires motion data detected by the motion detection sensor 101 and audio data detected by the audio detection sensor 102, and stores the motion data and the character data converted from the audio data as learning data in the learning data storage unit 23. Here, the learning data acquisition unit 21 sequentially acquires motion data and audio data detected when a medical professional carrying the motion detection sensor 101 and the audio detection sensor 102 is actually performing their duties during an arbitrarily set learning data collection period.

[0037] The motion data acquisition unit 21a sequentially acquires motion data of medical personnel detected by the motion detection sensor 101. The motion data acquisition unit 21a adds a timestamp indicating the date and time the motion data was acquired, and stores the motion data and timestamp together in the learning data storage unit 23.

[0038] The voice data acquisition unit 21b acquires voice data of medical personnel detected by the voice detection sensor 102. The voice data acquisition unit 21b extracts the audible period from the acquired voice data and outputs the extracted voice data (in the following description, when referred to as voice data, it means the voice data of the audible period) to the text conversion unit 21c.

[0039] The text conversion unit 21c applies speech recognition technology to the audio data supplied from the audio data acquisition unit 21b to recognize the speech of medical professionals contained in the audio data and generates character data consisting of a string representing the content of the speech. The text conversion unit 21c adds a timestamp indicating the date and time the character data was generated and stores the character data and timestamp together in the learning data storage unit 23.

[0040] Here, the learning data acquisition unit 21 is shown to include an audio data acquisition unit 21b and a text conversion unit 21c, but it is not limited to this configuration. For example, the model generation device 20 may have an external component for converting audio data to text data, and the learning data acquisition unit 21 may be configured to include a text data acquisition unit.

[0041] The association information generation unit 22b generates association information as illustrated in Figure 4 using the text data of the spoken voice stored in the learning data storage unit 23. Here, the information indicating the content of the action, the information indicating the content of the work, and the information on the number of characters of tolerance error included in the association information are manually set by the generator of the judgment model 13. That is, when a medical professional performs a certain action in a certain job, they often verbally inform the patient or care recipient of the content of the work in advance to obtain their consent, or verbally inform them of the results of the work. Since the type of utterances made for each job is generally predetermined, there is a strong correlation between the content of the utterance, the action, and the job. Based on this pre-recognized correlation, the generator of the judgment model 13 sets the association between the information indicating the content of the action and the information indicating the content of the work for the text data of the spoken voice. The number of characters of tolerance error can be set arbitrarily. The association information thus generated is stored in the second judgment model 13b shown in Figure 2.

[0042] The first judgment model generation unit 22a generates the first judgment model 13a by performing machine learning using the action data and spoken speech text data stored in the learning data storage unit 23 as learning data. When performing this machine learning, the generator of the judgment model 13 generates learning period setting information as shown in Figure 7 and sets it in the first judgment model generation unit 22a. The learning period setting information shown in Figure 7 sets the learning period for the action data of the medical professional associated with the action performed in relation to each string of spoken content.

[0043] As shown in Figure 7, the learning period is set to either the time required for the action when the action is performed after an utterance (hereinafter referred to as the time required for the post-utterance action (in seconds)) or the time required for the action when the utterance is made after the action is performed (hereinafter referred to as the time required for the pre-utterance action; in Figure 7, this is indicated as the time back to the start (in seconds)). Here, it is not the case that the time it takes for a medical professional to perform the same action is always the same; there will be variations. The time required set as the learning period setting information in Figure 7 may be the longest time, the shortest time, or the average time based on actual results. The time required set as the learning period in the learning period setting information becomes the judgment period determined by the first judgment model 13a, as described above.

[0044] The first judgment model generation unit 22a performs machine learning using a dataset consisting of text data of the spoken content corresponding to the voice data detected by the voice detection sensor 102 at a certain timing, and a series of motion data detected by the motion detection sensor 101 during a learning target period corresponding to the time required for the task, which is identified by referring to the learning period setting information in Figure 7 based on that timing.

[0045] In other words, if the character data stored in the learning data storage unit 23 is character data of speech content for which the duration of the action after utterance is set, the first determination model generation unit 22a performs machine learning using a set of action data stored in the learning data storage unit 23, which includes a series of action data with timestamps ranging from approximately the same date and time as the timestamp attached to the character data to a date and time after the duration of the action.

[0046] Furthermore, if the character data stored in the learning data storage unit 23 is character data of speech content for which a pre-utterance action duration (start retrospective time) has been set, the first determination model generation unit 22a performs machine learning using a set of action data stored in the learning data storage unit 23, which includes a series of action data with timestamps attached from approximately the same date and time as the timestamp attached to the character data, up to a date and time before the duration.

[0047] By performing the machine learning described above, a first judgment model 13a is generated that reflects the correlation between a series of action data included in the learning period and the text data of the utterance. As a result, the first judgment model 13a is constructed so that when a series of action data included in the learning period is input, the corresponding text data of the utterance is output. The first judgment model 13a thus generated is implemented in the business judgment unit 12 shown in Figure 2. Therefore, if the series of action data acquired by the action data acquisition unit 11 of the business identification device 10 is similar to the series of action data used in machine learning, the corresponding text data of the utterance will be output.

[0048] In this embodiment, the processing of the model generation device 20 described above is performed for each individual healthcare professional. This generates multiple customized decision models 13 for each healthcare professional.

[0049] Figure 8 is a flowchart showing an example of the operation of the model generation device 20 (an example of the processing procedure for the model generation method). Here, it is assumed that the training data collection period has already ended, and training data including the action data of medical personnel and the text data of spoken voice is stored in the training data storage unit 23. Figure 8 shows an example of the operation in which the first judgment model generation unit 22a generates the first judgment model 13a using the training data stored in the training data storage unit 23.

[0050] First, the first determination model generation unit 22a acquires one character data from among the character data of spoken speech stored in the learning data storage unit 23 (step S1). Immediately after processing starts, for example, it acquires character data whose timestamp indicates the earliest date and time. Then, as shown in Figure 7, the first determination model generation unit 22a refers to the pre-set learning period setting information and determines whether the character data acquired in step S1 is character data of spoken content for which a start retrospective time has been set (step S2).

[0051] If it is determined that the text data is of an utterance with a set start time, the first determination model generation unit 22a retrieves a series of action data stored in the learning data storage unit 23 from the learning data storage unit 23, which have timestamps attached from approximately the same date and time as the timestamp attached to the text data acquired in step S1 to a date and time before the required time (step S3). Then, the first determination model 13a is trained using the text data acquired in step S1 and the series of action data acquired in step S3 as a set (step S5).

[0052] On the other hand, if in step S2 it is determined that the text data is not of the spoken content for which a start time has been set, the first determination model generation unit 22a retrieves from the learning data storage unit 23 a series of operation data stored in the learning data storage unit 23, which have timestamps attached from the timestamp attached to the text data acquired in step S1 up to a date and time after the required time (step S4). Then, the first determination model 13a is subjected to machine learning using the text data acquired in step S1 and the series of operation data acquired in step S4 as a set (step S5).

[0053] Subsequently, the first judgment model generation unit 22a determines whether or not all character data has been acquired from the training data storage unit 23 (step S6). If not all character data has been acquired yet, the process returns to step S1 and acquires character data with the next timestamp. From there, steps S1 to S6 are repeated until all character data has been acquired from the training data storage unit 23 and machine learning is performed. If it is determined in step S6 that all character data has been acquired from the training data storage unit 23, the process shown in Figure 8 is terminated.

[0054] As explained in detail above, in this embodiment, motion data detected by the motion detection sensor 101 is input to a machine learning-trained judgment model 13 for the medical professional to be judged, and the content of the actions and tasks performed by the medical professional is determined. The judgment model 13 is configured to output information indicating the content of actions and tasks that are pre-associated with the content of speech data when motion data is input, using machine learning processing that uses motion data detected by the motion detection sensor 101 for the medical professional and character data converted from speech data detected by the speech detection sensor 102 for the medical professional as training data.

[0055] With this embodiment configured as described above, healthcare workers do not need to constantly emit voices that match predetermined keywords when performing their duties. If they perform their duties normally, their actions at that time are detected by the motion detection sensor 101, and the corresponding actions and duties are identified. Furthermore, since the judgment model 13 for determining the duties based on the motion data is machine-trained for each individual healthcare worker, the likelihood of correctly determining the actions and duties even from actions that differ from one healthcare worker to another is increased. As a result, with this embodiment, the actions and duties performed by healthcare workers can be identified with high accuracy without burdening the healthcare workers.

[0056] Furthermore, according to this embodiment, in order to perform machine learning, it is not necessary to assign information indicating the content of the action and the content of the work as a correct label to the motion data detected by the motion detection sensor 101. That is, during an arbitrarily set training data collection period, motion data and voice data detected when a medical professional carrying the motion detection sensor 101 and the voice detection sensor 102 is actually performing their duties are sequentially acquired and stored in the training data storage unit 23, while association information is generated based on the content of speech normally uttered when performing duties, making it possible to efficiently perform machine learning, including the collection of training data.

[0057] Here, the motion data and speech content text data used to generate the first judgment model 13a are a set of motion data detected by the motion detection sensor 101 when a medical professional performs a task, and text data converted from the speech content actually uttered by the medical professional when performing that task, so there is a strong correlation. Furthermore, the association information set in the second judgment model 13b is generated based on the speech content that is normally uttered when performing a task, so there is also a strong correlation between the speech content associated by the association information and the action content and work content. Therefore, by identifying the action content and work content of a medical professional using the first judgment model 13a and the second judgment model 13b generated in this way, the action content and work content of a medical professional can be identified with high accuracy.

[0058] In the above embodiment, an example was described in which a posture detection sensor is used as the motion detection sensor 101, but the present invention is not limited to this. For example, a position detection sensor that detects the location where a medical professional is performing an action may also be used as the motion detection sensor. Various tasks performed by medical professionals include those performed while remaining in one place, such as at the bedside of a patient or person requiring care, and those performed while moving between multiple locations, such as the bedside and other places. By detecting not only the posture of the medical professional but also their location and utilizing this information to identify the content of the task, it becomes possible to identify the content of the medical professional's actions and tasks with higher accuracy. Here, by miniaturizing the motion detection sensor or using only the necessary sensors, the content of the actions and tasks can be identified without the medical professional having to wear a large assist device like those in the conventional technology.

[0059] Furthermore, as an action detection sensor, an operation detection sensor may also be used to detect operations on equipment used by medical professionals when performing their duties (for example, operation of a suction device valve or operation of a medical device power switch). By detecting not only the posture of medical professionals but also the equipment they are using and utilizing this information to identify the content of their actions and tasks, it becomes possible to identify the content of medical professionals' actions and tasks with greater accuracy.

[0060] Furthermore, in the above embodiment, when collecting voice data as training data for machine learning, it is not confirmed which medical professional is speaking, but this is not limited to this. For example, a device that outputs voices from medical professionals may be provided, and this output voice can be confirmed by the medical professional themselves or a voice recognition device. This allows machine learning to be performed correctly based on the voice data of the medical professional themselves.

[0061] Furthermore, a device may be provided that converts the information of the utterance output from the first judgment model 13a into speech data and outputs it. The medical professional can then confirm the speech output from this device and provide feedback to the first judgment model 13a in the form of speech indicating whether the judgment result is correct or incorrect. This reinforcement learning method provides information as a reward indicating whether the machine learning is being performed correctly. In this way, it is possible to grasp the degree of learning (precision and recall) for each medical professional. In addition, considering that there are times when patients are asleep, such as at night, the information of the utterance output from the first judgment model 13a may be output as a list (such as a text file or CSV file), and the feedback input to the first judgment model 13a of reward information indicating correctness or incorrectness may be provided as a list.

[0062] Furthermore, while the above embodiment performs judgments for each healthcare professional, it is not limited to this. For example, for a newly appointed healthcare professional whose judgment model is incomplete, the judgment models of multiple other individuals may be copied, and information indicating the content of actions and tasks may be output from the behavioral data. In this case, if an incorrect judgment is made, the system is trained to recognize that the judgment is wrong, making it possible to build a judgment model for the newly appointed healthcare professional in a short period of time.

[0063] In the embodiments described above, examples of specifying the actions and duties of medical professionals have been explained, but the present invention is not limited thereto. For example, the duties may not be specified, but only the individual actions performed when carrying out the duties may be specified. Alternatively, the duties may be specified directly without specifying the actions.

[0064] Furthermore, the above embodiments are merely examples of how the present invention may be implemented, and the technical scope of the invention should not be interpreted as being limited by them. In other words, the present invention can be implemented in various ways without departing from its spirit or its main features. [Explanation of symbols]

[0065] 10 Business specific equipment 11. Operation data acquisition unit 12 Business Judgment Department 13 Decision Models 13a First Decision Model 13b Second Decision Model 20 Model Generators 21. Training data acquisition unit 21a Operation data acquisition unit 21b Audio data acquisition unit 21c Text Conversion Unit 22 Decision Model Generation Unit 22a First Decision Model Generation Unit 22b Association Information Generation Unit 23. Learning data storage unit 101 Motion detection sensor 102 Voice detection sensor

Claims

1. A motion data acquisition unit that acquires motion data detected by a motion detection sensor that detects actions other than speech by medical personnel, The system includes a task determination unit that inputs the operation data acquired by the operation data acquisition unit into a determination model that has been trained for the target medical professional from among the determination models generated for each medical professional by machine learning processing using training data, and determines at least one of the tasks performed by the medical professional and the content of the actions included in said tasks. The above judgment model is, A first determination model that takes motion data detected by the motion detection sensor for the above medical personnel as input and outputs information representing the speech content corresponding to said motion data, The system includes a second determination model that takes as input information representing the utterance output from the first determination model and outputs information indicating at least one of the business content and action content that are pre-associated with the utterance, The first judgment model described above is generated by machine learning processing using motion data detected by the motion detection sensor for the medical professional and text data of the spoken content converted from the voice data detected by the voice detection sensor for the medical professional as training data. The second determination model described above is configured to output information indicating at least one of the above-mentioned business content and action content based on association information that is formed by pre-associating the spoken content of the voice data with at least one of the above-mentioned business content and action content. A device for identifying the duties of medical professionals, characterized by the following features.

2. The medical professional's work identification device according to claim 1, characterized in that the motion detection sensor includes a posture detection sensor that detects the posture of the medical professional.

3. The medical professional's task identification device according to claim 2, characterized in that the motion detection sensor further includes a position detection sensor that detects the position in which the medical professional is performing an action.

4. The medical professional's work identification device according to claim 2, characterized in that the motion detection sensor further includes an operation detection sensor that detects operation on equipment used by the medical professional when performing their duties.

5. The first step involves the model generation device's judgment model generation unit inputting, as training data, motion data detected by a motion detection sensor that detects actions other than speech of medical personnel, and character data of speech content converted from speech data detected by a speech detection sensor for the medical personnel. The above-mentioned judgment model generation unit has a second step in which it performs machine learning using the training data input in the first step to generate a judgment model that outputs information representing the speech content corresponding to the action data when the action data is input. The above-mentioned judgment model generation unit generates the above-mentioned judgment model for each of the above-mentioned medical professionals using the above-mentioned training data prepared for each of the above-mentioned medical professionals. A method for generating a decision model characterized by the following:

6. The method for generating a judgment model according to claim 5, characterized in that the judgment model generation unit performs machine learning using as a dataset the text data of the utterance content corresponding to the voice data detected at a certain timing and a series of action data detected during a period corresponding to the time required for the work identified based on the certain timing.