[0016] The following describes the embodiments of the present invention in detail. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention, but should not be construed as limiting the present invention.
[0017] The following describes the method and device for searching a target person in a video in an embodiment of the present invention with reference to the accompanying drawings.
[0018] figure 1 It is a flowchart of a method for searching a target person in a video according to an embodiment of the present invention.
[0019] Such as figure 1 As shown, the method for finding the target person in the video can include:
[0020] S1: Receive target person information and video to be checked.
[0021] Among them, the target person information may be the name of the target person or a photo of the target person. Of course, it can also be a combination of the two. The file format of the video to be checked may include mp4, avi, rm, rmvb, flv and other video formats.
[0022] For example, if the reviewer wants to check whether a certain mp4 video contains sensitive characters Zhang xx and Wang xx, he can directly input the names of Zhang xx and Wang xx as the search condition. If the reviewer does not know the name of the sensitive person, he can input the photos corresponding to the sensitive person Zhang xx and Wang xx as the search condition.
[0023] S2: Determine a template corresponding to the target person according to the target person information.
[0024] Specifically, when the target person information is a name, the template corresponding to the name can be obtained from a pre-established template library, and then the template corresponding to the name can be displayed. After viewing the displayed template, the reviewer can select the template corresponding to the target person. For example, after the reviewer enters the target person’s name Zhang xx, there may be many people whose names are Zhang xx. At this time, all templates named Zhang xx can be retrieved from the template library, and these templates are provided to the reviewers. The template contains sample pictures from multiple angles of the character, and the reviewer selects and confirms the template corresponding to the target character Zhang xx, so the problem of duplicate names can be avoided.
[0025] In addition, when the target person information is a photo, the first similarity between the photo and the template in the template library can be calculated, and then the template corresponding to the target person can be obtained according to the first similarity. For example, the reviewer inputs a photo of the target person, and the photo can be matched with the template in the template library. Since the template contains sample pictures of people from multiple angles, the similarity between the photo and the sample pictures can be calculated sequentially. Obtain the template corresponding to the sample picture with the highest similarity score, that is, the template corresponding to the sample picture most similar to the photo as the template of the target person.
[0026] Among them, the template library is pre-established. Specifically, sample pictures of people from multiple angles can be obtained, and a template library can be established based on the sample pictures from multiple angles. For example: for a sensitive person Zhang xx, sample pictures of multiple angles such as the front of Zhang xx, 45 degrees left and right sides, 30 degrees left and right sides, looking up, and down can be obtained. Based on these sample pictures, multi-angle modeling of the sensitive person Zhang xx can be performed. The template corresponding to Zhang xx is generated and saved in the template library.
[0027] S3: The video to be checked is divided into multiple key frame images.
[0028] Specifically, the video to be inspected can be divided into multiple key frame images. In an embodiment of the present invention, the key frame image can be segmented according to the length of the video. For example, the video length is 60 seconds, and the key frame image can be captured in 1 second or 2 seconds.
[0029] S4: Filter out multiple frame images containing human faces from multiple key frame images based on the face recognition algorithm.
[0030] After segmenting to obtain multiple key frame images, an algorithm based on the OpenGL face recognition library can be used to filter out multiple frame images containing human faces from multiple key frame images, thereby filtering out frame images that do not contain human faces. Improve recognition efficiency. Among them, OpenGL is an open source image recognition library that can be used for face recognition.
[0031] S5: Determine, according to the template, a plurality of frame images containing human faces that contain the target person's frame image, and obtain time information of the frame image containing the target person in the video to be inspected.
[0032] Specifically, the first image feature of the frame image containing the face and the second image feature of the template corresponding to the target person can be extracted, and then the second similarity between the first image feature and the second image feature can be calculated. When the second degree of similarity is greater than the preset threshold, it can be determined that the frame image containing the human face is the frame image containing the target person. After that, the time information of the frame image containing the target person in the video to be checked can be obtained. For example: a certain video has a length of 15 minutes, and multiple key frame images are segmented into a frame every 2 seconds. Thus, the time information corresponding to each key frame image can be known. Assuming that the first key frame image is at 2 seconds, the second key frame image is at 4 seconds, and so on. Among them, some key frame images contain human faces, and the frame images containing the target person Zhang xx are detected by the above method. Then it can be determined that the frame of image is at 10 minutes and 20 seconds of the video.
[0033] According to the method for searching the target person in the video of the embodiment of the present invention, the template corresponding to the target person is determined by the received target person information, and then the video to be inspected is divided into multiple key frame images, and based on the face recognition algorithm, the multiple key frames Multiple frame images containing human faces are screened out from the image, and multiple frame images containing human faces are determined according to the template to contain the frame image of the target person, and the time information of the frame image containing the target person in the video to be checked is obtained. The sensitive person in the video can be found quickly and easily, and the recognition efficiency is improved.
[0034] figure 2 It is a flowchart of a method for searching a target person in a video according to another embodiment of the present invention.
[0035] Such as figure 2 As shown, the method for finding the target person in the video can include:
[0036] S1: Receive target person information and video to be checked.
[0037] S2: Determine a template corresponding to the target person according to the target person information.
[0038] S3: The video to be checked is divided into multiple key frame images.
[0039] S4: Filter out multiple frame images containing human faces from multiple key frame images based on the face recognition algorithm.
[0040] S5: Determine, according to the template, a plurality of frame images containing human faces that contain the target person's frame image, and obtain time information of the frame image containing the target person in the video to be inspected.
[0041] It should be understood that steps S1 to S5 are consistent with the description of steps S1 to S5 in the previous embodiment, so they will not be repeated in this embodiment.
[0042] S6, mask the face part in the frame image containing the target person.
[0043] After acquiring the frame image containing the target person and the time information in the video to be checked, the frame image can be processed, that is, the face part of the target person in the frame image is masked, for example, a mosaic is added to the face part. Thus, the face of the target person is blocked, and the target person is prevented from appearing when the video is played.
[0044] According to the method for finding a target person in a video in the embodiment of the present invention, after acquiring the frame image containing the target person and the time information in the video to be checked, the face part of the target person in the frame image is masked, thereby avoiding playback In the video, the situation of sensitive people is shown.
[0045] image 3 It is a flowchart of a method for searching a target person in a video according to another embodiment of the present invention.
[0046] Such as image 3 As shown, the method for finding the target person in the video can include:
[0047] S1: Receive target person information and video to be checked.
[0048] S2: Determine a template corresponding to the target person according to the target person information.
[0049] S3: The video to be checked is divided into multiple key frame images.
[0050] S4: Filter out multiple frame images containing human faces from multiple key frame images based on the face recognition algorithm.
[0051] S5: Determine, according to the template, a plurality of frame images containing human faces that contain the target person's frame image, and obtain time information of the frame image containing the target person in the video to be inspected.
[0052] S6, mask the face part in the frame image containing the target person.
[0053] It should be understood that steps S1 to S6 are consistent with the description of steps S1 to S6 in the previous embodiment, so they will not be repeated in this embodiment.
[0054] S7: Recognize the text information in the frame image containing the target person based on the OCR technology.
[0055] After shielding the human face in the frame image containing the target person, the text information in the frame image containing the target person can also be identified based on the OCR (Optical Character Recognition) technology. The purpose of this is: although the sensitive person's face is partially masked, the name of the sensitive person may still appear. For example: the sensitive person is attending a meeting, and the name badge of the sensitive person is placed on the seat. Therefore, it is not enough to shield only the face of a sensitive person, and further processing of the name is needed. S8, using NLP natural language processing technology, and judging whether the text information is the name of the target person according to the preset name database.
[0056] S8, using NLP natural language processing technology, and judging whether the text information is the name of the target person according to the preset name database.
[0057] Specifically, NLP (Natural Language Processing, natural language processing technology) can be used to process the recognized text information, and then it can be determined whether the recognized text information is the name of the target person according to a preset name library. For example, the preset name library may be a collection of names of sensitive persons.
[0058] S9: If the text information is the name of the target person, the text information is blocked.
[0059] For example: the face of the sensitive person Zhang xx in the frame image is blocked by mosaic, and the part of the corresponding name badge should also be blocked, so as to avoid the sensitive information from being played.
[0060] The method for finding a target person in a video of the embodiment of the present invention can further avoid the problem of sensitive information being played by shielding the name corresponding to the target person in the frame image containing the target person.
[0061] In order to achieve the above objective, the present invention also provides a device for searching a target person in a video.
[0062] Figure 4 It is a schematic diagram of the structure of a device for finding a target person in a video according to an embodiment of the present invention Figure one.
[0063] Such as Figure 4 As shown, the device for finding a target person in a video may include: a receiving module 110, a determining module 120, a segmentation module 130, a screening module 140, and an acquisition module 150.
[0064] The receiving module 110 is used for receiving target person information and video to be checked.
[0065] Among them, the target person information may be the name of the target person or a photo of the target person. Of course, it can also be a combination of the two. The file format of the video to be checked may include mp4, avi, rm, rmvb, flv and other video formats.
[0066] For example, if the reviewer wants to check whether a certain mp4 video contains sensitive characters Zhang xx and Wang xx, he can directly input the names of Zhang xx and Wang xx as the search condition. If the reviewer does not know the name of the sensitive person, he can input the photos corresponding to the sensitive person Zhang xx and Wang xx as the search condition.
[0067] The determining module 120 is configured to determine a template corresponding to the target person according to the target person information. Specifically, when the target person information is a name, the template corresponding to the name can be obtained from a pre-established template library, and then the template corresponding to the name can be displayed. After viewing the displayed template, the reviewer can select the template corresponding to the target person. For example, after the reviewer enters the target person’s name Zhang xx, there may be many people whose names are Zhang xx. At this time, all templates named Zhang xx can be retrieved from the template library, and these templates are provided to the reviewers. The template contains sample pictures from multiple angles of the character, and the reviewer selects and confirms the template corresponding to the target character Zhang xx, so the problem of duplicate names can be avoided.
[0068] In addition, when the target person information is a photo, the first similarity between the photo and the template in the template library can be calculated, and then the template corresponding to the target person can be obtained according to the first similarity. For example, the reviewer inputs a photo of the target person, and the photo can be matched with the template in the template library. Since the template contains sample pictures of people from multiple angles, the similarity between the photo and the sample pictures can be calculated sequentially. Obtain the template corresponding to the sample picture with the highest similarity score, that is, the template corresponding to the sample picture most similar to the photo as the template of the target person.
[0069] The segmentation module 130 is configured to segment the video to be inspected into multiple key frame images. Specifically, the video to be inspected can be divided into multiple key frame images. In an embodiment of the present invention, the key frame image can be segmented according to the length of the video. For example, the video length is 60 seconds, and the key frame image can be captured in 1 second or 2 seconds.
[0070] The screening module 140 is used for screening multiple frame images containing human faces from multiple key frame images based on a face recognition algorithm. After segmenting to obtain multiple key frame images, an algorithm based on the OpenGL face recognition library can be used to filter out multiple frame images containing human faces from multiple key frame images, thereby filtering out frame images that do not contain human faces. Improve recognition efficiency. Among them, OpenGL is an open source image recognition library that can be used for face recognition.
[0071] The acquiring module 150 is configured to determine, according to the template, a frame image containing a target person in a plurality of frame images containing a human face, and to acquire time information of the frame image containing the target person in the video to be inspected.
[0072] Specifically, the first image feature of the frame image containing the face and the second image feature of the template corresponding to the target person can be extracted, and then the second similarity between the first image feature and the second image feature can be calculated. When the second degree of similarity is greater than the preset threshold, it can be determined that the frame image containing the human face is the frame image containing the target person. After that, the time information of the frame image containing the target person in the video to be checked can be obtained. For example: a certain video has a length of 15 minutes, and multiple key frame images are segmented into a frame every 2 seconds. Thus, the time information corresponding to each key frame image can be known. Assuming that the first key frame image is at 2 seconds, the second key frame image is at 4 seconds, and so on. Among them, some key frame images contain human faces, and the frame images containing the target person Zhang xx are detected by the above method. Then it can be determined that the frame of image is at 10 minutes and 20 seconds of the video.
[0073] In addition, such as Figure 5 As shown, the device for searching the target person in the video may further include a establishing module 160.
[0074] The establishment module 160 is used to obtain sample pictures of a person from multiple angles, and build a template library based on the sample pictures from multiple angles. For example: for a sensitive person Zhang xx, sample pictures of multiple angles such as the front of Zhang xx, 45 degrees left and right sides, 30 degrees left and right sides, looking up, and down can be obtained. Based on these sample pictures, multi-angle modeling of the sensitive person Zhang xx can be performed. The template corresponding to Zhang xx is generated and saved in the template library.
[0075] The device for finding a target person in a video in an embodiment of the present invention determines the template corresponding to the target person through the received target person information, and then divides the video to be checked into multiple key frame images, and extracts multiple key frames from the multiple key frames based on the face recognition algorithm. Multiple frame images containing human faces are screened out from the image, and multiple frame images containing human faces are determined according to the template to contain the frame image of the target person, and the time information of the frame image containing the target person in the video to be checked is obtained. The sensitive person in the video can be found quickly and easily, and the recognition efficiency is improved.
[0076] Image 6 It is a schematic structural diagram of a device for searching a target person in a video according to another embodiment of the present invention.
[0077] Such as Image 6 As shown, the device for finding the target person in the video may include: a receiving module 110, a determining module 120, a segmentation module 130, a screening module 140, an acquisition module 150, a establishing module 160, and a shielding module 170.
[0078] Wherein, the receiving module 110, the determining module 120, the segmenting module 130, the screening module 140, the acquiring module 150, and the establishing module 160 are consistent with the description of the previous embodiment, and will not be repeated here.
[0079] The masking module 170 is used for masking the human face in the frame image containing the target person. After acquiring the frame image containing the target person and the time information in the video to be checked, the frame image can be processed, that is, the face part of the target person in the frame image is masked, for example, a mosaic is added to the face part. Thus, the face of the target person is blocked, and the target person is prevented from appearing when the video is played.
[0080] The device for finding a target person in a video in the embodiment of the present invention masks the face part of the target person in the frame image after acquiring the frame image containing the target person and the time information in the video to be checked, thereby avoiding the playback In the video, the situation of sensitive people is shown.
[0081] Figure 7 It is a schematic structural diagram of a device for searching a target person in a video according to another embodiment of the present invention.
[0082] Such as Figure 7 As shown, the device for finding a target person in a video may include: a receiving module 110, a determining module 120, a segmentation module 130, a screening module 140, an acquisition module 150, an establishment module 160, a shielding module 170, an identification module 180, and a judgment module 190.
[0083] Wherein, the receiving module 110, the determining module 120, the segmenting module 130, the screening module 140, the acquiring module 150, and the establishing module 160 are consistent with the description of the previous embodiment, and will not be repeated here.
[0084] The recognition module 180 is used to recognize the text information in the frame image containing the target person based on the OCR technology. After masking the human face in the frame image containing the target person, the text information in the frame image containing the target person can be identified based on the OCR (Optical Character Recognition) technology. The purpose of this is: although the sensitive person's face is partially masked, the name of the sensitive person may still appear. For example: the sensitive person is attending a meeting, and the name badge of the sensitive person is placed on the seat. Therefore, it is not enough to shield only the face of a sensitive person, and further processing of the name is needed.
[0085] The judging module 190 is used for using NLP natural language processing technology and judging whether the text information is the name of the target person according to a preset name library. Specifically, NLP (Natural Language Processing, natural language processing technology) can be used to process the recognized text information, and then it can be determined whether the recognized text information is the name of the target person according to a preset name library. For example, the preset name library may be a collection of names of sensitive persons.
[0086] The shielding module 170 is used for shielding the text information when the text information is confirmed to be the name of the target person. For example: the face of the sensitive person Zhang xx in the frame image is blocked by mosaic, and the part of the corresponding name badge should also be blocked, so as to avoid the sensitive information from being played.
[0087] The device for searching a target person in a video of the embodiment of the present invention can further avoid the problem of sensitive information being played by shielding the name text information corresponding to the target person in the frame image containing the target person.
[0088] In the description of the present invention, it should be understood that the terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", " Back", "Left", "Right", "Vertical", "Horizontal", "Top", "Bottom", "Inner", "Outer", "Clockwise", "Counterclockwise", "Axial", The orientation or positional relationship indicated by "radial", "circumferential", etc. is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying the pointed device or element It must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as a limitation of the present invention.
[0089] In addition, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In the description of the present invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise specifically defined.
[0090] In the present invention, unless otherwise clearly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , Or integrated; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, it can be the internal communication of two components or the interaction relationship between two components, unless otherwise specified The limit. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.
[0091] In the present invention, unless otherwise clearly defined and defined, the first feature "on" or "under" the second feature may be in direct contact with the first and second features, or indirectly through an intermediary. contact. Moreover, the "above", "above" and "above" of the first feature on the second feature may mean that the first feature is directly above or obliquely above the second feature, or it simply means that the level of the first feature is higher than the second feature. The "below", "below", and "below" of the second feature of the first feature may mean that the first feature is directly below or obliquely below the second feature, or it simply means that the level of the first feature is smaller than the second feature.
[0092] In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "examples", "specific examples", or "some examples" etc. mean specific features described in conjunction with the embodiment or example , Structure, materials or features are included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above-mentioned terms do not necessarily refer to the same embodiment or example. Moreover, the described specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine the different embodiments or examples and the characteristics of the different embodiments or examples described in this specification without contradicting each other.
[0093] Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention. A person of ordinary skill in the art can comment on the above-mentioned embodiments within the scope of the present invention. The embodiment undergoes changes, modifications, replacements and modifications.