Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Movie searching entity identification method based on CRF

An entity recognition, film and television technology, applied in special data processing applications, instruments, electronic digital data processing and other directions, can solve the problems of entity nesting, short film and television retrieval text, no syntactic structure, etc., and achieve strong real-time performance and good entity recognition. effect of effect

Inactive Publication Date: 2018-11-06
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, English entity recognition technology has reached a high level; however, Chinese entity recognition is relatively difficult
It is mainly due to several characteristics of Chinese itself: (1) There is no clear boundary label in Chinese, and the concept of words is relatively vague; (2) Chinese words are flexible and changeable, and the same entity has different meanings in different contexts; (3) Entities are nested, especially in organization names; (4) There are many simplified expressions in Chinese, and it is difficult to recognize the Chinese translation of English names
[0004] Most of the existing entity recognition algorithms are used to deal with long texts, while video retrieval texts are very short, without a complete syntactic structure, and usually include ambiguous phrases, which cannot provide sufficient background information. Therefore, it is very difficult to accurately determine the entity type

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Movie searching entity identification method based on CRF
  • Movie searching entity identification method based on CRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0033] A kind of video retrieval entity recognition method based on CRF, it is applied to video retrieval in the present embodiment, concrete implementation steps are as follows:

[0034] Step S1. Data collection, the data in this step are divided into two parts:

[0035] (1) film and television database data, when the training corpus is automatically marked, the mode of matching the film and television database is used to mark in this embodiment. The database data is mainly crawled from multiple film and television websites by using web crawler technology.

[0036] (2) User video retrieval text data. In this embodiment, this part of data is obtained from the TV user's online video retrieval data.

[0037] Concretely, in the present embodiment, when collecting the movie and TV database data in step S1, web crawler technology is mainly used to crawl the movie and TV data from multiple movie and TV websites. Therefore, this step mainly includes the following steps:

[0038] S1...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a movie searching entity identification method based on CRF. Data automatic rough labeling is carried out on movie searching text data; through combination of artificial correction, training corpus labeling is finished; features are extracted through design of a feature template; and entity identification is carried out through utilization of the CRF. Involved technologiescomprise natural language interaction understanding, movie corpus labeling and entity identification, and network crawler technologies. According to the method provided by the invention, the entity identification is independent of a knowledge base, so an unlogged entity also can be identified; in the movie searching field, good entity identification effects can be obtained for different entity types; and the timeliness is high.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a CRF-based video retrieval entity recognition method. Background technique [0002] Named entity recognition refers to the recognition of named referents from text, including names of people, places, institutions, and specific entities in some special fields. It is an important research direction in the field of natural language processing and has a wide range of applications in engineering practice. Applications, such as: event detection, information retrieval, machine translation, question answering system and other fields. [0003] At present, English entity recognition technology has reached a relatively high level; however, Chinese entity recognition is relatively difficult. It is mainly due to several characteristics of Chinese itself: (1) There is no clear boundary label in Chinese, and the concept of words is relatively vague; (2) Chinese words are f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
CPCG06F40/295
Inventor 杨兰孙锐展华益王欣赵亮谭斌许洛
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products