Method and system for extracting complex named entities from Web video p ages
A named entity and video technology, applied in the field of information extraction, can solve the problem that the algorithm cannot be directly applied, is not suitable for discovering complex named entities, and the named entity lacks context information, etc., and achieves the effect of improving the accuracy of extraction.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0049] The present invention will be described in further detail below in conjunction with the accompanying drawings.
[0050] The inventive method such as figure 1 shown.
[0051] Step S100, for each Web video page in the Web video page set, extract valid text information from the Web video page, the valid text information forms video text, and all video texts form a training set.
[0052] The specific implementation manner of step S100 is as follows.
[0053] Step 110, setting an information extraction template for each site.
[0054] For the vast majority of video websites, most of their webpages are read by scripts or programs from the interface provided by the database, and then generate HTML pages in a fixed format. Therefore, in the same website, webpages with the same or similar semantic content Usually also have the same or similar HTML syntax structure.
[0055] Due to the particularity of the HTML webpage, the method of extracting the text of the webpage may ado...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com