Method for identifying Web named entity based on statistical model
A named entity recognition and named entity technology, applied in computing, special data processing applications, instruments, etc., can solve the problems of insufficient recognition accuracy and accuracy, and achieve the effects of optimizing computational complexity, improving recognition accuracy, and improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0049] The present invention is a method for identifying named entities based on statistical models, which mainly preprocesses Web documents on web pages, and provides basic guarantee for subsequent information extraction, machine translation and question and answer systems.
[0050] Taking recruitment network as an example, the present invention uses statistical models to identify named entities for recruitment information on the Web. The named entities in the recruitment information are mainly four types of entities: location, time, organization, and position. The experimental process of identification is as follows figure 1 Shown. The experimental data in Table 1 of this example comes from Zhaolian recruitment webpages, which include six types of recruitment webpages including computer, biomedicine, construction, environmental protection, machinery and chemical engineering, and secretarial. Entity extraction of position name, recruitment agency name, work location, and recruit...
Embodiment 2
[0077] The method of Web named entity recognition based on the statistical model is the same as in Example 1. The named entity feature extraction in step 2 of the present invention is further explained:
[0078] (1) Structural feature vector of Web named entity analyse as below:
[0079] Since named entities in web pages are usually displayed in an emphasized manner, this feature can be taken into consideration when recognizing. For example, when the job name is displayed in a large red font, the display method is obviously different from other text; this feature of Web named entities is mainly used to emphasize some important information, and it is also convenient for users to browse requirements.
[0080] First, express the display style of the Web named entity on the web page to form a feature vector
[0081] Structural characteristics refer to the display style of Web objects, and the Cascading Style Sheet (CSS) attributes of the Web are introduced to describe the structural cha...
Embodiment 3
[0113] The web named entity recognition method based on the statistical model is the same as in Example 1-2. The test compares the effect of the MR-GHMM method of selecting multiple features and single feature:
[0114] The identification effect evaluation standard of the present invention is:
[0115] When comparing the recognition effects of different entities, the present invention uses the recall rate and the precision rate as the evaluation standard, and considers the precision rate and the recall rate, namely: the weighted geometric average F of the recall rate and the precision rate.
[0116] (1) The precision is equal to the number of correct answers produced by the system divided by the number of all answers produced by the system.
[0117] (2) The recall rate is equal to the number of correct answers produced by the system divided by the number of all possible answers in the text (including those obtained by the system and those that the system should not ignore).
[0118] ...
PUM

Abstract
Description
Claims
Application Information

- Generate Ideas
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com