Character data recognition and processing method and device
A technology of character data and data, which is applied in the field of computer data retrieval, can solve the problems of large recognition errors of character data, achieve the effect of overcoming inaccurate recognition and improving recognition accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0033] see figure 1 The method for character data identification and processing according to the embodiment of the present invention mainly includes the following steps:
[0034] S11: Identify the characteristic character data according to the benchmark corpus and the benchmark template, and obtain different entity names corresponding to each named entity;
[0035] S12: Obtain the characteristic prefix frequency of each entity name;
[0036] S13: Identify the character data to be processed according to the feature suffix frequency, the reference template and the predefined corpus, and obtain different entity names corresponding to the respective named entities;
[0037] S14: Perform subsequent analysis processing with the entity name identified from the character data to be processed as a data parameter.
Embodiment 2
[0040] The second embodiment of the method of the present invention is set forth below. The present invention can be applied in various character data, such as language symbol data, mathematical symbol data, logic symbol data, etc. of Chinese or other countries, and identify them in units of words or characters. After processing, the embodiment provided by the present invention takes Chinese news comments as an example to illustrate, for example, input a news web page, after the news title, news text and related comment collections can be correctly extracted, the news text and each comment can be fed back Perform corresponding data processing on the recognition results of person names, place names, and institution names in the database.
[0041] Embodiment 2 is illustrated by taking webpage text data as an example. For example, named entity recognition is performed on news data in webpage text data. The most important is the named entity recognition in news comments, which rec...
Embodiment 3
[0110] Figure 4 A structural diagram of the device of the present invention is shown. Such as Figure 4 As shown, the device for identifying and processing character data according to an embodiment of the present invention includes:
[0111] 1) Recognition unit 40 is used to identify the characteristic character data according to the reference corpus and the reference template, and obtain the different entity names corresponding to each named entity; Identifying the character data of each named entity to obtain different entity names corresponding to each named entity;
[0112] 2) Statistical unit 41, for obtaining the characteristic suffix frequency number of each entity name identified by said recognition unit 40 from the characteristic character data;
[0113] 3) The processing unit 42 is configured to use the entity name identified from the character data to be processed as a data parameter to perform subsequent analysis processing.
[0114] Preferably, the identifica...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com