Method and device for identifying information in non-structured text

An unstructured, text-based technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as extraction, inability to realize old key term mapping, and inability to achieve comprehensive information recognition results based on one Effect

Active Publication Date: 2012-09-26
数据堂(北京)科技股份有限公司
View PDF4 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Using this method, new key terms can be found, but the mapping of old key terms cannot be realized, that is, it is impossible to use a known key term dictionary, and there will be no synonymous relationship with known key terms from unstructured text terms that map to known key terms
[0010] Therefore, none of the existing methods can extract the corresponding name from the unstructured text that does not contain the name or its synonym based on the pre-set dictionary, so as to achieve the purpose of information recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for identifying information in non-structured text
  • Method and device for identifying information in non-structured text
  • Method and device for identifying information in non-structured text

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0060] figure 1 is a schematic block diagram showing the apparatus 1000 for identifying business information in unstructured text according to the first embodiment of the present invention. Figure 1A is a flowchart showing the overall operation of the business information identification device 1000 according to the first embodiment of the present invention.

[0061] Such as figure 1 As shown, the business information identification device 1000 according to the first embodiment of the present invention includes: a storage unit 1100 , a basic business term extraction unit 1200 , a business term extraction rule generation unit 1300 , a business term extraction unit 1400 and a business term mapping unit 1500 .

[0062] The storage unit 1100 is used for storing a business dictionary 1110 , a basic business term base 1120 and a business term extraction rule base 1130 . The business dictionary 1110 is used to store various business names and synonym forms of service providers (in ...

no. 2 example

[0237] Figure 8 A schematic block diagram of an apparatus 8000 for identifying business information in unstructured text according to a second embodiment of the present invention is shown. Figure 8A is a flowchart showing the overall operation of the business information identification apparatus 8000 according to the second embodiment of the present invention.

[0238] Figure 8 neutralize figure 1 The same elements in are indicated by the same reference numerals, Figure 8A neutralize Figure 1A The same steps are represented by the same reference numerals, and the detailed description thereof can refer to the foregoing specific content, and for the sake of brevity, details are not repeated here. Figure 8 The business information identification device 8000 and figure 1 The main difference of the shown business information identification apparatus 1000 is that a basic business term extension unit 8600 is introduced. Figure 8A The business operation flow and Figure ...

no. 3 example

[0243] Figure 9 A schematic block diagram of an apparatus 9000 for identifying business information in unstructured text according to a third embodiment of the present invention is shown. The third embodiment of the present invention can be combined with the first embodiment or the second embodiment, and is mainly used to deal with the failure of the business term extraction unit 1400 to extract from the input unstructured text 1 based on the basic business term and business term extraction rules A new business term is introduced. The following takes the first embodiment as an example for description. Figure 9 neutralize figure 1 The same units are represented by the same reference numerals, and their detailed description can refer to the foregoing specific content, and for the sake of brevity, details are not repeated here. Figure 9 The business information identification device 9000 and figure 1 The difference of the shown business information recognition apparatus 10...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for identifying information in a non-structured text, which can process the non-structured text containing/not containing the name or synonym form. The device for identifying information comprises a basic term extraction unit, a term extraction rule generation unit, a term extraction unit and a term mapping unit, wherein the basic term extraction unit is used for extracting the name included in the first non-structured text from the first non-structured text according to the dictionary as a basic term; the term extraction rule generation unit is used for generating a term extraction rule according to the extracted basic term and the first non-structured text; the term extraction unit is used for extracting a new term from the second non-structured text according to the extracted basic term and the generated term extraction rule; and the term mapping unit is used for mapping each extracted term to the most appropriate name in the dictionary as the identified information.

Description

technical field [0001] The present invention relates to the field of natural language processing, and more specifically, to a method and device for identifying information in unstructured texts, which can not only process unstructured texts containing names or synonyms, but also unstructured texts that do not contain names or unstructured text in the form of synonyms. In the present invention, the information in the unstructured text to be recognized may be services provided by service providers, products provided by product providers, patent terms, and / or keywords in related fields, and such information may be stored in a dictionary . Background technique [0002] Service providers such as telecom operators and banks usually need to process a large amount of unstructured text, such as customer complaints and consultations. These unstructured texts are all in the form of natural language, and often contain one or more services, which are customized by service providers for...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
Inventor 丰强泽齐红威
Owner 数据堂(北京)科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products