Approximate collation device, approximate collation method, program, and recording medium

An approximation and comparison technology, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve problems such as inability to extract typos and omissions, use keywords, and difficulty in quick extraction, etc., and achieve the effect of increasing processing time

Inactive Publication Date: 2011-07-27
NIPPON TELEGRAPH & TELEPHONE CORP
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. Because the input string is manually described (input), it often contains typos and omissions, or uses slightly different expressions from the expected keywords;
[0007] 2. If the number of keywords increases, quick extraction becomes difficult, and it takes time to process a la

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Approximate collation device, approximate collation method, program, and recording medium
  • Approximate collation device, approximate collation method, program, and recording medium
  • Approximate collation device, approximate collation method, program, and recording medium

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0209] Example

[0210] (Example in Chinese)

[0211] The present invention is not only applicable to Japanese, but also applicable to other languages. For example, as an example of Chinese, consider Figure 24 Extract from the input string shown Figure 25 The keywords shown.

[0212] In the previous completely consistent keyword extraction method, although the keyword can be extracted from sentence 1.

[0213] "Olympic Games"

[0214] ●But because, in sentence 2

[0215] "Olympic Games"

[0216] Expressed as:

[0217] "Olympics"

[0218] In addition, "men's 100-meter breaststroke" and "men's 100-meter breaststroke" are expressed in slightly different performances.

[0219] ●In sentence 3, "gold medalist" is expressed as "golden boot winner",

[0220] Therefore, keywords cannot be extracted from sentences 2 and 3.

[0221] If the technology related to the present invention is used, even when there are a large number of keywords, the keywords can be extracted from the sentences 2 and 3 as exp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Keywords can be extracted at high speed even if wrongly written words, skipped-character words, expressions a little different from expected keywords are contained in an inputted character string and the number of keywords is large. A skip dictionary creating unit (10) creates a skip dictionary containing the keywords to be extracted and listed in the keyword dictionary and a set of deletion keywords made by deleting at least one character in every character position from each keyword and stores the skip dictionary in a skip dictionary storage unit (20). A keyword extracting unit (30) collates an inputted character string with the words in the skip dictionary, extracts a preliminarily-given keyword and keywords approximate to the keyword from the inputted character string, and outputs them along with the appearance positions.

Description

technical field [0001] The present invention relates to a technique of comparing an input character string with a predetermined keyword in order to extract a predetermined keyword from a text (input character string) described in natural language, and outputting a matching keyword and its appearance position. Background technique [0002] <keyword extraction> [0003] Keyword extraction is a task of extracting keywords listed in advance such as a dictionary from an input character string described in a natural language. [0004] For example, consider starting from figure 1 The keywords related to the Olympic Games are extracted from the input string shown. In this case, by performing figure 2 As shown, the keyword to be extracted is extracted by comparing whether each keyword in the keyword set (hereinafter referred to as keyword dictionary) manually included in the table is included in the above-mentioned input character string. . [0005] However, there are the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/21
CPCG06F17/2735G06F40/242
Inventor 斋藤邦子今村贤治菊井玄一郎松尾义博
Owner NIPPON TELEGRAPH & TELEPHONE CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products