Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Approximate collation device, approximate collation method, program, and recording medium

An approximation and comparison technology, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve problems such as inability to extract typos and omissions, use keywords, and difficulty in quick extraction, etc., and achieve the effect of increasing processing time

Inactive Publication Date: 2011-07-27
NIPPON TELEGRAPH & TELEPHONE CORP
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. Because the input string is manually described (input), it often contains typos and omissions, or uses slightly different expressions from the expected keywords;
[0007] 2. If the number of keywords increases, quick extraction becomes difficult, and it takes time to process a large number of input strings
[0034] However, since the AC method can only extract character strings that are exactly the same as the keywords, there are problems that typos and omissions cannot be extracted like the trie structure, and keywords with slightly different expressions are used.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Approximate collation device, approximate collation method, program, and recording medium
  • Approximate collation device, approximate collation method, program, and recording medium
  • Approximate collation device, approximate collation method, program, and recording medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0210] (example in Chinese)

[0211] The present invention is applicable not only to Japanese, but also to other languages. For example, as an example of Chinese, consider starting from Figure 24 Extracted from the input string shown Figure 25 The keywords shown.

[0212] In the previous fully consistent keyword extraction method, although the keyword can be extracted from sentence 1

[0213] "Olympic Games"

[0214] ● But because, in statement 2, the

[0215] "Olympic Games"

[0216] Expressed as:

[0217] "Olympics"

[0218] In addition, "men's 100-meter breaststroke" and "men's 100-meter breaststroke" are expressed with slightly different performances.

[0219] ● In statement 3, express "gold medalist" as "golden boot winner",

[0220] So keywords cannot be extracted from statements 2 and 3.

[0221] By using the technique of the present invention, keywords can be extracted from sentences 2 and 3 as described below even when there are a large number of keywords....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Keywords can be extracted at high speed even if wrongly written words, skipped-character words, expressions a little different from expected keywords are contained in an inputted character string and the number of keywords is large. A skip dictionary creating unit (10) creates a skip dictionary containing the keywords to be extracted and listed in the keyword dictionary and a set of deletion keywords made by deleting at least one character in every character position from each keyword and stores the skip dictionary in a skip dictionary storage unit (20). A keyword extracting unit (30) collates an inputted character string with the words in the skip dictionary, extracts a preliminarily-given keyword and keywords approximate to the keyword from the inputted character string, and outputs them along with the appearance positions.

Description

technical field [0001] The present invention relates to a technique of comparing an input character string with a predetermined keyword in order to extract a predetermined keyword from a text (input character string) described in natural language, and outputting a matching keyword and its appearance position. Background technique [0002] <keyword extraction> [0003] Keyword extraction is a task of extracting keywords listed in advance such as a dictionary from an input character string described in a natural language. [0004] For example, consider starting from figure 1 The keywords related to the Olympic Games are extracted from the input string shown. In this case, by performing figure 2 As shown, the keyword to be extracted is extracted by comparing whether each keyword in the keyword set (hereinafter referred to as keyword dictionary) manually included in the table is included in the above-mentioned input character string. . [0005] However, there are the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/21
CPCG06F17/2735G06F40/242
Inventor 斋藤邦子今村贤治菊井玄一郎松尾义博
Owner NIPPON TELEGRAPH & TELEPHONE CORP
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More