Category based, extensible and interactive system for document retrieval

A technology of document retrieval and interactive search, which is applied in the field of search engines and can solve problems such as ignorance

Inactive Publication Date: 2004-10-06
COGISUM INTERMEDIA
View PDF0 Cites 74 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, traditional classifiers are trained with a set of positive or negative instances and often produce binary values ​​that ignore the potential relationship between the article and multiple categories

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Category based, extensible and interactive system for document retrieval
  • Category based, extensible and interactive system for document retrieval
  • Category based, extensible and interactive system for document retrieval

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0300] Example 1-Search through multiple levels

[0301] If the requester enters the search term "headache", the system looks up the word in the dictionary 204 to ensure that the spelling is correct, and handles ending changes and other issues. Next, the system checks the synonym table 206, and if any synonyms are found, the system expands the search to search for both items. When all these preparatory steps are completed, the system looks up the word "headache" in the query word list 214 to see if the item has been searched before. In this example, the item has been searched before, so "headache" is used as a query word, and the table 214 assigns query word number 2 to the query word.

[0302] After recognizing the word and discovering that it has been searched before, the system searches the query link table 216 and searches the URL table 218 of all documents containing the word from this table. Here, URL numbers 17 and 19 are found in the query link table 216.

[0303] Correspo...

example 2

[0305] Example 2-Search only one level

[0306] Assuming that the requester now enters the search term "aspirin", the system will first look up the word from the dictionary 204 and the synonym table 206 described in Example 1, and deal with inflections and other issues. After completing all necessary checks, the system goes to the query word list and learns that "aspirin" has been searched before and assigned a query word number. Therefore, the system then looks up the word number in the query link table 216 and learns that only one document assigned the URL number 20 contains this word. Referring to the URL table 218, the document 20 is only assigned to a topic number 2. Therefore, no interaction with the requester is required. The URL address and document name of a single document are displayed to the requester so that the requester can determine whether to browse the document.

[0307] Example 3-The search term does not appear in the query word list.

[0308] Suppose the reques...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An integrated, automatic and open information retrieval system comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requestor, said system retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requestor, and the requestor designates the relevant topics. The requestor is then granted access only to documents assigned to relevant topics. A knowledge database linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.

Description

Technical field [0001] The present invention generally relates to the field of information retrieval (IR) systems that can be accessed at high speeds, and in particular to search engines applied to the Internet and / or corporate intranet domains, which use automatic text classification technology to retrieve accessible documents to support high-speed Provide search query results in the network environment. Background technique [0002] As the amount of public information accessible through multiple corporate networks, especially through the Internet, continues to increase, the importance of helping people to better find, filter, and manage these resources is also growing. Since the network represents an early, dynamic and not yet well-standardized market, it contains a large amount of unorganized documents and text materials. In particular, because there are no grammatical rules that can be used to retrieve stored information, the Internet, as an open medium that anyone can freely...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30H04L12/56
CPCG06F17/30873G06F17/3071H04W4/00G06F16/355G06F16/954
Inventor 弗兰克·梅克迈克尔·维尔舍茨
Owner COGISUM INTERMEDIA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products