A method, system and computer program for naming a cluster of words and phrases extracted from a set of documents using a lexical database

一种数据库、文件组的技术,应用在数据处理应用、自然语言数据处理、特殊数据处理应用等方向,能够解决不十分相关等问题

Inactive Publication Date: 2006-05-17
VERITY INC
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this invention only extracts vocabulary from the input document data and uses it as a document label, and the specified document label may not be very relevant to the document from a content point of view.
[0016] From the various methods mentioned above, it is clear that although many people try to extract the content of the file and mark the file, none of these methods use the text that can display the file or key content, in fact, these methods just extract text from the file and use the text as a label

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, system and computer program for naming a cluster of words and phrases extracted from a set of documents using a lexical database
  • A method, system and computer program for naming a cluster of words and phrases extracted from a set of documents using a lexical database
  • A method, system and computer program for naming a cluster of words and phrases extracted from a set of documents using a lexical database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] Please refer to figure 1 Shown is a block diagram of the computer workstation environment of the present invention. figure 1 Is a typical single-user computer workstation 10, such as a personal computer plus associated peripheral equipment, computer workstation 10 includes a processor 12 and a bus 14, the bus 14 is connected and communicated with the The accessories of the processor 12 and the computer workstation 10, the computer workstation 10 usually includes a user interface connector 16, the user interface connector 16 connects the processor 12 to one or more interface devices through the bus 14, such as a keyboard 18, Mouse 20 and other interface device 22, the interface device 22 can be any user interface device, such as touch-sensitive screen, digital input pen and so on. The bus 14 connects a display device 24, such as a liquid crystal display or a traditional screen, to the processor 12 through a display card 26, and the bus 14 also connects the processor 12 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method, system and computer program for naming lexical clusters and phrase clusters extracted from document sets using a lexical database, which takes these clusters as input and generates appropriate cluster labels using a lexical database. The naming procedure involves using a lexical database to find all possible senses for all vocabularies in the cluster, and then augmenting each sense with a semantically similar sense to form a corresponding definition vector, Afterwards, the semantic clarification step is performed to find the most relevant meanings for each vocabulary, the definition vectors are clustered into clusters, each cluster represents a content, then, these content are sorted according to the support, and finally from the definition of the main content In the vector, a pre-confirmed number of words and phrases are selected as labels according to the attributes in the lexical database.

Description

technical field [0001] The present invention relates to a method, system and computer program for naming vocabulary clusters and phrase clusters extracted from a set of documents using a vocabulary database to organize documents by automatically naming a set of documents, in particular to a vocabulary Database to name vocabulary clusters and phrase clusters, and can properly represent the meaning essence of the vocabulary clusters and phrase clusters. Background technique [0002] Generally, all kinds of documents can be regarded as a document, and these documents are usually a combination of words, such as reports, news articles or web pages, or a combination of characters produced by keyboards or typewriters. With the advancement of modern technology and the increase in the reliability of computers, the number of files generated by various software has increased significantly. Today, in large companies, hundreds of thousands of files, or even more files, It has been gener...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/21G06F17/30G06F40/00
CPCG06F16/35G06F40/216G06F40/247G06F40/289G06F40/30Y10S707/99933
Inventor 江昌·茂舒密特·坦克克莉丝蒂娜·庄路克·艾尔发
Owner VERITY INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products