Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and system for determining contribution degree of word in text

A technology of contribution and text, applied in the field of information recognition, can solve problems that cannot truly reflect the contribution and simplicity of words

Inactive Publication Date: 2011-06-01
BEIJING KINGSOFT OFFICE SOFTWARE INC
View PDF7 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The factors considered by the existing calculation method of contribution degree are too simple and cannot truly reflect the contribution degree of words to the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for determining contribution degree of word in text
  • Method and system for determining contribution degree of word in text
  • Method and system for determining contribution degree of word in text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0097] figure 1 A method for determining the contribution of a word in a text provided by an embodiment of the present invention includes:

[0098] S101. Acquire a first text, and select at least one word from the first text;

[0099] The embodiment of the present invention is used to determine the contribution of words in the text. For the convenience of description, the first text is used to represent the text data in the embodiment of the present invention. The first text can be any article, or a group of text materials composed of multiple articles, or even a text library. The first text may also be in various forms, for example, it may be a webpage, a paper, etc., which is not limited in the present invention.

[0100] The word can be unspecified, any word in the first text.

[0101] It should be noted that, for the sake of processing speed and efficiency, after the first text is acquired, the first text may be preprocessed, including operations such as word segmentati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a system for determining the contribution degree of a word in a text. The method comprises: obtaining a first text, and selecting at least one word from the first text; dividing the first text into at least one text segment; calculating the occurring positions and times in the text segment of the first text; and according to the calculated parameter, calculating the contribution degree of the word for the first text, wherein the parameter comprises the occurring positions and times of the word in the text segment of the first text. In the method provided by the embodiment of the invention, the contribution degree of the word for the first text is calculated according to the occurring positions and times of the word in the text segment of the first text, and the length of the word. Compared with the existing term frequency / inverse document frequency (TF / IDF), the method provided by the embodiment of the invention can truly reflect the contribution degree of the word.

Description

technical field [0001] The invention relates to the field of information identification, in particular to a method and system for determining the contribution of words in text. Background technique [0002] The development of the Internet and the advancement of information technology have brought about a sharp increase in the amount of information, making it difficult for people to find the information they really need in the vast amount of information. Although the emergence of various search engines has solved this problem to a certain extent, the search results returned by various search engines are generally very large, and it is not conducive to users to find the required information. One solution to this problem is automatic text classification. One of the biggest characteristics and difficulties of automatic text classification is the high dimensionality of feature space and the sparsity of document representation vectors. Finding an effective calculation method of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张宇峰于亮王海洲
Owner BEIJING KINGSOFT OFFICE SOFTWARE INC