Method and system for searching target theme

A retrieval system and theme technology, applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve the problems of complex, incomplete, and accurate difference in thematic clustering methods, and achieve high precision, expanded content, and expanded scope Effect

Active Publication Date: 2016-06-08
新方正控股发展有限责任公司 +1
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For this reason, the technical problem to be solved by the present invention lies in the problems that the thematic clustering methods in the prio

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for searching target theme
  • Method and system for searching target theme
  • Method and system for searching target theme

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] A method for retrieving a target topic is provided in this embodiment, including the following steps:

[0032] S1: Determine the related words of the target topic. Expand according to the target subject words to obtain related words of the target subject words. The method here can use the method of expanding search keywords in the prior art to determine related words of the target topic. In this embodiment, a method of calculating related words of the target topic is provided, as follows:

[0033] First, search in the database according to the target subject to obtain all hit sentences.

[0034] Then, to obtain related sentences before and after the hit statement, the previous sentence and the next sentence can be obtained,

[0035] In other implementation manners, the first two sentences or the last two sentences may also be obtained.

[0036] Next, the hit sentence and the related sentence are word-segmented.

[0037] Finally, count the word frequency after all w...

Embodiment 2

[0049] This embodiment provides a search method for a subject term, according to which the subject term is used to obtain its relevant content, which can be used in scenarios such as clustering and classification. The specific process is as follows, as figure 2 Shown:

[0050] 1. Establish a corpus, including some full-text text content of the corpus.

[0051] 2. Use the subject headings to perform full-text searches in the corpus.

[0052] 3. Extract the sentence where the search result is located and the sentence before and after each sentence, a total of three sentences to form a screening sentence.

[0053] 4. Use the tokenizer to segment all the filtered sentences, sort them according to the word frequency from large to small, and take out the first N words as related words.

[0054] 5. Use these words to search respectively from the text to be searched, and obtain the set of search results R1.

[0055] 6. Segment the subject words with a tokenizer to obtain several w...

Embodiment 3

[0060] This embodiment only needs to solve the problem of topic content aggregation, that is, through a topic word, expand some related words, use these related words to search, and get the result R1; Take the intersection to get the result set R2, and then merge the two parts of the results R1 and R2 to generate a topic to solve the problem of topic aggregation. The specific process is as follows:

[0061] 1. Perform full-text retrieval from the corpus through the topic words specified by the user.

[0062] 2. For the hit sentence, use the method of drawing a window to take the hit sentence and one sentence before and after each, a total of three sentences.

[0063] 3. Segment the three sentences into words.

[0064] 4. All hit sentences are processed in the order of 2 and 3, and the word frequency after word segmentation is counted. After the statistics, they are sorted according to the word frequency. After sorting, the first few words are selected according to a certain ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for searching a target theme. The method includes the following steps: 1), calculating a relevant word of the target theme, and performing searching according to the relevant word to obtain a first search result; dividing the target theme, and performing searching by using words in a divided result to obtain a second search result; and fusing the first search result and the second search result to obtain a search result of the target theme. According to the method, searching is performed through both the relevant word of the target theme and the dividing result, the content of the target theme can be expanded, the relevant search content of the target theme can be obtained completely and accurately, and automatic search can be achieved, time and efforts can be saved, and the accuracy is high; the method can adapt personal demands of users through subsequent processing, and meet the demands for theme clustering and aggregation, and solve the problem that the conventional theme search clustering method is complicated, is poor in accuracy, and is not complete in the prior art.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a retrieval method and system for a target subject. Background technique [0002] With the development of network technology and information technology, a large amount of resources and information have been produced, and the acquisition and reading of a single piece of information can no longer meet the needs of users. Because the topic is a good representation of a certain type of information, it has become a topic that users pay attention to. hotspots. [0003] Generally speaking, a topic is an aggregate composed of several common contents. This commonality means that each article has at least one aspect in common among themes, themes, genres, and techniques of expression. This commonality shows that the content of these articles belongs to the same category. Therefore, in the form of special topics, the cause, progress, trend and degree of influence of a certain event ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
Inventor 王海涛耿蕾蕾许燕张显刚
Owner 新方正控股发展有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products