Unlock instant, AI-driven research and patent intelligence for your innovation.

Word association method and device

A technology of words and phrases, applied in the field of word association methods and devices, can solve problems such as inability to extract

Inactive Publication Date: 2016-06-15
IFLYTEK CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of the defects in the above-mentioned existing technology, the purpose of the present invention is to solve the problem that the existing technology cannot extract the most relevant words from the given document collection, therefore, the embodiment of the present invention provides a word Associative method, the technical scheme is as follows:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word association method and device
  • Word association method and device
  • Word association method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] see figure 1 , the embodiment of the present invention provides a word association method, including:

[0056] Step 110: Obtain a document collection, where the document collection contains at least one document.

[0057] Step 120: Segment the sentences in the document to obtain at least one word information.

[0058] Loop through all the documents in the above document collection, and perform word segmentation processing for each document. If the document collection contains the document "Thank you for calling", then perform word segmentation processing on the document. The three word information obtained after the word segmentation processing are respectively "Thank you" for "your" "call". A tokenizer can be used to process the document, and the tokenizer can use Paodingjieniu, imdict, mmseg4j, and IK tokenizers. Preferably, the embodiment of the present invention uses an IK tokenizer.

[0059] Step 130: Analyze each word information, obtain the analysis informati...

Embodiment 2

[0070] see figure 2 , the embodiment of the present invention provides a word association method, including:

[0071] Step 210: Obtain a document collection, where the document collection contains at least one document.

[0072] Step 220: Segment the sentences in the document to obtain at least one word information.

[0073] Loop through all the documents in the above document collection, and perform word segmentation processing for each document. If the document collection contains the document "Thank you for calling", then perform word segmentation processing on the document. The three word information obtained after the word segmentation processing are respectively "Thank you" for "your" "call". A tokenizer can be used to process the document, and the tokenizer can use Paodingjieniu, imdict, mmseg4j, and IK tokenizers. Preferably, the embodiment of the present invention uses an IK tokenizer.

[0074] Step 230: Analyze each word information to obtain four analysis infor...

Embodiment 3

[0092] see image 3 , in all the above embodiments, may also include the following steps:

[0093] Step 310: Obtain a list of stop words.

[0094] Get a list of stop words for relevant industries. Stop words mean words that have nothing to do with business, such as China Mobile's stop words may have: all single words, hello, hello, otherwise, haha, etc.

[0095] Step 320 : Compare the obtained word information with the stop words in the stop word list one by one, and filter out words from the word information that are the same as the stop words in the stop word list.

[0096] Step 320: Delete the filtered words.

[0097] The method provided by the embodiment of the present invention mainly removes some irrelevant words to reduce the scale of data processing. Different industries have different corresponding business target lists, which are closely related to the specific data to be analyzed. Each word in each document is processed. If some useless words are removed, the da...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a word association method and device, relating to the field of information processing. The method comprises following steps: acquiring document collections comprising at least one document; performing word segmentation to sentences in documents in order to obtain at least one word information; analyzing each word information in order to obtain analyzing information of word information and saving word information and analyzing information; selecting target words out of saved word information and calculating target words TF-IDF; calculating TF-IDF of other words apart from the target words; circulating all other words apart from target word and calculating relevancy of all other words and target words; and utilizing words ranking the top N as ones associated with the target words based on relevancy. The word association method and device have following beneficial effects: based on TF-IDF word association and analyzing methods, words most associated with the target words from specified document collections are dug out.

Description

technical field [0001] The invention relates to the field of information processing, in particular to a word association method and device. Background technique [0002] Through word association, we can mine and discover the relevance of different words in the text, thus deriving various applications. Therefore, in text analysis, given a collection of documents, it is valuable to mine the words most related to the target word. [0003] For example, in the business field of China Mobile, associating the word "traffic" can provide reference value for the mobile to provide new services. For example, the most traffic handled by users is "30M". Therefore, when the word "traffic" is associated , can recommend the service of "traffic 30M" to the user. For another example, in the field of e-commerce, when buying "milk", many people will buy "bread" at the same time, so associating the word "milk" can recommend other products to users, such as "bread". [0004] However, existing t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 易中华徐波汪磊
Owner IFLYTEK CO LTD