Word association method and device
A technology of words and phrases, applied in the field of word association methods and devices, can solve problems such as inability to extract
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0055] see figure 1 , the embodiment of the present invention provides a word association method, including:
[0056] Step 110: Obtain a document collection, where the document collection contains at least one document.
[0057] Step 120: Segment the sentences in the document to obtain at least one word information.
[0058] Loop through all the documents in the above document collection, and perform word segmentation processing for each document. If the document collection contains the document "Thank you for calling", then perform word segmentation processing on the document. The three word information obtained after the word segmentation processing are respectively "Thank you" for "your" "call". A tokenizer can be used to process the document, and the tokenizer can use Paodingjieniu, imdict, mmseg4j, and IK tokenizers. Preferably, the embodiment of the present invention uses an IK tokenizer.
[0059] Step 130: Analyze each word information, obtain the analysis informati...
Embodiment 2
[0070] see figure 2 , the embodiment of the present invention provides a word association method, including:
[0071] Step 210: Obtain a document collection, where the document collection contains at least one document.
[0072] Step 220: Segment the sentences in the document to obtain at least one word information.
[0073] Loop through all the documents in the above document collection, and perform word segmentation processing for each document. If the document collection contains the document "Thank you for calling", then perform word segmentation processing on the document. The three word information obtained after the word segmentation processing are respectively "Thank you" for "your" "call". A tokenizer can be used to process the document, and the tokenizer can use Paodingjieniu, imdict, mmseg4j, and IK tokenizers. Preferably, the embodiment of the present invention uses an IK tokenizer.
[0074] Step 230: Analyze each word information to obtain four analysis infor...
Embodiment 3
[0092] see image 3 , in all the above embodiments, may also include the following steps:
[0093] Step 310: Obtain a list of stop words.
[0094] Get a list of stop words for relevant industries. Stop words mean words that have nothing to do with business, such as China Mobile's stop words may have: all single words, hello, hello, otherwise, haha, etc.
[0095] Step 320 : Compare the obtained word information with the stop words in the stop word list one by one, and filter out words from the word information that are the same as the stop words in the stop word list.
[0096] Step 320: Delete the filtered words.
[0097] The method provided by the embodiment of the present invention mainly removes some irrelevant words to reduce the scale of data processing. Different industries have different corresponding business target lists, which are closely related to the specific data to be analyzed. Each word in each document is processed. If some useless words are removed, the da...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 