Efficient and accurate noisy word processing method for text based on knowledge graph
A technology of knowledge graph and processing method, which is applied in the field of efficient and accurate noise word processing based on knowledge graph, which can solve problems such as lack of ability to filter noise words from spoken language to text, inappropriate deletion or missing deletion of homonyms and synonyms, and affecting sentence meaning, etc. Achieve the effects of efficient and accurate filtering of noise words, overcoming interference, flexible expansion and real-time modification
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0026] Such as figure 1 As shown, the present invention provides a technical solution, an efficient and accurate noise word processing method for text based on a knowledge map, including the following steps:
[0027] S1. Build a word lexicon that needs to be filtered, build a business knowledge map and add various homophones of business-related words;
[0028] S2, adding a weight to each of the above words;
[0029] S3. Segment the text through a word segmentation tool;
[0030] S4. First correct the text homophones into business words through the business knowledge map and record all business words that appear in the text;
[0031] S5. The corrected text matches the filter words, but the recorded business words are not affected by the filter;
[0032] S6. Outputting the filtered text.
[0033] According to the above technical solution, the filter lexicon in S1 is connected to the network data, and the lexicon is classified, including political power, pornography, violence...
Embodiment 2
[0042] The invention provides a technical solution, an efficient and accurate noise word processing method for text based on knowledge graph:
[0043] S1. Build a thesaurus that needs to be filtered, build a business knowledge graph and add various homophones of business-related words. The best classification of the filtered thesaurus can improve reusability;
[0044] S2, adding a weight to each of the above words;
[0045] S3. Use the word segmentation tool to segment the text and adjust the word segmentation effect:
[0046] Word weight: ["collect" 200, "concentrate" 100]
[0047] Text: "Like collecting Chinese independent innovation products"
[0048] Segmentation results: "Like-collection-China-independent-innovation-of-product";
[0049] S4. First correct the text homophones to business words through the business knowledge map and record all business words that appear in the text:
[0050] Text: "Yeah, I like the warmth of freshman autumn in Harbin"
[0051] Result: ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
