Thesaurus generation method and device
A technology for generating devices and word segmentation methods, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as increasing the cost of artificial intelligence technology, high costs, and limited vocabulary coverage
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0033] The technical solutions of the present application will be described in further detail below with reference to the drawings and embodiments.
[0034] The flowchart of an embodiment of the thesaurus generation method of the present application is as figure 1 shown.
[0035] In step 101, the basic vocabulary is determined according to the open-source thesaurus. In one embodiment, the open source thesaurus includes a certain amount of vocabulary, which may lack popular words or have a small vocabulary.
[0036] In step 102, according to the basic vocabulary database, the word segmentation method based on word frequency statistics processes part of the corpus texts to obtain updated vocabulary. In one embodiment, the corpus text may be web page text obtained through web crawling. In one embodiment, a word segmentation method can be used to process a single article or a certain length and a certain number of webpage texts, and the maximum forward or maximum reverse matchi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


