Unlock instant, AI-driven research and patent intelligence for your innovation.

Thesaurus generation method and device

A technology for generating devices and word segmentation methods, applied in natural language data processing, special data processing applications, instruments, etc., can solve problems such as increasing the cost of artificial intelligence technology, high costs, and limited vocabulary coverage

Inactive Publication Date: 2019-02-26
CHINA TELECOM CORP LTD
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Users cannot use or need to pay high fees to use the latest lexicon, which greatly increases the cost of artificial intelligence technology development
However, the currently available open source thesaurus versions are relatively old, and the coverage of vocabulary is limited. Therefore, it is necessary to form a solution to obtain the latest thesaurus without input methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Thesaurus generation method and device
  • Thesaurus generation method and device
  • Thesaurus generation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The technical solutions of the present application will be described in further detail below with reference to the drawings and embodiments.

[0034] The flowchart of an embodiment of the thesaurus generation method of the present application is as figure 1 shown.

[0035] In step 101, the basic vocabulary is determined according to the open-source thesaurus. In one embodiment, the open source thesaurus includes a certain amount of vocabulary, which may lack popular words or have a small vocabulary.

[0036] In step 102, according to the basic vocabulary database, the word segmentation method based on word frequency statistics processes part of the corpus texts to obtain updated vocabulary. In one embodiment, the corpus text may be web page text obtained through web crawling. In one embodiment, a word segmentation method can be used to process a single article or a certain length and a certain number of webpage texts, and the maximum forward or maximum reverse matchi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

the application brings forward a Thesaurus generation method and device, relaating to that technical field of artificial intelligence. The invention discloses a method for generating a vocabulary, comprising the following steps: i) determining a basic vocabulary according to an open source vocabulary; (ii) according to that basic vocabulary database, a word segmentation method based on word frequency statistic processes part of the corpus text to obtain updated vocabulary; (iii) updating that basic vocabulary with the updated vocabulary use a predetermined strategy; Performing steps ii), iii)completing in a loop until the processing of all the corpus texts is completed; Creating a thesaurus based on the updated basic vocabulary. In this way, based on an existing open source thesaurus, Byupdating the thesaurus by processing the corpus text, and then using the updated thesaurus to process the corpus text, the thesaurus can be updated based on the timeliness of the corpus text, so thatthe thesaurus can be continuously updated and enriched, and the latest thesaurus can be obtained without relying on the input method.

Description

technical field [0001] The present application relates to the technical field of artificial intelligence, in particular to a method and device for generating a thesaurus. Background technique [0002] In the development of artificial intelligence technology, thesaurus is an important basis for determining the level of artificial intelligence, and it is also one of the core competitiveness of major companies. The traditional method of building a thesaurus generally relies on the input method, and updates the data by counting the content input by the user. However, enterprises with input methods generally do not open source their latest thesaurus. Users cannot use or need to pay high fees to use the latest thesaurus, which greatly increases the cost of artificial intelligence technology development. However, the currently available open source thesaurus versions are relatively old, and the coverage of vocabulary is limited. Therefore, it is necessary to form a solution to ob...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/9535G06F17/27
CPCG06F40/289
Inventor 路绪海杨迪马怡安龚靖任华王铮黄挺
Owner CHINA TELECOM CORP LTD