Generation method and apparatus for medium-and-long phrase in domain lexicon

A technology of medium and long words and field words, which is applied in the field of natural language processing, can solve problems such as the inability to generate medium and long word phrases, and achieve the effect of improving quality

Inactive Publication Date: 2017-02-22
BEIJING GRIDSUM TECH CO LTD
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention proposes a method and device for generating medium and long word phrases in a domain dictionary, the main purpose of which is to solve the problem that medium and long word phrases cannot be generated in the process of constructing a domain dictionary in a traditional way

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generation method and apparatus for medium-and-long phrase in domain lexicon
  • Generation method and apparatus for medium-and-long phrase in domain lexicon
  • Generation method and apparatus for medium-and-long phrase in domain lexicon

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0022] In the process of constructing a domain dictionary, domain words are often phrases composed of multiple words, that is, medium and long word phrases, which have unique meanings, rather than a combination of single words in the usual meaning . In domain dictionaries, middle and long term phrases are fixed in structure and integral in meaning, and the order of words constituting the middle and long term phrases generally canno...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a generation method and apparatus for a medium-and-long phrase in a domain lexicon, relates to the field of natural language processing, and solves the problem that medium-and-long phrase cannot be generated in the process of constructing a domain lexicon in a conventional way. The method disclosed by the present invention comprises: acquiring a general corpus and a domain corpus; performing Chinese word segmentation on the two corpora, and combining word segmentation results to obtain medium-and-long phrase candidate character strings; collecting statistical data of each medium-and-long phrase candidate character string in the general corpus and in the domain corpus; and according to the statistical data of each medium-and-long phrase candidate character string, calculating a chi-square statistic of each medium-and-long phrase candidate character string to obtain a score thereof, and comparing the score with a set condition, and retaining a medium-and-long phrase candidate character string that meet the condition as a domain word of a domain lexicon. The method and apparatus disclosed by the present invention are mainly used in the process of generating the medium-and-long phrase in the domain lexicon.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and device for generating medium and long term phrases in a field dictionary. Background technique [0002] In the field of natural language processing, the construction of a domain dictionary is one of the most basic tasks. A high-quality domain dictionary is of great help to high-level natural language processing tasks such as information retrieval and text classification. In the process of building a domain dictionary, domain words are often phrases composed of multiple words, that is, medium and long word phrases, rather than words in the usual meaning. For example, the domain term "natural language processing" does not mean "nature", "language" and "processing" in the usual sense; "Chinese word segmentation" does not mean "Chinese" or "word segmentation" in the usual sense. [0003] In the process of using traditional methods to construct domain dictionar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 何鑫
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products