Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Term acquisition method and device

A word and word segmentation technology, applied in the computer field, can solve the problem of low extraction accuracy and achieve the effect of improving the extraction accuracy

Inactive Publication Date: 2017-06-13
BEIJING GRIDSUM TECH CO LTD
View PDF1 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the corpus data that can be matched by this parallel structure method only matches the structure of the template in form, but not the relationship between the whole and the part in the actual content, so the extraction accuracy of this method is relatively low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Term acquisition method and device
  • Term acquisition method and device
  • Term acquisition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0023] The embodiment of the present invention provides a method for acquiring words, such as figure 1 As shown, this method is used to obtain words with a whole-part relationship in the text corpus, and the specific steps include:

[0024] 101. Preprocessing the acquired text data to obtain an independent sentence with participle information.

[0025] In the embodiment of the present invention, the acquired text data refers to the corpus data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a term acquisition method and device, relates to the field of computer technology and mainly aims to improve the extraction accuracy of the whole and part relation between corpus terms through term annotation domain information. According to the main technical scheme, acquired text data is preprocessed to obtain independent sentences with segmented word information; a structure template is utilized to screen out candidate sentences with coordination structures among the independent sentences; a domain dictionary and the segmented word information in the candidate sentences are utilized to determine domain segmented words with coordination structures in the candidate sentences, wherein the domain dictionary records segmented words in the same domain; and a domain segmented word set with the whole and part relation is output according to positional characteristics of the domain segmented words. The term acquisition method and device are mainly used for acquiring the terms with the whole and part relation in texts.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for acquiring words. Background technique [0002] With the development of network technology, the scale of data information is getting larger and larger, and more effective text classification technology is needed to obtain useful data information. However, some existing mature text classification techniques are relatively ideal for English texts, but not ideal for Chinese texts. The reason is that the role of semantic factors in Chinese texts cannot be ignored. There are two types of the most basic semantic relations: 1. The relationship between superordinate concepts and subordinate concepts. The appearance of subordinate concepts is only to limit the extension of superordinate concepts; A statement of one basic lexical unit to another basic lexical unit. The grammatical forms are mostly produced to express these relations. [0003] Among the relation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/284
Inventor 钦滨杰陈晓敏
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products