Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for extracting phrases in corpus text, storage medium and electronic equipment

A text and corpus technology, applied in the field of big data, can solve the problems of small number of extracted phrases, low reliability of phrase extraction, difficult phrase extraction, etc.

Active Publication Date: 2020-10-16
CHINA PING AN LIFE INSURANCE CO LTD
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The defect in the existing technology is that it is more dependent on the distribution of the current corpus, some phrases with larger granularity are difficult to extract, and the number of extracted phrases is also small
Therefore, there is a problem of low reliability of phrase extraction in the corpus text in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting phrases in corpus text, storage medium and electronic equipment
  • Method and device for extracting phrases in corpus text, storage medium and electronic equipment
  • Method and device for extracting phrases in corpus text, storage medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concepts of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present application. However, those skilled in the art will appreciate that the technical solutions of the present application can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other instances, well-known technical solutions have not b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and device for extracting phrases in a corpus text, a storage medium and electronic equipment, which belong to the technical field of big data. The method comprises the steps of performing word segmentation on the corpus text to obtain a plurality of words forming the corpus text; performing part-of-speech tagging on the words to obtain part-of-speech tags of thewords; utilizing the part-of-speech tags to determine a word combination meeting a preset part-of-speech dependency rule in the plurality of words; inputting the word combination into a pre-trained language model to obtain a word forming probability corresponding to the word combination; and determining the word combination corresponding to the word formation probability greater than a predetermined threshold as the extracted first phrase. The preset part-of-speech dependency rule can be obtained from a rule sharing block chain. According to the invention, the phrase extraction reliability inthe corpus text is effectively improved.

Description

technical field [0001] The present application relates to the technical field of big data, and in particular, relates to a phrase extraction method, device, storage medium and electronic equipment in a corpus text. Background technique [0002] Generally, in many fields, it is necessary to extract a large number of phrases with larger granularity, that is, to extract words larger than the unit word length. At present, most of the phrases are extracted by means of statistics, for example, the phrases to be extracted are determined by counting the occurrence frequency of a certain word. The defect in the prior art is that it is more dependent on the distribution of the current corpus, some phrases with larger granularity are difficult to extract, and the number of phrases to be extracted is also small. Therefore, there is a problem in the prior art that the reliability of phrase extraction in the corpus text is low. [0003] It should be noted that the information disclosed ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/211G06K9/62
CPCG06F40/289G06F40/211G06F18/24G06F18/214
Inventor 何斐斐刘志慧金培根陆林炳李炫林加新
Owner CHINA PING AN LIFE INSURANCE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products