System and method for automatically splitting English generalized phrase

A technology of phrases and English, applied in the field of automatically splitting English compound phrases

Inactive Publication Date: 2009-11-11
杜小勇
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there is no better s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for automatically splitting English generalized phrase
  • System and method for automatically splitting English generalized phrase
  • System and method for automatically splitting English generalized phrase

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0034] Example 1: Assume that the input phrases we have include the following:

[0035] i. "Data Mining"

[0036] ii. "Data Warehousing"

[0037] iii. "Data Mining and Warehousing"

[0038] iv. "Data Mining and Data Warehousing"

[0039] v. "Data, Text and Web Mining"

[0040] vi. "Data and Web Management"

[0041] Such as figure 1 As shown, the system according to the present invention includes a phrase input module, a phrase classification module, a probability model module based on linguistic rules, a machine learning model module based on text classification, and an adaptive evolution module.

[0042] The phrase input module is used to input a large number of phrases separated from the text of a field at a time. That is, input several phrases in Example 1 into the system.

[0043] The phrase classification module is used to classify each input phrase. Specifically, according to whether each phrase contains "and", comma, semicolon, dash and other separated words or separators,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a system and a method for automatically splitting English generalized phrases, wherein the system comprises a phrase input module, a phrase classification module, a probability model module based on linguistics rules and a machine learning model module based on text classification; the phrase input module is used for inputting a large amount of phrases separated from the text of one field; the phrase classification module is used for carrying out classification to each input phrase and putting simple phrases into a simple phrase table; the probability model module based on linguistics rules is used for carrying out one-by-one splitting to each classified generalized phrases; and the machine learning model module based on text classification is used for carrying out further splitting to generalized phrases which can not be splitted accurately in the probability model module based on linguistics rules.

Description

Technical field [0001] The invention relates to the fields of natural language processing, ontology learning, and text mining, in particular to a system and method for automatically splitting English compound phrases. Background technique [0002] The ultimate goal of natural language processing is to enable computers to use natural language as correctly and effectively as humans. However, there is still a long way to go due to the following two major challenges: one is the ambiguity of natural language; the other is the need for a lot of background knowledge. Human language is quite vague at all levels. These levels include: lexical level, syntactic level, semantic level and various language-specific structures and grammars; in order for computers to correctly process natural language, we may need to provide numbers Millions of thesaurus, syntactic knowledge and more complex other information about language semantics, structure and idiomatic language. Even so, it is still diffic...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
Inventor 杜小勇刘红岩何军李直旭
Owner 杜小勇
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products