Natural language-based topic and keyword extraction method and system
A natural language and extraction method technology, applied in the field of subject and keyword extraction, can solve the problems of difficulty in guaranteeing and evaluating the quality of results, consumption, efficiency, and high quality
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0045] Such as figure 1 As shown, a method for extracting topics and keywords based on natural language includes:
[0046] Divide the continuous text into individual words and mark the part of speech;
[0047] Extract the subject and predicate from each word-cut sentence;
[0048] Cluster all subject-predicate dyads to compute the main topic clusters and associated keyword clusters across all corpora.
[0049] After adopting the above scheme, the present invention obtains a theme-keyword set based on subject-predicate binary group clustering, and then describes the public opinion dimension of a specific field, which constitutes a good basis for further quantitative analysis of public opinion.
Embodiment 2
[0051] Embodiment 1 is described in detail, wherein, preferably, the continuous text is segmented into individual words, and the part of speech is marked, including:
[0052] Obtain the input Chinese and English text, and perform word segmentation and part-of-speech tagging on the input Chinese-English text; where the output results are separated by spaces, and the part of speech of each word is marked by the agreed symbol.
[0053] Preferably, the subject and the predicate are extracted from each word-cut sentence, including:
[0054] Extract the subject and predicate from the input sentence sequence, and output the keywords of the subject phrase in each sentence: subject, and the keywords of the predicate phrase: predicate, as well as the formed subject and predicate dyads.
[0055] Preferably, if there is a lack of pronouns and a lack of subjects, appropriate subjects are added according to the context.
[0056] Preferably, all subject-predicate binary groups are clustered...
Embodiment 3
[0063] Such as figure 2 As shown, corresponding to the above method embodiments, the present invention discloses a system for extracting topics and keywords based on natural language, including: a natural language preprocessing subsystem, a subject-predicate extraction subsystem, and a clustering subsystem, wherein ,
[0064] The natural language preprocessing subsystem is used to segment the continuous text into individual words and mark the part of speech;
[0065] The subject-predicate extracting subsystem is used to extract the subject and predicate from each word-cut sentence;
[0066] The clustering subsystem is used to cluster all subject-predicate pairs, and calculate main topic clusters and related keyword clusters in all corpus.
[0067] Preferably, the natural language preprocessing subsystem cuts the continuous text into individual words and marks the part of speech. The specific methods include:
[0068] Obtain the input Chinese and English text, and perform w...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

