Classified corpus establishing method and system and server provided with system

A construction method and corpus technology, applied in the field of natural language processing, can solve problems such as inability to classify, and achieve the effect of reducing human subjective influence, shortening time, and reducing the degree of manual participation.
CN106202380AActive Publication Date: 2016-12-07SHANGHAI ADVANCED RES INST CHINESE ACADEMY OF SCI

Patent Information

Authority / Receiving Office
CN Β· China
Current Assignee / Owner
SHANGHAI ADVANCED RES INST CHINESE ACADEMY OF SCI
Publication Date
2016-12-07

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention provides a classified corpus establishing method and system and a server provided with the system. The establishing method comprises the steps of acquiring target data to be classified and acquiring category description data according to actual needs, selecting a text similarity calculating method corresponding to maximum accuracy, classifying the target data to be classified as a category corresponding to maximum similarity, filling the target data with first classification matching degree within a first similarity range in a preset primary corpus, classifying the rest of the target data to be classified with a selected and well trained classifier, filling the target data with second classification matching degree within a second similarity range in the preset primary corpus, and determining the preset primary corpus as a final corpus when the filled preset primary corpus can not be enlarged any more. In this way, corpus establishment cost is reduced, manual intervene degree is reduced, and corpus establishment time is shortened.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of natural language processing, and relates to a construction method and system, in particular to a classification corpus construction method, system and a server with the system. Background technique

[0002] In recent years, network technology has developed rapidly, and Internet data has become the main source of information for people due to its advantages such as rapid update, wide range, and easy access. According to statistics, the vast majority of network data exists in the form of text. How to use natural language processing technology to classify these text information, so that users can find useful information more accurately and quickly, has become an important issue in the field of artificial intelligence. an important research question. Faced with this demand, a number of technologies with great practical value have been born, such as information retrieval, data mining, and public opinion monit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More