Multi-granularity word segmentation method and system based on sequence labeling modeling
A technology of sequence tagging and word segmentation method, applied in biological neural network models, special data processing applications, instruments, etc., can solve problems such as the difficulty of multi-granularity word segmentation, the lack of multi-granularity word segmentation data, and the complexity of the tagging process
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0057] In this embodiment, the multi-granularity word segmentation method based on sequence annotation modeling includes:
[0058] Select three single-granularity labeling data sets with different specifications, that is, CTB, PPD, and MSR three word segmentation specifications;
[0059] Sentences in a single-granularity labeling data set are converted into word segmentation sequences that comply with the other two word segmentation specifications, and the converted sentences correspond to three different word segmentation sequences;
[0060] The three word segmentation sequences corresponding to each sentence are converted into a multi-granularity word segmentation hierarchy, and each layer of the multi-granularity word segmentation hierarchy is a sentence, which cannot be further combined with words to form a coarser-grained word, word, word ;
[0061] Determine the multi-granularity label of each word in the multi-granularity word segmentation hierarchy according to the pr...
Embodiment 2
[0085] In this embodiment, the multi-granularity word segmentation method based on sequence annotation modeling includes:
[0086] Select three single-granularity labeling data sets with different specifications, that is, CTB, PPD, and MSR three word segmentation specifications;
[0087] The sentences in the two single-granularity annotation datasets are converted into word segmentation sequences that comply with the other two word segmentation specifications, and the converted sentences correspond to three different word segmentation sequences;
[0088] The three word segmentation sequences corresponding to each sentence are converted into a multi-granularity word segmentation hierarchy, and each layer of the multi-granularity word segmentation hierarchy is a sentence, which cannot be further combined with words to form a coarser-grained word, word, word ;
[0089] Determine the multi-granularity label of each word in the multi-granularity word segmentation hierarchy accordi...
Embodiment 3
[0117] The multi-granularity word segmentation method based on sequence labeling modeling in this embodiment is different from Embodiment 1 in that the acquisition of the multi-granularity word segmentation sequence is different, and the specific word segmentation sequence acquisition includes:
[0118] Select two single-granularity labeling datasets with different specifications, that is, PPD and CTB word segmentation specifications. In this implementation, only the specific conversion results of the sentence "this diving team was established in the mid-1980s" in the PPD into the data under the CTB specification are listed. In this embodiment, similar ones will also comply with the single granularity of the CTB specification. The sentence "re-employment population in the province has increased in recent years" in the labeled data set is transformed into a word segmentation sequence that complies with the PPD specification, that is, the converted sentences in the single-grained...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com