Paragraph segmentation method and paragraph segmentation device
A paragraph and document segmentation technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as difficult paragraph segmentation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] The first embodiment is an embodiment of a paragraph segmentation method, device and program using document vectors in similarity calculation and word vectors in similar document retrieval. In this embodiment, a document vector is a vector having all the documents included in the corpus unit of the segmentation device as dimensions.
[0042] Before describing this embodiment in detail, an example of document vectors and word vectors will be described.
[0043] Figure 6 represents an example of a document vector. exist Figure 6 In the example, the total number of documents included in the corpus is set to ten. And, when the documents obtained as the retrieval result are 1, 3, 4, and 8, the Figure 6 Document vectors are represented as document vectors 601 shown in (a) in (a). Similarly, when a search score is obtained as a search result, it can be expressed as the Figure 6 The document vector 602 shown in (b) in .
[0044] Figure 7 represents an example of a ...
Embodiment 2
[0088] Embodiment 2 is an embodiment of a paragraph segmentation method, device, and program that use word vectors in similarity calculations and also use word vectors in similar document retrieval.
[0089] Figure 4 It is a functional block diagram of the paragraph dividing device of the second embodiment. The hardware structure of the paragraph segmentation device of this figure is also the same as that of embodiment 1 Figure 1A The same can of course be done by Figure 1B The illustrated computer is implemented, and the illustration of the hardware structure is omitted here.
[0090] The input unit 402, the sentence segmentation unit 403, the paragraph update unit 408, the output unit 409, the sentence storage unit 410, the feature storage unit 412, the paragraph storage unit 413, and the morpheme analysis unit 414 are the same as the corresponding modules of Embodiment 1, so only the description The corpus unit 411 , the feature quantity calculation unit 404 , the simi...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com
