Method for establishing and searching feature matrix of Web document based on semantics
A feature matrix and document technology, applied in the field of information retrieval, can solve problems such as difficulty in improving semantic levels and loss of semantic information
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0085] Embodiment 1. Establishment of a feature matrix of a semantically based Web document
[0086] Assume that there are five web documents from the Internet (first step), and their contents are:
[0087] Document 1: Public transit
[0088] Train, plane, car, bus, subway
[0089] Document 2: Traffic Jam
[0090] Document 3: Transportation Industry
[0091] Document 4: lifeline of public transportation
[0092] Document 5: Buses and subways are the main means of transportation
[0093] First, use the word segmentation tool to perform word frequency statistics on nouns, pronouns, place words, personal names, place names, institutions, and other proper names in each document (step two). Form a keyword-document term frequency matrix (Table 3 below, corresponding to the third step, the fourth step, and the fifth step).
[0094] Table 3. Keyword-document term frequency matrix and n i and idf i
[0095] Keyword\document (word frequency)
[...
Embodiment 2
[0100] Example 2.Semantics-based retrieval method for Web documents
[0101] Assume that the retrieval content is: public transportation; assuming that the retrieved data source is the five documents corresponding to the feature matrix established in 1;
[0102] Establish ontology: Assume that the established traffic ontology is as Figure 5 Shown (corresponding to the first of the preparation):
[0103] according to SN ( N 1 , N 2 ) = Depth ( com _ parent ( N 1 , N 2 ) ) Height ( root ...
Embodiment 3
[0129] Embodiment 3. Utilize the traditional LSA algorithm
[0130] Suppose there are five documents, and their contents are:
[0131] Document 1: Public Transportation
[0132] train, plane, car, bus, subway
[0133] Document 2: Traffic Jam
[0134] Document 3: Transportation Industry
[0135] Document 4: The lifeblood of public transport
[0136] Document 5: Buses and subways are the main means of transportation
[0137] Suppose the search content is: public transportation
[0138] First, use the word segmentation tool to perform word frequency statistics on nouns, pronouns, local words, personal names, place names, institutions, and other proper names in each document. A keyword-document term frequency matrix is formed.
[0139] Table 5 keywords-document term frequency matrix and n i and idf i
[0140] Keyword\document (word frequency)
document 1
document 2
document 3
document 4
Document 5
n i
idf...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com