Unlock instant, AI-driven research and patent intelligence for your innovation.
Method for establishing and searching feature matrix of Web document based on semantics
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature matrix and document technology, applied in the field of information retrieval, can solve problems such as difficulty in improving semantic levels and loss of semantic information
Inactive Publication Date: 2008-08-27
EAST CHINA NORMAL UNIV
View PDF0 Cites 79 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
However, in the traditional LSA model, this relationship is not considered at the conceptual level, so it is difficult to improve at the semantic level, resulting in a large loss of semantic information
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0085] Embodiment 1. Establishment of a feature matrix of a semantically based Web document
[0086] Assume that there are five web documents from the Internet (first step), and their contents are:
[0091] Document 4: lifeline of public transportation
[0092] Document 5: Buses and subways are the main means of transportation
[0093] First, use the word segmentation tool to perform word frequency statistics on nouns, pronouns, place words, personal names, place names, institutions, and other proper names in each document (step two). Form a keyword-document term frequency matrix (Table 3 below, corresponding to the third step, the fourth step, and the fifth step).
[0094] Table 3. Keyword-document term frequency matrix and n i and idf i
[0095] Keyword\document (word frequency)
[...
Embodiment 2
[0100] Example 2.Semantics-based retrieval method for Web documents
[0101] Assume that the retrieval content is: public transportation; assuming that the retrieved data source is the five documents corresponding to the feature matrix established in 1;
[0102] Establish ontology: Assume that the established traffic ontology is as Figure 5 Shown (corresponding to the first of the preparation):
[0103] according to SN ( N 1 , N 2 ) = Depth ( com _ parent ( N 1 , N 2 ) ) Height ( root ...
Embodiment 3
[0129] Embodiment 3. Utilize the traditional LSA algorithm
[0130] Suppose there are five documents, and their contents are:
[0136] Document 5: Buses and subways are the main means of transportation
[0137] Suppose the search content is: public transportation
[0138] First, use the word segmentation tool to perform word frequency statistics on nouns, pronouns, local words, personal names, place names, institutions, and other proper names in each document. A keyword-document term frequency matrix is formed.
[0139] Table 5 keywords-document term frequency matrix and n i and idf i
[0140] Keyword\document (word frequency)
document 1
document 2
document 3
document 4
Document 5
n i
idf...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention relates to an establishing and retrieval method for a characteristic matrix of a semantically based Web document, belonging to the information retrieve technical field. During the process of establishing the characteristic matrix for the Web document, position information and particular expression form information are added into an index process of a prior LSA model by utilization of the particular position information and the particular expression form information in the Web document, thereby the prior LSA method is effectively improved. The retrieval process is as follows: firstly, semantic expansion of a concept in a query sentence is performed according to a body; secondly, a query vector is generated according to the query concept and an enlarged concept of the query concept, and the similarity of the query concept and the enlarged concept can be taken into consideration by a vector value, thereby semantic deletion of the prior LSA model is made up in a certain extent. The establishing and retrieval method for the characteristic matrix of tbe semantically based Web document has the advantages of scientific index and effective retrieve of unstructured document information, realization of retrieve of unstructured information in all locations at any moment, and assistance of convenient and in-time acquisition of required information of a user.
Description
technical field [0001] The invention relates to a method for establishing and retrieving a feature matrix of a semantic-based Web document, and belongs to the technical field of Information Retrieval. Background technique [0002] Since the development of database technology, the retrieval of formatted data has been relatively mature, and the document retrieval function based on the string matching function can already be realized. However, there is no effective retrieval method for a large number of unformatted documents (mainly referring to data in non-databases, such as Web documents). How to let users find the information they need in the most effective way and most accurately in the vast free text collection has become a hot spot in the field of Chinese retrieval. [0003] The development of Web search engine technology makes it possible to retrieve massive Web page information in the Internet. However, this kind of retrieval also has its own disadvantages: the basic ...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.