ISSN:
1070-5325
Keywords:
information
;
latent semantic indexing
;
low-rank
;
orthogonal
;
matrices
;
metrieval
;
singular value decomposition
;
sparse
;
ULV and URV decompositions
;
updating
;
Engineering
;
Engineering General
Source:
Wiley InterScience Backfile Collection 1832-2000
Topics:
Mathematics
Notes:
Current methods to index and retrieve documents from databases usually depend on a lexical match between query terms and keywords extracted from documents in a database. These methods can produce incomplete or irrelevant results due to the use of synonyms and polysemus words. The association of terms with documents (or implicit semantic structure) can be derived using large sparse {\it term-by-document} matrices. In fact, both terms and documents can be matched with user queries using representations in k-space (where 100 ≤ k ≤ 200) derived from k of the largest approximate singular vectors of these term-by-document matrices. This completely automated approach called latent semantic indexing or LSI, uses subspaces spanned by the approximate singular vectors to encode important associative relationships between terms and documents in k-space. Using LSI, two or more documents may be closeto each other in k-space (and hence meaning) yet share no common terms. The focus of this work is to demonstrate the computational advantages of exploiting low-rank orthogonal decompositions such as the ULV (or URV) as opposed to the truncated singular value decomposition (SVD) for the construction of initial and updated rank-k subspaces arising from LSI applications.
Additional Material:
11 Ill.
Type of Medium:
Electronic Resource