Saturday, April 19, 2014
Text Size
Semantic Analysis
These research papers cover a variety of semantic analysis approaches including LSA.

DocumentsDate added

Order by : Name | Date | Hits [ Ascendant ]
Jozef Stefan Institute - From the automated text processing point of view, naturallanguage is very redundant in the sense that many differentwords share a common or similar meaning. For computerthis can be hard to understand without some backgroundknowledge. Latent Semantic Indexing (LSI) is a techniquethat helps in extracting some of this background knowledgefrom corpus of text documents. This can be also viewed asextraction of hidden semantic concepts from textdocuments. On the other hand visualization can be veryhelpful in data analysis, for instance, for finding maintopics that appear in larger sets of documents. Extraction ofmain concepts from documents using techniques such asLSI, can make the results of visualizations more useful.
Berkeley - Probabilistic Latent Semantic Indexing is a novel approachto automated document indexing which is based on a statisticallatent class model for factor analysis of count data.Fitted from a training corpus of text documents by a generalizationof the Expectation Maximization algorithm, theutilized model is able to deal with domain{speci c synonymyas well as with polysemous words. In contrast to standardLatent Semantic Indexing (LSI) by Singular Value Decomposition,the probabilistic variant has a solid statistical foundationand de nes a proper generative data model. Retrievalexperiments on a number of test collections indicate substantialperformance gains over direct term matching methodsas well as over LSI. In particular, the combination ofmodels with di erent dimensionalities has proven to be advantageous.
Università degli Studi di Milano - Minimal-interval semantics [5] associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimal-interval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents. In this paper we provide algorithmsfor computing conjunction and disjunction that are linear in the number of intervals andlogarithmic in the number of operands; for additional operators, such as ordered conjunction and Brouwerian difference, we provide linear algorithms.
Applied Semantics (later acquired by Google) - Human beings today are inundated with massive amounts of information. While the availability of such a wealth of information provides an unprecedented opportunity for access to knowledge and cross-fertilization of ideas, it also introduces the problem of how to organize the information in such a way that the information is easily digestible and accessible. Great efforts are being made to facilitate the process of helping users to find information that is relevant to their goals, by improving search technologies and automating document classification.
Various - A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 itemvectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
University of Colorado - Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSA’s reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabularyand subject matter tests; it mimics human word sorting and category judgments; it simulates word–word and passage–word lexical priming data; and, as reported in 3 following articles in this issue, it accurately estimates passage coherence, learnability of passages by individual students, and the quality and quantity of knowledge contained in an essay.
Please update your Flash Player to view content.
Restore Default Settings