Open Access
Subscription Access
Using Topic Identification in Chinese Information Retrieval
Abstract
Information retrieval is to identify documents, from text collections, which are relevant with respect to some query. In current information retrieval systems, users can query with an unordered set of keywords, a question or a sentence. A list of document links matching the query can be retrieved and ordered by relevancy between the query and the documents. In this article, we are concerned with a hypothesis that the discourse-level element, topic, could be used to contribute the calculations of information retrieval. Due to the phenomenon of zero anaphora frequently occurring in Chinese texts, the topics may be omitted and are not expressed on the surface text. The key elements of the centering model of local discourse coherence are employed to extract structures of discourse segments. We propose a topic identification method using the local discourse structure to recover the omissions of topics and identify the topics of documents in the text collection. Then the topic information is inserted into the text for creating better indices. The experiment results are demonstrated on a test collection which is taken from Chinese Information Retrieval Benchmark, version 3.0.
Keywords
Natural Language Processing; Shallow Parsing; Topic Identification; Information Retrieval
Citation Format:
Ching-Long Yeh, Yi-Chun Chen, "Using Topic Identification in Chinese Information Retrieval," Journal of Internet Technology, vol. 10, no. 2 , pp. 95-102, Apr. 2009.
Ching-Long Yeh, Yi-Chun Chen, "Using Topic Identification in Chinese Information Retrieval," Journal of Internet Technology, vol. 10, no. 2 , pp. 95-102, Apr. 2009.
Full Text:
PDFRefbacks
- There are currently no refbacks.
Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314 E-mail: jit.editorial@gmail.com