Open Access Open Access  Restricted Access Subscription Access

A GA-Based Document Clustering Method for Search Engines

Chun-Wei Tsai,
Ming-Chao Chiang,
Chu-Sing Yang,

Abstract


In this paper, we present a novel genetic algorithm, called Multiple Search Genetic Algorithm (MSGA), for clustering the web pages returned by a search engine and providing a taxonomy of those web pages to the user. MSGA uses two different kinds of chromosomes (conservative and explorer) to improve the search capability as well as enhance the clustering result. The conservative chromosomes keep the better solutions found at each generation while the explorer chromosomes are used to increase the search directions to avoid falling into local minima. The proposed method can find the optimal solutions quickly via a multiple search strategy. Our simulation result shows that the proposed algorithm outperforms other algorithms. We also present a clustering search engine system, called Document Clustering Search Engine (DCSE). It is the DCSE that takes the responsibility for spawning agents for collecting the web pages from the meta-search engine and computing the similarity between the web pages. The user of the system will receive information that has been computed and sorted and web links that are ranked according to their relevance. The end result is that the amount of time required to filter out irrelevant information is highly reduced.

Keywords


information retrieval; document clustering and search engine

Citation Format:
Chun-Wei Tsai, Ming-Chao Chiang, Chu-Sing Yang, "A GA-Based Document Clustering Method for Search Engines," Journal of Internet Technology, vol. 9, no. 4 , pp. 375-383, Oct. 2008.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com