Open Access Open Access  Restricted Access Subscription Access

Most Possible Partition: Utilizing Semantic Links for Duplicate Detection

Hai Jin,
Li Huang,
Ping-Peng Yuan,

Abstract


Duplicate detection is a hotspot in the study of heterogeneous data integration and information retrieval. The efficiency and precision of detection are the goals of this study. In this paper, we introduce a duplicate detecting method based on semantic links among data, and propose a novel approach, named Most Possible Partition (MPP) to help detect duplicates efficiently. The main principle of MPP is to partition those data into most-possible-duplicate parts, in which there is a higher probability of duplicates. MPP does not sort data into certain order as classical Sorted Neighborhood Method (SNM) did. We give an effective partition method using semantic links among entities. Experiments on publication datasets show that the proposed method is efficient, and performance and accuracy of MPP are better than those of SNM.

Keywords


Semantic links; Partition; Duplicate detection

Citation Format:
Hai Jin, Li Huang, Ping-Peng Yuan, "Most Possible Partition: Utilizing Semantic Links for Duplicate Detection," Journal of Internet Technology, vol. 11, no. 3 , pp. 333-342, May. 2010.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com