Open Access Open Access  Restricted Access Subscription Access

Efficiently Comparing Provenance for Knowledge Discovery

Jiuyang Tang,
Xiang Zhao,
Bin Ge,
Weidong Xiao,
Haichuan Shang,


Provenance is a record that describes entities and processes involved in producing, delivering and influencing a resource. Provenance management and reuse can enable interesting applications for knowledge discovery and analytics. One crucial component of a provenance management system is the comparison between provenances. In the era of big data, provenance management systems are in need of a scalable algorithmic solution for efficient comparison. Existing solutions to the problem have large memory footprint and require overlong system response time. In this paper, we present a new solution to threshold-based provenance comparison.It models provenance directly as graphs, and proposes to measure their similarities using provenance edit distance. We first provide analytic results regarding the expected search space of the existing and the proposed solution. On top of the depth-first search paradigm, we design an algorithm PEDSim using an encoding technique specific to provenance graphs and quantifiable heuristics. Extensive experiments on real data demonstrate the superiority of our method to other alternatives.


Provenance; Similarity comparison; Edit distance; Depth-first search

Citation Format:
Jiuyang Tang, Xiang Zhao, Bin Ge, Weidong Xiao, Haichuan Shang, "Efficiently Comparing Provenance for Knowledge Discovery," Journal of Internet Technology, vol. 15, no. 6 , pp. 963-974, Nov. 2014.

Full Text:



  • There are currently no refbacks.

Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: