Open Access Open Access  Restricted Access Subscription Access

A Loosely Coupled Interactive Web Data Extraction System

Jui-Yuan Su,
Lung-Pin Chen,
I-Chen Wu,

Abstract


As the rapid growing of Internet, the Web data extraction (DE) system has become a convenient tool for application programs to collect useful data. A DE system takes a wrapper as its input, which is a script describes how to navigate Web pages and extract the data. The integration of application program, wrappers, and DE system is a nontrivial task due to the complete coordination and interaction among application programs and DE systems. Furthermore, the asynchronous update technologies used in many Web pages, such as AJAX, make the integration more complex.
This paper proposes a loosely coupled interactive DE system based on Browser-Oriented Data Extraction (BODE) systems and the Web Service Resource Framework (WSRF), a web service specification which standardizes the accessing and manipulation of states for web services. The loosely coupled interactive DE system provides users with the states of wrappers through the web service states, called WS-Resources. The interactive DE system can also accept parameters from the application programs during extraction through the WSRF's notification design pattern. By providing the above interactive capabilities, the application programs and wrappers of the DE systems can be easily shared and controlled.

Keywords


Data xtraction; WS-Resource; Notification; Wrapper; Browser-Oriented Data Extraction and WSRF

Citation Format:
Jui-Yuan Su, Lung-Pin Chen, I-Chen Wu, "A Loosely Coupled Interactive Web Data Extraction System," Journal of Internet Technology, vol. 11, no. 2 , pp. 237-249, Mar. 2010.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com