Open Access Open Access  Restricted Access Subscription Access

Extracting Information from the Web

J.L. Arjona,
R. Corchuelo,
D. Ruiz,
M. Toro,

Abstract


Extracting structured, semantically-meaningful information from the web is quite a difficult task from a programmatically point of view. The main reason is that lost documents are available in human-readable forms, but they lack a description of the structure or the semantics associated with the data they contain; furthermore, their appearance may change unexpectedly, which complicates the problem. In this arti-cle, we present a framework that relieves web agent developers from task of writing specific code to have access to the information of writing specific code to have access to the information they need from the web. This proposal achieves a complete separation between the logic an agent encapsulates and the way the information it needs is extracted, which enhances modularity, adaptability, and maintainability. It also allows to define the navigation path to the page that contains the information in which we are interested, and allows for unexpected changes to the information sources. Our approach is novel in that it combines different technologies to extract information from the web and associates semantics with it, which facilitates semantic interoperability in a multi-agent society.

Keywords


Web agents; information extraction; wrappers; ontologies.

Citation Format:
J.L. Arjona, R. Corchuelo, D. Ruiz, M. Toro, "Extracting Information from the Web," Journal of Internet Technology, vol. 3, no. 4 , pp. 267-274, Oct. 2002.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.





Published by Executive Committee, Taiwan Academic Network, Ministry of Education, Taipei, Taiwan, R.O.C
JIT Editorial Office, Office of Library and Information Services, National Dong Hwa University
No. 1, Sec. 2, Da Hsueh Rd., Shoufeng, Hualien 974301, Taiwan, R.O.C.
Tel: +886-3-931-7314  E-mail: jit.editorial@gmail.com