摘 要
网络信息集成是伴随互联网日益普及而兴起的研究领域。网络信息集成的过程是根据领域本体的内容,从互联网上采集信息,并将信息集成到领域本体中。网络信息集成的实质意义是为网络信息提供一种重新组织和理解的机制。本论文主要基于本人独立研发的Web Captor系统,描述了网络信息集成中各个流程的实现方法。首先是网络信息集成建模,文中使用一种基于框架的方式描述了论文研究对象——简化商业领域本体的规格,并指出信息源与本体的对应关系。其次是查询规划和数据集成,这是全文重点。文中使用传统的智能规划描述方法表示了网络信息集成中的查询规划问题,并指出其与传统的规划问题的关键区别在于:规划生成的核心是操作算子的前提可满足性的判定,而不是操作算子间冲突的消解。本人提出一种根据知识推理规则构造兼容检测表的新方法,解决了前提可满足性的判定问题。为了使规划执行能够适应实时网络环境的影响,本人提出了一种交替产生和执行访问算子的启发式图搜索算法。另外,本人还对数据集成问题提出了判定同一信息对象以及修复不完整或冲突数据的方法。再次是信息源访问实现和信息抽取实现。对于信息源访问实现,本文讨论了信息源描述到访问接口之间的转换,并阐述了建立实体名向量空间模型,实现文本归类到实体的方法;对于信息抽取,本人不仅拟定了一种基于HTML树的抽取规则规范,还提出了一种利用启发信息自动获取列表对象抽取规则的新方法。最后,本文对现有Web Captor系统给出了结构、界面、运行环境等方面的描述和运行例子,并提出了未来的研究方向。 英语毕业论文----德维在线 www.devay.net
关键字:本体、查询规划、数据集成、信息抽取
ABSTRACT
Internet Information Integration is a brand new research domain raised by the popularization of the Internet. The process of Internet Information Integration is to gather information from the Internet according to the spec of domain ontology, and integrate information into the domain ontology. The essence of Internet Information Integration is to provide a new mechanism for reorganizing and realizing the Internet information. Basing on my self-made Internet Information Integration System -- Web Captor, this thesis describes implementing methods in each step of Internet Information Integration. Firstly, I model the problem of Internet Information Integration. I use a frame-based model to describe the spec of the research object -- simplified business domain ontology, and explain the relationship between the information source and the ontology. Secondly, I discuss the problem of query planning and data integration -- the focuses in this thesis. I use traditional planning description to model the planning problem in answering queries to Internet information. Then I point out that the essential difference between traditional planning and query planning is that the kernel problem during plan creation is to determine the satisfaction of an operator's premises, not to solve the conflict of several operators. I develop a new method which using knowledge reasoning rules and creating a compatibility-check-table to determine the satisfaction of an operator's premises. I also develop an informed graph search method which supporting alternate creation and execution of operators to make planning execution adapt the network environment. Further more, for data integration, I explain how to determine the same information object and repair partial or conflicting data. Thirdly, I advance the methods to implement source accessing and information extraction. For source accessing, I describe how to map the source description into accessing interface, and develop a method to map a paragraph into a predefined entity by creating the entity's vector space model. For information extraction, I make out an HTML-tree-based spec for information extraction rules, and develop an automatic method to get list object extraction rules. Finally, I illuminate the current Web Captor System, and suggest the areas for further researches. 请访问----德维在线 www.devay.net
Key Words: Ontology, Query Planning, Data Integration, Information Extraction
热点关注