Web-based data mining

来源：百度文库编辑：神马文学网时间：2024/05/20 12:38:00

Automatically extract information with HTML, XML, and Java

Document options

Print this page

E-mail this page
Rate this page

Help us improve this content
Level: Advanced
Jussi Myllymaki (mailto:jussi@almaden.ibm.com?subject=Web-based data mining), Researcher, IBM
Jared Jackson (mailto:jjared@almaden.ibm.com?subject=Web-based data mining), Researcher, IBM
01 Jun 2001
The World Wide Web is now undeniably the richest and most dense source of information the world has ever seen, yet its structure makes it difficult to make use of that information in a systematic way. The methods and tools described in this article will enable developers familiar with the most common technologies of the Web to quickly and easily extract the Web-delivered information they need.
The rapid growth of the World Wide Web in this age of information has led to a prolific distribution of a wide variety of public information. Unfortunately, while HTML, the major carrier of this information, provides a convenient way to present information to human readers, it can be a challenging structure from which to automatically extract information relevant to a data-driven service or application.
A variety of approaches have been taken to solve this problem. Most take the form of some proprietary query language that maps sections of an HTML page into code that populates a database with information from the Web page. While these approaches may offer some advantages, most are impractical for two reasons: one, they require a developer to take the time to learn a query language that can not be used in any other setting, and two, they are not robust enough to work in the face of the simple changes to the Web pages they target that are inevitable.
In this article, a method for Web-based data mining is developed using the standard technologies of the Web -- HTML, XML, and Java. This method is equal in power, if not more powerful, than other proprietary solutions and requires little effort to produce robust results for those already familiar with the technologies of the Web. As an added bonus, much of the code needed to begin data extraction is included with this article.

Web-based data mining Web-based data mining Data Mining on the Web Data Mining on the Web Web Extraction Products (Web Crawler, Web Grabber, Web Data Mining) Data Mining COS论坛 Ontology based data integration Ontology based data integration - Wikipedia, ... Web Content Mining Web-Based Instant Messengers(網頁即時通訊系統) Best of DBPD: Data Mining: The AI Metamorphosis RapidMiner -Open-Source Data Mining mit der Java Software RapidMiner Web Data Extraction,Web Extraction,HTML Extra... Gregarius A Free, Web-based Feed Aggregator the dynamics of web-based social networks KnowleSys Solutions(Custom Web Crawler, Web Grabber, Web Data Extractor) SOA Web Services - Data Access Service Client-Side Deep Web Data Extraction * Web2B Service -- Web Data Extraction & Screen... 营销成本是SaaS的最大障碍 : 数据挖掘研究院 China Data Mining Re... 统计软件中的数据录入格式 - 数据挖掘工具( Data Mining Tools) 如何正确理解商业智能（BI）？ - 数据挖掘研究院( China Data Mining ... Data mining made faster: New method eases analysis of 'multidimensional' information 网站运营工具：日志分析软件介绍 - 数据挖掘研究院( China Data Mining ...