Web Content Mining With Java: Techniques for Exploiting the World Wide Web - Softcover

Loton, Tony

 
9780470843116: Web Content Mining With Java: Techniques for Exploiting the World Wide Web

Synopsis

Unlock the potential of the world's biggest database. This practical book shows you how to build portals, construct search engines and other knowledge-based applications to mine the information you need from the Web. - Written by a developer for developers - A practical, hands-on approach - Illustrates how Java associated tools (XML, HTML) can be combined with database technology to display and manipulate Web-derived information more effectively. - Demonstrates how to build a structure browser, portal, meta-search engine and how to make 'Talking Pages'

"synopsis" may belong to another edition of this title.

About the Author

Tony Loton, LOTONTech Ltd, Middlewich, UK Tony Loton launched LOTONtech as a vehicle for researching and developing innovative software solutions. He developed the WebDataKit: a Java 2 solution comprising an API and a Structured Query Language designed specifically for the automatic extraction of HTML and XML from web sources. Tony's early Java web mining ideas have been featured previously as a case study contribution to "Professional Java Data programming" (Wrox Press). This book takes the ideas much further, with brand new material.

From the Back Cover

What do you with information at the websites you visit? You read it, print it, and maybe do a screen grab. But you could do so much more with it if only you could get hold of the information in a more usable form: a form that you could manipulate, store and query automatically.

In this book you'll learn how to automate the:
* discovery of websites containing interesting data
* extraction of specific information from HTML and XML pages
* presentation of aggregate information via your own portal
* interpretation of data using text- and data-mining techniques
Java is the language of the web, so all practical examples are provided in the form of Java code that demonstrates HTTP communication, HTML and XML parsing, email retrieval and much more.

This is the book for you if you want some real, practical, help to get your Java-based information applications off the ground.

"About this title" may belong to another edition of this title.