6. Use DOM methods to navigate a document

来源:百度文库 编辑:神马文学网 时间:2024/06/03 04:59:40

Use DOM methods to navigate a document

Problem

You have a HTML document that you want to extract data from. You know generally the structure of the HTML document.

Solution

Use the DOM-like methods available after parsing HTML into a Document.

File input = new File("/tmp/input.html"); 
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/"); 
Element content = doc.getElementById("content"); 
Elements links = content.getElementsByTag("a");for (Element link : links) { 
  String linkHref = link.attr("href");  String linkText = link.text();}

Description

Elements provide a range of DOM-like methods to find elements, and extract and manipulate their data. The DOM getters are contextual: called on a parent Document they find matching elements under the document; called on a child element they find elements under that child. In this way you can winnow in on the data you want.

Finding elements

  • getElementById(String id)
  • getElementsByTag(String tag)
  • getElementsByClass(String className)
  • getElementsByAttribute(String key) (and related methods)
  • Element siblings: siblingElements(), firstElementSibling(), lastElementSibling(); nextElementSibling(), previousElementSibling()
  • Graph: parent(), children(), child(int index)

Element data

  • attr(String key) to get and attr(String key, String value) to set attributes
  • attributes() to get all attributes
  • id(), className() and classNames()
  • text() to get and text(String value) to set the text content
  • html() to get and html(String value) to set the inner HTML content
  • outerHtml() to get the outer HTML value
  • data() to get data content (e.g. of script and style tags)
  • tag() and tagName()

Manipulating HTML and text

  • append(String html), prepend(String html)
  • appendText(String text), prependText(String text)
  • appendElement(String tagName), prependElement(String tagName)
  • html(String value)