htmlcxx - html and css APIs for C++
来源:百度文库 编辑:神马文学网 时间:2024/05/23 12:26:11
Description
htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique:
STL like navigation of DOM tree, using excelent‘s tree.hh library from Kasper Peeters It is possible to reproduce exactly, character by character, the original document from the parse tree Bundled css parser Optional parsing of attributes C++ code that looks like C++ (not so true anymore) Offsets of tags/elements in the original document are stored in the nodes of the DOM tree
The parsing politics of htmlcxx were created trying to mimic mozilla firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by firefox. However, differently from firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document.
News for version 0.7.3
Added utility code to escape/decode urls as defined by RFC 2396. Added new SAX interface. The API was slightly broken to support the new SAX interface :-(. Added Visual Studio 2003 projects for the WIN32 port.
Examples
Using htmlcxx is quite simple. Take a look at this example.
#include ... //Parse some html code string html = "hey"; HTML::ParserDom parser; tree dom = parser.parseTree(html); //Print whole DOM tree cout << dom << endl; //Dump all links in the tree tree::iterator it = dom.begin(); tree::iterator end = dom.end(); for (; it != end; ++it) { if (it->tagName() == "A") { it->parseAttributes(); cout << it->attributes("href"); } } //Dump all text of the document it = dom.begin(); end = dom.end(); for (; it != end; ++it) { if ((!it->isTag()) && (!it->isComment())) { cout << it->text(); } }
The htmlcxx application
htmlcxx is the name of both the library and the utility application that comes with this package. Although the htmlcxx (the application) is mostly useless for programming, you can use it to easily see how htmlcxx (the library) would parse your html code. Just install and try htmlcxx -h.
Downloads
Use the project page at sourceforge:http://sf.net/projects/htmlcxx
License Stuff
Code is now under the LGPL. This was our initial intention, and is now possible thanks to the author of tree.hh, who allowed us to use it under LGPL only for HTML::Node template instances. Checkhttp://www.fsf.org or the COPYING file in the distribution for details about the LGPL license. The uri parsing code is a derivative work of Apache web server uri parsing routines. Checkwww.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the distribution for details.
Enjoy!
Davi de Castro Reis -
Robson Braga Ara鷍o -
Last Updated: Thu Mar 24 00:56:09 2005
htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique:
STL like navigation of DOM tree, using excelent‘s tree.hh library from Kasper Peeters It is possible to reproduce exactly, character by character, the original document from the parse tree Bundled css parser Optional parsing of attributes C++ code that looks like C++ (not so true anymore) Offsets of tags/elements in the original document are stored in the nodes of the DOM tree
The parsing politics of htmlcxx were created trying to mimic mozilla firefox (http://www.mozilla.org) behavior. So you should expect parse trees similar to those create by firefox. However, differently from firefox, htmlcxx does not insert non-existent stuff in your html. Therefore, serializing the DOM tree gives exactly the same bytes contained in the original HTML document.
News for version 0.7.3
Added utility code to escape/decode urls as defined by RFC 2396. Added new SAX interface. The API was slightly broken to support the new SAX interface :-(. Added Visual Studio 2003 projects for the WIN32 port.
Examples
Using htmlcxx is quite simple. Take a look at this example.
#include
The htmlcxx application
htmlcxx is the name of both the library and the utility application that comes with this package. Although the htmlcxx (the application) is mostly useless for programming, you can use it to easily see how htmlcxx (the library) would parse your html code. Just install and try htmlcxx -h.
Downloads
Use the project page at sourceforge:http://sf.net/projects/htmlcxx
License Stuff
Code is now under the LGPL. This was our initial intention, and is now possible thanks to the author of tree.hh, who allowed us to use it under LGPL only for HTML::Node template instances. Checkhttp://www.fsf.org or the COPYING file in the distribution for details about the LGPL license. The uri parsing code is a derivative work of Apache web server uri parsing routines. Checkwww.apache.org/licenses/LICENSE-2.0 or the ASF-2.0 file in the distribution for details.
Enjoy!
Davi de Castro Reis -
Robson Braga Ara鷍o -
Last Updated: Thu Mar 24 00:56:09 2005
htmlcxx - html and css APIs for C++
CSS For Beginners - The Code Project - HTML /...
Unix And C/C++ Runtime Memory Management For Programmers
Maximum Entropy Modeling Toolkit for Python and C
CSS:HTML结构化
CSS For Bar Graphs
简明 HTML CSS 开发规范
简明 HTML CSS 开发规范
Haskell for C Programmers
HTML第五课CSS滤镜知识
HTML元素CSS要注意的
HTML第五课CSS滤镜知识 (转载)
实用HTML,CSS和JavaScript速查表
实用HTML,CSS和JavaScript速查表
unix and c
[C#/C ]C#调用非托管DLL的APIs - .NET人字拖 - 博客园
(中英文对照)GMP Guidance for APIs (FDA原料药GMP指南)
70 Expert Ideas For Better CSS Coding
ASP和HTML表单 (ASP and HTML Forms)
Debugging ""C"" And ""C++"" Programs Using ""gdb""
Politics and Energy: Now and for the Future
IP NGN Requirements for Scalable and Reliable...
Jayrock: JSON and JSON-RPC for .NET
The Adapt Framework for Adaptable and Composa...