babby's first C++ project. refactor in progress
- i needed parsing capability for another one of my projects
- Just Works
- 支持 CJK 字符集
- given a html5-compliant file input, accurately produces a doubly linked general tree representation of the DOM
- accurately preserves ALL tag attributes
- shouldnt discriminate against even the most horrendously formatted markup
- any facility whatsoever to process the parsed tree
- parses at reasonable speed
- support of emmet-like input rules to the parser
- a 1-week old cpp dev birthed this into existence. do point out any better approach to the spaghetti that is the parsing logic
- will break down at javascript embeds if the raw string </script>
is involved. clueless as to how to deal with it at the moment
- built without unsafe input handling considered. use recklessly at your own risk
- no "parse exceptions" of any kind implemented... yet?
- discards certain data such as script and stylesheet embeds
- excessive spaces within the content are not ignored, though its, at worst, an annoyance that doesnt affect the accuracy of the structure