Class HTMLPageParser

  • All Implemented Interfaces:
    PageParser
    Direct Known Subclasses:
    DivExtractingPageParser

    public class HTMLPageParser
    extends Object
    implements PageParser

    Builds an HTMLPage object from an HTML document. This behaves similarly to the FastPageParser, however it's a complete rewrite that is simpler to add custom features to such as extraction and transformation of elements.

    To customize the rules used, this class can be extended and have the userDefinedRules() methods overridden.

    Author:
    Joe Walnes
    See Also:
    HTMLProcessor
    • Constructor Detail

      • HTMLPageParser

        public HTMLPageParser()