class vB_WysiwygHtmlParser

Class to parse the HTML generated by the WYSIWYG editor to BB code.

Can be extended to parse additional tags or change the parsing behavior.

This class can be used for generic HTML to BB code conversions, but it is not always ideally suited to this.

Traits

Properties

protected boolean $allowHtml Whether HTML is allowed. If false, non parsed HTML will be stripped.
protected int $pLinebreaks The number of linebreaks a

<

p> tag generates. This is usually 1 when parsing from the WYSIWYG editors and 2 in other cases.

protected array $tags The rules for the "normal" HTML tags that should be parsed. Only tags that are matched (ie, .
protected array $state Arbitrary array that can be used for tracking limited tag state when parsing.

Methods

__sleep()

No description

__wakeup()

No description

__construct()

Constructor. Automatically loads the tag rules.

array
loadTagRules()

Returns the rule set for parsing matched tags. Array key is name of HTML tag to match. Value is either a simple callback or an array with keys 'callback' and 'param' (an optional extra value to pass in to the parsing callback function). Callbacks may refer to the string $this to refer to the current class instance.

setPLinebreaks(int $linebreaks)

Sets the number of line breaks a

<

p> tag inserts.

boolean
inState(string $state)

Determines whether the parser is in the named state.

pushState(string $state)

Pushes a new state into the list.

popState(string $state)

Pops a state off the list.

string
parseWysiwygHtmlToBbcode(string $unparsed, boolean $allowHtml = false)

Parses the specified HTML into BB code

string
filterBefore(string $text)

Template method for pre-filtering the HTML before it is parsed.

string
filterHtmlTags(string $text)

Filters the HTML tags to fix common issues (HTML intertwined with BB code).

escapeWithinUrlPregMatch($matches)

Callback for preg_replace_callback in filterHtmlTags

string
escapeWithinUrl(string $type, string $url, string $delimiter = '\\"')

PCRE callback for escaping special HTML characters within src/href attributes so they are not removed by strip_tags calls later.

string
filterLinebreaksSpaces(string $text)

Filters line breaks and spaces within the HTML. Also handles a browser-specific behavior with soft wrapping.

string
filterBbcode(string $text)

Filters BB code behaviors before the HTML is parsed. Includes removing HTML from BB codes that don't support it and removing linking HTML from a manually entered BB code.

stripHtmlFromBbcodePregMatch($matches)

Callback for preg_replace_callback in filterBbcode

stripHtmlFromBbcode($text)

PCRE callback function to remove HTML from BB codes that don't support it.

string
parseHtml(string $text)

Parses the HTML tags within a string.

string
parseUnmatchedTags(string $text)

Parses special unmatched HTML tags like and
.

translateSmilieIdTextPregMatch($matches)

Callback for preg_replace_callback in filterBbcode

processEnhancedImageHtml($text, $charset = '')

No description

handleWysiwygAdvancedImageImg($img_url, $attributes)

No description

handleWysiwygImgPregMatch($matches)

Callback for preg_replace_callback in filterBbcode

string
handleWysiwygImg(string $img_url, string $fulltag)

Translates the specified img link into an img bbcode

string
translateSmilieIdText(int $smilieid)

Translates the specified smilie ID to the text that represents that smilie.

string
parseTagImg(string $img_url)

PCRE callback function to parse an tag. Can only parse the src attribute.

parseMatchedTags($text)

Parses "normal" matched HTML tags. This function (and the individual tag functions) are the primary places where the tag parsing rules are used.

parseTagByName($tagName, $text, $forceParam = null)

Parses a matched HTML tag by the name of the tag. This is resolved to the tag parsing rules array and handled from there.

string
cleanupAfter(string $text)

Post parsing clean up. Removes unparsed HTML and sanitizes some BB codes.

string
cleanupHtml(string $text)

Cleans up HTML stragglers after the parsing.

string
cleanupDisallowedHtml(string $text)

Cleans up disallowed HTML. This generally removes all HTML. It is normally called if HTML is not allowed.

string
cleanupSmiliesFromImages(string $text)

Translates image BB codes that represent smilies into the actual smilie representation.

string
cleanupBbcode(string $text)

General BB code cleanup after HTML parsing.

parseStyleAttribute(string $tagoptions, string $prependtags, string $appendtags)

Parses the style attribute from a list of attributes and determines if tags need to be wrapped. This does not do the wrapping, but gives you the text to prepend/append.

parseTagA(string $aoptions, string $text, string $tag_name, mixed $args)

Parses an tag. Matches URL and EMAIL BB code.

parseTagHeading(string $attributes, string $text, string $tag_name, mixed $args)

Parses

<

h1> through

<

h6> tags. Simply uses bold with line breaks.

parseTagP(string $poptions, string $text, string $tag_name, mixed $args)

Parses a

<

p> tag. Supports alignments and style attributes. Gives a line break.

parseTagSpan(string $spanoptions, string $text, string $tag_name, mixed $args)

Parses a tag. Supports style attributes.

parseTagDiv(string $divoptions, string $text, string $tag_name, mixed $args)

Parses a

<

div> tag. Supports alignments and style attributes. Gives a line break.

parseTagLi(string $listoptions, string $text, string $tag_name, mixed $args)

Parses an

  • tag. Outputs the list element BB code if within a list state.

  • parseTagList(string $listoptions, string $text, string $tagname, mixed $args)

    Parses

    <

    ol> and

    <

    ul> tags.

    parseTagFont(string $fontoptions, string $text, string $tag_name, mixed $args)

    Parses a tag. Supports font face, size, and color.

    parseTagBasic(string $options, string $text, string $tagname, mixed $parseto)

    Parses and does a basic HTML replacement for the named tag. The argument passed in is the BB code to parse to.

    string
    buildTableBbcodeParam(array $options)

    Builds the key-value parameter format for table (and tr/td) BB codes.

    string
    getEffectiveClassList(string $classes, string $parent_classes = '', string $suffix = '')

    Gets the effective class list for a BB code. A specific suffix is stripped off and a prefix of 'cms_table_' is removed. The class 'wysiwyg_dashes' is always ignored. For any remaining classes that aren't in the parent list are returned in a space-delimited string.

    parseTagTable(string $attributes, string $text, string $tag_name, mixed $args)

    Parses

    <

    table> tags. Supports various options. Automatically parses TRs within.

    parseTagTr(string $attributes, string $text, string $tag_name, mixed $args)

    Parses tags. Supports various options. Automatically parses TDs within.

    parseTagTd(string $attributes, string $text, string $tag_name, mixed $args)

    Parses tags. Supports various options. Arguments passed in are usually the options applied to the parent table and tr tags.

    string
    convertColorRgbToHex(string $style)

    Converts RGB colors to HEX in a style attribute

    string
    parseTag(string $tagname, string $text, callback $functionhandle, mixed $extraargs = '')

    General matched tag HTML parser. Finds matched pairs of tags (outside pairs first) and calls the specified call back.

    string
    parseWysiwygTagAttribute(string $option, string $text)

    General attribute parser. Parses the named attribute out of a string of attributes.

    string
    parseWysiwygStyleAttribute(string $option, string $text)

    General attribute parser. Parses the named attribute out of a string of attributes.

    bool
    isBbcodeTagAllowed(string $tag)

    Determines if the specified BB code tag is globally enabled.

    Details

    in vB_Trait_NoSerialize at line 15
    __sleep()

    in vB_Trait_NoSerialize at line 20
    __wakeup()

    at line 65
    __construct()

    Constructor. Automatically loads the tag rules.

    at line 79
    array loadTagRules()

    Returns the rule set for parsing matched tags. Array key is name of HTML tag to match. Value is either a simple callback or an array with keys 'callback' and 'param' (an optional extra value to pass in to the parsing callback function). Callbacks may refer to the string $this to refer to the current class instance.

    Return Value

    array

    at line 148
    setPLinebreaks(int $linebreaks)

    Sets the number of line breaks a

    <

    p> tag inserts.

    Parameters

    int $linebreaks

    at line 168
    boolean inState(string $state)

    Determines whether the parser is in the named state.

    Note that a parser can be in multiple states simultaneously. The state is not tracked with a stack.

    Parameters

    string $state State

    Return Value

    boolean

    at line 178
    protected pushState(string $state)

    Pushes a new state into the list.

    Parameters

    string $state State

    at line 195
    protected popState(string $state)

    Pops a state off the list.

    Parameters

    string $state State

    at line 216
    string parseWysiwygHtmlToBbcode(string $unparsed, boolean $allowHtml = false)

    Parses the specified HTML into BB code

    Parameters

    string $unparsed HTML to parse
    boolean $allowHtml Whether to allow unparsable HTML to remain

    Return Value

    string Parsed version (BB code)

    at line 242
    string filterBefore(string $text)

    Template method for pre-filtering the HTML before it is parsed.

    Filters things like BB code mixed into HTML, browser specific wrapping, and HTML within BB codes that don't support nested tags.

    Parameters

    string $text Text pre-filter

    Return Value

    string Text post-filter

    at line 258
    protected string filterHtmlTags(string $text)

    Filters the HTML tags to fix common issues (HTML intertwined with BB code).

    Parameters

    string $text Text pre-filter

    Return Value

    string Text post-filter

    at line 274
    protected escapeWithinUrlPregMatch($matches)

    Callback for preg_replace_callback in filterHtmlTags

    Parameters

    $matches

    at line 288
    protected string escapeWithinUrl(string $type, string $url, string $delimiter = '\\"')

    PCRE callback for escaping special HTML characters within src/href attributes so they are not removed by strip_tags calls later.

    Parameters

    string $type Type of call (tag name and src/href)
    string $url URL that will be escaped
    string $delimiter Delimiter for the attribute

    Return Value

    string Escaped output.

    at line 310
    protected string filterLinebreaksSpaces(string $text)

    Filters line breaks and spaces within the HTML. Also handles a browser-specific behavior with soft wrapping.

    Parameters

    string $text Text pre-filter

    Return Value

    string Text post-filter

    at line 333
    protected string filterBbcode(string $text)

    Filters BB code behaviors before the HTML is parsed. Includes removing HTML from BB codes that don't support it and removing linking HTML from a manually entered BB code.

    Parameters

    string $text Text pre-filter

    Return Value

    string Text post-filter

    at line 347
    protected stripHtmlFromBbcodePregMatch($matches)

    Callback for preg_replace_callback in filterBbcode

    Parameters

    $matches

    at line 360
    protected stripHtmlFromBbcode($text)

    PCRE callback function to remove HTML from BB codes that don't support it.

    Standard line break HTML is maintinaed.

    Parameters

    $text

    at line 374
    string parseHtml(string $text)

    Parses the HTML tags within a string.

    Handles matched and special unmatched tags.

    Parameters

    string $text Text pre-parsed

    Return Value

    string Parsed text (BB code)

    at line 394
    protected string parseUnmatchedTags(string $text)

    Parses special unmatched HTML tags like and
    .

    Parameters

    string $text Text pre-parsed

    Return Value

    string Parsed text

    at line 428
    protected translateSmilieIdTextPregMatch($matches)

    Callback for preg_replace_callback in filterBbcode

    Parameters

    $matches

    at line 443
    protected processEnhancedImageHtml($text, $charset = '')

    Parameters

    $text
    $charset

    at line 627
    protected handleWysiwygAdvancedImageImg($img_url, $attributes)

    Parameters

    $img_url
    $attributes

    at line 706
    protected handleWysiwygImgPregMatch($matches)

    Callback for preg_replace_callback in filterBbcode

    Parameters

    $matches

    at line 719
    protected string handleWysiwygImg(string $img_url, string $fulltag)

    Translates the specified img link into an img bbcode

    Parameters

    string $img_url image url
    string $fulltag full image tag

    Return Value

    string img bbcode

    at line 790
    protected string translateSmilieIdText(int $smilieid)

    Translates the specified smilie ID to the text that represents that smilie.

    Parameters

    int $smilieid Smilie ID

    Return Value

    string Smilie text

    at line 824
    protected string parseTagImg(string $img_url)

    PCRE callback function to parse an tag. Can only parse the src attribute.

    Parameters

    string $img_url The image's URL (src attribute)

    Return Value

    string An IMG BB code

    at line 845
    protected parseMatchedTags($text)

    Parses "normal" matched HTML tags. This function (and the individual tag functions) are the primary places where the tag parsing rules are used.

    Parameters

    $text

    at line 875
    parseTagByName($tagName, $text, $forceParam = null)

    Parses a matched HTML tag by the name of the tag. This is resolved to the tag parsing rules array and handled from there.

    Parameters

    $tagName
    $text
    $forceParam

    at line 928
    string cleanupAfter(string $text)

    Post parsing clean up. Removes unparsed HTML and sanitizes some BB codes.

    Parameters

    string $text Text pre-cleanup

    Return Value

    string Text post-cleanup

    at line 944
    protected string cleanupHtml(string $text)

    Cleans up HTML stragglers after the parsing.

    Parameters

    string $text Text pre-cleanup

    Return Value

    string Text post-cleanup

    at line 994
    protected string cleanupDisallowedHtml(string $text)

    Cleans up disallowed HTML. This generally removes all HTML. It is normally called if HTML is not allowed.

    Parameters

    string $text Text pre-cleanup

    Return Value

    string Text post-cleanup

    at line 1011
    protected string cleanupSmiliesFromImages(string $text)

    Translates image BB codes that represent smilies into the actual smilie representation.

    Parameters

    string $text Text pre-cleanup

    Return Value

    string Text post-cleanup

    at line 1043
    protected string cleanupBbcode(string $text)

    General BB code cleanup after HTML parsing.

    Parameters

    string $text Text pre-cleanup

    Return Value

    string Text post-cleanup

    at line 1365
    protected parseStyleAttribute(string $tagoptions, string $prependtags, string $appendtags)

    Parses the style attribute from a list of attributes and determines if tags need to be wrapped. This does not do the wrapping, but gives you the text to prepend/append.

    Parameters

    string $tagoptions Attribute string (multiple attributes within)
    string $prependtags (return) Text to prepend
    string $appendtags (return) Text to append

    at line 1495
    protected parseTagHeading(string $attributes, string $text, string $tag_name, mixed $args)

    Parses

    <

    h1> through

    <

    h6> tags. Simply uses bold with line breaks.

    Parameters

    string $attributes String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1529
    protected parseTagP(string $poptions, string $text, string $tag_name, mixed $args)

    Parses a

    <

    p> tag. Supports alignments and style attributes. Gives a line break.

    Parameters

    string $poptions String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1590
    protected parseTagSpan(string $spanoptions, string $text, string $tag_name, mixed $args)

    Parses a tag. Supports style attributes.

    Parameters

    string $spanoptions String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1607
    protected parseTagDiv(string $divoptions, string $text, string $tag_name, mixed $args)

    Parses a

    <

    div> tag. Supports alignments and style attributes. Gives a line break.

    Parameters

    string $divoptions String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1662
    protected parseTagLi(string $listoptions, string $text, string $tag_name, mixed $args)

    Parses an

  • tag. Outputs the list element BB code if within a list state.

  • Parameters

    string $listoptions String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1689
    protected parseTagList(string $listoptions, string $text, string $tagname, mixed $args)

    Parses

    <

    ol> and

    <

    ul> tags.

    Parameters

    string $listoptions String containing tag attributes
    string $text Text within tag
    string $tagname Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1754
    protected parseTagFont(string $fontoptions, string $text, string $tag_name, mixed $args)

    Parses a tag. Supports font face, size, and color.

    Parameters

    string $fontoptions String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 1791
    protected parseTagBasic(string $options, string $text, string $tagname, mixed $parseto)

    Parses and does a basic HTML replacement for the named tag. The argument passed in is the BB code to parse to.

    Parameters

    string $options String containing tag attributes
    string $text Text within tag
    string $tagname Name of HTML tag. Used if one function parses multiple tags
    mixed $parseto Name of the BB code to parse to

    at line 1833
    protected string buildTableBbcodeParam(array $options)

    Builds the key-value parameter format for table (and tr/td) BB codes.

    Parameters

    array $options Key-value array of params to specify

    Return Value

    string If there are options, the full BB code param (including the leading "=").

    at line 1862
    protected string getEffectiveClassList(string $classes, string $parent_classes = '', string $suffix = '')

    Gets the effective class list for a BB code. A specific suffix is stripped off and a prefix of 'cms_table_' is removed. The class 'wysiwyg_dashes' is always ignored. For any remaining classes that aren't in the parent list are returned in a space-delimited string.

    Parameters

    string $classes List of classes applied to this tag
    string $parent_classes List of classes applied to any parent tags
    string $suffix Optional suffix to strip off from each class applied to this tag

    Return Value

    string Space-delimited list of remaining classes

    at line 1918
    protected parseTagTable(string $attributes, string $text, string $tag_name, mixed $args)

    Parses

    <

    table> tags. Supports various options. Automatically parses TRs within.

    Parameters

    string $attributes String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 2037
    protected parseTagTr(string $attributes, string $text, string $tag_name, mixed $args)

    Parses tags. Supports various options. Automatically parses TDs within.

    Arguments passed in are usually the options applied to the parent table tag.

    Parameters

    string $attributes String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 2094
    protected parseTagTd(string $attributes, string $text, string $tag_name, mixed $args)

    Parses tags. Supports various options. Arguments passed in are usually the options applied to the parent table and tr tags.

    Parameters

    string $attributes String containing tag attributes
    string $text Text within tag
    string $tag_name Name of HTML tag. Used if one function parses multiple tags
    mixed $args Extra arguments passed in to parsing call or tag rules

    at line 2170
    protected string convertColorRgbToHex(string $style)

    Converts RGB colors to HEX in a style attribute

    Parameters

    string $style Contents of the style attribute

    Return Value

    string Contents of the style attribute, after applying changes

    at line 2193
    protected string parseTag(string $tagname, string $text, callback $functionhandle, mixed $extraargs = '')

    General matched tag HTML parser. Finds matched pairs of tags (outside pairs first) and calls the specified call back.

    Parameters

    string $tagname Name of the HTML tag to search for
    string $text Text to search
    callback $functionhandle Callback to call when found
    mixed $extraargs Extra arguments to pass into the callback function

    Return Value

    string Text with named tag parsed

    at line 2301
    protected string parseWysiwygTagAttribute(string $option, string $text)

    General attribute parser. Parses the named attribute out of a string of attributes.

    Parameters

    string $option Name of attribute to parse. Should be in form "attr="
    string $text Text to search

    Return Value

    string Value of named attribute

    at line 2350
    protected string parseWysiwygStyleAttribute(string $option, string $text)

    General attribute parser. Parses the named attribute out of a string of attributes.

    Parameters

    string $option Name of attribute to parse. Should be in form "attr:"
    string $text Text to search

    Return Value

    string Value of named attribute

    at line 2369
    protected bool isBbcodeTagAllowed(string $tag)

    Determines if the specified BB code tag is globally enabled.

    Parameters

    string $tag Tag name

    Return Value

    bool