vBulletin 5.6.5 API

vB_Utility_String
in package
Uses vB_Utility_Trait_NoSerialize

Table of Contents

$browserCharsetMap  : mixed
$defaultcharset  : mixed
$iconvenabled  : mixed
$iconvMap  : mixed
$mbstringenabled  : mixed
$mbstringMap  : mixed
$specialcharsCharsetMap  : mixed
__construct()  : mixed
Constructor
__serialize()  : mixed
__sleep()  : mixed
__unserialize()  : mixed
__wakeup()  : mixed
areCharsetsEqual()  : bool
Are the two charsets the same
getCensor()  : vB_Utility_Censor
Utility function to get a string censor
getCharset()  : mixed
Get the default charset for the class
htmlentities()  : mixed
htmlspecialchars()  : mixed
Encoding aware htmlspecialchars
isDefaultCharset()  : bool
Does the charset match the default charset for the class
normalizepath()  : mixed
parseUrl()  : mixed
UTF-8 Safe Parse_url http://us3.php.net/manual/en/function.parse-url.php
strlen()  : the
Return the length position of the given string in characters
strpos()  : the
Return the position of the given string taking into account charsets
strtolower()  : the
Returns a lower case version of the string
substr()  : the
Return the substring in the context of the character encoding
toCharset()  : string|array<string|int, mixed>
Converts a variable from one character encoding to another.
toDefault()  : string|array<string|int, mixed>
Converts to the default charset
toUtf8()  : string|array<string|int, mixed>
Converts from the internal charset to utf8
unparseUrl()  : mixed
decodeUtf8Url()  : mixed
encodeUtf8Url()  : string
Encode a UTF-8 Encoded URL and urlencode it while leaving control characters in tact.
getActualEncoding()  : mixed
Look up the passed encoding.
getCanonicalBrowserEncoding()  : mixed
Look up the canonical charset from the map based on
toCharsetInternal()  : string|array<string|int, mixed>
Converts a variable from one character encoding to another.

Properties

$browserCharsetMap

private static mixed $browserCharsetMap = array( //utf-8 'unicode-1-1-utf-8' => 'utf-8', 'utf-8' => 'utf-8', 'utf8' => 'utf-8', //Legacy single-byte encodings '866' => 'ibm866', 'cp866' => 'ibm866', 'csibm866' => 'ibm866', 'ibm866' => 'ibm866', 'csisolatin2' => 'iso-8859-2', 'iso-8859-2' => 'iso-8859-2', 'iso-ir-101' => 'iso-8859-2', 'iso8859-2' => 'iso-8859-2', 'iso88592' => 'iso-8859-2', 'iso_8859-2' => 'iso-8859-2', 'iso_8859-2:1987' => 'iso-8859-2', 'l2' => 'iso-8859-2', 'latin2' => 'iso-8859-2', 'csisolatin3' => 'iso-8859-3', 'iso-8859-3' => 'iso-8859-3', 'iso-ir-109' => 'iso-8859-3', 'iso8859-3' => 'iso-8859-3', 'iso88593' => 'iso-8859-3', 'iso_8859-3' => 'iso-8859-3', 'iso_8859-3:1988' => 'iso-8859-3', 'l3' => 'iso-8859-3', 'latin3' => 'iso-8859-3', 'csisolatin4' => 'iso-8859-4', 'iso-8859-4' => 'iso-8859-4', 'iso-ir-110' => 'iso-8859-4', 'iso8859-4' => 'iso-8859-4', 'iso88594' => 'iso-8859-4', 'iso_8859-4' => 'iso-8859-4', 'iso_8859-4:1988' => 'iso-8859-4', 'l4' => 'iso-8859-4', 'latin4' => 'iso-8859-4', 'csisolatincyrillic' => 'iso-8859-5', 'cyrillic' => 'iso-8859-5', 'iso-8859-5' => 'iso-8859-5', 'iso-ir-144' => 'iso-8859-5', 'iso8859-5' => 'iso-8859-5', 'iso88595' => 'iso-8859-5', 'iso_8859-5' => 'iso-8859-5', 'iso_8859-5:1988' => 'iso-8859-5', 'arabic' => 'iso-8859-6', 'asmo-708' => 'iso-8859-6', 'csiso88596e' => 'iso-8859-6', 'csiso88596i' => 'iso-8859-6', 'csisolatinarabic' => 'iso-8859-6', 'ecma-114' => 'iso-8859-6', 'iso-8859-6' => 'iso-8859-6', 'iso-8859-6-e' => 'iso-8859-6', 'iso-8859-6-i' => 'iso-8859-6', 'iso-ir-127' => 'iso-8859-6', 'iso8859-6' => 'iso-8859-6', 'iso88596' => 'iso-8859-6', 'iso_8859-6' => 'iso-8859-6', 'iso_8859-6:1987' => 'iso-8859-6', 'csisolatingreek' => 'iso-8859-7', 'ecma-118' => 'iso-8859-7', 'elot_928' => 'iso-8859-7', 'greek' => 'iso-8859-7', 'greek8' => 'iso-8859-7', 'iso-8859-7' => 'iso-8859-7', 'iso-ir-126' => 'iso-8859-7', 'iso8859-7' => 'iso-8859-7', 'iso88597' => 'iso-8859-7', 'iso_8859-7' => 'iso-8859-7', 'iso_8859-7:1987' => 'iso-8859-7', 'sun_eu_greek' => 'iso-8859-7', 'csiso88598e' => 'iso-8859-8', 'csisolatinhebrew' => 'iso-8859-8', 'hebrew' => 'iso-8859-8', 'iso-8859-8' => 'iso-8859-8', 'iso-8859-8-e' => 'iso-8859-8', 'iso-ir-138' => 'iso-8859-8', 'iso8859-8' => 'iso-8859-8', 'iso88598' => 'iso-8859-8', 'iso_8859-8' => 'iso-8859-8', 'iso_8859-8:1988' => 'iso-8859-8', 'visual' => 'iso-8859-8', 'csiso88598i' => 'iso-8859-8-i', 'iso-8859-8-i' => 'iso-8859-8-i', 'logical' => 'iso-8859-8-i', 'csisolatin6' => 'iso-8859-10', 'iso-8859-10' => 'iso-8859-10', 'iso-ir-157' => 'iso-8859-10', 'iso8859-10' => 'iso-8859-10', 'iso885910' => 'iso-8859-10', 'l6' => 'iso-8859-10', 'latin6' => 'iso-8859-10', 'iso-8859-13' => 'iso-8859-13', 'iso8859-13' => 'iso-8859-13', 'iso885913' => 'iso-8859-13', 'iso-8859-14' => 'iso-8859-14', 'iso8859-14' => 'iso-8859-14', 'iso885914' => 'iso-8859-14', 'csisolatin9' => 'iso-8859-15', 'iso-8859-15' => 'iso-8859-15', 'iso8859-15' => 'iso-8859-15', 'iso885915' => 'iso-8859-15', 'iso_8859-15' => 'iso-8859-15', 'l9' => 'iso-8859-15', 'iso-8859-16' => 'iso-8859-16', 'cskoi8r' => 'koi8-r', 'koi' => 'koi8-r', 'koi8' => 'koi8-r', 'koi8-r' => 'koi8-r', 'koi8_r' => 'koi8-r', 'koi8-ru' => 'koi8-u', 'koi8-u' => 'koi8-u', 'csmacintosh' => 'macintosh', 'mac' => 'macintosh', 'macintosh' => 'macintosh', 'x-mac-roman' => 'macintosh', 'dos-874' => 'windows-874', 'iso-8859-11' => 'windows-874', 'iso8859-11' => 'windows-874', 'iso885911' => 'windows-874', 'tis-620' => 'windows-874', 'windows-874' => 'windows-874', 'cp1250' => 'windows-1250', 'windows-1250' => 'windows-1250', 'x-cp1250' => 'windows-1250', 'cp1251' => 'windows-1251', 'windows-1251' => 'windows-1251', 'x-cp1251' => 'windows-1251', 'ansi_x3.4-1968' => 'windows-1252', 'ascii' => 'windows-1252', 'cp1252' => 'windows-1252', 'cp819' => 'windows-1252', 'csisolatin1' => 'windows-1252', 'ibm819' => 'windows-1252', 'iso-8859-1' => 'windows-1252', 'iso-ir-100' => 'windows-1252', 'iso8859-1' => 'windows-1252', 'iso88591' => 'windows-1252', 'iso_8859-1' => 'windows-1252', 'iso_8859-1:1987' => 'windows-1252', 'l1' => 'windows-1252', 'latin1' => 'windows-1252', 'us-ascii' => 'windows-1252', 'windows-1252' => 'windows-1252', 'x-cp1252' => 'windows-1252', 'cp1253' => 'windows-1253', 'windows-1253' => 'windows-1253', 'x-cp1253' => 'windows-1253', 'cp1254' => 'windows-1254', 'csisolatin5' => 'windows-1254', 'iso-8859-9' => 'windows-1254', 'iso-ir-148' => 'windows-1254', 'iso8859-9' => 'windows-1254', 'iso88599' => 'windows-1254', 'iso_8859-9' => 'windows-1254', 'iso_8859-9:1989' => 'windows-1254', 'l5' => 'windows-1254', 'latin5' => 'windows-1254', 'windows-1254' => 'windows-1254', 'x-cp1254' => 'windows-1254', 'windows-1255' => 'windows-1255', 'cp1255' => 'windows-1255', 'x-cp1255' => 'windows-1255', 'cp1256' => 'windows-1256', 'windows-1256' => 'windows-1256', 'x-cp1256' => 'windows-1256', 'cp1257' => 'windows-1257', 'windows-1257' => 'windows-1257', 'x-cp1257' => 'windows-1257', 'cp1258' => 'windows-1258', 'windows-1258' => 'windows-1258', 'x-cp1258' => 'windows-1258', 'x-mac-cyrillic' => 'x-mac-cyrillic', 'x-mac-ukrainian' => 'x-mac-cyrillic', // Legacy multi-byte Chinese (simplified) encodings 'chinese' => 'gbk', 'csgb2312' => 'gbk', 'csiso58gb231280' => 'gbk', 'gb2312' => 'gbk', 'gb_2312' => 'gbk', 'gb_2312-80' => 'gbk', 'gbk' => 'gbk', 'iso-ir-58' => 'gbk', 'x-gbk' => 'gbk', 'gb18030' => 'gb18030', // Legacy multi-byte Chinese (traditional) encodings 'big5' => 'big5', 'big5-hkscs' => 'big5', 'cn-big5' => 'big5', 'csbig5' => 'big5', 'x-x-big5' => 'big5', // Legacy multi-byte Japanese encodings 'cseucpkdfmtjapanese' => 'euc-jp', 'euc-jp' => 'euc-jp', 'x-euc-jp' => 'euc-jp', 'csiso2022jp' => 'iso-2022-jp', 'iso-2022-jp' => 'iso-2022-jp', 'csshiftjis' => 'shift_jis', 'ms932' => 'shift_jis', 'ms_kanji' => 'shift_jis', 'shift-jis' => 'shift_jis', 'shift_jis' => 'shift_jis', 'sjis' => 'shift_jis', 'windows-31j' => 'shift_jis', 'x-sjis' => 'shift_jis', // Legacy multi-byte Korean encodings 'cseuckr' => 'euc-kr', 'csksc56011987' => 'euc-kr', 'euc-kr' => 'euc-kr', 'iso-ir-149' => 'euc-kr', 'korean' => 'euc-kr', 'ks_c_5601-1987' => 'euc-kr', 'ks_c_5601-1989' => 'euc-kr', 'ksc5601' => 'euc-kr', 'ksc_5601' => 'euc-kr', 'windows-949' => 'euc-kr', // Legacy miscellaneous encodings // 'csiso2022kr' => 'replacement', // 'hz-gb-2312' => 'replacement', // 'iso-2022-cn' => 'replacement', // 'iso-2022-cn-ext' => 'replacement', // 'iso-2022-kr' => 'replacement', 'utf-16be' => 'utf-16be', 'utf-16' => 'utf-16le', 'utf-16le' => 'utf-16le', )

$defaultcharset

private mixed $defaultcharset

$iconvenabled

private mixed $iconvenabled

$iconvMap

private static mixed $iconvMap = array('utf-8' => 'utf-8', 'ibm866' => 'cp866', 'iso-8859-2' => 'iso-8859-2', 'iso-8859-3' => 'iso-8859-3', 'iso-8859-4' => 'iso-8859-4', 'iso-8859-5' => 'iso-8859-5', 'iso-8859-6' => 'iso-8859-6', 'iso-8859-7' => 'iso-8859-7', 'iso-8859-8' => 'iso-8859-8', 'iso-8859-8-i' => 'iso-8859-8', 'iso-8859-10' => 'iso-8859-10', 'iso-8859-13' => 'iso-8859-13', 'iso-8859-14' => 'iso-8859-14', 'iso-8859-15' => 'iso-8859-15', 'iso-8859-16' => 'iso-8859-16', 'koi8-r' => 'koi8-r', 'koi8-u' => 'koi8-u', 'macintosh' => 'macintosh', 'windows-874' => 'windows-874', 'windows-1250' => 'windows-1250', 'windows-1251' => 'windows-1251', 'windows-1252' => 'windows-1252', 'windows-1253' => 'windows-1253', 'windows-1254' => 'windows-1254', 'windows-1255' => 'windows-1255', 'windows-1256' => 'windows-1256', 'windows-1257' => 'windows-1257', 'windows-1258' => 'windows-1258', 'x-mac-cyrillic' => 'maccyrillic', 'gbk' => 'gbk', 'gb18030' => 'gb18030', 'big5' => 'big5', 'euc-jp' => 'euc-jp', 'iso-2022-jp' => 'iso-2022-jp', 'shift_jis' => 'shift_jis', 'euc-kr' => 'euc-kr', 'utf-16be' => 'utf-16be', 'utf-16le' => 'utf-16le')

$mbstringenabled

private mixed $mbstringenabled

$mbstringMap

private static mixed $mbstringMap = array('utf-8' => 'utf-8', 'ibm866' => 'cp866', 'iso-8859-2' => 'iso-8859-2', 'iso-8859-3' => 'iso-8859-3', 'iso-8859-4' => 'iso-8859-4', 'iso-8859-5' => 'iso-8859-5', 'iso-8859-6' => 'iso-8859-6', 'iso-8859-7' => 'iso-8859-7', 'iso-8859-8' => 'iso-8859-8', 'iso-8859-8-i' => 'iso-8859-8', 'iso-8859-10' => 'iso-8859-10', 'iso-8859-13' => 'iso-8859-13', 'iso-8859-14' => 'iso-8859-14', 'iso-8859-15' => 'iso-8859-15', 'iso-8859-16' => 'iso-8859-16', 'koi8-r' => 'koi8-r', 'koi8-u' => 'koi8-u', 'windows-1251' => 'windows-1251', 'windows-1252' => 'windows-1252', 'gbk' => 'gbk', 'gb18030' => 'gb18030', 'big5' => 'big5', 'euc-jp' => 'euc-jp', 'iso-2022-jp' => 'iso-2022-jp', 'shift_jis' => 'shift_jis', 'euc-kr' => 'euc-kr', 'utf-16be' => 'utf-16be', 'utf-16le' => 'utf-16le')

$specialcharsCharsetMap

private static mixed $specialcharsCharsetMap = array( 'iso-8859-1' => 'iso-8859-1', //not actually used since we map iso-8859-1 to windows-1252 'utf-8' => 'utf-8', 'windows-1252' => 'cp1252', 'iso-8859-5' => 'iso-8859-5', 'iso-8859-15' => 'iso-8859-15', 'ibm866' => 'cp866', 'windows-1251' => 'cp1251', 'koi8-r' => 'koi8-r', 'big5' => 'big5', 'big5-hkscs' => 'big5-hkscs', //not used, the standard below maps big5-hkscs to big5 'gbk' => 'gb2312', //mapping relevant as gbk is not accepted by htmlspecialchars 'shift_jis' => 'shift_jis', 'euc-jp' => 'euc-jp', 'macintosh' => 'macroman', )

Methods

__construct()

Constructor

public __construct( $charset) : mixed
Parameters
$charset :

-- the default charset

Will throw an exception if the $charset is not an accepted value

Return values
mixed

__serialize()

public __serialize() : mixed
Return values
mixed

__sleep()

public __sleep() : mixed
Return values
mixed

__unserialize()

public __unserialize(mixed $serialized) : mixed
Parameters
$serialized : mixed
Return values
mixed

__wakeup()

public __wakeup() : mixed
Return values
mixed

areCharsetsEqual()

Are the two charsets the same

public areCharsetsEqual(string $charset1, string $charset2) : bool

This uses the charset matching rules to look up the charsets and then compares the canoncical value for each charset to see if they match. If either charset is invalid according to the matching rule, the function will return false (even if both are the same invalid value)

Parameters
$charset1 : string
$charset2 : string
Return values
bool

getCensor()

Utility function to get a string censor

public getCensor(string $censortext) : vB_Utility_Censor

The censor class is conceptually related to the string class but needs to be seperate for various reasons. This is a simply helper function to handle the plumbing of geting instances of that class which also ensures that we can easily generate it from any place we have the string class

Parameters
$censortext : string
Return values
vB_Utility_Censor

getCharset()

Get the default charset for the class

public getCharset() : mixed
Tags
string

string -- the canonical charset for the class default

Return values
mixed

htmlentities()

public htmlentities(mixed $value[, mixed $flags = ENT_COMPAT | ENT_HTML401 ][, mixed $encoding = null ]) : mixed
Parameters
$value : mixed
$flags : mixed = ENT_COMPAT | ENT_HTML401
$encoding : mixed = null
Return values
mixed

htmlspecialchars()

Encoding aware htmlspecialchars

public htmlspecialchars(string $value[, int $flags = ENT_COMPAT | ENT_HTML401 ][, string $encoding = null ]) : mixed

This takes a string and produces an html escaped version. It uses specified charset.

Parameters
$value : string

-- string to be escaped

$flags : int = ENT_COMPAT | ENT_HTML401

-- flags per php function htmlspecialchars

$encoding : string = null

-- the browser encoding to use. Note that this is not the encoding value for the php function. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.

Tags
retun

string -- escaped string

Return values
mixed

isDefaultCharset()

Does the charset match the default charset for the class

public isDefaultCharset(string $charset) : bool
Parameters
$charset : string
Tags
see
areCharsetsEqual
Return values
bool

normalizepath()

public normalizepath(mixed $path[, mixed $dir_sep = '/' ]) : mixed
Parameters
$path : mixed
$dir_sep : mixed = '/'
Return values
mixed

parseUrl()

UTF-8 Safe Parse_url http://us3.php.net/manual/en/function.parse-url.php

public parseUrl(string $url[, int $component = -1 ]) : mixed
Parameters
$url : string
$component : int = -1
Return values
mixed

strlen()

Return the length position of the given string in characters

public strlen(string $string[, string $encoding = null ]) : the
Parameters
$string : string

-- string to search

$encoding : string = null

-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.

Return values
the

position of the string or false if not found

strpos()

Return the position of the given string taking into account charsets

public strpos(string $haystack, string $needle, int $offset[, string $encoding = null ]) : the

Note that this will return the character position and not the byte return attempting to do $string[$posvalue] will work right just often enough to pass testing. The substr function on this class will return the correct results

$haystack and $needle must by in the same encoding.

Parameters
$haystack : string

-- string to search

$needle : string

-- string to find

$offset : int

-- character index to start at

$encoding : string = null

-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.

Return values
the

position of the string or false if not found

strtolower()

Returns a lower case version of the string

public strtolower(string $value[, string $encoding = null ]) : the

Will attempt to use the mb_string package if available.

Parameters
$value : string

-- string to change

$encoding : string = null

-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.

Return values
the

lowercased string

substr()

Return the substring in the context of the character encoding

public substr(string $string, int $start[, int $length = null ][, string $encoding = null ]) : the
Parameters
$string : string
$start : int

-- character index to start at

$length : int = null

-- number of characters to return. If not provided or null this will return to the end of the string

$encoding : string = null

-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.

Return values
the

position of the string or false if not found

toCharset()

Converts a variable from one character encoding to another.

public toCharset(string|array<string|int, mixed> $value, string $sourceEncoding, string $targetEncoding) : string|array<string|int, mixed>

If the variable is a string it is converted. If it is array will attempt to recurse over it and convert any string values located. Any other types will be returned unchanged.

Note that this does not attempt to deal with reference loops so is not suitable for complex objects.

Parameters
$value : string|array<string|int, mixed>

-- The variable to convert

$sourceEncoding : string

-- The source encoding

$targetEncoding : string

-- The target encoding

Return values
string|array<string|int, mixed>

The converted variable.

toDefault()

Converts to the default charset

public toDefault(string|array<string|int, mixed> $value, string $sourceEncoding) : string|array<string|int, mixed>
Parameters
$value : string|array<string|int, mixed>

-- The variable to convert

$sourceEncoding : string

-- The source encoding

Tags
see
toCharset
Return values
string|array<string|int, mixed>

The converted variable.

toUtf8()

Converts from the internal charset to utf8

public toUtf8(string|array<string|int, mixed> $value) : string|array<string|int, mixed>
Parameters
$value : string|array<string|int, mixed>

-- The variable to convert

Tags
see
toCharset
Return values
string|array<string|int, mixed>

The converted variable.

unparseUrl()

public unparseUrl(mixed $parsedUrl[, mixed $removeScheme = false ][, mixed $stopBefore = '' ]) : mixed
Parameters
$parsedUrl : mixed
$removeScheme : mixed = false
$stopBefore : mixed = ''
Return values
mixed

decodeUtf8Url()

private decodeUtf8Url(mixed $url) : mixed
Parameters
$url : mixed
Return values
mixed

encodeUtf8Url()

Encode a UTF-8 Encoded URL and urlencode it while leaving control characters in tact.

private encodeUtf8Url(mixed $url) : string

(It can also work with single byte encodings, but its purpose is to supply UTF-8 urls on non UTF-8 forums.)

Parameters
$url : mixed
Return values
string

getActualEncoding()

Look up the passed encoding.

private getActualEncoding(mixed $encoding) : mixed

Automatically handles the idiom that a blank (usually default) value means to use the default character set.

Parameters
$encoding : mixed
Return values
mixed

getCanonicalBrowserEncoding()

Look up the canonical charset from the map based on

private getCanonicalBrowserEncoding(mixed $charset) : mixed
Parameters
$charset : mixed
Return values
mixed

toCharsetInternal()

Converts a variable from one character encoding to another.

private toCharsetInternal(string|array<string|int, mixed> $in, string $in_encoding, string $target_encoding) : string|array<string|int, mixed>

If the variable is a string it is converted. If it is array will attempt to recurse over it and convert any string values located. Any other types will be returned unchanged.

Note that the caller is responsible for ensuring that the charsets match the canonical charset including case

Parameters
$in : string|array<string|int, mixed>

-- The variable to convert

$in_encoding : string

-- The source encoding (must be one of the mapped canonical browser values)

$target_encoding : string

-- The target encoding (must be one of the mapped canonical browser values)

Return values
string|array<string|int, mixed>

The converted variable.

Search results