vB_Utility_String
in package
Uses
vB_Utility_Trait_NoSerialize
Table of Contents
- $browserCharsetMap : mixed
- $defaultcharset : mixed
- $iconvenabled : mixed
- $iconvMap : mixed
- $mbstringenabled : mixed
- $mbstringMap : mixed
- $specialcharsCharsetMap : mixed
- __construct() : mixed
- Constructor
- __serialize() : mixed
- __sleep() : mixed
- __unserialize() : mixed
- __wakeup() : mixed
- areCharsetsEqual() : bool
- Are the two charsets the same
- getCensor() : vB_Utility_Censor
- Utility function to get a string censor
- getCharset() : mixed
- Get the default charset for the class
- htmlentities() : mixed
- htmlspecialchars() : mixed
- Encoding aware htmlspecialchars
- isDefaultCharset() : bool
- Does the charset match the default charset for the class
- normalizepath() : mixed
- parseUrl() : mixed
- UTF-8 Safe Parse_url http://us3.php.net/manual/en/function.parse-url.php
- strlen() : the
- Return the length position of the given string in characters
- strpos() : the
- Return the position of the given string taking into account charsets
- strtolower() : the
- Returns a lower case version of the string
- substr() : the
- Return the substring in the context of the character encoding
- toCharset() : string|array<string|int, mixed>
- Converts a variable from one character encoding to another.
- toDefault() : string|array<string|int, mixed>
- Converts to the default charset
- toUtf8() : string|array<string|int, mixed>
- Converts from the internal charset to utf8
- unparseUrl() : mixed
- decodeUtf8Url() : mixed
- encodeUtf8Url() : string
- Encode a UTF-8 Encoded URL and urlencode it while leaving control characters in tact.
- getActualEncoding() : mixed
- Look up the passed encoding.
- getCanonicalBrowserEncoding() : mixed
- Look up the canonical charset from the map based on
- toCharsetInternal() : string|array<string|int, mixed>
- Converts a variable from one character encoding to another.
Properties
$browserCharsetMap
private
static mixed
$browserCharsetMap
= array(
//utf-8
'unicode-1-1-utf-8' => 'utf-8',
'utf-8' => 'utf-8',
'utf8' => 'utf-8',
//Legacy single-byte encodings
'866' => 'ibm866',
'cp866' => 'ibm866',
'csibm866' => 'ibm866',
'ibm866' => 'ibm866',
'csisolatin2' => 'iso-8859-2',
'iso-8859-2' => 'iso-8859-2',
'iso-ir-101' => 'iso-8859-2',
'iso8859-2' => 'iso-8859-2',
'iso88592' => 'iso-8859-2',
'iso_8859-2' => 'iso-8859-2',
'iso_8859-2:1987' => 'iso-8859-2',
'l2' => 'iso-8859-2',
'latin2' => 'iso-8859-2',
'csisolatin3' => 'iso-8859-3',
'iso-8859-3' => 'iso-8859-3',
'iso-ir-109' => 'iso-8859-3',
'iso8859-3' => 'iso-8859-3',
'iso88593' => 'iso-8859-3',
'iso_8859-3' => 'iso-8859-3',
'iso_8859-3:1988' => 'iso-8859-3',
'l3' => 'iso-8859-3',
'latin3' => 'iso-8859-3',
'csisolatin4' => 'iso-8859-4',
'iso-8859-4' => 'iso-8859-4',
'iso-ir-110' => 'iso-8859-4',
'iso8859-4' => 'iso-8859-4',
'iso88594' => 'iso-8859-4',
'iso_8859-4' => 'iso-8859-4',
'iso_8859-4:1988' => 'iso-8859-4',
'l4' => 'iso-8859-4',
'latin4' => 'iso-8859-4',
'csisolatincyrillic' => 'iso-8859-5',
'cyrillic' => 'iso-8859-5',
'iso-8859-5' => 'iso-8859-5',
'iso-ir-144' => 'iso-8859-5',
'iso8859-5' => 'iso-8859-5',
'iso88595' => 'iso-8859-5',
'iso_8859-5' => 'iso-8859-5',
'iso_8859-5:1988' => 'iso-8859-5',
'arabic' => 'iso-8859-6',
'asmo-708' => 'iso-8859-6',
'csiso88596e' => 'iso-8859-6',
'csiso88596i' => 'iso-8859-6',
'csisolatinarabic' => 'iso-8859-6',
'ecma-114' => 'iso-8859-6',
'iso-8859-6' => 'iso-8859-6',
'iso-8859-6-e' => 'iso-8859-6',
'iso-8859-6-i' => 'iso-8859-6',
'iso-ir-127' => 'iso-8859-6',
'iso8859-6' => 'iso-8859-6',
'iso88596' => 'iso-8859-6',
'iso_8859-6' => 'iso-8859-6',
'iso_8859-6:1987' => 'iso-8859-6',
'csisolatingreek' => 'iso-8859-7',
'ecma-118' => 'iso-8859-7',
'elot_928' => 'iso-8859-7',
'greek' => 'iso-8859-7',
'greek8' => 'iso-8859-7',
'iso-8859-7' => 'iso-8859-7',
'iso-ir-126' => 'iso-8859-7',
'iso8859-7' => 'iso-8859-7',
'iso88597' => 'iso-8859-7',
'iso_8859-7' => 'iso-8859-7',
'iso_8859-7:1987' => 'iso-8859-7',
'sun_eu_greek' => 'iso-8859-7',
'csiso88598e' => 'iso-8859-8',
'csisolatinhebrew' => 'iso-8859-8',
'hebrew' => 'iso-8859-8',
'iso-8859-8' => 'iso-8859-8',
'iso-8859-8-e' => 'iso-8859-8',
'iso-ir-138' => 'iso-8859-8',
'iso8859-8' => 'iso-8859-8',
'iso88598' => 'iso-8859-8',
'iso_8859-8' => 'iso-8859-8',
'iso_8859-8:1988' => 'iso-8859-8',
'visual' => 'iso-8859-8',
'csiso88598i' => 'iso-8859-8-i',
'iso-8859-8-i' => 'iso-8859-8-i',
'logical' => 'iso-8859-8-i',
'csisolatin6' => 'iso-8859-10',
'iso-8859-10' => 'iso-8859-10',
'iso-ir-157' => 'iso-8859-10',
'iso8859-10' => 'iso-8859-10',
'iso885910' => 'iso-8859-10',
'l6' => 'iso-8859-10',
'latin6' => 'iso-8859-10',
'iso-8859-13' => 'iso-8859-13',
'iso8859-13' => 'iso-8859-13',
'iso885913' => 'iso-8859-13',
'iso-8859-14' => 'iso-8859-14',
'iso8859-14' => 'iso-8859-14',
'iso885914' => 'iso-8859-14',
'csisolatin9' => 'iso-8859-15',
'iso-8859-15' => 'iso-8859-15',
'iso8859-15' => 'iso-8859-15',
'iso885915' => 'iso-8859-15',
'iso_8859-15' => 'iso-8859-15',
'l9' => 'iso-8859-15',
'iso-8859-16' => 'iso-8859-16',
'cskoi8r' => 'koi8-r',
'koi' => 'koi8-r',
'koi8' => 'koi8-r',
'koi8-r' => 'koi8-r',
'koi8_r' => 'koi8-r',
'koi8-ru' => 'koi8-u',
'koi8-u' => 'koi8-u',
'csmacintosh' => 'macintosh',
'mac' => 'macintosh',
'macintosh' => 'macintosh',
'x-mac-roman' => 'macintosh',
'dos-874' => 'windows-874',
'iso-8859-11' => 'windows-874',
'iso8859-11' => 'windows-874',
'iso885911' => 'windows-874',
'tis-620' => 'windows-874',
'windows-874' => 'windows-874',
'cp1250' => 'windows-1250',
'windows-1250' => 'windows-1250',
'x-cp1250' => 'windows-1250',
'cp1251' => 'windows-1251',
'windows-1251' => 'windows-1251',
'x-cp1251' => 'windows-1251',
'ansi_x3.4-1968' => 'windows-1252',
'ascii' => 'windows-1252',
'cp1252' => 'windows-1252',
'cp819' => 'windows-1252',
'csisolatin1' => 'windows-1252',
'ibm819' => 'windows-1252',
'iso-8859-1' => 'windows-1252',
'iso-ir-100' => 'windows-1252',
'iso8859-1' => 'windows-1252',
'iso88591' => 'windows-1252',
'iso_8859-1' => 'windows-1252',
'iso_8859-1:1987' => 'windows-1252',
'l1' => 'windows-1252',
'latin1' => 'windows-1252',
'us-ascii' => 'windows-1252',
'windows-1252' => 'windows-1252',
'x-cp1252' => 'windows-1252',
'cp1253' => 'windows-1253',
'windows-1253' => 'windows-1253',
'x-cp1253' => 'windows-1253',
'cp1254' => 'windows-1254',
'csisolatin5' => 'windows-1254',
'iso-8859-9' => 'windows-1254',
'iso-ir-148' => 'windows-1254',
'iso8859-9' => 'windows-1254',
'iso88599' => 'windows-1254',
'iso_8859-9' => 'windows-1254',
'iso_8859-9:1989' => 'windows-1254',
'l5' => 'windows-1254',
'latin5' => 'windows-1254',
'windows-1254' => 'windows-1254',
'x-cp1254' => 'windows-1254',
'windows-1255' => 'windows-1255',
'cp1255' => 'windows-1255',
'x-cp1255' => 'windows-1255',
'cp1256' => 'windows-1256',
'windows-1256' => 'windows-1256',
'x-cp1256' => 'windows-1256',
'cp1257' => 'windows-1257',
'windows-1257' => 'windows-1257',
'x-cp1257' => 'windows-1257',
'cp1258' => 'windows-1258',
'windows-1258' => 'windows-1258',
'x-cp1258' => 'windows-1258',
'x-mac-cyrillic' => 'x-mac-cyrillic',
'x-mac-ukrainian' => 'x-mac-cyrillic',
// Legacy multi-byte Chinese (simplified) encodings
'chinese' => 'gbk',
'csgb2312' => 'gbk',
'csiso58gb231280' => 'gbk',
'gb2312' => 'gbk',
'gb_2312' => 'gbk',
'gb_2312-80' => 'gbk',
'gbk' => 'gbk',
'iso-ir-58' => 'gbk',
'x-gbk' => 'gbk',
'gb18030' => 'gb18030',
// Legacy multi-byte Chinese (traditional) encodings
'big5' => 'big5',
'big5-hkscs' => 'big5',
'cn-big5' => 'big5',
'csbig5' => 'big5',
'x-x-big5' => 'big5',
// Legacy multi-byte Japanese encodings
'cseucpkdfmtjapanese' => 'euc-jp',
'euc-jp' => 'euc-jp',
'x-euc-jp' => 'euc-jp',
'csiso2022jp' => 'iso-2022-jp',
'iso-2022-jp' => 'iso-2022-jp',
'csshiftjis' => 'shift_jis',
'ms932' => 'shift_jis',
'ms_kanji' => 'shift_jis',
'shift-jis' => 'shift_jis',
'shift_jis' => 'shift_jis',
'sjis' => 'shift_jis',
'windows-31j' => 'shift_jis',
'x-sjis' => 'shift_jis',
// Legacy multi-byte Korean encodings
'cseuckr' => 'euc-kr',
'csksc56011987' => 'euc-kr',
'euc-kr' => 'euc-kr',
'iso-ir-149' => 'euc-kr',
'korean' => 'euc-kr',
'ks_c_5601-1987' => 'euc-kr',
'ks_c_5601-1989' => 'euc-kr',
'ksc5601' => 'euc-kr',
'ksc_5601' => 'euc-kr',
'windows-949' => 'euc-kr',
// Legacy miscellaneous encodings
// 'csiso2022kr' => 'replacement',
// 'hz-gb-2312' => 'replacement',
// 'iso-2022-cn' => 'replacement',
// 'iso-2022-cn-ext' => 'replacement',
// 'iso-2022-kr' => 'replacement',
'utf-16be' => 'utf-16be',
'utf-16' => 'utf-16le',
'utf-16le' => 'utf-16le',
)
$defaultcharset
private
mixed
$defaultcharset
$iconvenabled
private
mixed
$iconvenabled
$iconvMap
private
static mixed
$iconvMap
= array('utf-8' => 'utf-8', 'ibm866' => 'cp866', 'iso-8859-2' => 'iso-8859-2', 'iso-8859-3' => 'iso-8859-3', 'iso-8859-4' => 'iso-8859-4', 'iso-8859-5' => 'iso-8859-5', 'iso-8859-6' => 'iso-8859-6', 'iso-8859-7' => 'iso-8859-7', 'iso-8859-8' => 'iso-8859-8', 'iso-8859-8-i' => 'iso-8859-8', 'iso-8859-10' => 'iso-8859-10', 'iso-8859-13' => 'iso-8859-13', 'iso-8859-14' => 'iso-8859-14', 'iso-8859-15' => 'iso-8859-15', 'iso-8859-16' => 'iso-8859-16', 'koi8-r' => 'koi8-r', 'koi8-u' => 'koi8-u', 'macintosh' => 'macintosh', 'windows-874' => 'windows-874', 'windows-1250' => 'windows-1250', 'windows-1251' => 'windows-1251', 'windows-1252' => 'windows-1252', 'windows-1253' => 'windows-1253', 'windows-1254' => 'windows-1254', 'windows-1255' => 'windows-1255', 'windows-1256' => 'windows-1256', 'windows-1257' => 'windows-1257', 'windows-1258' => 'windows-1258', 'x-mac-cyrillic' => 'maccyrillic', 'gbk' => 'gbk', 'gb18030' => 'gb18030', 'big5' => 'big5', 'euc-jp' => 'euc-jp', 'iso-2022-jp' => 'iso-2022-jp', 'shift_jis' => 'shift_jis', 'euc-kr' => 'euc-kr', 'utf-16be' => 'utf-16be', 'utf-16le' => 'utf-16le')
$mbstringenabled
private
mixed
$mbstringenabled
$mbstringMap
private
static mixed
$mbstringMap
= array('utf-8' => 'utf-8', 'ibm866' => 'cp866', 'iso-8859-2' => 'iso-8859-2', 'iso-8859-3' => 'iso-8859-3', 'iso-8859-4' => 'iso-8859-4', 'iso-8859-5' => 'iso-8859-5', 'iso-8859-6' => 'iso-8859-6', 'iso-8859-7' => 'iso-8859-7', 'iso-8859-8' => 'iso-8859-8', 'iso-8859-8-i' => 'iso-8859-8', 'iso-8859-10' => 'iso-8859-10', 'iso-8859-13' => 'iso-8859-13', 'iso-8859-14' => 'iso-8859-14', 'iso-8859-15' => 'iso-8859-15', 'iso-8859-16' => 'iso-8859-16', 'koi8-r' => 'koi8-r', 'koi8-u' => 'koi8-u', 'windows-1251' => 'windows-1251', 'windows-1252' => 'windows-1252', 'gbk' => 'gbk', 'gb18030' => 'gb18030', 'big5' => 'big5', 'euc-jp' => 'euc-jp', 'iso-2022-jp' => 'iso-2022-jp', 'shift_jis' => 'shift_jis', 'euc-kr' => 'euc-kr', 'utf-16be' => 'utf-16be', 'utf-16le' => 'utf-16le')
$specialcharsCharsetMap
private
static mixed
$specialcharsCharsetMap
= array(
'iso-8859-1' => 'iso-8859-1',
//not actually used since we map iso-8859-1 to windows-1252
'utf-8' => 'utf-8',
'windows-1252' => 'cp1252',
'iso-8859-5' => 'iso-8859-5',
'iso-8859-15' => 'iso-8859-15',
'ibm866' => 'cp866',
'windows-1251' => 'cp1251',
'koi8-r' => 'koi8-r',
'big5' => 'big5',
'big5-hkscs' => 'big5-hkscs',
//not used, the standard below maps big5-hkscs to big5
'gbk' => 'gb2312',
//mapping relevant as gbk is not accepted by htmlspecialchars
'shift_jis' => 'shift_jis',
'euc-jp' => 'euc-jp',
'macintosh' => 'macroman',
)
Methods
__construct()
Constructor
public
__construct( $charset) : mixed
Parameters
Return values
mixed —__serialize()
public
__serialize() : mixed
Return values
mixed —__sleep()
public
__sleep() : mixed
Return values
mixed —__unserialize()
public
__unserialize(mixed $serialized) : mixed
Parameters
- $serialized : mixed
Return values
mixed —__wakeup()
public
__wakeup() : mixed
Return values
mixed —areCharsetsEqual()
Are the two charsets the same
public
areCharsetsEqual(string $charset1, string $charset2) : bool
This uses the charset matching rules to look up the charsets and then compares the canoncical value for each charset to see if they match. If either charset is invalid according to the matching rule, the function will return false (even if both are the same invalid value)
Parameters
- $charset1 : string
- $charset2 : string
Return values
bool —getCensor()
Utility function to get a string censor
public
getCensor(string $censortext) : vB_Utility_Censor
The censor class is conceptually related to the string class but needs to be seperate for various reasons. This is a simply helper function to handle the plumbing of geting instances of that class which also ensures that we can easily generate it from any place we have the string class
Parameters
- $censortext : string
Return values
vB_Utility_Censor —getCharset()
Get the default charset for the class
public
getCharset() : mixed
Tags
Return values
mixed —htmlentities()
public
htmlentities(mixed $value[, mixed $flags = ENT_COMPAT | ENT_HTML401 ][, mixed $encoding = null ]) : mixed
Parameters
- $value : mixed
- $flags : mixed = ENT_COMPAT | ENT_HTML401
- $encoding : mixed = null
Return values
mixed —htmlspecialchars()
Encoding aware htmlspecialchars
public
htmlspecialchars(string $value[, int $flags = ENT_COMPAT | ENT_HTML401 ][, string $encoding = null ]) : mixed
This takes a string and produces an html escaped version. It uses specified charset.
Parameters
- $value : string
-
-- string to be escaped
- $flags : int = ENT_COMPAT | ENT_HTML401
-
-- flags per php function htmlspecialchars
- $encoding : string = null
-
-- the browser encoding to use. Note that this is not the encoding value for the php function. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.
Tags
Return values
mixed —isDefaultCharset()
Does the charset match the default charset for the class
public
isDefaultCharset(string $charset) : bool
Parameters
- $charset : string
Tags
Return values
bool —normalizepath()
public
normalizepath(mixed $path[, mixed $dir_sep = '/' ]) : mixed
Parameters
- $path : mixed
- $dir_sep : mixed = '/'
Return values
mixed —parseUrl()
UTF-8 Safe Parse_url http://us3.php.net/manual/en/function.parse-url.php
public
parseUrl(string $url[, int $component = -1 ]) : mixed
Parameters
- $url : string
- $component : int = -1
Return values
mixed —strlen()
Return the length position of the given string in characters
public
strlen(string $string[, string $encoding = null ]) : the
Parameters
- $string : string
-
-- string to search
- $encoding : string = null
-
-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.
Return values
the —position of the string or false if not found
strpos()
Return the position of the given string taking into account charsets
public
strpos(string $haystack, string $needle, int $offset[, string $encoding = null ]) : the
Note that this will return the character position and not the byte return attempting to do $string[$posvalue] will work right just often enough to pass testing. The substr function on this class will return the correct results
$haystack and $needle must by in the same encoding.
Parameters
- $haystack : string
-
-- string to search
- $needle : string
-
-- string to find
- $offset : int
-
-- character index to start at
- $encoding : string = null
-
-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.
Return values
the —position of the string or false if not found
strtolower()
Returns a lower case version of the string
public
strtolower(string $value[, string $encoding = null ]) : the
Will attempt to use the mb_string package if available.
Parameters
- $value : string
-
-- string to change
- $encoding : string = null
-
-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.
Return values
the —lowercased string
substr()
Return the substring in the context of the character encoding
public
substr(string $string, int $start[, int $length = null ][, string $encoding = null ]) : the
Parameters
- $string : string
- $start : int
-
-- character index to start at
- $length : int = null
-
-- number of characters to return. If not provided or null this will return to the end of the string
- $encoding : string = null
-
-- the browser encoding to use. Use the same values as you would use for the http/html value and would pass to this class. If null, the class default will be used.
Return values
the —position of the string or false if not found
toCharset()
Converts a variable from one character encoding to another.
public
toCharset(string|array<string|int, mixed> $value, string $sourceEncoding, string $targetEncoding) : string|array<string|int, mixed>
If the variable is a string it is converted. If it is array will attempt to recurse over it and convert any string values located. Any other types will be returned unchanged.
Note that this does not attempt to deal with reference loops so is not suitable for complex objects.
Parameters
- $value : string|array<string|int, mixed>
-
-- The variable to convert
- $sourceEncoding : string
-
-- The source encoding
- $targetEncoding : string
-
-- The target encoding
Return values
string|array<string|int, mixed> —The converted variable.
toDefault()
Converts to the default charset
public
toDefault(string|array<string|int, mixed> $value, string $sourceEncoding) : string|array<string|int, mixed>
Parameters
- $value : string|array<string|int, mixed>
-
-- The variable to convert
- $sourceEncoding : string
-
-- The source encoding
Tags
Return values
string|array<string|int, mixed> —The converted variable.
toUtf8()
Converts from the internal charset to utf8
public
toUtf8(string|array<string|int, mixed> $value) : string|array<string|int, mixed>
Parameters
- $value : string|array<string|int, mixed>
-
-- The variable to convert
Tags
Return values
string|array<string|int, mixed> —The converted variable.
unparseUrl()
public
unparseUrl(mixed $parsedUrl[, mixed $removeScheme = false ][, mixed $stopBefore = '' ]) : mixed
Parameters
- $parsedUrl : mixed
- $removeScheme : mixed = false
- $stopBefore : mixed = ''
Return values
mixed —decodeUtf8Url()
private
decodeUtf8Url(mixed $url) : mixed
Parameters
- $url : mixed
Return values
mixed —encodeUtf8Url()
Encode a UTF-8 Encoded URL and urlencode it while leaving control characters in tact.
private
encodeUtf8Url(mixed $url) : string
(It can also work with single byte encodings, but its purpose is to supply UTF-8 urls on non UTF-8 forums.)
Parameters
- $url : mixed
Return values
string —getActualEncoding()
Look up the passed encoding.
private
getActualEncoding(mixed $encoding) : mixed
Automatically handles the idiom that a blank (usually default) value means to use the default character set.
Parameters
- $encoding : mixed
Return values
mixed —getCanonicalBrowserEncoding()
Look up the canonical charset from the map based on
private
getCanonicalBrowserEncoding(mixed $charset) : mixed
Parameters
- $charset : mixed
Return values
mixed —toCharsetInternal()
Converts a variable from one character encoding to another.
private
toCharsetInternal(string|array<string|int, mixed> $in, string $in_encoding, string $target_encoding) : string|array<string|int, mixed>
If the variable is a string it is converted. If it is array will attempt to recurse over it and convert any string values located. Any other types will be returned unchanged.
Note that the caller is responsible for ensuring that the charsets match the canonical charset including case
Parameters
- $in : string|array<string|int, mixed>
-
-- The variable to convert
- $in_encoding : string
-
-- The source encoding (must be one of the mapped canonical browser values)
- $target_encoding : string
-
-- The target encoding (must be one of the mapped canonical browser values)
Return values
string|array<string|int, mixed> —The converted variable.