vBulletin v6.1.0

vB_RelatedText_TfIdf
in package
uses vB_Trait_NoSerialize

Calculate TfIdf values from a data array and a vocabulary object. This is structured to allow batch processing of data arrays without necesarily considering all of the data records at once, but the all documents need to be processed into the vocabulary object before using it for the transform. The vocabulary object keeps all of the "universe" level statistics in addition to an ordinal to word mapping.

Table of Contents

Methods

__construct()  : mixed
__serialize()  : array<string|int, mixed>
__sleep()  : array<string|int, mixed>
__unserialize()  : void
__wakeup()  : void
transform()  : void
Transforms a data array in the form of [$recordid => [word vector array] to a TfIdf array

Methods

__serialize()

public __serialize() : array<string|int, mixed>
Return values
array<string|int, mixed>

__sleep()

public __sleep() : array<string|int, mixed>
Return values
array<string|int, mixed>

__unserialize()

public __unserialize(array<string|int, mixed> $serialized) : void
Parameters
$serialized : array<string|int, mixed>

transform()

Transforms a data array in the form of [$recordid => [word vector array] to a TfIdf array

public transform(array<string|int, mixed> &$data) : void
Parameters
$data : array<string|int, mixed>

-- transform in place to reduce memory footprint. We are not likly to need the frequence vectors after we get the TdIdf values.


        
On this page

Search results