vB_RelatedText_TfIdf
in package
uses
vB_Trait_NoSerialize
Calculate TfIdf values from a data array and a vocabulary object. This is structured to allow batch processing of data arrays without necesarily considering all of the data records at once, but the all documents need to be processed into the vocabulary object before using it for the transform. The vocabulary object keeps all of the "universe" level statistics in addition to an ordinal to word mapping.
Table of Contents
Methods
- __construct() : mixed
- __serialize() : array<string|int, mixed>
- __sleep() : array<string|int, mixed>
- __unserialize() : void
- __wakeup() : void
- transform() : void
- Transforms a data array in the form of [$recordid => [word vector array] to a TfIdf array
Methods
__construct()
public
__construct(vB_RelatedText_Vocabulary $vocabularly) : mixed
Parameters
- $vocabularly : vB_RelatedText_Vocabulary
-
-- the vocabularly that represents the universe of all documents to be considered.
__serialize()
public
__serialize() : array<string|int, mixed>
Return values
array<string|int, mixed>__sleep()
public
__sleep() : array<string|int, mixed>
Return values
array<string|int, mixed>__unserialize()
public
__unserialize(array<string|int, mixed> $serialized) : void
Parameters
- $serialized : array<string|int, mixed>
__wakeup()
public
__wakeup() : void
transform()
Transforms a data array in the form of [$recordid => [word vector array] to a TfIdf array
public
transform(array<string|int, mixed> &$data) : void
Parameters
- $data : array<string|int, mixed>
-
-- transform in place to reduce memory footprint. We are not likly to need the frequence vectors after we get the TdIdf values.