alis.feature_extraction.MinhashLSH.transform#

MinhashLSH.transform(db_text)[source]#

Return a dask bag containing the minhash signatures of documents in the given dask bag of text

Parameters
db_textdask.bag object

Dask bag object containing texts and their document identifier. Each element in the dask bag will be a tuple with the first element being the identifier and the second element being the text.

This allows us to perform mapping function via:

` db_text.map(lambda x: (x[0], map_function(x[1], **kwargs))) `

Returns
db_minhashdask.bag object

Dask bag object containing the minhash signature of each document along with its identifier.