alis.feature_extraction.MinhashLSH.transform
alis.feature_extraction.MinhashLSH.transform#
- MinhashLSH.transform(db_text)[source]#
Return a dask bag containing the minhash signatures of documents in the given dask bag of text
- Parameters
- db_textdask.bag object
Dask bag object containing texts and their document identifier. Each element in the dask bag will be a tuple with the first element being the identifier and the second element being the text.
This allows us to perform mapping function via:
` db_text.map(lambda x: (x[0], map_function(x[1], **kwargs))) `
- Returns
- db_minhashdask.bag object
Dask bag object containing the minhash signature of each document along with its identifier.