Minhash LSH
Minhash LSH#
The traditional LSH appraoch to a hypothetical many-to-many document similarity task. The objective is to bucket similar documents together. The implementation is done through the LSH class which leverages on dask.bag functionality and methods to parallelize the banding technique. Specifically, the map (hash function) and reduce (bucketing) tasks.
Note: importing the model automatically initializes a dask client.
BY: Mike Dorosan, 2022
|
The LSH class for a many-to-many document similarity task. |