alis.feature_extraction.MinhashLSH.__init__
alis.feature_extraction.MinhashLSH.__init__#
- MinhashLSH.__init__(shingle_size, num_shingle_bucket, num_hash, hash_size=None, stop_words=None, seed=1337)[source]#
Initialize the Minhash LSH signature extractor
- Parameters
- shingle_sizeint
Shingle size to use for hashed word shingle extraction
- num_shingle_bucketint
The number defining the bucket size for word shingles. This is equal to 2**n - 1
- num_hashint
Number of randomized hash functions to use in minhash signature extraction.
- hash_sizeint, default=None
Range of hte hash function. If not specified, this defaults to 2**32
- stop_wordsiterable of str, default=None
List of stop words to be used. By default, uses the English stopwords defined by sklearn
- seedint, default=1337
Random seed to use during random number generation