alis.feature_extraction.hashed_word_shingles
alis.feature_extraction.hashed_word_shingles#
- alis.feature_extraction.hashed_word_shingles(text, k, n, stop_words=None)[source]#
Return the list of word k-shingles from the given text based on a given stop words then hases it into a bucket with range 0 to 2**n - 1.
We define a shingle to be a stop word followed by the next k-1 words regardless of whether the next words were stop words or not.
- Parameters
- textstr
String of text whose word shingles are to be extracted
- kint
Shingle size
- nint
The number defining the bucket size 2**n - 1
- stop_wordsiterabe of str, default=None
List of stop words to be used. By default, uses the English stopwords defined by sklearn
- Returns
- shinglesiterable of int
A list containing the extracted word shingles in hashed representation.