alis.feature_extraction.word_shingles
alis.feature_extraction.word_shingles#
- alis.feature_extraction.word_shingles(text, k, stop_words=None)[source]#
Return the list of word k-shingles from the given text based on a given stop words.
We define a shingle to be a stop word followed by the next k-1 words regardless of whether the next words were stop words or not.
- Parameters
- textstr
String of text whose word shingles are to be extracted
- kint
Shingle size
- stop_wordsiterabe of str, default=None
List of stop words to be used. By default, uses the English stopwords defined by sklearn
- Returns
- shinglesiterable of str
A list containing the extracted word shingles in a document.