alis.feature_extraction.word_shingles#

alis.feature_extraction.word_shingles(text, k, stop_words=None)[source]#

Return the list of word k-shingles from the given text based on a given stop words.

We define a shingle to be a stop word followed by the next k-1 words regardless of whether the next words were stop words or not.

Parameters
textstr

String of text whose word shingles are to be extracted

kint

Shingle size

stop_wordsiterabe of str, default=None

List of stop words to be used. By default, uses the English stopwords defined by sklearn

Returns
shinglesiterable of str

A list containing the extracted word shingles in a document.