https://pypi.org/project/LPatternIdentification/
The formal mathematical definition of the l-Pattern Identification Problem is as follows:
A finite alphabet Σ, two disjoint sets Good, Bad ⊆ Σn of strings and an integer l > 0
Is there a set of P patterns such that: |P| ≤ l and P → (Good, Bad)?
See demo.ipynb
pip install LPatternIdentification
from LPatternIdentification import feature_set, split_data, get_patterns_from_feature_set, reduce_pattern_set
Sort dataset by class labels
Separate observations into numpy ndarray
Separate labels into list
features = feature_set(observations, labels)
Here, classes are named 'Good' and 'Bad', the 'Good' class being the class of our interest.
split_point, Good, Bad = split_data(elections_X, elections_y)
Patterns = get_patterns_from_feature_set(Good, elections_feature_set)
7. Identify 'L' number of patterns such that all patterns are uniquely Good and not similar to Bad patterns
L = 7
Patterns_identified = reduce_pattern_set(Patterns, Bad, L)