-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Added IDK² and s-IDK² Anomaly Detector To Aeon #2465
Open
Ramana-Raja
wants to merge
43
commits into
aeon-toolkit:main
Choose a base branch
from
Ramana-Raja:s-idk-and-idk
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+241
−0
Open
Changes from 18 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
e9795f1
Added IDK² and s-IDK² anomaly detector to aeon
Ramana-Raja ff4b576
Added IDK to init
Ramana-Raja 7709a7d
Added IDK to docs
Ramana-Raja 7f2916f
Automatic `pre-commit` fixes
Ramana-Raja 18516df
Update _idk.py to update docs
Ramana-Raja dd36f8b
Automatic `pre-commit` fixes
Ramana-Raja 6734468
Update _idk.py to add get test param
Ramana-Raja b46a6fb
Automatic `pre-commit` fixes
Ramana-Raja ee81313
Update _idk.py to update axis
Ramana-Raja 4de22ff
Update _idk.py to remove univariate
Ramana-Raja c7f057a
Update _idk.py changed axis
Ramana-Raja c77f556
Update _idk.py to make test_param small
Ramana-Raja 6d8467d
Update _idk.py change width of test case to 1
Ramana-Raja 4faa551
Update _idk.py changes psi1 and psi2 test values
Ramana-Raja af6ea04
Update _idk.py to add extra random_state
Ramana-Raja 172fd80
Automatic `pre-commit` fixes
Ramana-Raja 8cef628
Update _idk.py to add random_state for test_param
Ramana-Raja 08d5ae8
Automatic `pre-commit` fixes
Ramana-Raja f78174e
test cases and changes have been added as requested by the moderators
Ramana-Raja 6d28f5e
Merge remote-tracking branch 'origin/s-idk-and-idk' into s-idk-and-idk
Ramana-Raja d6b1719
Automatic `pre-commit` fixes
Ramana-Raja 29f3348
added test_case random state
Ramana-Raja 0112c67
Automatic `pre-commit` fixes
Ramana-Raja 4e1ceab
fixed docs
Ramana-Raja 39d9292
Automatic `pre-commit` fixes
Ramana-Raja 4711ede
Updated docs
Ramana-Raja 2c11c68
Automatic `pre-commit` fixes
Ramana-Raja 8383533
Updated docs for test case
Ramana-Raja fc28ef3
Automatic `pre-commit` fixes
Ramana-Raja 270e0e1
Updated test_idk.py
Ramana-Raja 0319b5e
Automatic `pre-commit` fixes
Ramana-Raja 4bb39a4
Updated test_idk.py to add docs
Ramana-Raja e4b51d2
Automatic `pre-commit` fixes
Ramana-Raja e5d9585
updated random_state
Ramana-Raja 316a5d1
Automatic `pre-commit` fixes
Ramana-Raja cb7992e
Updated test.py
Ramana-Raja b319624
Automatic `pre-commit` fixes
Ramana-Raja 3cbc709
Updated test_idk.py
Ramana-Raja cac70c7
Automatic `pre-commit` fixes
Ramana-Raja a2f4bf0
Updated test_idk.py to make sliding and non sliding into 1
Ramana-Raja 1c4262b
Automatic `pre-commit` fixes
Ramana-Raja f91793b
Updated test_idk.py
Ramana-Raja 81b0f5b
Automatic `pre-commit` fixes
Ramana-Raja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,205 @@ | ||
"""IDK² and s-IDK² anomaly detector.""" | ||
|
||
import random | ||
|
||
import numpy as np | ||
|
||
from aeon.anomaly_detection.base import BaseAnomalyDetector | ||
|
||
|
||
class IDK(BaseAnomalyDetector): | ||
"""IDK² and s-IDK² anomaly detector. | ||
The Isolation Distributional Kernel (IDK) is a data-dependent kernel for efficient | ||
anomaly detection, improving accuracy without explicit learning. Its extension, | ||
IDK², simplifies group anomaly detection, outperforming traditional methods in | ||
speed and effectiveness. | ||
.. list-table:: Capabilities | ||
:stub-columns: 1 | ||
* - Input data format | ||
- univariate | ||
* - Output data format | ||
- anomaly scores | ||
* - Learning Type | ||
- unsupervised | ||
Parameters | ||
---------- | ||
psi1 : int | ||
Number of samples randomly selected in each iteration for the feature | ||
map matrix. | ||
psi2 : int | ||
Number of samples used for the second-stage feature map embedding. | ||
width : int | ||
Size of the sliding or fixed-width window for anomaly detection. | ||
t : int, default=100 | ||
Number of iterations (time steps) for random sampling to construct | ||
feature maps. | ||
sliding : bool, default=False | ||
Whether to use a sliding window approach. If True, computes scores | ||
for sliding windows; | ||
otherwise, processes fixed-width segments. | ||
random_state : int, Random state or None, default=None | ||
Notes | ||
----- | ||
This implementation is inspired by the Isolation Distributional Kernel (IDK) | ||
approach as detailed in [1]_. | ||
The code is adapted from the open-source repository [2]_. | ||
References | ||
---------- | ||
[1]Isolation Distributional Kernel: A New Tool for Kernel-Based Anomaly Detection. | ||
DOI: https://dl.acm.org/doi/10.1145/3394486.3403062 | ||
[2] GitHub Repository: | ||
IsolationKernel/Codes: IDK Implementation for Time Series Data | ||
URL: https://github.com/IsolationKernel/Codes/tree/main/IDK/TS | ||
""" | ||
|
||
_tags = { | ||
"capability:univariate": True, | ||
"capability:multivariate": False, | ||
"capability:missing_values": False, | ||
} | ||
|
||
def __init__( | ||
self, | ||
psi1, | ||
psi2, | ||
width, | ||
t=100, | ||
sliding=False, | ||
random_state=None, | ||
Ramana-Raja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
): | ||
self.psi1 = psi1 | ||
self.psi2 = psi2 | ||
self.width = width | ||
self.t = t | ||
self.sliding = sliding | ||
self.random_state = random_state | ||
super().__init__(axis=0) | ||
|
||
def __IK_inne_fm(self, X, psi, t=100): | ||
np.random.seed(self.random_state) | ||
random.seed(self.random_state) | ||
onepoint_matrix = np.zeros((X.shape[0], (int)(t * psi)), dtype=int) | ||
for time in range(t): | ||
sample_num = psi # | ||
sample_list = [p for p in range(len(X))] | ||
sample_list = random.sample(sample_list, sample_num) | ||
sample = X[sample_list, :] | ||
|
||
tem1 = np.dot(np.square(X), np.ones(sample.T.shape)) # n*psi | ||
tem2 = np.dot(np.ones(X.shape), np.square(sample.T)) | ||
point2sample = tem1 + tem2 - 2 * np.dot(X, sample.T) # n*psi | ||
|
||
sample2sample = point2sample[sample_list, :] | ||
row, col = np.diag_indices_from(sample2sample) | ||
sample2sample[row, col] = 99999999 | ||
radius_list = np.min(sample2sample, axis=1) | ||
|
||
min_point2sample_index = np.argmin(point2sample, axis=1) | ||
min_dist_point2sample = min_point2sample_index + time * psi | ||
point2sample_value = point2sample[ | ||
range(len(onepoint_matrix)), min_point2sample_index | ||
] | ||
ind = point2sample_value < radius_list[min_point2sample_index] | ||
onepoint_matrix[ind, min_dist_point2sample[ind]] = 1 | ||
|
||
return onepoint_matrix | ||
|
||
def __IDK(self, X, psi, t=100): | ||
point_fm_list = self.__IK_inne_fm(X=X, psi=psi, t=t) | ||
feature_mean_map = np.mean(point_fm_list, axis=0) | ||
return np.dot(point_fm_list, feature_mean_map) / t | ||
|
||
def _IDK_T(self, X): | ||
np.random.seed(self.random_state) | ||
random.seed(self.random_state) | ||
Ramana-Raja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
window_num = int(np.ceil(X.shape[0] / self.width)) | ||
featuremap_count = np.zeros((window_num, self.t * self.psi1)) | ||
onepoint_matrix = np.full((X.shape[0], self.t), -1) | ||
|
||
for time in range(self.t): | ||
sample_num = self.psi1 | ||
sample_list = [p for p in range(X.shape[0])] | ||
sample_list = random.sample(sample_list, sample_num) | ||
sample = X[sample_list, :] | ||
tem1 = np.dot(np.square(X), np.ones(sample.T.shape)) # n*psi | ||
tem2 = np.dot(np.ones(X.shape), np.square(sample.T)) | ||
point2sample = tem1 + tem2 - 2 * np.dot(X, sample.T) # n*psi | ||
|
||
sample2sample = point2sample[sample_list, :] | ||
row, col = np.diag_indices_from(sample2sample) | ||
sample2sample[row, col] = 99999999 | ||
|
||
radius_list = np.min(sample2sample, axis=1) | ||
min_dist_point2sample = np.argmin(point2sample, axis=1) # index | ||
|
||
for i in range(X.shape[0]): | ||
if ( | ||
point2sample[i][min_dist_point2sample[i]] | ||
< radius_list[min_dist_point2sample[i]] | ||
): | ||
onepoint_matrix[i][time] = ( | ||
min_dist_point2sample[i] + time * self.psi1 | ||
) | ||
featuremap_count[(int)(i / self.width)][ | ||
onepoint_matrix[i][time] | ||
] += 1 | ||
|
||
for i in range((int)(X.shape[0] / self.width)): | ||
featuremap_count[i] /= self.width | ||
isextra = X.shape[0] - (int)(X.shape[0] / self.width) * self.width | ||
if isextra > 0: | ||
featuremap_count[-1] /= isextra | ||
|
||
if isextra > 0: | ||
featuremap_count = np.delete( | ||
featuremap_count, [featuremap_count.shape[0] - 1], axis=0 | ||
) | ||
|
||
return self.__IDK(featuremap_count, psi=self.psi2, t=self.t) | ||
|
||
def _IDK_square_sliding(self, X): | ||
point_fm_list = self.__IK_inne_fm(X=X, psi=self.psi1, t=self.t) | ||
point_fm_list = np.insert(point_fm_list, 0, 0, axis=0) | ||
cumsum = np.cumsum(point_fm_list, axis=0) | ||
|
||
subsequence_fm_list = (cumsum[self.width :] - cumsum[: -self.width]) / float( | ||
self.width | ||
) | ||
|
||
return self.__IDK(X=subsequence_fm_list, psi=self.psi2, t=self.t) | ||
|
||
def _predict(self, X): | ||
if self.sliding: | ||
return self._IDK_square_sliding(X) | ||
return self._IDK_T(X) | ||
|
||
@classmethod | ||
def _get_test_params(cls, parameter_set="default"): | ||
"""Return testing parameter settings for the estimator. | ||
Parameters | ||
---------- | ||
parameter_set : str, default="default" | ||
Name of the set of test parameters to return, for use in tests. If no | ||
special parameters are defined for a value, will return `"default"` set. | ||
Returns | ||
------- | ||
params : dict | ||
Parameters to create testing instances of the class. | ||
Each dict are parameters to construct an "interesting" test instance, i.e., | ||
`MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance. | ||
""" | ||
return { | ||
"psi1": 4, | ||
"psi2": 2, | ||
"width": 1, | ||
"random_state": 1, | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,7 @@ Detectors | |
PyODAdapter | ||
STOMP | ||
STRAY | ||
IDK | ||
|
||
Base | ||
---- | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t
is not very descriptive. I would call itmax_iter
similar on how sklearn does it, e.g. for kmeansThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I utilized 't' as it was implemented this way in the original code