Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

Anom detection needs at least 2 periods worth of data #15

Open
odp opened this issue Jan 10, 2015 · 13 comments
Open

Anom detection needs at least 2 periods worth of data #15

odp opened this issue Jan 10, 2015 · 13 comments

Comments

@odp
Copy link

odp commented Jan 10, 2015

str(bar)
'data.frame': 506 obs. of 2 variables:
$ timestamp: POSIXct, format: "2014-08-25 00:00:00" "2014-08-25 00:10:00" ...
$ count : num 40465895 54157589 34727655 38576160 36686470 ...

res = AnomalyDetectionTs(bar, direction='both', max_anoms=0.02, plot=TRUE)
Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, :
Anom detection needs at least 2 periods worth of data

What's the definition of period here? The data contains a time series for about 4 days with granularity of 10 minutes.

Posting the data frame "bar" here
https://www.dropbox.com/s/1j263k6srq18qpp/bar.Rda?dl=0

@odp
Copy link
Author

odp commented Jan 11, 2015

After debugging..
When the granularity is decided as "min" by get_gran() we set period = 24*60 = 1440, that is we set number of observations to one per minute. Next we expect num_obs to be twice that of the period in detect_anoms()

if(num_obs < num_obs_per_period * 2) {
    stop("Anom detection needs at least 2 periods worth of data")
}

So the period is basically a day here and we are expecting more than 2*1440 = 2880 observations. It's implicit that the granularity should be one minute and we need at least two days worth of data.

Is there anything that can be done here when the granularity is multiple minutes?

@owenvallis
Copy link

Your totally right. The seasonality we were looking at was either daily (if the data was minutely or hourly), or weekly (if the data was daily). We added AnomalyDetectionVec() in order to support time series data of any granularity or period length. You can pass in the data column and manually specify the period length. Additional info on the Vec function can be found using help(AnomalyDetectionVec).

However, it would be nice for AnomalyDetectionTs() to support additional data granularities, or non-consecutive timestamps. Would you like to submit a patch, and @jhochenbaum and I can review?

@odp
Copy link
Author

odp commented Jan 11, 2015

thanks. I'll try to come up with something.

@elbamos
Copy link

elbamos commented Jan 12, 2015

I get this even with daily data, and I've confirmed using the internal AnomalyDetection::: functions that it is correctly recognizing that the period. Minimal example:

quantmod::getSymbols("^GSPC")
minimal <- data.frame(timestamp = index(GSPC), count = GSPC$GSPC.Adjusted)
AnomalyDetectionTs(minimal, longterm = TRUE)

@owenvallis
Copy link

Hi Elbamos,

I was able to reproduce your error, and I'll look into posting a patch soon. In the interim, you can run the data using the following:

AnomalyDetectionVec(minimal[[2]], period=7, longterm_period=30, plot=T)

That will give a weekly periodicity, and assumes a longterm stable state of 30 days. Both parameters can be changed, but the longterm_period must be at least (period*2)+1.

The other issue was that the timestamps are currently doubles, while the Ts function is expecting a POSIX type. We are checking for that, but I think we are going to re work this to return the timestamps in the same format as they were passed in.

Hope that helps. Cheers,

@elbamos
Copy link

elbamos commented Feb 8, 2015

I'm just wondering if this ever got fixed...

@rtjohn
Copy link

rtjohn commented Feb 9, 2016

From help(AnomalyDetectionVec):

period Defines the number of observations in a single period, and used during seasonal decomposition.

But what is the definition of a period? In the forecast package one uses a "frequency" argument which is specified in terms of a year: quarterly data would be frequency = 4, monthly data is frequency =12, daily data would be frequency = 365. What is the definition of "period" in this package? I have monthly data (1 row per month). What period do I use?

@owenvallis
Copy link

Hi rtjohn,

We used period here to denote the number of observations in a single cycle of the dominant seasonal component. This way we can define the number of observations per cycle without having to relate the number of cycles to some window, e.g., annual, quarterly, etc.

Best,

@rtjohn
Copy link

rtjohn commented Feb 9, 2016

I think there are some terminology confusions here. Time series data generally can have trend, seasonal, and/or cyclic components, right? So you want users to "define the number of observations per cycle" (cyclic component)? But the definition of a cyclic component is that they are not of a fixed period...
Also isn't a seasonal component is by it's nature defined by a fixed known window: weekly, monthly, quarterly, etc? I can tell you're trying to help me out but your answer to my question for clarity on definition for "period" makes me need clarity for your definition of "seasonal" and "cycle". See what I mean?

So again for monthly data with let's say a strong true "season"-al pattern (changing drastically from winter, to spring to fall to summer) the period argument should be 3 right? I'd have 3 periods in a single "cycle" as you'd call it?

@elbamos
Copy link

elbamos commented Feb 9, 2016

@rtjohn while I totally relate to the point you're making, and I've found the issue confusing also, im pretty sure the package uses the same conventions for cycle and period definition as base R does. Which is definitely not friendly, but the package should conform to the convention of the platform.

On Feb 9, 2016, at 1:52 PM, Ryan Johnson [email protected] wrote:

I think there are some terminology confusions here. Time series data generally can have trend, seasonal, and/or cyclic components, right? So you want users to "define the number of observations per cycle" (cyclic component)? But the definition of a cyclic component is that they are not of a fixed period...

Also isn't a seasonal component is by it's nature defined by a fixed known window: weekly, monthly, quarterly, etc? I can tell you're trying to help me out but your answer to my question for clarity on definition for "period" makes me need clarity for your definition of "seasonal" and "cycle". See what I mean?

So again for monthly data with let's say a strong true "season"-al pattern (changing drastically from winter, to spring to fall to summer) the period argument should be 3 right? I'd have 3 periods in a single "cycle" as you'd call it?


Reply to this email directly or view it on GitHub.

@owenvallis
Copy link

@rtjohn I see what you're saying. This Seasonal-Trend Decomposition paper was a big part of developing the package, and we based our naming conventions around their notion of "Seasonal, Trend, Residual" terminology. So in that case, Seasonal components would be the repeating cycles in the time series, the Trend would account for the variations from winter to summer, and the Residual should be the unimodal noise that we can use to detect the anoms. Also, Jordan and I have an audio background, so we tend to treat cycle as synonymous with period.

Let us know if we could improve the doc strings though.

@aaishaosman
Copy link

Hi all, I am quite new to this package and would like to use it for some analysis i am doing. I have data that is not regular ie. trading. Would i be able to use the AnomalyDetection to identify say irregular rices charged? If so, what would i set the "period" to, as on some days there might be a trade every second, or hour, and on some days none? i have data for roughly a year.

Any help will be greatly appreciated!
Thanks!

@asavla
Copy link

asavla commented May 26, 2017

Still get-Error in detect_anoms(all_data[[i]], k = max_anoms, alpha = alpha, num_obs_per_period = period, :
Anom detection needs at least 2 periods worth of data
Has this been resovled ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants