Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Anil Bawa-Cavia authored and cavvia committed Oct 11, 2013
1 parent ae6917e commit a5c0105
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 28 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2013 Anil Bawa-Cavia
Copyright (c) 2013 Artsy, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
Expand Down
58 changes: 31 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,66 @@
Hipster
Forgetsy
=======

__Note__: This is a draft readme. Implementation is on-going. I will remove this notice when the interface described below is fully operational.

Hipster is a trending library designed to track temporal trends in non-stationary categorical distributions. It uses [forget-table](https://github.com/bitly/forgettable/) style data structures which decay observations over time. Using two such sets decaying over different time periods, it picks up on temporal trends, forgetting historical data responsibly.
Forgetsy is a trending library designed to track temporal trends in non-stationary categorical distributions. It uses [forget-table](https://github.com/bitly/forgettable/) style data structures which decay observations over time. Using two such sets decaying over different lifetimes, it picks up on time differentials, whilst forgetting historical data responsibly.

Trends are encapsulated by a construct named _Delta_. A _Delta_ consists of two sets of counters, each of which implements exponential decay of the form:

![equation](http://latex.codecogs.com/gif.latex?X_t_1%3DX_t_0%5Ctimes%7Be%5E%7B-%5Clambda%5Ctimes%7Bt%7D%7D%7D)

Where the inverse of the _decay rate_ (lambda) is the mean lifetime of an observation in the set, expressed in time units. By normalising such a set by a set with a slower decay rate, we obtain a temporal trending score for each category in a distribution.
Where the inverse of the _decay rate_ (lambda) is the mean lifetime of an observation in the set. By normalising such a set by a set with a slower decay rate, we obtain a trending score for each category in a distribution.

Hipster avoids the need for sliding time windows and explicit rolling counts, as observations naturally decay away over time. It's designed for heavy writes and sparse reads, as it implements decay at read time.
Forgetsy avoids the need for manually sliding time windows or explicitly maintaining rolling counts, as observations naturally decay away over time. It's designed for heavy writes and sparse reads, as it implements decay at read time.

Each set is implemented as a redis `sorted set`, and keys are scrubbed when a count is decayed to near zero, providing storage efficiency.

Hipster handles distributions with upto around 10^5 active categories, receiving dozens of writes per second, without much fuss. Its scalability is highly dependent on your redis deployment.
Forgetsy handles distributions with upto around 10^6 active categories, receiving hundreds of writes per second, without much fuss. Its scalability is highly dependent on your redis deployment.

It requires redis to be running on localhost at the default port (6379).

Usage
-----

Take, for example, a social network in which users can follow each other. You want to track trending users. You construct a one week delta, to capture trends in your follows data over one week periods:

follows_delta = Hipster::Delta.new('user_follows', t: 1.week)

```ruby
follows_delta = Forgetsy::Delta.new('user_follows', t: 1.week)
```
The delta consists of two sets of counters indexed by category identifiers. In this example, the identifiers will be user ids. One set decays over the mean lifetime specified by _t_, and another set decays over double the lifetime.

You can now add observations to the delta, in the form of follow events. Each time a user follows another, you increment the followed user id. You can also do this retrospectively:

follows_delta = Hipster::Delta.fetch('user_follows')
follows_delta.incr('UserFoo', date: 2.weeks.ago)
follows_delta.incr('UserBar', date: 2.weeks.ago)
follows_delta.incr('UserBar', date: 1.week.ago)
follows_delta.incr('UserFoo', date: 1.day.ago)
follows_delta.incr('UserFoo')

```ruby
follows_delta = Forgetsy::Delta.fetch('user_follows')
follows_delta.incr('UserFoo', date: 2.weeks.ago)
follows_delta.incr('UserBar', date: 10.days.ago)
follows_delta.incr('UserBar', date: 1.week.ago)
follows_delta.incr('UserFoo', date: 1.day.ago)
follows_delta.incr('UserFoo')
```
Providing an explicit date is useful if you are processing data asynchronously. You can also use `incr_by` to increment a counter in batches.

You can now consult your follows delta to find your top trending users:

a = follows_delta.fetch()
puts a

```ruby
puts follows_delta.fetch()
```
Will print:

{ 'UserFoo' => 0.789, 'UserBar' => 0.367 }

```ruby
{ 'UserFoo' => 0.789, 'UserBar' => 0.367 }
```
Each user is given a dimensionless score in the range [0..1] corresponding to the normalised follows delta over the time period.

Optionally fetch the top _n_ users, or an individual user's trending score:

follows_delta.fetch(n: 20)
follows_delta.fetch(bin: 'UserFoo')

```ruby
follows_delta.fetch(n: 20)
follows_delta.fetch(bin: 'UserFoo')
```
Contributing
------------

Just fork the repo and submit a pull request.

Copyright & License
-------------------
MIT license. See [LICENSE](LICENSE) for details.

(c) 2013 [Art.sy Inc.](http://artsy.github.com)

0 comments on commit a5c0105

Please sign in to comment.