Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create histogram with fixed bin width #46

Closed
netzwerg opened this issue Dec 31, 2016 · 2 comments
Closed

How to create histogram with fixed bin width #46

netzwerg opened this issue Dec 31, 2016 · 2 comments

Comments

@netzwerg
Copy link

netzwerg commented Dec 31, 2016

I am trying to create a histogram with a specific number of bins which should all have the same width (i.e. the domain should be uniformly divided):

I started by using x.ticks() (Option 1):

const binCount = 5;
const data = [1, 2, 3, 4, 4.7];
const [min, max] = d3.extent(data);
const x = d3.scaleLinear().domain([min, max]);

// Option 1: Tick-based thresholds

const histogram1 = d3.histogram()
  .domain(x.domain())
  .thresholds(x.ticks(binCount));
const bins1 = histogram1(data);

console.log("Option 1 bin widths: " + bins1.map(b => (b.x1 - b.x0)));
// 1,1,1,0.7000000000000002

The last bin is narrower than all other bins (to be expected based on how ticks works).

For histogram.thresholds([count]), the docs state that:

If a count is specified instead of an array of thresholds, then the domain will be uniformly divided into approximately count bins

This is not what I observe (Option 2):

// Option 2: Count-based thresholds

const histogram2 = d3.histogram()
  .domain(x.domain())
  .thresholds(binCount);

const bins2 = histogram2(data);

console.log("Option 2 bin widths: " + bins2.map(b => (b.x1 - b.x0)));
// 1,1,1,0.7000000000000002

Q: What's the proper invocation of histogram.thresholds([count])?

I currently use a manual array of thresholds (Option 3):

// Option 3: Range-based thresholds

const thresholds = d3.range(min, max, (max - min) / binCount);
const histogram3 = d3.histogram()
  .domain(x.domain())
  .thresholds(thresholds);

const bins3 = histogram3(data);

console.log("Option 3 bin widths: " + bins3.map(b => (b.x1 - b.x0)));
// 0.74,0.74,0.7399999999999998,0.7400000000000002,0.7400000000000002

This works, but seems overly complex for such a simple use case...

@mbostock
Copy link
Member

mbostock commented Dec 31, 2016

Regarding the first or last bin not being the same width as the others: that is because given n thresholds, n + 1 bins will be produced. The first bin (bins[0]) contains any value less than the first threshold (thresholds[0]); the last bin (bins[thresholds.length]) contains any value greater than or equal to the last threshold (thresholds[thresholds.length - 1]).

If you prefer, you can consider the effective width of the first and last bin as infinite, since they are bounded only by the input data (or more precisely the histogram’s domain) and not the thresholds. The bins[0].x0 is equal to domain[0] and bins[thresholds.length].x1 is equal to domain[1] given the histogram’s domain, which defaults to the extent of the input data.

If you want to force the domain to coincide exactly with the tick interval, you can use d3.range to manually create the ticks as you discuss in option 3. If you also want the tick thresholds to be “human-readable” (per the design of d3.ticks) you can nice your domain before computing the ticks, and then use scale.ticks:

var data = [1, 2, 3, 4, 4.7];
var count = 5;
var x = d3.scaleLinear().domain(d3.extent(data)).nice(count);
var histogram = d3.histogram().domain(x.domain()).thresholds(x.ticks(count));
var bins = histogram(data);
console.log("bin widths: " + bins.map(b => b.x1 - b.x0));

You could also use d3.tickStep to nice your histogram’s domain manually, but that’s exactly what linear.nice does.

@netzwerg
Copy link
Author

netzwerg commented Jan 1, 2017

Thank you for the detailed answer – I wish you a happy new year 🎉 🍾 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants