Skip to content

Commit

Permalink
Merge pull request #10 from freakyzoidberg/doc-and-readme
Browse files Browse the repository at this point in the history
Add package docs and readme
  • Loading branch information
AlexanderSaydakov authored Jan 5, 2024
2 parents 2bba6e6 + d602cdd commit e80ce5f
Show file tree
Hide file tree
Showing 5 changed files with 100 additions and 46 deletions.
80 changes: 80 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

[![Go](https://github.com/apache/datasketches-go/actions/workflows/go.yml/badge.svg)](https://github.com/apache/datasketches-go/actions/workflows/go.yml)
[![Go Report Card](https://goreportcard.com/badge/github.com/apache/datasketches-go)](https://goreportcard.com/report/github.com/apache/datasketches-go)
[![Release](https://img.shields.io/github/release/apache/datasketches-go.svg)](https://github.com/apache/datasketches-go/releases)
[![GoDoc](https://godoc.org/github.com/apache/datasketches-go?status.svg)](https://godoc.org/github.com/apache/datasketches-go)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/apache/datasketches-go/blob/master/LICENSE)

# Apache<sup>&reg;</sup> DataSketches&trade; Core Go Library Component
This is the core Go component of the DataSketches library. It contains some of the sketching algorithms and can be accessed directly from user applications.

Note that we have a parallel core component for C++, Java and Python implementations of the same sketch algorithms,
[datasketches-cpp](https://github.com/apache/datasketches-cpp) and [datasketches-java](https://github.com/apache/datasketches-java).

Please visit the main [DataSketches website](https://datasketches.apache.org) for more information.

If you are interested in making contributions to this site please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us.



## Major Sketches
| Type | Implementation | Status |
|--------------|-------------------------|--|
| Cardinality | | |
| | CpcSketch ||
| | HllSketch | ⚠️ |
| | ThetaSketch ||
| | TupleSketch<S> ||
| Quantiles | | |
| | CormodeDoublesSketch ||
| | CormodeItemsSketch<T> ||
| | KllDoublesSketch ||
| | KllFloatsSketch ||
| | KllSketch<T> ||
| | ReqFloatsSketch ||
| Frequencies | ||
| | LongsSketch | ⚠️ |
| | ItemsSketch<T> | ⚠️ |
| Sampling | | |
| | ReservoirLongsSketch ||
| | ReserviorItemsSketch<T> ||
| | VarOptItemsSketch<T> ||

## Specialty Sketches
| Type | Interface Name | Status |
| --- | --- |---|
| Cardinality/FM85 | UniqueCountMap ||
| Cardinality/Tuple | FdtSketch ||
| | FdtSketch ||
| | ArrayOfDoublesSketch ||
| | DoubleSketch ||
| | IntegerSketch ||
| | ArrayOfStringsSketch ||
| | EngagementTest3 ||


❌ = Not yet implemented

⚠️ = Implemented but not officially released

=================

This code requires Go 1.21
9 changes: 9 additions & 0 deletions frequencies/items_sketch.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,15 @@
* limitations under the License.
*/


// Package frequencies is dedicated to streaming algorithms that enable estimation of the
// frequency of occurrence of items in a weighted multiset stream of items.
// If the frequency distribution of items is sufficiently skewed, these algorithms are very
// useful in identifying the "Heavy Hitters" that occurred most frequently in the stream.
// The accuracy of the estimation of the frequency of an item has well understood error
// bounds that can be returned by the sketch.
//
// These algorithms are sometimes referred to as "TopN" algorithms.
package frequencies

import (
Expand Down
46 changes: 0 additions & 46 deletions hll/README.md

This file was deleted.

6 changes: 6 additions & 0 deletions hll/hll_sketch.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
* limitations under the License.
*/

// Package hll is dedicated to streaming algorithms that enable estimation of the
// cardinality of a stream of items.
//
// HllSketch and Union are the public facing classes of this high performance implementation of Phillipe Flajolet's
// HyperLogLog algorithm[1] but with significantly improved error behavior and important features that can be
// essential for large production systems that must handle massive data.
package hll

import (
Expand Down
5 changes: 5 additions & 0 deletions main.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,9 @@
* limitations under the License.
*/

// Package datatasketches is the parent package for all sketch families and common code areas.
//
// The Sketching Core Library provides a range of stochastic streaming algorithms that are
// particularly useful when integrating this technology into systems that must deal with massive
// data. The library is designed to be easy to use, highly performant, and memory efficient.
package datatasketches

0 comments on commit e80ce5f

Please sign in to comment.