Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom aggregate functions #236

Closed
hashemi opened this issue Jun 30, 2017 · 3 comments
Closed

Support for custom aggregate functions #236

hashemi opened this issue Jun 30, 2017 · 3 comments

Comments

@hashemi
Copy link

hashemi commented Jun 30, 2017

GRDB currently supports defining custom SQL functions. However, it doesn't support custom aggregate functions. I looked into it and here are my notes.

In the C interface, a custom SQLite aggregate function is defined as a pair of functions: step and final. Step is called for each row and does not return a value but instead keeps track of the data somehow. Final is called at the end and returns the actual result of the aggregate.

For example, in an average aggregate function, when step is called and is passed a value from a row, it adds the value to a sum and increments a count variable. When final is called, it divides the sum by the count and returns an average as an SQL value.

In PHP, the user implements a couple of functions for step and finalize. The first argument the functions is passed by reference with each call and is used to keep track of state between calls to step and the last call to final:

function max_len_step(&$context, $string) {
    if (strlen($string) > $context) {
        $context = strlen($string);
    }
}

function max_len_finalize(&$context) {
    return $context;
}

In Python, it's done by writing a class that has step and finalize. step takes a single argument (database value) and does not return. finalize does not take any arguments and returns the final result. State is tracked in instance variables of the class:

import sqlite3

class MySum:
    def __init__(self):
        self.count = 0

    def step(self, value):
        self.count += value

    def finalize(self):
        return self.count

I think we could use a technique similar to either in Swift:

  1. We can have a struct or class that takes a pair of functions or closures. Each function accepts a context of a generic type as an inout. The step function is called with this context as well as a database value per row. The final function is called with the context only and is expected to return a database value. The GRDB API would expect an instance of this struct (or class) type.

Here's an example of what the interface looks like:

typealias DatabaseValue = Int

// Interface provided by GRDB
struct Aggregate1<Context> {
    let step: (inout Context?, DatabaseValue) -> ()
    let final: (inout Context?) -> (DatabaseValue)
}

// Simulates how the functions are called by GRDB/SQLite
extension Aggregate1 {
    func aggregate(_ rows: [DatabaseValue]) -> DatabaseValue {
        var context: Context? = nil
        for row in rows {
            step(&context, row)
        }
        return final(&context)
    }
}

// What the user implements
func step1(context: inout Int?, value: DatabaseValue) {
    if context == nil {
        context = value
    } else {
        context = context! + value
    }
}

func final1(context: inout Int?) -> DatabaseValue {
    return context! as! DatabaseValue
}

let agg1 = Aggregate1(step: step1, final: final1)

// 3
agg1.aggregate([1,1,1])
  1. We can create a protocol with a mutating step function that takes a singe database value and a final function that takes no arguments and returns a final value. The aggregate would be passed to the GRDB API as a type that is instantiated every time the aggregate function is called within SQLite.

This is an example of what the code would look like:

// Interface provided by GRDB
protocol AggregateProtocol {
	mutating func step(value: DatabaseValue)
	func final() -> DatabaseValue
}

// Simulates how the functions are called by GRDB/SQLite
extension AggregateProtocol {
	mutating func aggregate(_ rows: [DatabaseValue]) -> DatabaseValue {
		for row in rows {
			step(value: row)
		}
		return final()
	}
}

// What the user implements
struct Aggregate2: AggregateProtocol {
	var sum: DatabaseValue = 0
	mutating func step(value: DatabaseValue) {
		sum += value
	}
	
	func final() -> DatabaseValue {
		return sum
	}
}

var agg2 = Aggregate2()
print(agg2.aggregate([1,1,1])) // 3

The current implementation of custom functions uses the first method, except that there isn't the added complexity of this inout context variable to track state and there's only one function required which works well with Swift's trailing closure syntax.

I'd really appreciate your thoughts on this and whether this is something that would be considered for GRDB. The Swift/C interface is currently outside of my comfort level so I don't think I'd be able to implement it myself although I'm willing to give it a try if it's desirable. I do need the functionality for a project I'm working on now but may end up using a workaround.

@groue
Copy link
Owner

groue commented Jun 30, 2017

Hello @hashemi,

Thanks for your feature request, and your analysis!

For the record, the SQLite reference about custom aggregate functions is here: https://sqlite.org/c3ref/create_function.html

Do you wish somebody takes care of adding support for custom aggregate functions, or do you feel like submitting a pull request?

@groue
Copy link
Owner

groue commented Jun 30, 2017

I'e been fiddling a little bit, this requires a heavy load of Swift pointer/retain/release wizardry 😅 !

@groue
Copy link
Owner

groue commented Jul 1, 2017

Superseded by #237

@groue groue closed this as completed Jul 1, 2017
groue added a commit that referenced this issue Jul 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants