Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom Calculation functions #2900

Open
1 task done
looksystems opened this issue Jun 17, 2022 · 16 comments
Open
1 task done

Add support for custom Calculation functions #2900

looksystems opened this issue Jun 17, 2022 · 16 comments

Comments

@looksystems
Copy link

This is:

- [x] a feature request

What is the expected behavior?

Add ability to add own calculation functions / implementations.

What is the current behavior?

Currently, Calculation class (and it's usage within Spreadsheet) is private and cannot be overridden or extended without changes to the source code.

In particular, Calculation::$phpSpreadsheetFunctions is private and there are no methods available to add/override function.

Proposal: add the following method to the Calculation class...

    public static function registerFunction(string $name, array $definition)
    {
        self::$phpSpreadsheetFunctions[$name] = $definition;
    }

ALSO

Calculation::addDefaultArgumentValues only supports static methods (for Calculation functions):

** Proposal:** modify addDefaultArgumentValues...

    private function addDefaultArgumentValues(array $functionCall, array $args, array $emptyArguments): array
    {
        if (is_object($functionCall[0])) {
            $reflector = new ReflectionMethod($functionCall[0], $functionCall[1]);
        } else {
            $reflector = new ReflectionMethod(implode('::', $functionCall));
        }

        $methodArguments = $reflector->getParameters();

What are the steps to reproduce?

N/A

What features do you think are causing the issue

  • Formula Calulations

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

Yes

Which versions of PhpSpreadsheet and PHP are affected?

Latest PhpSpreadsheet and all supported versions of php.

@looksystems
Copy link
Author

PS. Use case: I would like to be able to define a function which accesses external data.

@MarkBaker
Copy link
Member

PS. Use case: I would like to be able to define a function which accesses external data.

Useful for you perhaps, but it's a lot more significant as a feature to make it useful for others. Why do you need this to be something that can be defined as a function for the calculation engine? The purpose of teh Calculation Engine is for evaluating spreadsheet formulae, so why do you need to add new functions that don't exist in Excel itself? What functions do you have in spreadsheet formulae that don't exist in MS Excel already?

I can see value with functions between different spreadsheet applications, e.g. ODS supports a regexp() function that isn't available in Excel, so being able to extend the base function set dynamically to support rexexp() if a loaded spreadsheet was an ODS spreadsheet has some value, because then it would be possible to calculate formulae using that function without getting a "function not found" error.

But I can also see problems when saving that in a format that doesn't support those functions... if a formula uses a custom function, then it will fail when that file is opened in MS Excel because the function isn't available in Excel (or unless we fully expand to support any UDF, which means supporting VBA as well).

@looksystems
Copy link
Author

Also related to pull request #2734.

@MarkBaker
Copy link
Member

I was just about to point you to the "flyweight" experiments. 🙂

Those are very much experimental; and while their purpose was to reduce memory usage in the calculation engine, they would make it a lot easier to extend the formula function set with custom functions. There's still problems with those experiments, they're not specially good implementations, the expected memory savings aren't visible with the test suite (although I believe they would be there with real world applications) and the 500 files in one folder isn't healthy for PHPStorm.

But I haven't dismissed the concept (which is why I haven't deleted the branches); and the principles (particularly for experiment #2) could be extended to allow registering additional functions more easily.

The problem of load an Ods, register regexp() as a "custom" function, then save as Xlsx still applies; and I don't have any idea how that could be resolved other than by creating an Excel UDF that implements the regexp() functionality via VBA

@looksystems
Copy link
Author

looksystems commented Jun 18, 2022

The purpose of teh Calculation Engine is for evaluating spreadsheet formulae,
so why do you need to add new functions that don't exist in Excel itself?

A great question - thank you.

I'm writing a custom reporting system that pulls in data from a range of different systems/sources.

However, how the data will be queried and presented needs' to be within the users' control.

With a few small modifications, phpSpreadsheet makes an excellent tool in that regard(!) as the calculation engine offers a great deal of power and flexibility.

I've made the mods above to a local copy and, as you say, they work for me.

In addition, I have overridden Spreadsheet::getSheetByName() and used it to "lazy load" sheets dynamically from sheets from both stored in a database and dynamically generated data from elastic search, mysql, postgresql, etc. I also hope to dynamically query data from other sources such as Google Sheets.

The ability to define custom functions allows users to pull in calculated values rather than whole data sets.

As proof of concept, I created an AGGREGATE function which takes the name of a data source, field and aggregation method.

When "lazy loading" sheets, I tried to define local named ranges for the imported data (ie. column names) but they don't appear to work as expected ie. if I reference the sheet ImportedWorksheet!COLUMN_NAME it won't resolve. I'm not sure if it's me or a bug.

As a workaround - and proof of concept only - I created a DCOL function which takes the name of the worksheet and column and returns the appropriate cell range.

Finally, when lazy loading, it would be useful to support the external worksheet syntax eg. [External]Worksheet!A1:C3 as this would allow greater control over where the data is dynamically pulled in from.

From you side, this would mean small changes and no change in functionality - just so long as there's a method to override to return the imported spreadsheet.

@looksystems
Copy link
Author

But I can also see problems when saving that in a format that doesn't support those functions

Agreed.

That would be the responsibility of the application, not phpSpreadsheet, as - out of the box - it wouldn't ship with any "non-excel" functions (other than perhaps ODS regex, as you mentioned above).

In my use case, I'm leveraging phpSpreadsheet primarily as a calculation engine and to assist with presentation, rather than for loading/saving sheets in different formats... in fact, I'm storing and retrieving only the formulas and number formats from a database of user defined sheets.

@looksystems
Copy link
Author

I was just about to point you to the "flyweight" experiments. 🙂

Thank you.

But I haven't dismissed the concept (which is why I haven't deleted the branches);
they're not specially good implementations, the expected memory savings aren't visible

I took a quick look at the pull request and couldn't figure out how it would save memory or time - for me, the big function list is a "flyweight" already. I'm glad to see my instincts weren't wrong there! ;-)

The main benefit I see in the experimental implementation was in making the function definitions more self contained ie. not having to refer back to the big function list when reviewing the list. Whether it's worth the additional work and I/O of one class per function is up for debate, I guess.

@looksystems
Copy link
Author

The purpose of teh Calculation Engine is for evaluating spreadsheet formulae,
so why do you need to add new functions that don't exist in Excel itself?

PS. While working on this, the thought occurred to me that refactoring the calculation engine (using Interfaces) would allow it to be used independently.

I grep'ed through the Calculation folder and only saw a few references to some utility classes and a few phpSpreadsheet specific classes... doesn't look like it would be a huge task (any problematic functions could be omitted from the abstracted calculation engine and defined as phpSpreadsheet specific implementations).

./BinaryComparison.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./DateTimeExcel/Date.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Date.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./DateTimeExcel/DateParts.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/DateValue.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Days.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Days360.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Difference.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Helpers.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Time.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/TimeParts.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/TimeValue.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/Week.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./DateTimeExcel/YearFrac.php:use PhpOffice\PhpSpreadsheet\Shared\Date as SharedDateHelper;
./Exception.php:use PhpOffice\PhpSpreadsheet\Exception as PhpSpreadsheetException;
./Financial/Coupons.php:use PhpOffice\PhpSpreadsheet\Shared\Date;
./Functions.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./Functions.php:use PhpOffice\PhpSpreadsheet\Shared\Date;
./Information/Value.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./Information/Value.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./Information/Value.php:use PhpOffice\PhpSpreadsheet\NamedRange;
./Information/Value.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./LookupRef/Address.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./LookupRef/ExcelMatch.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./LookupRef/Formula.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/Helpers.php:use PhpOffice\PhpSpreadsheet\Cell\AddressHelper;
./LookupRef/Helpers.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/Helpers.php:use PhpOffice\PhpSpreadsheet\DefinedName;
./LookupRef/Helpers.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./LookupRef/HLookup.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./LookupRef/HLookup.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./LookupRef/Hyperlink.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/Indirect.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/Indirect.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./LookupRef/Offset.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/Offset.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./LookupRef/Offset.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./LookupRef/RowColumnInformation.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef/RowColumnInformation.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./LookupRef/RowColumnInformation.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./LookupRef/Sort.php:use PhpOffice\PhpSpreadsheet\Cell\Coordinate;
./LookupRef/Sort.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./LookupRef/Unique.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./LookupRef/VLookup.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./LookupRef.php:use PhpOffice\PhpSpreadsheet\Cell\Cell;
./LookupRef.php:use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
./Statistical/Permutations.php:use PhpOffice\PhpSpreadsheet\Shared\IntOrFloat;
./Statistical/Trends.php:use PhpOffice\PhpSpreadsheet\Shared\Trend\Trend;
./TextData/CaseConvert.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./TextData/Format.php:use PhpOffice\PhpSpreadsheet\Shared\Date;
./TextData/Format.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./TextData/Format.php:use PhpOffice\PhpSpreadsheet\Style\NumberFormat;
./TextData/Search.php:use PhpOffice\PhpSpreadsheet\Shared\StringHelper;
./Web/Service.php:use PhpOffice\PhpSpreadsheet\Settings;

@MarkBaker
Copy link
Member

In my use case, I'm leveraging phpSpreadsheet primarily as a calculation engine and to assist with presentation, rather than for loading/saving sheets in different formats... in fact, I'm storing and retrieving only the formulas and number formats from a database of user defined sheets.

I guess that's a valid use case, especially as I actually provide a quadratic equation evaluator in the samples that does exactly that.

That would be the responsibility of the application, not phpSpreadsheet, as - out of the box - it wouldn't ship with any "non-excel" functions (other than perhaps ODS regex, as you mentioned above).

And I do have some concerns about that, and the number of developers that don't take that responsibility, because it does create extra work for me when developers raise issues for something that they should be responsible for

I'll certainly consider it, though there's two major updates to the calculation engine that I'm working on that I want to handle first

I took a quick look at the pull request and couldn't figure out how it would save memory or time - for me, the big function list is a "flyweight" already. I'm glad to see my instincts weren't wrong there! ;-)

In theory at least, it eliminates the large $phpSpreadsheetFunctions array completely, so that big memory usage is no longer required. The ExcelFunctions class maintains an array of functions that are actually used in calculations for a spreadsheet, but not the entire collection of 500 Excel functions.... as most spreadsheets only use a few different functions, that's potentially a big memory saving for the calculation engine

And the individual function signatures are maintained as structured objects implementing XlFunctionAbstract , so it's imposing a controlled structure and a self-contained definition... that should make it easier for us to add new functions whenever necessary, so maintenance benefits as well; and (with some reworking) it may make it easier to register custom functions, perhaps from separate github repos (PhpOffice/PhpSpreadsheet_Ods_Extended?) so the actual implementation code for custom functions could be composer'd in to individual developer requirements.

@MarkBaker
Copy link
Member

PS. While working on this, the thought occurred to me that refactoring the calculation engine (using Interfaces) would allow it to be used independently.

I have looked at that possibility a few times, likewise with separating out Readers and Writers... about 35% of user devs only use Readers, closer to 50% use only Writers, with the balance using both; but nearly 85% use the Calculation Engine (mostly indirectly, without even realising it) so extracting the Calc Engine itself doesn't seem particularly worthwhile.

Back in the PhpExcel days, it was much more tightly coupled to the core spreadsheet implementation (and had a cyclomatic complexity of over 21 million)... I've spent a lot of time over the years working on it to reduce that coupling and reduce its complexity, but I don't think it can ever be truly standalone.
It's come a long way singe the original version that simply called eval()

@looksystems
Copy link
Author

I'll certainly consider it

Thank you.

"Structured References", for working with formulae in tables

Great to hear that this feature is going to make the cut - as it will help with dynamically imported sheets.

It's come a long way singe the original version that simply called eval()

Certainly has! And is an invaluable project for the php community. Thank you.

@looksystems
Copy link
Author

but nearly 85% use the Calculation Engine (mostly indirectly, without even realising it) so extracting the Calc Engine itself doesn't seem particularly worthwhile.

Understood.

I guess I didn't mean that it would be come a separate project.

Rather that it could be refactored so that it can be used independently (and extended / adapted as required). An example of this is Laravel, where the framework is the "super" package, which provides the sub packages in it's composer.json and the sub packages (in the src folder) also contains thier own composer.json files. The whole project is managed with git submodules or subtrees (I can't recall which).

Most users pull in the main package - but - sub packages are also valuable to those that don't use the full framework (hence broadening the community / user base).

Anyway, not a high priority for me personally - I could just imagine (as I am sure you have) the engine being used in a wide variety of applications.

@looksystems
Copy link
Author

I'll certainly consider it

PS. The only change I "need" to be able to use the stock package is this one:

    private function addDefaultArgumentValues(array $functionCall, array $args, array $emptyArguments): array
    {
        if (is_object($functionCall[0])) {
            $reflector = new ReflectionMethod($functionCall[0], $functionCall[1]);
        } else {
            $reflector = new ReflectionMethod(implode('::', $functionCall));
        }

        \\ ...

As I use this evil hack to override phpSpreadsheetFunctions in my own Spreadsheet class:

    public function registerFunction(string $name, array $definition)
    {
        $engine = $this->getCalculationEngine();

        $name = strtoupper($name);
        $functions = $engine->getFunctions();
        $functions[$name] = $definition;

        $refObject   = new ReflectionObject($engine);
        $target = $refObject->getParentClass();
        $refProperty = $target->getProperty('phpSpreadsheetFunctions');
        $refProperty->setAccessible(true);
        $refProperty->setValue($engine, $functions);

        return $this;
    }

@robertotremonti
Copy link

+1 on this request.
I would like to add custom functions to the Calculation Engine without touching the codebase.
We use PhpSpreadsheet library to provide formula calculations using an Excel-like syntax.
We don't parse/import Excel files, the formula is typed in a HTML textarea.

@looksystems
Copy link
Author

We use PhpSpreadsheet library to provide formula calculations using an Excel-like syntax.

@robertotremonti that's our use case also - as mentioned, for the time being, the hack above works for us. On the output side, we currently use only number formatting and a custom html writer.

@robertotremonti
Copy link

Thanks for the suggestion @looksystems.
If anyway I need to modify the Calculation class in the library, it's easier to add my custom functions to the private $phpSpreadsheetFunctions array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants