Question about AST evaluation for Java #494

GeniusYx · 2024-07-01T09:37:22Z

Hello, I am testing my own model. The test set is java. There is an example:

The output of my model is {'invokemethod007_runIt': {'args': ['suspend', 'log'], 'out': 'debugLog'}}. When I execute the code, it seems that the code forces all the parameter values to be of type string: {'invokemethod007_runIt': {'args': "['suspend', 'log']," 'out']: 'debugLog'}} , but the real expected answer is {'invokemethod007_runIt': {'args': [['suspend','log']], 'out': ['debugLog']}} .

As a result, the final evaluation result error type is type mismatch. Do you have a solution? Thank you very much!

The text was updated successfully, but these errors were encountered:

HuanzhiMao · 2024-07-01T18:37:02Z

Hi @GeniusYx,

All parameters are forced to be string type because all evaluation scripts are in Python and casting the values to string can prevent them from erroring during the process. We then use tree-sitter (which takes in a string and outputs the converted value in Python syntax) to handle the parsing and type-checking part for the Java/JS test category. A detailed explanation can be found at #424.

From your description, it seems that the args parameter is supposed to be a list. Since this is in Java category, you should use the Java syntax for list in the possible answer as well (new String[]{xxxx}). Take a look at this entry from the BFCL possible answer and notice how it creates a list of class Point objects through new Point[]{xxxx}

Let me know if this solves your issue!

GeniusYx · 2024-07-02T03:10:16Z

Thank you very much for your reply！

I noticed that the answers in the java test set possible_answer folder are still in json format. Could you please provide the java format answers for the possible answers?

Thank you very much!

HuanzhiMao · 2024-07-02T19:04:53Z

There are no truly-java-format possible answers. The json format possible answers are loaded as the Python-type values. We use the Java tree-sitter to parse and convert model result into their corresponding Python-type values. And then the accuracy checking part is performed between the two Python-type values. In this way, we can re-use the whole evaluation pipeline for non-Python languages as well.

GeniusYx added the hosted-openfunctions-v2 Issues with OpenFunctions-v2 label Jul 1, 2024

HuanzhiMao added BFCL-General General BFCL Issue and removed hosted-openfunctions-v2 Issues with OpenFunctions-v2 labels Jul 8, 2024

HuanzhiMao closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about AST evaluation for Java #494

Question about AST evaluation for Java #494

GeniusYx commented Jul 1, 2024

HuanzhiMao commented Jul 1, 2024

GeniusYx commented Jul 2, 2024

HuanzhiMao commented Jul 2, 2024

Question about AST evaluation for Java #494

Question about AST evaluation for Java #494

Comments

GeniusYx commented Jul 1, 2024

HuanzhiMao commented Jul 1, 2024

GeniusYx commented Jul 2, 2024

HuanzhiMao commented Jul 2, 2024