Skip to content
Nico Williams edited this page Aug 17, 2023 · 16 revisions

jv is jq's internal JSON library. All jv objects are immutable, which is a requirement if you want to implement jq's backtracking while remaining approximately sane.

The jv/jq APIs are public, mostly -- we've not made any stability commitments, but we're not likely to make breaking changes either.

This means that functions operating on jv values tend to be referentially transparent: you can't pass an empty array to a function and expect it to be filled in when the function returns. If you want a function to return some information, it has to actually return a new object since it can't go modifying its arguments.

This means that some of the API usage will look a little odd. For instance, the functions jv_array_get() and jv_array_set() can be used to get and set elements of an array. The usage of jv_array_get() is fairly standard:

    jv elem = jv_array_get(array, 42);

But to use jv_array_set(), you have to know that it returns the new array. You can't ignore the return value.

    array = jv_array_set(array, 42, elem);

Kinds

The "kind" of a jv value is one of the following, defined in the enum jv_kind:

  • JV_KIND_INVALID
  • JV_KIND_NULL
  • JV_KIND_FALSE
  • JV_KIND_TRUE
  • JV_KIND_NUMBER
  • JV_KIND_STRING
  • JV_KIND_ARRAY
  • JV_KIND_OBJECT

All but the first represent normal JSON values. The next section explains invalid objects. You can check the kind of an object by calling jv_get_kind().

Functions

Memory management:

  • jv jv_copy(jv)

  • void jv_free(jv)

  • jv_mem_alloc()

  • jv_mem_alloc_unguarded()

  • jv_mem_calloc()

  • jv_mem_calloc_unguarded()

  • jv_mem_free()

  • jv_mem_realloc()

  • jv_mem_strdup()

  • jv_mem_strdup_unguarded()

  • jv_nomem_handler()

Scalar constructors:

  • jv jv_null()
  • jv jv_bool(int)
  • jv jv_true()
  • jv_false()
  • jv_invalid()

Array functions:

  • jv jv_array(void)
  • jv jv_array_append(jv, jv)
  • jv jv_array_concat(jv, jv)
  • jv jv_array_get(jv, int)
  • jv jv_array_indexes(jv, jv)
  • jv jv_array_length(jv)
  • jv jv_array_set(jv, int, jv)
  • jv jv_array_sized(int)
  • jv jv_array_slice(jv, int, int)

Object functions:

  • jv jv_object(void)
  • jv jv_object_delete(jv, jv)
  • jv jv_object_get(jv, jv)
  • jv jv_object_has(jv, jv)
  • int jv_object_iter()
  • jv jv_object_iter_key(jv, int)
  • int jv_object_iter_next(jv, int)
  • int jv_object_iter_valid(jv, int)
  • jv jv_object_iter_value(jv, int)
  • jv jv_object_length(jv)
  • jv jv_object_merge(jv, jv)
  • jv jv_object_merge_recursive(jv, jv)
  • jv jv_object_set(jv, jv, jv)

String functions:

  • jv jv_string(const char *)
  • jv jv_string_append_buf(jv, const char *, int)
  • jv jv_string_append_codepoint(jv, uint32_t)
  • jv jv_string_append_str(jv, const char *)
  • jv jv_string_concat(jv, jv)
  • jv jv_string_empty(int)
  • jv jv_string_explode(jv)
  • jv jv_string_fmt(const char *, ...)
  • unsigned long jv_string_hash(jv)
  • jv jv_string_implode(jv)
  • jv_string_indexes(jv, jv)
  • int jv_string_length_bytes(jv)
  • int jv_string_length_codepoints(jv)
  • jv jv_string_sized(const char *, int)
  • jv jv_string_slice(jv, int, int)
  • jv jv_string_split(jv, jv)
  • const char *jv_string_value()
  • jv jv_string_vfmt(const char *, va_list, ...)

Number functions:

  • jv jv_number(double)
  • const char *jv_number_get_literal(jv)
  • int jv_number_has_literal(jv)
  • double jv_number_value(jv)
  • jv jv_number_with_literal(const char *)

"Invalid" jv functions:

  • jv_invalid(void)
  • jv jv_invalid_get_msg(jv)
  • jv jv_invalid_has_msg(jv)
  • jv jv_invalid_with_msg(jv)

JSON parsing functions:

  • jv jv_parse(const char *)
  • void jv_parser_free(struct jv_parser *)
  • struct jv_parser* jv_parser_new(int)
  • jv jv_parser_next(struct jv_parser *)
  • int jv_parser_remaining(struct jv_parser *)
  • void jv_parser_set_buf(struct jv_parser *, const char *, int, int)
  • jv jv_parse_sized(const char *, int)

JSON formatting functions:

  • jv jv_dump(jv, int)
  • jv jv_dumpf(jv, FILE *, int)
  • jv jv_dump_string(jv, int)
  • jv jv_dump_string_trunc(jv, char *, size_t)
  • void jv_show(jv, int) (mainly for use in debuggers)

Comparison functions:

  • jv jv_cmp(jv, jv)
  • jv jv_equal(jv, jv)
  • jv_identical()

Misc. utility functions

  • jv jv_contains(jv, jv)
  • jv jv_delpaths(jv, jv)
  • jv jv_get(jv, jv)
  • jv_kind jv_get_kind(jv)
  • jv_getpath(jv, jv)
  • int jv_get_refcnt(jv)
  • jv jv_group(jv, jv)
  • jv jv_has(jv, jv)
  • int jv_is_integer(jv)
  • jv jv_keys(jv)
  • jv jv_keys_unsorted(kv)
  • const char *jv_kind_name(jv_kind)
  • jv jv_load_file()
  • jv jv_set(jv, jv, jv)
  • jv jv_setpath(jv, jv, jv)
  • jv jv_sort(jv, jv)

Errors

As well as the normal kinds of JSON values (array, bool, string, etc.), jv supports objects of kind JV_KIND_INVALID. Such objects are used to signal errors. Some of them carry error messages, which may be an arbitrary JSON value (you can check with jv_invalid_has_msg() and retrieve it with jv_invalid_get_msg()).

Generally, the kind-specific functions like jv_array_get() require that their argument be of the correct kind, and trigger an assertion failure aborting the program if not. That is, the program will crash if you pass a string to jv_array_get(): you must check that the argument is an array before using this function.

The functions are forgiving as long as the kinds are right. If you call jv_array_get() with an out-of-bounds index, then you will get an object of kind JV_KIND_INVALID back. This definitely indicates an invalid index; it is impossible to store an object of kind JV_KIND_INVALID in an array (or in anything else, for that matter).

You may find it more convenient to use the higher-level functions from <jv_aux.h>, which do more runtime error-checking and are implemented in terms of the primitives in <jv.h>. For instance, jv_get from <jv_aux.h> takes a value and an index. If the value is an array and the index is in-bounds, it returns the corresponding element. If the value is an object and the index is a valid string key, it returns the corresponding entry. Otherwise, it returns a JV_KIND_INVALID with a suitable error message.

Memory management

jv refcounts all heap-allocated objects. The usual objection to refcounting is that it fails when objects contain cycles. This is true. Luckily, due to the immutability of jv objects, it's impossible to create a cycle.

This is a pleasant property; as well as getting rid of pointer aliasing (a fertile source of bugs), it also limits us to acyclic heap structures. Since JSON does not support cyclic structures, this means that any jv object can be rendered as JSON.

Most jv functions are said to "consume" their arguments. That is, once you have passed the arguments to the function you may no longer use them and their memory may be reused. For instance, in the jv_array_get() example above, it is invalid to use the variable array after that line has executed. If you need to reuse a jv value, you can call jv_copy() to get a second copy of it. jv_copy() does not consume its argument.

It may seem like jv_copy() does a deep copy of the object. It certainly behaves in this way, and if you keep that model in mind when writing jv code you'll get the right answer. However, jv_copy is in fact very cheap, see below for how it works.

You must consume every jv value, otherwise there may be memory leaks (the tests won't pass if so, as they're run under valgrind). If you have nothing else to do with a value, pass it to jv_free(), which consumes its argument and does nothing with it.

Functions that do not decrement their jv arguments' references

  • jv_copy() (naturally)
  • jv_string_value()
  • jv_number_value()
  • jv_show()

Implementation

The jv API can be used as though every operation copied the entire object and jv_copy did a deep-copy. That's a useful mental model to program with, but it would be horrendously slow. Instead, jv uses a copy-on-write scheme for all objects.

In the worst case, the jv functions will need to copy their input object. However, most of the time there's no reason to keep the old version around as it will never be used again. In this case, the refcount of the input object will be 1 (only one reference) and the function would have to free it. So, all of the functions that return new version of an object (e.g. jv_array_set) first check whether the refcount is 1. If so, they know they can safely modify the object in-place without allocating any new memory. Thus, most of the time, jv_array_set won't copy anything.

jv_copy is then implemented by increasing the reference count by 1. This means that the object won't be modified by future calls to jv_array_set and the like. Instead, jv_array_set will copy the object and modify that.

Clone this wiki locally