benejson by codehero

benejson is a buffering SAX-style JSON parser library. The library package contains 3 major components:

Dependencies

libc type definitions, STL exceptions

C/C++ code builds with scons. Setup scripts (make, ebuilds, etc) contributions welcome.

MIT/X License

David Bender (codehero@gmail.com)

David Bender (codehero@gmail.com)
Mailing List (benejson@librelist.com)

ZERO use of malloc and global variables in core and pull parser
Additional support for NaN and Infinity values
User can provide an alphabetically sorted array of character strings against which the library will match map keys. The library will tag each value with the index of user's array, thus speeding up value identification within maps.
C core provides both callback and return (pulling) style parsing

Tree building libraries are convenient, but have many drawbacks:
Undesirable for large inputs, as the entirety of the input must be parsed before the user sees the first glimpse of data.
The input tree's structure could require conversion to a different tree structure.
They are typically built on some kind of lower level parsing library anyway.
Unless you are working directly in javascript, it is impossible to deliver a Tree class with general appeal.

Callback based libraries (cjson, yajl) typically require the user to define callback functions based on TYPE.
This is a poor approach because the parser is forced to eagerly interpret values before giving them to the user.
Thus, if the user wants to read a float, but the parser reads '1', then the parser calls the integer callback! yajl kludges around this by providing yet another callback.
Interpreting data is always based on SCOPE, so organizing callbacks by TYPE is incongruous.
These libraries waste cycles dynamically allocating memory to make large strings contiguous and in UTF-8 format.
Some users may want UTF-16 or UTF-32 strings. These libraries only deliver UTF-8 string.
Some libraries refuse to support NaN or +/- Infinity values. Although against the spec, I personally have a Real Demand for these.

benejson only requires the user to define a single callback. The callback provides the user with two sets of information:

benejson will classify each datum as a String, Numeric, or Special (true, false, null) or optionally (NaN, +/- Infinity).
The user calls helper functions to convert the datum to the desired type (float, int, UTF-8 string, etc).
There are sensible restrictions on conversion (ie, no conversion from Numeric to UTF-8 and vice versa).
benejson records the length of string fragments and reports how many bytes the fragment would require in UTF-8,UTF-16, and UTF-32.
benejson does not use a single piece of global state, especially not malloc(). There is no possibility of a memory leak.

benejson forces the user to do their own string allocation. To keep with the design goals, benejson may deliver strings in fragments. Depending on the circumstances this a good or bad thing.
It is good because simple pieces of information like names, email addresses, etc are generally bounded to reasonably small values.
The PullParser interface lets the user read into stack allocated strings, with no fear overflow.
It is bad because the user may be expecting a large string (such as an email) which benejson would be forced to deliver in chunks.
However, much like one would read a file in chunks, the same strategy may be employed with benejson and large strings.
The user may also opt to use a raw data buffer size exceeding the largest allowable input, which eliminates string fragmentation.

Please go to the Downloads page for the latest tagged release.

You can also clone the project with Git by running:

$ git clone git://github.com/codehero/benejson