benejson is a buffering SAX-style JSON parser library. The library package contains 3 major components:
- benejson.js: SAX-style parser written in Javascript
- PullParser: A C++ class for JSON pull parsing
- Core: The parsing core with minimal dependencies
Dependencies
libc type definitions, STL exceptions
Install
C/C++ code builds with scons. Setup scripts (make, ebuilds, etc) contributions welcome.
License
MIT/X License
Authors
David Bender (codehero@gmail.com)
Contact
David Bender (codehero@gmail.com)
Mailing List (benejson@librelist.com)
Goals
- Easy integration into existing project
- By design, make memory leaks impossible
- By design, use zero mutexes in multithreaded environments
- By design, eliminate locale dependent parsing issues
- Provide intuitive and modular parser for use in Javascript
- Provide easy to use C++ library
- Provide extensibility to arbitrarily sized numbers.
Special Features
- ZERO use of malloc and global variables in core and pull parser
- Additional support for NaN and Infinity values
- User can provide an alphabetically sorted array of character strings against which the library will match map keys. The library will tag each value with the index of user's array, thus speeding up value identification within maps.
- C core provides both callback and return (pulling) style parsing
Applications
- Parse CouchDB responses without downloading entire web request
- Basis for JSON parsing bindings in python, ruby, node.js
- Use in embedded systems
- jsonoise could be used to test input validation
What's Needed
- Add support for more build systems (make, Visual Studio, etc)
- Install packages (ebuild, .deb, .rpm, etc)
- Test scripts
- Efficiency measurements, benchmarks
Motivations
- Why not use an existing tree building library:
- Tree building libraries are convenient, but have many drawbacks:
- Undesirable for large inputs, as the entirety of the input must be parsed before the user sees the first glimpse of data.
- The input tree's structure could require conversion to a different tree structure.
- They are typically built on some kind of lower level parsing library anyway.
- Unless you are working directly in javascript, it is impossible to deliver a Tree class with general appeal.
- Why not use an existing callback library:
- Callback based libraries (cjson, yajl) typically require the user to define callback functions based on TYPE.
- This is a poor approach because the parser is forced to eagerly interpret values before giving them to the user.
- Thus, if the user wants to read a float, but the parser reads '1', then the parser calls the integer callback! yajl kludges around this by providing yet another callback.
- Interpreting data is always based on SCOPE, so organizing callbacks by TYPE is incongruous.
- These libraries waste cycles dynamically allocating memory to make large strings contiguous and in UTF-8 format.
- Some users may want UTF-16 or UTF-32 strings. These libraries only deliver UTF-8 string.
- Some libraries refuse to support NaN or +/- Infinity values. Although against the spec, I personally have a Real Demand for these.
- Why use benejson:
- benejson only requires the user to define a single callback. The callback provides the user with two sets of information:
- Change in JSON context (up/down the stack, array or map).
- How many data read.
- benejson will classify each datum as a String, Numeric, or Special (true, false, null) or optionally (NaN, +/- Infinity).
- The user calls helper functions to convert the datum to the desired type (float, int, UTF-8 string, etc).
- There are sensible restrictions on conversion (ie, no conversion from Numeric to UTF-8 and vice versa).
- benejson records the length of string fragments and reports how many bytes the fragment would require in UTF-8,UTF-16, and UTF-32.
- benejson does not use a single piece of global state, especially not malloc(). There is no possibility of a memory leak.
- benejson caveats:
- benejson forces the user to do their own string allocation. To keep with the design goals, benejson may deliver strings in fragments. Depending on the circumstances this a good or bad thing.
- It is good because simple pieces of information like names, email addresses, etc are generally bounded to reasonably small values.
- The PullParser interface lets the user read into stack allocated strings, with no fear overflow.
- It is bad because the user may be expecting a large string (such as an email) which benejson would be forced to deliver in chunks.
- However, much like one would read a file in chunks, the same strategy may be employed with benejson and large strings.
- The user may also opt to use a raw data buffer size exceeding the largest allowable input, which eliminates string fragmentation.
- The C core seems too complicated for ordinary use.
- Use the C++ Pull Parser
- I will implement an easier C++ callback interface if necessary.
- Build a binding for your favorite scripting language.
Download
Please go to the Downloads page for the latest tagged release.
You can also clone the project with Git
by running:
$ git clone git://github.com/codehero/benejson