Summary:
- Add http_parser_pause() API. A callback may invoke this at any time.
This will cause http_parser_parse() to return indicating that it
parsed less than the number of requested bytes and set an error to
HBE_PAUSED. A paused parser with fail with HBE_PAUSED until it is
un-paused with http_parser_pause().
- Stop using 'state', 'header_state', 'index', and 'nread' shadow
variables and then updating their http_parser fields when we're done.
Instead, update the live values as we go. This will make it possible
to return from anywhere in the parser (say, due to EPAUSED) and have
valid/expected state.
- Update state before making callbacks so that if the want to pause,
we'll know the correct state already.
- Make sure that every callback has a state that uniquely identifies the
next step so that we can resume in the right place if we were suppoed
to be paused.
- Clean and re-factor up CALLBACK() macros.
- Use CALLBACK() macros for (almost) all callbacks; on_headers_complete
is still a special case. This includes on_body which we used to invoke
manually with a long run of bytes. We now use a 'body' mark and hit
its callback just like every other data callback.
- Clean up (most) gotos and replace with real states.
- Add some unit tests.
Fixes#70
- Add an http_parser_parse_url() method to parse a URL into its
constituent components. This uses the same underlying parser
as http_parser_parse() and doesn't do any data copies.
- Re-add the URL components in various test.c structures; validate
them when parsing.
- Get rid of support for these callbacks in http_parser_settings.
- Retain state transitions between different URL portions in
http_parser_execute() so that we're making the same correctness
guarantees as before.
- These are being removed because making multiple callbacks for the same
byte makes it more difficult to pause the parser.
- Add http_errno enum w/ values for many parsing error conditions. Stash
this in http_parser.state if the 0x80 bit is set.
- Report line numbers on error generation if the (new) HTTP_PARSER_DEBUG
cpp symbol is set. Increases http_parser struct size by 8 bytes in
this case.
- Add http_errno_*() methods to help turning errno values into
human-readable messages.
This also fixes test failures from the previous commit.
It also adds support for the LOCK method, which was previously
missing.
This brings the size of http_parser from 44 bytes to 32 bytes. It
also makes the code substantially shorter, at a slight cost in
craziness.
This saves space in the structure (it is now 28 bytes on x86), and
makes the handling of content_length more consistent between chunked
encoding and non-chunked-encoding.
The *_mark members were actually being used as just boolean values to
the next call of the parser. However, you can calculate if the mark
members should be set or not purely based on the current state, so
they can just be gotten rid of entirely.
This is mostly done by using sized types instead of enums, and
reordering fields to allow better packing.
I also moved the 'upgrade' field out of the PRIVATE section and into
the READ-ONLY section, as I believe that it is supposed to be
non-private.
This does have some slight functional changes in cases where
MAX_FIELD_SIZE is hit, specficially if a URL is made up of many
components, each of which is smaller than MAX_FIELD_SIZE, but the
total together is greater than MAX_FIELD_SIZE, then we now might not
call callbacks for any of the components (even the ones that are
smaller than 80kb). With the old code, it was possible to get a
callback for query_string and never get a callback for the URL (or at
least the end of the URL that is past 80kb), if the callback for the
URL would have been larger than 80kb.
(to be honest, I'm surprised that the MAX_FIELD_SIZE is implemented in
http_parser at all, instead of requiring that callers pay attention to
it, as it feels like it should be the caller's responsibility)
That is, for a request parser do this:
http_parser_init(my_parser, HTTP_REQUEST)
for a response parser do this:
http_parser_init(my_parser, HTTP_RESPONSE)
Then http_parse_requests() and http_parse_responses() both turn
into http_parer_execute().
This sacrifices
- a little space (10 bytes),
- a few extra calculations, and
- introduces a dependency on strncmp()
to dramatically simplify the code of parsing methods and support almost
arbitrary extension methods.
In the future I will do as NGINX does and not use strncmp but bit level
blob comparisons.