- Add an http_parser_parse_url() method to parse a URL into its
constituent components. This uses the same underlying parser
as http_parser_parse() and doesn't do any data copies.
- Re-add the URL components in various test.c structures; validate
them when parsing.
- Treat ' ' specially, as apparently IIS6.0 can send this in headers.
Allow this character through if we're not in strict mode.
- Move some test code around so that test indices don't break when
HTTP_PARSER_STRICT changes.
Fixes#13.
With HTTP/1.1, if neither Content-Length nor Transfer-Encoding is present,
section 4.4 of RFC 2616 suggests http-parser needs to read a response body
until the connection is closed (except the response must not include a body)
See also joyent/node#2457.
Fixes#72
- Get rid of support for these callbacks in http_parser_settings.
- Retain state transitions between different URL portions in
http_parser_execute() so that we're making the same correctness
guarantees as before.
- These are being removed because making multiple callbacks for the same
byte makes it more difficult to pause the parser.
- This was only used by CALLBACK() (which then cleared the mark anyway),
and the end of the http_parser_execute() body (after which they
go out of scope).
- Add http_errno enum w/ values for many parsing error conditions. Stash
this in http_parser.state if the 0x80 bit is set.
- Report line numbers on error generation if the (new) HTTP_PARSER_DEBUG
cpp symbol is set. Increases http_parser struct size by 8 bytes in
this case.
- Add http_errno_*() methods to help turning errno values into
human-readable messages.
- When handling upgraded bodies, http_parser_execute() used to return
one fewer bytes parsed than expected. This caused the final LF to be
interpreted by the caller as part of the body.
- Add a bunch of upgrade body unit tests.
Normal value cb is called for subsequent lines. LWS is skipped.
Note that \t whitespace character is now supported after header field name.
RFC 2616, Section 2.2
"HTTP/1.1 header field values can be folded onto multiple lines if the
continuation line begins with a space or horizontal tab. All linear
white space, including folding, has the same semantics as SP. A
recipient MAY replace any linear white space with a single SP before
interpreting the field value or forwarding the message downstream."
- Add IS_ALPHA(), IS_NUM(), IS_HOST_CHAR(), etc. macros for determining
membership in a character class. HTTP_PARSER_STRICT causes some of
these definitions to change.
- Support '_' character in hostnames in non-strict mode.
- Support leading digits in hostnames when the method is HTTP_CONNECT.
- Don't re-define HTTP_PARSER_STRICT in http_parser.h if it's already
defined.
- Tweak Makefile to run non-strict-mode unit tests. Rearrange non-strict
mode unit tests in test.c.
- Add test_fast to .gitignore.
Fixes#44
- This is non-spec behavior, but it appears that most HTTP servers
implicitly support non-ASCII characters when parsing path components.
Extend http-parser to allow this.
- Fill out slots [128, 256) in normal_url_char[] with 1 so that these
high octets are accepted in path components.
- Add unit test for paths that include such non-ASCII characters.
Fixes#37.
Check for overflow during chunk trailer by removing unnecessary check in macro PARSING_HEADER. This will force the parser to abort if the chunk trailer contains more than HTTP_MAX_HEADER_SIZE of data.
Without this change, it is possible to get an assertion to fail by
continuing to call http_parser_execute after it has returned an error.
Specifically, the parser could be called with parser->state ==
s_chunk_size_almost_done and parser->flags & F_CHUNKED set. Then,
F_CHUNKED could have been cleared, and an error could be hit. In this
case, the parser would have returned with F_CHUNKED clear, but
parser->state == s_chunk_size_almost_done, resulting in an assertion
failure on the next call.
There are alternate solutions possible, including just saving all of
the fields (state included) on error.
I didn't add a test case because this is a bit annoying to test, but I
can add one if necesssary.
acceptable_header[x] is always assigned to a variable of type char, so
the 'unsigned' is unnecessary.
The other arrays can be of type int8_t/uint8_t to save space.
Yay valgrind testing
I don't believe that this actually mattered at all, because state was
initialized correctly, and flags would be set to 0 almost immediately
anyways.
This matters because char is signed by default on x86, so bytes with
values above 127 could have theoretically survived a pass through
lowcase (assuming that there was some non-zero data before the lowcase
array).
This also fixes test failures from the previous commit.
It also adds support for the LOCK method, which was previously
missing.
This brings the size of http_parser from 44 bytes to 32 bytes. It
also makes the code substantially shorter, at a slight cost in
craziness.
This saves space in the structure (it is now 28 bytes on x86), and
makes the handling of content_length more consistent between chunked
encoding and non-chunked-encoding.
This fixes a possible issue where a very large body (one that involves
> 80*1024 calls to http_parser_execute) will cause the next request
with that parser to return an error because it believes that this is
an overflow condition.
The *_mark members were actually being used as just boolean values to
the next call of the parser. However, you can calculate if the mark
members should be set or not purely based on the current state, so
they can just be gotten rid of entirely.
This does have some slight functional changes in cases where
MAX_FIELD_SIZE is hit, specficially if a URL is made up of many
components, each of which is smaller than MAX_FIELD_SIZE, but the
total together is greater than MAX_FIELD_SIZE, then we now might not
call callbacks for any of the components (even the ones that are
smaller than 80kb). With the old code, it was possible to get a
callback for query_string and never get a callback for the URL (or at
least the end of the URL that is past 80kb), if the callback for the
URL would have been larger than 80kb.
(to be honest, I'm surprised that the MAX_FIELD_SIZE is implemented in
http_parser at all, instead of requiring that callers pay attention to
it, as it feels like it should be the caller's responsibility)
That is, for a request parser do this:
http_parser_init(my_parser, HTTP_REQUEST)
for a response parser do this:
http_parser_init(my_parser, HTTP_RESPONSE)
Then http_parse_requests() and http_parse_responses() both turn
into http_parer_execute().
This sacrifices
- a little space (10 bytes),
- a few extra calculations, and
- introduces a dependency on strncmp()
to dramatically simplify the code of parsing methods and support almost
arbitrary extension methods.
In the future I will do as NGINX does and not use strncmp but bit level
blob comparisons.
Trashing the old Ragel parser (which was based on Mongrel) because it's
proving difficult to get the control I need in end-of-message cases.
Replacing this with a hand written parser using a couple tricks borrowed
from NGINX. The new parser will be much more work to write, but should prove
faster and allow for better hacking.