From 9b6b8c288cae972f5e219f30507314db4fe49f9f Mon Sep 17 00:00:00 2001 From: Sergey Shepelev Date: Mon, 27 Jul 2009 17:27:25 +0400 Subject: [PATCH] more detailed description of callbacks workflow --- README.md | 58 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 51 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index e0c2b9c..22bb0bc 100644 --- a/README.md +++ b/README.md @@ -52,13 +52,6 @@ When data is received on the socket execute the parser and check for errors. // handle error. usually just close the connection } -During the `http_parser_execute()` call, the callbacks set in `http_parser` -will be executed. The parser maintains state and never looks behind, so -buffering the data is not necessary. If you need to save certain data for -later usage, you can do that from the callbacks. (You can also `read()` into -a heap allocated buffer to avoid copying memory around if this fits your -application.) - Scalar valued message information such as `status_code`, `method`, and the HTTP version are stored in the parser structure. This data is only temporarlly stored in `http_parser` and gets reset on each new message. If @@ -74,6 +67,57 @@ need to inspect the body. Decoding gzip is non-neglagable amount of processing (and requires making allocations). HTTP proxies using this parser, for example, would not want such a feature. +Callbacks +--------- + +During the `http_parser_execute()` call, the callbacks set in `http_parser` +will be executed. The parser maintains state and never looks behind, so +buffering the data is not necessary. If you need to save certain data for +later usage, you can do that from the callbacks. + +There are two types of callbacks: + +* notification `typedef int (*http_cb) (http_parser*);` + Callbacks: on_message_begin, on_headers_complete, on_message_complete. +* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);` + Callbacks: (requests only) on_path, on_query_string, on_uri, on_fragment, + (common) on_header_field, on_header_value, on_body; + +In case you parse HTTP message in chunks (i.e. `read()` request line +from socket, parse, read half headers, parse, etc) your data callbacks +may be called more than once. Http-parser guarantees that data pointer is only +valid for the lifetime of callback. You can also `read()` into a heap allocated +buffer to avoid copying memory around if this fits your application. + +Reading headers may be a tricky task if you read/parse headers partially. +Basically, you need to remember whether last header callback was field or value +and apply following logic: + + /* on_header_field and on_header_value shortened to on_h_* + ------------------------ ------------ -------------------------------------------- + | State (prev. callback) | Callback | Description/action | + ------------------------ ------------ -------------------------------------------- + | nothing (first call) | on_h_field | Allocate new buffer and copy callback data | + | | | into it | + ------------------------ ------------ -------------------------------------------- + | value | on_h_field | New header started. | + | | | Copy current name,value buffers to headers | + | | | list and allocate new buffer for new name | + ------------------------ ------------ -------------------------------------------- + | field | on_h_field | Previous name continues. Reallocate name | + | | | buffer and append callback data to it | + ------------------------ ------------ -------------------------------------------- + | field | on_h_value | Value for current header started. Allocate | + | | | new buffer and copy callback data to it | + ------------------------ ------------ -------------------------------------------- + | value | on_h_value | Value continues. Reallocate value buffer | + | | | and append callback data to it | + ------------------------ ------------ -------------------------------------------- + */ + +See http://gist.github.com/155877 for a partial example of reading in +headers. + Releases --------