utf8 strings, Cowboy bindings

After a quite note about changes to the compiler, I’m going to walk through an example using the Cowboy web server and jsone JSON-parser.

UTF8 String representation

A week or so ago I merged 0.10.5 changes from purescript master into the Erlang backend, and in doing so due to internal changes and discussion, I’ve decided to change the compiled representation of the String type (and hence string constants in source) from Erlang strings to UTF8 binaries. I think this is in keeping with the direction Erlang is going and will give good interoperation with Erlang libraries, it’s probably the “modern” thing to do. On the other hand it should be noted it is not in keeping with the main PureScript compiler, which views strings as lists of UTF16 code units (in particular, not code points, and including the possibility of lone surrogates etc).

Incidentally doing this has prompted fixes for various issues with source containing unicode characters, eg that will be compiled into atoms and variables in the output Erlang code. (For clarity, output is now officially UTF8 Erlang source as per recent Erlang releases).

OTP projects

A PureScript project typically contains a src/ directory at the root, containing PureScript source files, and generates compiled output in output/. The hello world example I showed before called erlc directly, but most Erlang projects follow the OTP standard project structure; in particular they will have a src/ directory at the top level containing Erlang source.

I don’t know the best solution, but for this example I’m using the following structure:

src/ - Erlang source files *.erl
ps/ - PureScript “project root”
- ps/src/ - PureScript source files

Building with Rebar3, and a Makefile which calls pserlc and copies all *.erl output files into src/. For this example cowboy and jsone are added to the rebar.config

Cowboy

I’ve made a start on some Cowboy bindings, to the 2.0.0 pre-release series of the cowboy HTTP server.

For the time being there is one caveat before we get started. The main application, and in particular the handler modules for cowboy routes, are defined in Erlang code. The application itself:

start(_StartType, _StartArgs) ->
    {ok, Pid} = pscowboytest_sup:start_link(),
    Routes = [ {
      '_',
      [
        {"/json/[...]", json_handler, []},
        {"/[...]", root_handler, []}
      ]
      } ],
    Dispatch = cowboy_router:compile(Routes),

    NumAcceptors = 10,
    TransOpts = [ {ip, {0,0,0,0}}, {port, 8081} ],
    ProtoOpts = [{env, [{dispatch, Dispatch}]}],

    {ok, _} = cowboy:start_http(the_http_listener,
        NumAcceptors, TransOpts, ProtoOpts),
    {ok, Pid}.

And a handler:

-module(root_handler).
-export([init/2, terminate/3]).

init(Req, _Opts) -> ((cowboyTest@ps:handlerM())(Req))(no_state).

terminate(_Reason, _Req, _State) -> ok.

So with that voodoo in hand, we can define a handler. The Cowboy bindings specify a Handler a type for handlers of request-state type a (which we won’t use here). Handler a is defined as Req -> a -> Tuple3 Ok Req a, and various functions are provided for inspecting and modifying a Req. The below handler uses path and qs to extract the path and query string, and reply to give the reply including headers and body.

handler :: forall a. Handler a
handler req state =
  let headers = tuple2 "content-type" "text/plain" : nil
      response = "Hello! path is " <> path req <> " and query string is " <> qs req
      req' = reply (StatusCode 200) headers response req
  in tuple3 ok req' state

Sure enough, when I hit the URL http://localhost:8081/hello-world?foobar with this application running, I see

Hello! Path is /hello-world and query string is foobar

Something more must be said of the type of reply. The reply function returns an updated Req object, which must be chained through subsequent calls. This is a little tedious, particularly as there is no benefit to be gained to these “functional updates” (as far as I know requests can’t be “forked”) - though with the v2 cowboy API there is no longer the need to chain this request object around for even simple read-only operations.

As an antidote to this I was toying with a monadic interface (as well as a little sugar):

infixl 6 tuple2 as ~~>

handlerM :: forall a. Handler a
handlerM = ReqM.handler $ do
  let headers = "content-type" ~~> "text/plain" : nil
  path <- ReqM.path
  qs <- ReqM.qs
  let response = "Hello! Path is " <> path <> " and query string is " <> qs
  ReqM.reply (StatusCode 200) headers response

JSON

As a slight hint of something more than a hello world route, how about a route returning JSON? The purescript-erl-json bindings use the Erlang jsone library for JSON parsing, taking much the same approach as purescript-argonaut, and presenting much of the same API, including combinators and EncodeJson/DecodeJson.

The following handler makes use of this JSON encoding, again to simply parrot back the supplied parameters:

handlerJson :: forall a. Handler a
handlerJson req state =
  let headers = tuple2 "content-type" "application/json" : nil
      resp = ( "path" := path req
            ~> "query" := qs req
            ~> jsonEmptyObject )
      req' = reply (StatusCode 200) headers (printJson resp) req
  in tuple3 ok req' state

Again hitting http://localhost:8081/json/hello-world?foobar gives back some JSON:

{"path":"\/json\/hello-world","query":"foobar"}

What now?

So we can now throw together an HTTP server, which doesn’t do very much, and encode some JSON. I think the JSON side of things is mostly OK, maybe the big hole is any ability to share code with a front-end using the main JS-backed PureScript compiler (ie share Encode/DecodeJson instances), as I haven’t constructed purescript-erl-jsone as a fork of purescript-argonaut using the same namespaces etc. Maybe that can be papered over.

An obvious TODO is building out the cowboy API coverage, which may be fairly mechanical. More interesting is finding a way of constructing handlers directly in PureScript (or even the main application?). There are two problems in general, one is constructing an output module which has the correct top level names and uses the correct function representation (i.e. uncurried functions), and the other is representing attributes to eg indicate a module as implementing a behaviour.

The code extracts in this post are taken from an example application which puts everything together. Its status I can only describe as “works on my machine”.