Large json-schema usage with Reason

rusticdev · 2019-01-30 18:04:27 UTC

Newbie Reason developer here!

I have a Js project which uses, for it’s serialization format, a json schema which ~1500 lines (58kB) and the ajv library is used at runtime for validation and command-line ajv for debugging and checking.

I am aware of bs-json and I also watched Sean Grove’s talk “Having your cake and eating it too – GraphQL in Reason | ReasonConf 2018”, where he demoed a JSON -> ATD app (atdgen).

Alas, GraphQL is not an option for this project, currently. However, atggen is pretty close to what I’m looking for. I skimmed through the docs at atd.readthedocs.io/en and I did not see any mention of Json Schema.

I also noticed bs-ajv and in the examples there, the schema is represented in Reason code, not as a schema.json file (my scenario).

What I am wondering is how a Reason user would approach bridging between Js and Reason in a large-ish json-schema case like this? What are the different options be for parsing and serializing json data with json schema validation, varying from the quick and dirty/unsafe, to laboriously creating nested type defs in Reason?

It probably would not take long code all the nested type defs in Reason (a day?) but then once that is done then there is pretty much no point in maintaining the schema.json file itself anymore- otherwise there would be 2 canonical definitions of the schema?

Just trying to get an idea of the options and the big picture. Thanks so much!

yawaramin · 2019-01-30 22:36:04 UTC

Leaving aside atdgen (I don’t know how well that works with Reason/BuckleScript), I guess your choices are to keep the JSON schema and use bs-ajv, or rewrite the schema as a JSON decoder using maybe bs-json. Personally, I would go for the former option–why rewrite what you already have, which is an industry standard, after all.

bs-ajv is an option but it should be rather simple to write bindings to exactly the Ajv calls that you need, e.g. to call this (JavaScript):

var valid = ajv.validate(schema, data);
if (!valid) console.log(ajv.errors);

You would need to write two bindings: one for validate and one for errors. I think the bindings would need to conform to the following signature:

module type Ajv = {
  let validate: (~schema: Js.Json.t, ~data: Js.Json.t) => bool;
  let errors: Js.undefined(Js.Json.t);
};

rusticdev · 2019-01-30 23:02:50 UTC

Yawar, thanks I had not thought of it like that. I need to learn more about Js.Json module. I’ll probably do what you suggested and keep the validation as separate from the bs-json usage. In the current app the JSON is used to load into a Redux store but it’s pretty messy. So bs-json will be useful as an alternative there I think.

Khady · 2019-01-31 00:30:08 UTC

Perfectly.

yawaramin · 2019-01-31 03:08:19 UTC

Thanks, enlightening!

Anyway, after seeing your message I searched for more, and found your post https://tech.ahrefs.com/getting-started-with-atdgen-and-bucklescript-1f3a14004081

It’s a great tutorial. So I have a rough understanding of atdgen now: it’s a schema description language and a set of code generators, currently for JSON and another, compressed binary format. So it’s similar to Thrift/Protobuf/etc. In this case the main benefit being that it generates JSON encoders and decoders. And after some setup you can get it to work with BuckleScript.

The problem is that to make this a reproducible setup, especially in your build environment, it looks like you’ll need to introduce OCaml and opam into your build. Perhaps install them in your Docker image and deploy that. Now depending on your Docker expertise and available time to invest in setup, this may be easy or hard.

The alternative is to use Ajv (pure JavaScript) to validate a JSON object parsed with Js.Json.parseExn (shipped with BuckleScript), and if it passes validation cast it and use it directly as the shape you need it to be. Here’s a small proof of concept:

let string = {|{"a": 1, "b": true}|};
let abJson = Js.Json.parseExn(string);
let printAPlusOne(objAB) = Js.log(objAB##a + 1);

abJson |> Obj.magic |> printAPlusOne;

Yeah, Obj.magic is essentially a dynamic cast and thus unsafe, but you can wrap this whole thing up in a nice safe API. Something like:

/* Store.rei */
type t;
type validationErrors = Js.Json.t;

/** [load filename] loads the contents of [filename]. */
let load: string => Js.Result.t(t, validationErrors);
let getA: t => int;
let getB: t => bool;

And you can implement the load function to do the read from disk, Ajv validation, and Obj.magic cast in the happy path and return the validation errors (or file read error or what have you) in the sad path. This is way less poking around in the build infrastructure, and you’re in a pure JavaScript setup.

IMHO it makes sense to use codegen if you’re not deploying on a JavaScript platform, but if you’re using BuckleScript that investment makes less sense.

Khady · 2019-01-31 03:35:54 UTC

If one can’t manage to install atdgen, one don’t deserve to ship an app that will be used by more than 1 person. Also atdgen can be compiled to javascript using jsoo (example here). So it can be compiled once and then one commit the js file to ones repo. Or one can just commit to the repo the files generated by atdgen (_t and _bs files in this repo). So many ways to have something working and no reproducibility issue. And woohoo, still a “pure JavaScript setup” (lol).

More seriously, one thing you missed is that atdgen generates types in addition to encoders an decoders. Those types can be complex, don’t have to be objects and can even be composed of types you define yourself. So you are basically sure you build a valid value just by creating a value that is correctly typed. Even better, you can share the same definition of a type between frontend and backend but use different representations. It seems more powerful than what ajv provides. Downside is that you can’t reuse your existing json schema.

yawaramin · 2019-01-31 03:58:04 UTC

If one can’t manage to install atdgen, one don’t deserve to ship an app that will be used by more than 1 person. … And woohoo, still a “pure JavaScript setup” (lol).

You’re sending the signal that your technical achievements put you on some kind of pedestal to belittle and make fun of people who want to keep things simple. I personally don’t accept this kind of behaviour and I strongly suggest you stop.

one thing you missed is that atdgen generates types in addition to encoders an decoders.

That’s not really a huge win in the circumstances. I would’ve had to first write down the types myself in the atdgen schema language. In fact it would have been surprising (and rendered the encoders and decoders impossible to compile) if it didn’t codegen the OCaml types from the atdgen schema as well.

Khady · 2019-01-31 05:23:15 UTC

I am sending the signal that what you say is FUD. Running 3 commands to get a binary isn’t hard. So stop putting words in my mouth and watch your tone. I don’t like your arrogant behaviour. I didn’t pretend to have competencies or ship amazing stuff. All I said was that atdgen works perfectly with reason and bucklescript. Then you feel in danger, want to start an argument and use terribly bad points to justify yourself. That’s on you.

If you don’t understand the benefits of high levels types in this kind of tool I can’t do much for you.

yawaramin · 2019-01-31 14:57:54 UTC

But it’s not just three commands, is it? Not even according to your own tutorial. And that doesn’t even talk about how make that a reproducible set up in your CI/CD pipeline. I’m pointing out that there are tradeoffs and potential hidden time and effort investments, and that’s somehow FUD?

All I said was that atdgen works perfectly with reason and bucklescript.

Come on. You know exactly what you said and why I called you out. It’s right here in this thread. Your selectively forgetting parts of your replies isn’t me putting words in your mouth, it’s you arguing in bad faith.

If you don’t understand the benefits of high levels types in this kind of tool I can’t do much for you.

I understood the benefits just fine; I’ve used both Protobuf and Thrift with all the codegen bells and whistles. Pointing out that there are costs along with the benefits, doesn’t mean that I don’t understand the benefits.