As I mentioned in the previous post (this one, the first one), we use Apache Thrift for lots of stuff at Via and we like it lots. Still, sometimes there are little things that get our hackles up. One such little thing is the way Thrift checks types and required fields in Python (and other scripting languages, I presume), which is to say the ways in which Thrift doesn’t really check types and required fields in Python. 

Let’s start with a little Thrift code (full file is on GitHub).

namespace py avi.thrift.validation.example

struct Point {
	1: required double x;
	2: required double y;

struct Place {
	1: required string name;
	2: required Point location;
	3: optional string review;

This sample defines a 2-d point structure and a structure for holding a “place” - a store or business, say - that has a name, location and an optional review. I compiled these structures to Python objects and started playing with them in ipython. 

In [1]: from avi.thrift.validation.example.ttypes import Place, Point

In [2]: p = Point(x=1.5, y=2.75)

In [3]: p.validate()

In [4]: p = Point(x=1.5)

In [5]: p.validate()

TProtocolException: Required field y is unset!

So validation of Point structures works as expected: when required fields are absent, validate() raises and exception. Let’s try something similar with the Place structure:

In [9]: place = Place(name="avis place")

In [10]: place.validate()

TProtocolException: Required field location is unset!

In [11]: place = Place(name="avis place", location=Point(x=1.5))

In [12]: place.validate()

In [13]: place = Place(name="avis place", location="asdf")

In [14]: place.validate()

In [15]:

Note that validate() complains when place.location is None, but it doesn’t mind if place.location is set to an invalid (incomplete) Point structure. Worse yet, it doesn’t care if I set place.location to a string instead of a Point. The validate() method is completely indifferent to type information - it simply checks that required fields are not None. On top of that, validate() does not descend recursively into the attributes of the Place structure (or any other structure) to check that they too are valid.

Coming from the world of Protocol Buffers, which provide deep, typesafe validation even in scripting languages (see IsInitialized()), I don’t like the behavior of the Thirft validate() method. It seems almost pointless to me. At the very least it should be renamed “shallow_validate()” or something of the sort. Now you might say “why not just serialize and deserialize the structure to check these things?” and indeed this works (sort of, more on this later). But Thrift serialization is a tad cumbersome, behaves somewhat funnily in Python and takes some time to boot, so this serialize/deserialize approach is only really appropriate for testing. Fortunately, it is fairly easy to write a little recursive validator for Thrift structures and I’ll talk about how I did that in the next post.

1 year ago