NAME Qstruct - Qstruct perl interface SYNOPSIS use Qstruct; Qstruct::load_schema(q{ ## This is my schema qstruct MyPkg::PhoneNumber { number @0 string; ext @1 uint8; } qstruct MyPkg::User { id @0 uint64; name @1 string; is_admin @3 bool; is_moderator @4 bool; emails @2 string[]; account_ids @5 uint64[]; phones @7 MyPkg::PhoneNumber[]; sha256_hash @6 uint8[32]; } }); ## Build a new user message: my $message = MyPkg::User->encode({ name => "jimmy", id => 100, is_admin => 1, emails => [ 'jimmy@example.com', 'jim@jimmy.com' ], sha256_hash => "\xFF"x32, phones => [ { number => '555-1212' }, { number => '1234567', ext => 2 }, ], }); ## Load a user message: my $user = MyPkg::User->decode($message); ## Scalar accessors: print "User id: " . $user->id . "\n"; print "User name: " . $user->name . "\n"; print "*** ADMIN ***\n" if $user->is_admin; print "1st phone #: " . $user->phones->[0]->number . "\n"; ## Zero-copy access to strings/blobs: $user->name(my $name); ## Zero-copy array iteration: $user->emails->foreach(sub { print "EMAIL is ", $_[0], "\n"; }); ## Zero-copy nested qstructs: $user->phones->foreach(sub { $_[0]->number(my $number); print $number, "\n"; }); DESCRIPTION Qstruct is a binary serialisation format that requires a schema. This documentation describes the Qstruct perl module which is the reference dynamic-language implementation for qstructs. The specification for the qstruct format is documented here: Qstruct::Spec. Because in qstructs the "wire" and "in-memory" formats are the same, the "encode" and "decode" functions are somewhat mis-named. As soon as the object is built in memory it is ready to be copied out to disk or the network. Also, as soon as it is read or mapped into memory it is ready for accessing. So the "encode" and "decode" operations are mostly no-ops. This module is designed to be particularly efficient for reading qstructs. Numerics, strings, blobs, nested qstructs, and arrays of these types can all be randomly-accessed or iterated over without reading or parsing any unrelated parts of the message (qstructs are lazy). Furthermore, all copies of message data can be avoided -- only pointers into the message memory are recorded (qstructs are zero-copy). The encoder in this module is not exactly slow, it just does more memory-allocations and copying than an optimised implementation would. The compiled static interface will probably be optimised for encoding eventually. ZERO-COPY As shown in the synopsis, fields can be accessed simply by calling their corresponding methods on the objects representing decoded messages: ## Field access (copying) my $name = $user->name; However, due to the semantics of return values in perl, the above line of code allocates new memory and copies the "name" field into it. This is inefficient for two reasons. Firstly, the process of copying takes time. This time is proportional to how large the data is. Often this copying is unnecessary and therefore an inefficient use of time. Secondly, copying is inefficient because impacts your memory system. If you aren't copying the data, you aren't paging it in from disk, pulling it into your filesystem/CPU caches, pushing other things out of cache, or exercising your CPU's translation lookaside buffer (TLB). Qstruct is always lazy when it comes to memory access: It will only access the bare-minimum memory required to fulfill accessor requests. If you wish to avoid copying however, you need to pass an "output scalar" into the accessor method: ## Field access (zero-copy) $user->name(my $name); Passing these output scalars into methods to avoid copying is a common theme throughtout the Qstruct perl module interface. This module is designed to work with modules like File::Map which map files into perl strings without actually copying them into memory, and also with modules like LMDB_File which interact with transactional in-process databases that support zero-copy. When combining Qstructs with these modules you can have true zero-copy access to a filesystem or database from your high-level perl code just as conveniently as with copying interfaces. For more information on zero-copy, see the Test::ZeroCopy module and the "t/zerocopy.t" test in this distribution that uses it. ARRAYS When you call the accessor method on an array it returns a special overloaded object of type "Qstruct::ArrayRef". This object can (obviously) be accessed as an array reference: ## Array random access (copying) my $first_email = $user->emails->[0]; Because of the lazy-loading nature of Qstructs, in the above code none of the other emails are accessed at all. If the message is in a memory-mapped file, the other emails might never even get paged in to memory (although emails are generally small enough that they many of them can be stored together on the same page). Of course references can also be de-referenced and iterated over: ## Array iteration (copying) foreach my $email (@{ $user->emails }) { print "Email: ", $email, "\n"; } The problem with the above approach is that while the elements are lazy-loaded, they are not zero-copy. In other words, for the elements iterated over, perl is allocating new memory for them and then they are being copied into it. In addition to acting as array refs, "Qstruct::ArrayRef" objects are also special objects with additional methods. The "get" method is similar to the random-access de-reference operation above except that you can pass an output scalar to it to get zero-copy behaviour: ## Array random access (zero-copy) $user->emails->get(0, my $first_email); Because the "my $first_email" scalar is passed in, the "get" method will populate it with a pointer into the underlying message-memory owned by the $user object. There is also a "len" method which of course means you can iterate over arrays: ## Array iteration (zero-copy) my $emails = $user->emails; for(my $i=0; $i < $emails->len; $i++) { $emails->get($i, my $email); print "Email: ", $email, "\n"; } There is a short-cut "foreach" method that simplifies the above pattern: ## Array iteration short-cut (zero-copy) $user->emails->foreach(sub { print "Email: ", $_[0], "\n"; }); Arrays of qstructs work essentially the same as arrays of primitive types except that the elements are decoded objects convenient for traversal, ie: ## Arrays of qstructs $department->staff->employees->foreach(sub { my $employee = shift; print "Employee id: ", $employee->id, "\n"; print "Employee name: ", $employee->name, "\n"; }); RAW ARRAY ACCESS For fixed arrays of numeric types there are also raw accessors. For example, hash values are known-length values so it can make sense for them to be fixed arrays which are inlined in the message body for efficiency (see Qstruct::Spec for details). Such arrays are most likely best accessed with raw accessors: ## Whole-array access (copying) my $hash_value = $user->sha256_hash->raw; Of course there is a corresponding zero-copy interface: ## Whole-array access (zero-copy) $user->sha256_hash->raw(my $hash_value); When encoding messages, you can simply pass in an appropriately sized string and it will be treated as raw: my $msg = MyPkg::User->encode({ sha256_hash => Digest::SHA::sha256("whatever"), }); Numeric values are stored in little-endian format so if you use raw accessors on arrays with elements of more than 2 byte sizes then you will need to "pack" and "unpack" them in order for your code to be portable. Also, fixed arrays are more limited than dynamic arrays in that the schema can't be evolved by converting them into arrays of nested qstructs. Because of the portability and schema evolution restrictions, fixed arrays and raw array access are usually recommended against. EXCEPTIONS This module will throw exceptions in the following conditions: * Schema parse errors * Decoding or accessing truncated/malformed qstructs * Out of memory during encoding * You are on a 32-bit system and you attempt to access a field that can't fit in your address space * Trying to set an array from a raw buffer that is the incorrect size * Attempting to modify a Qstruct::Array Note that if fields aren't set, accessing them will *not* throw exceptions. Instead, accessors will return the default values of their respective types (see Qstruct::Spec). This is so that you can still parse old messages that were created with old versions of a schema. PORTABILITY This module uses the "slow" but portable accessors described in libqstruct meaning it should work on any machine regardless of byte order or alignment requirements. Despite the name, these accessors are not actually slow relative to the overhead of making a perl function or method call so there is little point in optimising them for the perl module. Because the perl module uses the slow and portable accessors, no matter what CPU you use you do not need to worry about loading messages from aligned offsets. When using the C API, if you choose to compile with the non-portable accessors you should be aware that depending on your CPU you may have reliabilty or performance issues if you load messages from non-aligned offsets. However, modern x86-64 CPUs are perfectly suited for the "fast" interface and this interface can be used without sacrificing reliability or performance even with non-aligned messages. SEE ALSO Qstruct::Spec - The Qstruct design objectives and format specification Video: Doug Hoyte introduces Qstruct to Toronto Perl Mongers Qstruct::Compiler - The reference compiler implementation Test::ZeroCopy - More information on zero-copy and how it is tested for libqstruct - Shared C library Qstruct github repo AUTHOR Doug Hoyte, "" COPYRIGHT & LICENSE Copyright 2014 Doug Hoyte. This module is licensed under the same terms as perl itself. The bundled "libqstruct" is (C) Doug Hoyte and licensed under the 2-clause BSD license.