XC-Messages File Schema

XC-Messages File Schema#

FILE & WIRE SCHEMA and generated LANGUAGE BINDINGS

for SYSTEMS | GEOMETRY | PHYSICS | INTEGRATION

I. Introduction#

Messages (aka XC-Messages) is an object-oriented schema for engineers and scientists that efficiently enables numerical computing data to be seamlessly shared across computing platforms and programming languages.

Generated interface code can deliver high-performance, elegant, and flexible messaging across science and engineering applications, providing flexible encoding/decoding similar to XML and JSON but faster and denser. A machine-generated Messages library contains utilities to flatten and reconstruct object-oriented and vectorized data structures encountered in numerical simulation setup, expression, and results (e.g. systems engineering, CFD, FEA, EDA, and geometry processing).

Standard Protocol Buffer Protobuf definitions offer a compact layout for data found in numerical computing storage and transmission. The Messages schema is passed into the protoc compiler to generate high-performance binary-encoded accessors for various languages. Xplicit Computing’s applications are built atop these generated bindings; end-users (such as yourself!) can also apply those same standards to custom integration for an integrated workflow with high continuity and performance.

The overarching CAE concept is that each system manages its own parameters, members, and references (including potential child sub-systems), each encapsulated in a *.xcs or *.json file (while optionally referencing native and external files).

Users and developers can use all or part of the provided messages as fit. Tighter conformance to this standard enables high-performance integration across CAE applications.

Benefits:

binary file-and-wire formats for storage and transmission between computers
universal accessors and serialization utilities for most OS’s and languages
native compatibility across xcompute and user-defined applications
efficient parallel read/write with reverse and forward compatibility
free and open standard built using agnostic infrastructure

Limitations:

protobuf 32-bit indexing, leading to 2^31 (~2 billion) element limit per numerical system/geometry ¹
reading large messages greater than 64 MB requires special treatment ²

--------
1. Offsets can extend to 2^64, but this feature has not been matured in the working spec.
2. See [Section IV-7: Loading large messages with Google::Protobuf::CodedFileStream](Sec_IV-7).

Proto Definitions#

Four *.proto definition files are central to the Messages schema, organized by context:

concept.proto ➔ domain setup, models, parameters, associations (*.xcs/*.json system files)
vector.proto ➔ numeric arrays using packed arena allocation (*.xco data files)
spatial.proto ➔ topology of elements and regions discretization (*.xcg geometry files)
meta.proto ➔ meta-data and user-graphics for a specific system (*.xcm media files)

Message classes (and sub-classes) are available for each of the four files imported. In C++ they can be found under the namespace Messages:: after including one or more *.pb.h header bindings. Other languages will name and organize the interfaces slightly differently. Refer to the *.proto definitions for all schema definitions, as access patterns are built from these assignments directly. Standardized get and set functors access members encoded in protocol buffer streams and files.

There are ~18 useful messages that each relate to different concepts. Users don’t need to change definitions, but the *.proto files can be handy references while constructing and assigning messages in a coding environment. For example, the vector.proto set are well-suited for storing and transmitting vectorized data (e.g. space and time domain, frequency domain, scalar and vector quantities) and provide a good entry point to using Messages.

II. Adding messages to your project#

First, download the latest XC-Messages definitions:

Download or Clone from Github

The Protobuf compiler protoc provides bindings to the following languages:

C++, C#, Java, JavaScript, Objective-C, PHP, Python, Ruby

Languages such as Dart and Go are available as official plugins for protoc. Several additional third-party bindings are available, some of which are listed in Section VI: Building custom bindings.

In your coding project, import the relevant files as headers and/or libraries. Statically-compiled languages such as C++, Obj-C, and C# require linking to the static library libxcmessages.a. Depending on usage, you may also need to install Google Protobuf 3 Libraries.

Check that a viable Protobuf-3 or Protobuf-4 compiler is installed on your OS:

> protoc --version

III. Serializing and parsing messages#

In order to save or transmit, a program’s associative data structures must be serialized into a format that permits representation in contiguous memory and/or storage. For instance, several built-in serialization functions are available for C++ including those that leverage a std output file stream:

//assuming a Messages::Vector64 msg ...
std::ofstream outfile(path); // create an output file stream with given path
msg.SerializeToOstream(&outfile); // serialize to standard output stream
outfile.close(); // finish the file and release resource

Serialized data can then be transmit over file-and-wire to another computer session, where it is then de-serialized (parsed) using the conjugate pair load function such as std input file stream:

std::ifstream infile(path); // create an input file stream
Messages::Vector64 msg; // create an empty message container
msg.ParseFromIStream(&infile); // de-serialize binary file to fill message
infile.close(); // finish the file and release resource

Please refer to the Protobuf3 Tutorials for a comprehensive reference on patterns across languages.

IV. C++ examples#

To get started, explore the mini app in examples/hello_vector to demonstrate save and load floating-point data with the file system. It should compile and run out-of-box using build.sh, save.sh, and load.sh scripts using a C++ compiler such as gcc or clang. Be sure that the Google Protobuf 3 Libraries path and our schema-specific library are properly linked in the local CMakeLists.txt with either absolute or relative path (or else you’ll get a linking error after compilation):

target_link_libraries(some_app /path/to/libprotobuf.a) 
target_link_libraries(some_app /path/to/libxcmessages.a)

Or simply compile without cmake by specifying library paths with -L flag:

c++ save_msg.cpp -L/path/to/libprotobuf.a -L/path/to/libxcmessages.a

1. Assigning messages#

Include the message header file containing the desired message, such as:

#include "concept.pb.h"
#include "spatial.pb.h"
#include "vector.pb.h"

a. Fill repeated fields (one-by-one)

Messages::Vector64 msg; // first, create an empty message container
msg.set_name("Position"); // set the name field with a string1
msg.set_components(3); // or however many vector dimensions, ie. 3 for xyz
msg.add_values(pos.x); // push back x value, e.g. from some glm::dvec
msg.add_values(pos.y); // push back y value
msg.add_values(pos.z); // push back z value
//add additional entries, only in triplets in this case...

b. Fill repeated fields (serial loop)

//given a std::vector other
for (auto val : other) // iterate through each element of other
    msg.add_values(some_function(val)); //add element one-by-one

c. Fill repeated fields (parallel loop)

auto N = other.size(); // get the source vector size
auto& values = *msg.mutable_values(); // get ref to mutable values
values.resize(N); // allocate memory in the destination
#pragma omp parallel for
for (auto n=0 ; n<N ; n++) // iterate concurrently
    values[n] = some_function(other[n]); //assign msg entries

2. Saving messages to file#

std::ofstream outfile(path); // create an output file stream with given path
msg.SerializeToOstream(&outfile); // serialize a prepared Messages::Vector64
outfile.close(); // finish the file and release resource

3. Loading messages from file#

Again, using the message header file containing the desired messages:

#include "vector.pb.h"
...

a. Use standard input file stream:

Messages::Vector64 msg; // create an empty message container
std::ifstream infile(path); // create an input file stream
msg.ParseFromIStream(&infile); // de-serialize binary file to fill message
infile.close(); // finish the file and release resource

b. For large messages (greater than 64MB), see example in Section IV-7: Loading large messages with Google::Protobuf::CodedFileStream.

4. Accessing messages#

a. Access repeated fields (manually by entry):

auto& vec_name = msg.name(); // get string ref
auto C = msg.components(); // get int32 copy (as an alternative to ref)
auto& rev = msg.revision(); // get Messages::Revision ref
std::vector<double> output; // declare destination vector
if (msg.values_size() > 1 && rev.major_rev() < 42) // do something with size, rev
{
    output.push_back(msg.values(0)); // returns a double for first entry
    output.push_back(msg.values(1)); // returns a double for second entry
}

b. Access repeated fields (serial loop, better):

std::vector<double> output; // declare destination vector
for (auto value : msg.values())
    output.push_back(some_function(value)); //use the protobuf

c. Access repeated fields (parallel loop, best):

auto N = msg.values_size(); // get number of values
std::vector<double> output(N); // allocate destination
#pragma omp parallel for
for (auto n=0 ; n<N ; n++) // concurrent iteration
    output[n] = some_function(msg.values(n)); //access msg entries

5. Embedding messages#

More complex data structures require a special getter prefixed with the mutable_ keyword to return pointer to an underlying mutable object.

a. For instance, Vector::revision is an embedded message inside Vector64, so it cannot be assigned as a primitive with the set_ keyword. However, once a mutable pointer has been retrieved, we can assign its member primitives:

auto rev = msg.mutable_revision(); // getting a mutable object returns a pointer
rev->set_major_revision(4); // access the pointer and set the member
rev->set_minor_revision(13); // access the pointer and set the member

b. A slight stylistic variation de-references the pointer and takes a ref to the underlying object to eliminate extra pointer access arrows for cleanliness:

auto& rev = *msg.mutable_revision(); // de-ref pointer and apply C-style ref
rev.set_major_revision(4); // access and set the member
rev.set_minor_revision(13); // access and set the member

Note: Forgetting the & in auto& on the first line in the above example will result in a copy of the object to be created on the stack and the assigned values will not touch the intended underlying msg object. Most often, such a reference is warranted.

6. Copying messages#

Messages can be explicitly copied from other messages of the same type:

a. msg = other_msg;

b. msg.CopyFrom(other_msg);

c. As repeated values exist in contiguous memory (i.e. packed arena allocation) and each conform to IEEE-754 floating-point standards, it is also possible to perform a C-style low-level copy to/from contiguous standard library and other containers, as desired:

msg.mutable_values()→Resize(other.size());
memcpy(msg.mutable_values()->data(), other.data(), other.size()*sizeof(double))

7. Loading large messages with `Google::Protobuf::CodedFileStream`#

This process is necessary to bypass security restrictions in place within the Protobuf 3 libraries. A default limitation of 64MB is used to discourage buffer overflow attacks in internet infrastructure. The good news is that Google made it possible to increase and/or disable the limits up to int32 (2^32).

The process requires use of the coded and zero-copy stream machinery used in parsing operations:

#include <google/protobuf/io/coded_stream.h>
#include <google/protobuf/io/zero_copy_stream_impl.h>
...
Messages::Vector64 msg; // create an empty message container
unsigned int IObyteLimit{ 1 << 31 }; // buffer size (0xffffffff max)

A coded input stream requires a raw file input stream with an open path:

int fd = open(path.c_str(), O_RDONLY); // open path and get file descriptor
FileInputStream raw(fd); // create a raw input stream buffer
CodedInputStream inputStream(&raw); // create a coded input stream

Set the stream byte limit at safe max and warn at half:

auto limit = inputStream.PushLimit(-1); // limit work-around
inputStream.SetTotalBytesLimit(IObyteLimit >> 1, IobyteLimit);

Finally, do the actual parsing from input stream to message:

if (msg.ParseFromCodedStream(&inputStream)) //if okay
{
    inputStream.PopLimit(limit); // limit work-around
    close(fd); // clean up
}
else 
    // handle the error (e.g. message type mismatch, invalid assignments)

V. Python examples#

Among the languages for which Messages bindings exist is Python, specifically, Python 3. Examples using Python 3 are provided in examples/python. These include an application to generate and save floating point numbers, and a series of unit tests.

The Python bindings for Messages are generated by protoc, the Protobuf compiler. When run, it generates files concept_pb2.py, spatial_pb2.py, vector_pb2.py, and meta_pb2.py. (To build the bindings for Python, see Section VI: Building custom bindings.)

If the Python source files have been compiled into bytecode, then a subdirectory __pycache__ will sit alongside the source files and contain compiled bytecode files (suffix .pyc).

The path to the bindings can be set by the PYTHONPATH environement varable, e.g.:

PYTHONPATH=/path/to/pythonbindings

or by the Python sys.path variable. For example:

import sys
sys.path.insert(1, "/path/to/pythonbindings")

1. Assigning messages#

Import the vector_pb2 module, optionally with a more useful name, such as:

import vector_pb2 as vector

a. Fill repeated files (one-by-one)

msg = vector.Vector64()             # create an empty message container
msg.name = "Position"
msg.components = 3
msg.values.append(pos.x)
msg.values.append(pos.y)
msg.values.append(pos.z)
# add additional entries, only as triplets in this case…

b. Fill repeated fields (serial loop)

# given a 3-element list "other"
for val in other:
    msg.values.append(val)

Or more concisely,

msg.values[:] = other

Note: The use of [:] is required for the list assignment.

2. Saving messages to file#

serial = msg.SerializeToString()
with open(path, "wb") as sout:      # open output stream and close on completion
    sout.write(serial)              # write the serialized stream

Alternatively, this might be written:

with open(path, "wb") as sout:
    sout.write(msg.SerializeToString())

3. Loading messages from file#

Again, first import the module:

import vector_pb2 as vector

Create an empty container, and fill from the input file:

msg = vector.Vector64()             # create an empty message container
with open(path, "rb") as sin:       # open input stream and close on completion
    serial = sin.read()             # read serialized stream
msg.ParseFromString(serial)         # de-serialize string

Or more concisely:

msg = vector.Vector64()
with open(path, "rb") as sin:
    msg.ParseFromString(sin.read())

4. Accessing messages#

a. Access repeated fields (manually by entry):

vec_name = msg.name
C = msg.components
output = []                           # output becomes empty list
if len(msg.values)>1 and rev.major_rev < 42:
    output.append(msg.values[0])
    output.append(msg.values[1])

b. Access repeated fields (serial loop, better):

output = []
for i in len(msg.values):
    output.append(msg.values[i])

Or more succinctly,

output = msg.values[:]

5. Embedding messages#

XC Messages objects often are composed of other Message objects. Python invokes value assignments as “call-by-object-reference”. In effect, scalar assignments are call-by-value, and non-scalar object assignments are call-by-reference.

In the example below, msg is a message object that includes msg.revision, which is a message object composed of two scalars. The assignment to rev from msg.revision basically passes a reference (pointer). Thus, modifications to rev are visible to msg. Object values of each can be seen via Python’s print() function.

rev = msg.revision
rev.major_rev = 4 
rev.minor_rev = 13
print (msg)
print (rev)

6. Copying messages#

Since Python uses “call-by-object-reference”, object assignment results in references (pointers) to the same object:

msg = other_msg

To separate the objects from each other, use the Python deepcopy() function. The following example shows a possible use.

from copy import deepcopy
msg1 = msg                  # shallow copy (just a reference)
msg2 = deepcopy(msg1)
msg2.revision.minor_rev = msg1.revision.minor_rev + 1

To see the results:

print(msg.revision)         # unmodified original
print(msg1.revision)        # result same as msg
print(msg2.revision)        # incremented minor revision

VI. Building custom bindings#

The Protobuf compiler protoc has built-in code generators for C++, C#, Java, JavaScript, Objective-C, PHP, Python, and Ruby. Google also has plug-ins and guides for Dart, Go, and Kotlin. Furthermore, third-party add-ons (such as plug-ins) are available for other 30 other languages, including C, Perl, R, Rust, Scala, Swift, Julia, etc.

Reuse in independent projects is encouraged as long as existing message definitions are not altered – only added and not distributed. It is recommended to enumerate custom extensions some number greater than 100 to minimize chance of conflict with official assignments. To generate new bindings, install a Google Protobuf 3 Compiler from a package manager or sources.

Check the bin path and version:

> which protoc
> protoc --version

After Protobuf3 and protoc are installed, make a directory for each desired language and run the compiler from shell or script:

> mkdir -p cpp python java javascript ruby objc csharp

Invoke the protoc compiler for the desired output languages (by defining the name of the output folder following equals sign) against *.proto files to yield libmessages.a and bindings:

> protoc --cpp_out=cpp --csharp_out=csharp –objc_out=objc –ruby_out=ruby 
    --python_out=python --java_out=java --js_out=javascript 
    vector.proto system.proto spatial.proto meta.proto

Some languages such as Javascript require additional options for encoded serialization utilities:

--js_out=javascript,import_style=commonjs,binary:.

VII. License and fair use#

SPDX-License-Identifier: BSD-3-Clause
OSI-License-URL: https://opensource.org/licenses/BSD-3-Clause

The above four proto files, the generated library/bindings, and the README file are provided under the BSD 3-Clause License. You are free to use these components for personal, academic, commercial, and/or research use. No warranty is implied nor provided unless otherwise stated in a separate engineering support agreement. As part of the license agreement, the README file must remain alongside any provided *.proto definitions.

A copy of the BSD 3-Clause License resides in the LICENSE file of this software. It can also be found online at OSI-License-URL. Custom extensions to the schema are permitted for private use but should not be distributed publicly in order to preserve compatibility. (See Section VI: Building custom bindings.) However, to foster schema consistency across the Messages user community, please share proposed public schema and documentation updates with the maintainers at info@xplicitcomputing.com.

XC-Messages File Schema

Contents

XC-Messages File Schema#

I. Introduction#

Proto Definitions#

II. Adding messages to your project#

III. Serializing and parsing messages#

IV. C++ examples#

1. Assigning messages#

2. Saving messages to file#

3. Loading messages from file#

4. Accessing messages#

5. Embedding messages#

6. Copying messages#

7. Loading large messages with Google::Protobuf::CodedFileStream#

V. Python examples#

1. Assigning messages#

2. Saving messages to file#

3. Loading messages from file#

4. Accessing messages#

5. Embedding messages#

6. Copying messages#

VI. Building custom bindings#

VII. License and fair use#

7. Loading large messages with `Google::Protobuf::CodedFileStream`#