-
Notifications
You must be signed in to change notification settings - Fork 22
Protocol Encoding
Encoding does not align primitive types on word boundaries.
For connection-oriented communication (TCP/IP), the server MUST notify the client what byte order to use. Each message contains an endianness flag in order to allow all the intermediates to forward data without requiring it to be unmarshaled (so that the intermediates can forward requests by simply copying blocks of binary data) and in order not to require a specific byte order for connection-less protocols (UDP/IP).
Deviation: Current peers ignore the header endian bit for messages received over TCP.
The SET_ENDIANESS
control message is used instead. Similarly w/ protocol version.
Many of the types involved in the data encoding, as well as several protocol message components, have an associated size (or "count"). Size values MUST always be a non-negative integer and encoded as follows:
- The unsigned 8-bit integer 255 indicates "null", no data.
- If the number of elements is less than 254, the size MUST be encoded as a single byte containing an unsigned 8-bit integer indicating the number of elements
- If the number of elements is less than 2^31-1, then the size MUST be encoded as an unsigned 8-bit integer with value 254, followed by a positive signed 32-bit integer indicating the number of elements
- Values greater than or equal to 2^31-1 are currently not implemented. If they were, their size MUST be encoded as an unsigned 8-bit integer with value 254, followed by a positive signed 32-bit integer with value 2^31-1, followed by a positive signed 64-bit integer indicating the number of elements. This implies a maximum size of 2^63-1.
The basic types MUST be encoded as shown in the Table 1. Signed integer types (byte, short, int, long) MUST be represented as two’s complement numbers. Floating point types (float, double) MUST use the IEEE-754 standard formats bib:ieee754wiki.
Type | Encoding |
---|---|
boolean | A single byte with value non-zero value for true, zero for false. |
byte | Signed 8-bit integer. |
ubyte | Unsigned 8-bit integer. |
short | Signed 16-bit integer. |
ushort | Unsigned 16-bit integer. |
int | Signed 32-bit integer. |
uint | Unsigned 32-bit integer. |
long | Signed 64-bit integer. |
ulong | Unsigned 64-bit integer. |
float | 32-bit float (IEEE-754 single-precision float). |
double | 64-bit float (IEEE-754 double-precision float). |
Encoding for basic types.
Note on boolean encoding: a receiver MUST NOT assume that a boolean value of true is represented by any special non-zero number, nor that the same sender consistently uses the same number.
Variable-size arrays MUST be encoded as a size representing the number of elements in the array, followed by the elements encoded as specified for their type (as specified in these sections).
Bounded-size arrays MUST be encoded as a size representing the number of elements in the array, followed by the elements encoded as specified for their type (as specified in these sections). The size MUST be less then or equal to the array's declared bound.
Fixed-size arrays MUST be encoded as elements encoded as specified for their type (as specified in these sections). The number of elements encoded MUST equal to the array's fixed size.
Strings are encoded as arrays of bytes. The actual content (the bytes in the array) MUST be a valid UTF-8 encoded string.
Particularly, this means that strings MUST be encoded as a size, followed by the string contents in a UTF-8 format as bytes. Size gives the number of bytes that follow it and not the number of UTF-8 characters. UTF-8 multi-byte characters MUST NOT be broken. An empty string MUST be encoded with a size of zero.
Implementations that internally use a zero byte or a zero character to indicate end-of-string SHOULD NOT include a terminating zero byte in the pvAccess string encoding. 'null' strings are not supported.
Same as strings, just that size MUST be less than or equal to the string's bound.
Structures MUST be encoded by appending the data of all comprising fields in the order in which the fields have been defined. A structure can contain a structure and an union (see below) for its field.
An array of structures is encoded as a size representing the number of elements in the array. For each array element, the encoding then consists of a boolean that indicates if the array element is present or null. A byte 0x00 indicates that the array element is null. A byte 0x01 indicates that the array is present, followed by the structure data, i.e. the encoding for each field of the structure.
Example array of structures, three elements, middle element null, where each structure contains two short numbers:
[ { 0x1111, 0x2222 }, null, { 0x3333, 0x4444 } ]
would be encoded as
03 01 11 11 22 22 00 01 33 33 44 44
Unions MUST be encoded as a selector value (encoded as a size), followed by the selected union member data. The selector chooses one member of a union as specified in the union introspection data, so must be a value in the range 0..N-1 where N is the number of union members. A union can contain a structure and a union for its field.
Variant Unions are open ended union type, also known as any type. Variant Unions MUST be encoded as a introspection data (Field) description of the encoded value, followed by the encoded value itself.
Given the following structure:
structure
byte[] value [1,2,3]
byte<16> boundedSizeArray [4,5,6,7,8]
byte[4] fixedSizeArray [9,10,11,12]
structure timeStamp
long secondsPastEpoch 0x1122334455667788
int nanoSeconds 0xAABBCCDD
int userTag 0xEEEEEEEE
structure alarm
int severity 0x11111111
int status 0x22222222
string message Allo, Allo!
union valueUnion
int 0x33333333
any variantUnion
string String inside variant union.
The above would be serialized as illustrated below (when using big-endian byte order, valueUnion selector with value 1 is selected):
Hexdump [Serialized structure] size = 85
03 01 02 03 05 04 05 06 07 08 09 0A 0B 0C 11 22 .... .... .... ..."
33 44 55 66 77 88 AA BB CC DD EE EE EE EE 11 11 3DUf w... .... ....
11 11 22 22 22 22 0B 41 6C 6C 6F 2C 20 41 6C 6C .."" "".A llo, All
6F 21 01 33 33 33 33 60 1C 53 74 72 69 6E 67 20 o!.3 333` .Str ing
69 6E 73 69 64 65 20 76 61 72 69 61 6E 74 20 75 insi de v aria nt u
6E 69 6F 6E 2E nion .
BitSet is a data type that represents a finite sequence of bits.
BitSet is encoded as sequence of ulong and ubyte. Zero or more ulong, and between zero and seven trailing ubytes. Bits are serialized in groups of eight in ascending order (LSB to MSB). Serialization SHOULD be optimized to avoid sending trailing zeros.
TODO: needs LE encoding examples.
Examples of BitSet serialization:
Hexdump [{}] size = 1
00 .
Hexdump [{0}] size = 2
01 01 ..
Hexdump [{1}] size = 2
01 02 ..
Hexdump [{7}] size = 2
01 80 ..
Hexdump [{8}] size = 3
02 00 01 ...
Hexdump [{15}] size = 3
02 00 80 ...
Hexdump [{55}] size = 8
07 00 00 00 00 00 00 80 .... ....
Hexdump [{56}] size = 9
08 00 00 00 00 00 00 00 01 .... .... .
Hexdump [{63}] size = 9
08 00 00 00 00 00 00 00 80 .... .... .
Hexdump [{64}] size = 10
09 00 00 00 00 00 00 00 00 01 .... .... ..
Hexdump [{65}] size = 10
09 00 00 00 00 00 00 00 00 02 .... .... ..
Hexdump [{0, 1, 2, 4}] size = 2
01 17 ..
Hexdump [{0, 1, 2, 4, 8}] size = 3
02 17 01 ...
Hexdump [{8, 17, 24, 25, 34, 40, 42, 49, 50}] size = 8
07 00 01 02 03 04 05 06 .... ....
Hexdump [{8, 17, 24, 25, 34, 40, 42, 49, 50, 56, 57, 58}] size = 9
08 00 01 02 03 04 05 06 07 .... .... .
Hexdump [{8, 17, 24, 25, 34, 40, 42, 49, 50, 56, 57, 58, 67}] size = 10
09 00 01 02 03 04 05 06 07 08 .... .... ..
Hexdump [{8, 17, 24, 25, 34, 40, 42, 49, 50, 56, 57, 58, 67, 72, 75}] size = 11
0A 00 01 02 03 04 05 06 07 08 09 .... .... ...
Hexdump [{8, 17, 24, 25, 34, 40, 42, 49, 50, 56, 57, 58, 67, 72, 75, 81, 83}] size = 12
0B 00 01 02 03 04 05 06 07 08 09 0A .... .... ....
Each structure can (depending on message definition) have a BitSet instance defining what subset of that structure's fields have been serialized. This allows partial serialization of structures. That is, serializing only fields that have changed rather than the entire structure. Each node of a structure corresponds to one bit; if a bit is set then its corresponding field has been serialized, otherwise not. BitSet does not apply to array elements.
This example shows how bits of a BitSet are assigned to the fields of a structure:
bit# field
0 structure
1 structure timeStamp
2 long secondsPastEpoch
3 int nanoSeconds
4 int userTag
5 structure[] value
structure org.epics.ioc.test.testStructure
double value
structure location
double x
double y
structure org.epics.ioc.test.testStructure
double value
structure location
double x
double y
6 string factoryRPC
7 structure arguments
8 int size
The structure above requires a BitSet that contains 9 bits.
If the bit corresponding to a structure node is set, then all the fields
of that node MUST be serialized.
pvAccess defines a structure to inform endpoints about completion status. It is nominally defined as:
struct Status {
byte type; // enum { OK = 0, WARNING = 1, ERROR = 2, FATAL = 3 }
string message;
string callTree; // optional (provides more context data about the error), can be empty
};
In practice, since the majority of Status instances would be OK with no
message and no callTree, a special definition of Status SHOULD be used
in the common case that all three of these conditions are met; if Status
is OK and no message and no callTree would be sent, then the special
type value of -1
MAY be used, and in this case the string fields are
omitted:
struct StatusOK {
byte type = -1;
};
Examples of Status serialization:
Hexdump [Status OK] size = 1
FF .
Hexdump [WARNING, "Low memory", ""] size = 13
01 0A 4C 6F 77 20 6D 65 6D 6F 72 79 00 ..Lo w me mory .
Hexdump [ERROR, "Failed to get, due to unexpected exception", (call tree)] size = 264
02 2A 46 61 69 6C 65 64 20 74 6F 20 67 65 74 2C .*Fa iled to get,
20 64 75 65 20 74 6F 20 75 6E 65 78 70 65 63 74 due to unex pect
65 64 20 65 78 63 65 70 74 69 6F 6E DB 6A 61 76 ed e xcep tion .jav
61 2E 6C 61 6E 67 2E 52 75 6E 74 69 6D 65 45 78 a.la ng.R unti meEx
63 65 70 74 69 6F 6E 0A 09 61 74 20 6F 72 67 2E cept ion. .at org.
65 70 69 63 73 2E 63 61 2E 63 6C 69 65 6E 74 2E epic s.ca .cli ent.
65 78 61 6D 70 6C 65 2E 53 65 72 69 61 6C 69 7A exam ple. Seri aliz
61 74 69 6F 6E 45 78 61 6D 70 6C 65 73 2E 73 74 atio nExa mple s.st
61 74 75 73 45 78 61 6D 70 6C 65 73 28 53 65 72 atus Exam ples (Ser
69 61 6C 69 7A 61 74 69 6F 6E 45 78 61 6D 70 6C iali zati onEx ampl
65 73 2E 6A 61 76 61 3A 31 31 38 29 0A 09 61 74 es.j ava: 118) ..at
20 6F 72 67 2E 65 70 69 63 73 2E 63 61 2E 63 6C org .epi cs.c a.cl
69 65 6E 74 2E 65 78 61 6D 70 6C 65 2E 53 65 72 ient .exa mple .Ser
69 61 6C 69 7A 61 74 69 6F 6E 45 78 61 6D 70 6C iali zati onEx ampl
65 73 2E 6D 61 69 6E 28 53 65 72 69 61 6C 69 7A es.m ain( Seri aliz
61 74 69 6F 6E 45 78 61 6D 70 6C 65 73 2E 6A 61 atio nExa mple s.ja
76 61 3A 31 32 36 29 0A va:1 26).
Introspection data describes the type of a user data item. It is not itself user data, but rather meta data. Introspection data appears in one of four forms: no introspection data (NULL_TYPE_CODE), a full type description (FULL_TYPE_CODE), a type identifier (ONLY_ID_TYPE_CODE), both (FULL_WITH_ID_TYPE_CODE), according to the table "Encoding of Introspection Data".
The sender MUST send introspection data, but is free to chose one of the above methods. Sending FULL_WITH_ID_TYPE_CODE defines the type identifier for subsequent sends using ONLY_ID_TYPE_CODE. Therefore, before sending ONLY_ID_TYPE_CODE, the sender MUST have previously sent at least one FULL_WITH_ID_TYPE_CODE with the same type identifier to the same receiver.
Since user data types can be arbitrarily complex, introspection data SHOULD be sent only once per type and receiver combination. The mapping of dynamically assigned type identifier (ID) to introspection data MUST be cached on the receiver side, and SHOULD be cached and re-used on the sender side. ID MUST be encoded as short and MUST be valid only within one connection. Moreover, IDs MUST be assigned only by the sender. The receiver MUST keep track of the IDs and use them to identify deserializations. Since communication is full-duplex this implies there MUST be two introspection registries per connection. The sender MAY override a previously assigned ID by simply assigning the ID to a new introspection data instance. The introspection registry size MUST be negotiated when each connection is established.
Field Encoding | Name | Description |
---|---|---|
0xFF | NULL_TYPE_CODE | No introspection data (also implies no data). |
0xFE + ID | ONLY_ID_TYPE_CODE | Serialization contains only an ID (that was assigned by one of the previous FULL_WITH_ID_TYPE_CODE or FULL_TAGGED_ID_TYPE_CODE descriptions). |
0xFD + ID + FieldDesc | FULL_WITH_ID_TYPE_CODE | Serialization contains an ID (that can be used later, if cached) and full interface description. Any existing definition with the same ID is overriden. |
0xFC + ID + tag + FieldDesc | FULL_TAGGED_ID_TYPE_CODE | Serialization contains an ID (that can be used later, if cached), tag (of integer type) and full interface description. Any existing definition with the same ID is overriden. A tag must guarantee that the same (ID, FieldDesc) pair has the same tag and any previous definition with the same ID and different FieldDesc has a different tag. This identifies whether the definition with given ID overrides already existing one and allow receivers to skip deserialization of FieldDesc, if tags match. SHOULD be used in non-reliable transport systems only. |
0xFB - 0xE0 | RESERVED | Reserved for future usage, MUST NOT be used. |
FieldDesc (0xDF - 0x00) |
FULL_TYPE_CODE | Serialization contains only full interface description. |
Each instance of a Field introspection description (FieldDesc) MUST be encoded as a byte. The upper 3 bits (Most Significant Bits, MSBs) are used for the type selector, e.g. 'integer'. The middle 2 bits distinguish between arrays and scalars. The remaining 3 lower bits are type dependent and used for size encoding, e.g. to select a 'short' or 'long' integer.
Bit | Value | Description |
---|---|---|
7-5 | 111 | reserved (MUST never be used) |
110, 101 | reserved (MUST not be used) | |
100 | complex | |
011 | string | |
010 | floating-point | |
001 | integer | |
000 | boolean | |
4-3 | 11 | fixed-size array flag |
10 | bounded-size array flag | |
01 | variable-size array flag | |
00 | scalar flag | |
2-0 | type (bits 7-5) depended |
Bit | Value | Type Name |
---|---|---|
2 | 1 | unsigned flag |
0 | signed flag | |
1-0 | 11 | long |
10 | int | |
01 | short | |
00 | byte |
Bit | Value | Type Name | IEEE 754-2008 Name |
---|---|---|---|
2-0 | 111, 110, 101 | reserved | |
100 | reserved | binary128 (Quadruple) | |
011 | double | binary64 (Double) | |
010 | float | binary32 (Single) | |
001 | reserved | binary16 (Half) | |
000 | reserved |
Bit | Value | Type Name |
---|---|---|
2-0 | 111, 110, 101, 100 | reserved |
011 | bounded string | |
010 | variant union | |
001 | union | |
000 | structure |
For all other types, bits 2-0 MUST be '0b000'.
Structure, union, and bounded string (and all their arrays) REQUIRE more description. A structure REQUIRES its identification string and a named array of Fields - size followed by one or more (field name, FieldDesc) pairs. Arrays of structures/unions REQUIRE an introspection data of a structure/union defining an array element type.
FieldDesc Encoding | Description |
---|---|
0bxxx00xxx | Scalar |
0bxxx01xxx | Variable-size array of scalars |
0bxxx10xxx + bound (encoded as size) | Bounded-size array of scalars |
0bxxx11xxx + fixed size (encoded as size) | Fixed-size array of scalars |
0b10000000 + identification string + (field name, FieldDesc)[] | Structure |
0b10001000 + structure FieldDesc | Array of structures. |
0b10000001 + identification string + (field name, FieldDesc)[] | Union |
0b10001001 + union FieldDesc | Array of unions |
0b10000010 | Variant union |
0b10001010 | Array of variant unions |
0b10000110 + bound (encoded as size) | Bounded string |
Given the following structure, as may be expressed by a pvData Structure:
timeStamp_t
long secondsPastEpoch
int nanoSeconds
int userTag
The introspection description of the above structure is be encoded by pvAccess as the following:
Hexdump [Serialized structure IF] size = 57
FD 00 01 80 0B 74 69 6D 65 53 74 61 6D 70 5F 74 .... .tim eSta mp_t
03 10 73 65 63 6F 6E 64 73 50 61 73 74 45 70 6F ..se cond sPas tEpo
63 68 23 0B 6E 61 6E 6F 53 65 63 6F 6E 64 73 22 ch#. nano Seco nds"
07 75 73 65 72 54 61 67 22 .use rTag "
Given the following structure, as may be expressed by a pvData Structure:
exampleStructure
byte[] value
byte<16> boundedSizeArray
byte[4] fixedSizeArray
time_t timeStamp
long secondsPastEpoch
int nanoseconds
int userTag
alarm_t alarm
int severity
int status
string message
union valueUnion
string stringValue
int intValue
double doubleValue
any variantUnion
The introspection description of the above structure would be encoded by pvAccess as the following:
Hexdump [Serialized structure IF] size = 243
FD 00 01 80 10 65 78 61 6D 70 6C 65 53 74 72 75 .... .exa mple Stru
63 74 75 72 65 07 05 76 61 6C 75 65 28 10 62 6F ctur e..v alue (.bo
75 6E 64 65 64 53 69 7A 65 41 72 72 61 79 30 10 unde dSiz eArr ay0.
0E 66 69 78 65 64 53 69 7A 65 41 72 72 61 79 38 .fix edSi zeAr ray8
04 09 74 69 6D 65 53 74 61 6D 70 FD 00 02 80 06 ..ti meSt amp. ....
74 69 6D 65 5F 74 03 10 73 65 63 6F 6E 64 73 50 time _t.. seco ndsP
61 73 74 45 70 6F 63 68 23 0B 6E 61 6E 6F 73 65 astE poch #.na nose
63 6F 6E 64 73 22 07 75 73 65 72 54 61 67 22 05 cond s".u serT ag".
61 6C 61 72 6D FD 00 03 80 07 61 6C 61 72 6D 5F alar m... ..al arm_
74 03 08 73 65 76 65 72 69 74 79 22 06 73 74 61 t..s ever ity" .sta
74 75 73 22 07 6D 65 73 73 61 67 65 60 0A 76 61 tus" .mes sage `.va
6C 75 65 55 6E 69 6F 6E FD 00 04 81 00 03 0B 73 lueU nion .... ...s
74 72 69 6E 67 56 61 6C 75 65 60 08 69 6E 74 56 trin gVal ue`. intV
61 6C 75 65 22 0B 64 6F 75 62 6C 65 56 61 6C 75 alue ".do uble Valu
65 43 0C 76 61 72 69 61 6E 74 55 6E 69 6F 6E FD eC.v aria ntUn ion.
00 05 82 ...