This is a Draft Specification
This document is not finalised. Do not use its contents as a reference as they may change in the future as the specification process advances.

About SF3

SF3 (Simple File Format Family) is a family of file format specifications. These file formats all follow a similar scheme and the same principles.

Principles

SF3 formats follow these principles:

Nomenclature

The following clarifies how to interpret certain words in regards to this standard:

Specification Description

The format specifications in this document use a BNF-style abstract syntax language. The language is defined here:

Format       ::= Definition+
Definition   ::= Rule Identifier? Description? '\n'
Identifier   ::= '::=' Sequence
Description  ::= '---' text
Sequence     ::= Composition (' ' Composition)*
Composition  ::= (Rule | OctetArray | Octet | Type | Switch | BitRange | '(' Sequence ')') (OneOrMore | AnyNumber | ExactNumber)?
OctetArray   ::= '[' octet (' ' octet)* ']'
Type         ::= ('int' | 'uint') IntBittage | 'float' FloatBittage | 'string' ExactNumber
IntBittage   ::= '8' | '16' | '24' | '32' | '64'
FloatBittage ::= '16' | '32' | '64'
Rule         ::= name
OneOrMore    ::= '+'
AnyNumber    ::= '*'
ExactNumber  ::= '{' number '}'
Switch       ::= '<' Rule SwitchCase ('|' SwitchCase)* '>'
SwitchCase   ::= octet+ ':' Sequence
BitRange     ::= Rule ':' (mask ',')? mask

name         --- The name of a rule as a sequence of non-numeric ASCII characters.
octet        --- Eight bits expressed as two hexadecimal digits.
text         --- Human-readable textual description of the contents.
number       --- A textual description of the number of occurrences. Can make a reference to other rules, in which case the rule's content designates a runtime number.
mask         --- A textual description of the number of bits to use.

White space unless otherwise mandated may be inserted liberally to aid readability. Each rule ultimately defines a sequence of octets that should be parsed. The Types mentioned translate to signed integers, unsigned integers, and IEEE floating point numbers of the given number of bits. string designates a UTF-8 encoded character sequence with an octet length as indicated by the required following ExactNumber rule.

A Switch acts as a runtime switching based on the value of the referenced Rule. This means that when evaluated, the Switch should act as if it were the Sequence of the SwitchCase whose octet sequence matches the value of the referenced Rule. If the Rule evaluates to a value that does not match any of the SwitchCases, the file is invalid. For example, <x 00: int8 | 01: int16> with x evaluating to 00 would match an int8, and when evaluating to 01 would match an int16.

A BitRange extracts the bits of the specified Rule by shifting the integer to the right by the first mask number of bits, if given. It then masks the remaining integer such that only the number of bits specified in the second mask remain. For example, x:2,4 with x being the binary sequence 10101101 would leave the bits 1011.

Formats

Each format is made up of the following structure, where a valid file must begin with the File rule.

File       ::= Identifier Header Payload
Identifier ::= [ 81 53 46 33 00 E0 D0 0D 0A 0A ] format-id
format-id  ::= uint8 --- A single octet identifying the format.

The rationale for the ten octets in the identifier is as follows:

The Header and Payload will be described by the individual formats.
The values for the format-id are interpreted as follows:

Any other value for the format-id is reserved for future formats in this spec. If an implementation only supports a subset of these formats, it must generate an error when it encounters a format-id that it does not support.

Archive

The Archive format allows storing multiple files in one binary package. The file also includes some metadata so that the files can be stored with a relative path and mime-type, allowing both file-system extraction, and content inspection without explicit extraction.

Header       ::= Count MetadataSize
Payload      ::= Metadata Files
Count        ::= uint64                --- The number of entries.
MetadataSize ::= uint64                --- The octet size of the Metadata payload.
Metadata     ::= EntryOffset{Count} MetaEntry{Count}
EntryOffset  ::= uint64                --- The octet offset of the corresponding MetaEntry from the beginning of Metadata.
MetaEntry    ::= ModTime Checksum Mime Path
                                       --- A descriptor of modification time, CRC checksum, mime type, and path.
ModTime      ::= uint64                --- A unix-time timestamp denoting when the file was last modified.
Mime         ::= uint8 string{uint8}   --- The mime-type of the corresponding file.
Checksum     ::= uint32                --- A CRC32 checksum.
Path         ::= uint16 string{uint16} --- The relative path of the corresponding file.
Files        ::= FileOffset{Count} File{Count}
FileOffset   ::= uint64                --- The octet offset of the corresponding File from the beginning of Files.
File         ::= uint64 uint8{uint64}  --- A binary file payload.

The included MetadataSize, EntryOffset, and FileOffset fields should allow constant-time access to any content within the archive.
If no mime-type is known for a file that should be stored, the corresponding Mime should be set to application/octet-stream.

Each EntryOffset in Metadata, and each FileOffset in Files must be larger than the preceding entry.

Use-Case

This format is useful if you need a way to bundle files together into a single payload, and require constant-time, typed access to individual files without having to extract, decompress, or decrypt.

Audio

This format is for storing plain audio sample data. It includes support for all types of sample formats and channel numbers out there. Unlike other formats, it also enforces the channel data to be interleaved rather than sequential, which foregoes the need for a length field, allowing the file to be written on the fly.

Header      ::= Samplerate Channels Format FrameCount
Payload     ::= sample{Channels}{FrameCount}
Samplerate  ::= uint32 --- The samplerate in Hz.
Channels    ::= uint8  --- The number of audio channels.
Format      ::= uint8  --- A single octet identifying the per-sample data type.
FrameCount  ::= uint64 --- The number of audio frames in the file.
sample      ::= <Format 01: int8
                      | 02: int16
                      | 04: int32
                      | 08: int64
                      | 11: uint8
                      | 12: uint16
                      | 14: uint32
                      | 18: uint64
                      | 22: float16
                      | 24: float32
                      | 28: float64>
                       ---  A single-channel value in the format indicated by format.

Any other value for format is invalid. If the format is 01, then the samples are stored according to the "A-law" non-linear scheme. Otherwise the samples are linear.

The values for channels are interpreted as follows, declaring the order and purpose of the channels:

Where

Any other value for channels is invalid.

Use-Case

This format is useful for raw audio data storage, which means it should be trivial to feed into an audio playback system with minimal overhead. Unlike the traditional uncompressed audio format, Wave, this follows a much clearer and simpler specification with sensible metadata encoding.

Image

This format is for storing raw image data. Unlike plain data however, it includes a header that completely identifies the pixel data layout and format. The format supports 3D images as well.

Header        ::= Width Height Depth channels format
Payload       ::= Layer{Depth}
Layer         ::= Row{Height}
Row           ::= Color{Width}
Color         ::= channel{channel-count}
Width         ::= uint32
Height        ::= uint32
Depth         ::= uint32
format        ::= uint8      --- A single octet identifying the per-channel data type.
channels      ::= uint8      --- A single octet identifying the number and order of channels.
channel-count ::= channels:4 --- The number of channels indicated by the lower 4 bits of the channels.
channel       ::= <format 01: int8
                        | 02: int16
                        | 04: int32
                        | 08: int64
                        | 11: uint8
                        | 12: uint16
                        | 14: uint32
                        | 18: uint64
                        | 22: float16
                        | 24: float32
                        | 28: float64>
                             ---  A single-channel colour value in the format indicated by format.

Any other value for format is invalid.
The values for channels are interpreted as follows:

Where

Any other value for channels is invalid.
The payload must have exactly Width*Height*Depth*channel-count*format-bits number of bits.

Use-Case

This format is useful for storing raw bitmap data that can be directly memory-mapped and read out. This is especially convenient for GPU texture uploads with DirectX, OpenGL, Vulkan, or similar.

Log

This format is for storing generic logging and event information.

Header      ::= StartTime ChunkCount
Payload     ::= Chunk*
Chunk       ::= ChunkSize EntryCount EntryOffset{EntryCount} Entry{EntryCount}
StartTime   ::= uint64                        --- The number of seconds since the Unix epoch (1st of January 1970 UTC) at which this log begins.
Entry       ::= Size Time Severity Source Category Message
Size        ::= uint32                        --- The size of the remaining log entry in octets.
Time        ::= uint64                        --- The number of milliseconds since StartTime at which this log entry was recorded.
Severity    ::= int8                          --- The severity or importance of the log entry.
Source      ::= uint8 string{uint8}           --- An identifier of the source of the log entry.
Category    ::= uint8 string{uint8}           --- An identifier of the category the entry belongs to.
Message     ::= uint16 string{uint16}         --- A human-readable message describing the event.
ChunkCount  ::= uint16                        --- The number of chunks in the file.
ChunkSize   ::= uint64                        --- The octet size of the chunk.
EntryCount  ::= uint32                        --- The number of entries in the chunk.
EntryOffset ::= uint64                        --- The octet offset of the corresponding Entry from the beginning of Payload.

The severity should be zero if the message is of neutral importance, positive for increasingly vital information, and negative for increasingly detailed information.

Both the Source and Category may consist of just a null byte each if the information is not relevant.

The format is designed such that an application can continuously append new entries. To do so, it should behave as follows:

  1. Increase the ChunkCount

  2. Append a new Chunk and allocate a number of EntryOffsets within the chunk.

  3. When a new entry is generated and there are still unused EntryOffsets:

    1. Update the ChunkSize and EntryCount

    2. Fill in the current end offset into the corresponding EntryOffset.

    3. Append the new Entry.

  4. Otherwise start from 1.

The EntryOffsets allow a reading application to scan through the log much more quickly, and perform a binary search to identify date ranges.

Use-Case

This format is useful for basic logging purposes in applications that run for a longer amount of time.The binary format allows quickly skipping ahead in the file to reach interesting messages or to filter out important events.

Model

This format is for singular triangular meshes only. It does not include a scene graph or the capability for non-triangular or non-static meshes. If animation of the model is desired, animation information can be delivered separately.

Header         ::= format material-type MaterialSize
Payload        ::= Material Faces Vertices
MaterialSize   ::= uint32                  --- The octet size of the Material payload.
Material       ::= Texture{material-count} --- An array of texture maps for the model's material.
Texture        ::= uint16 string{uint16}   --- A relative file path to an image.
Faces          ::= uint32 uint32{uint32}   --- An array of 0-based indices into the Vertices array, every 3 of which describe a face.
Vertices       ::= uint32 float32{uint32}  --- An array of vertices, packed as floats. The count must be a multiple of the float count of an individual vertex.
Position       ::= float32 float32 float32 --- A vertex position in model-space.
UV             ::= float32 float32         --- A texture coordinate in texture-space.
Color          ::= float32 float32 float32 --- An RGB colour triplet, each channel in [0,1].
Normal         ::= float32 float32 float32 --- A surface normal, in tangent-space.
Tangent        ::= float32 float32 float32 --- A surface tangent, in tangent-space.
format         ::= uint8                   --- A single octet identifying the per-vertex format.
material-type  ::= uint8                   --- A single octet identifying the material used.
material-count ::= material-type:4         --- The number of material textures as indicated by material-type.
vertex         ::= <format 01: Position
                        |  02: Position UV
                        |  12: Position Color
                        |  22: Position Normal
                        |  03: Position UV Normal
                        |  13: Position Color Normal
                        |  04: Position UV Normal Tangent
                        |  14: Position Color Normal Tangent>
                                            --- A single-vertex value in the format indicated by format.

The values for material-type are interpreted as follows, and describe the usage and number of Textures:

Any other value for material-type is invalid.
The Metallic texture is a combination of Metalness, Roughness, and Occlusion in the R, G, and B channels respectively.

The included MaterialSize field should allow constant-time access to the vertex data without having to parse the Material structure, if that structure is not needed. If the Faces array is empty, then the faces are implicit and every three vertices in the Vertices array form a face.

Use-Case

This format is useful for storing uncompressed, directly accessible 3D geometry data. It is packed in such a way that it should be trivial to upload into vertex-buffers for use with GPU rendering toolkits like DirectX, OpenGL, Vulkan, or similar. For instance, the format describes the vertex-array layout, the Faces array makes up the element-buffer, and the Vertices makes up the vertex-buffer.

Physics-Model

This format is for storing a series of convex meshes that make up the collision shapes of a more complex model. It is far more efficient at storing this data than the generic model format and allows multiple shapes in one.

Header       ::= format hull-count
Payload      ::= mass Tensor Hull{hull-count}
Tensor       ::= float32{9}              --- The inertia tensor for the entire model, in row-major order.
Hull         ::= Transform vertex-count Vertex{vertex-count}
                                         --- A single convex hull.
Transform    ::= float32{16}             --- The transform matrix describing the offset and orientation of this hull from the model's origin, in row-major order.
Vertex       ::= float32 float32 float32 --- A single vertex on the convex hull's surface.
hull-count   ::= uint16                  --- The number of convex hulls that make up the model.
vertex-count ::= uint16                  --- The number of vertices that make up the hull's boundary.
mass         ::= float32                 --- The initial mass of the unscaled model in kg.

Use-Case

This format is intended for use in games and other applications that require a convex decomposition of a model for use in collision testing. The packed storage format is ideal for direct use in-engine.

Text

This format allows for a very simple rich text markup. Primitive displays can also ignore all the markup directly and instead display the text plain without needing special processing to strip the markup out.

Header     ::= MarkupSize
Payload    ::= uint32 Markup{uint32} uint64 string{uint64}
MarkupSize ::= uint64                            --- The size of the markup block in octets
Markup     ::= Start End Option                  --- A singular markup option
Start      ::= uint64                            --- The starting (0-based) index of the region being styled.
End        ::= uint64                            --- The end (exclusive) index of the region being styled.
Option     ::= (Bold | Italic | Underline | Strike | Mono | Color | Size | Heading | Link)
                                                 --- A description of the text style.
Bold       ::= [ 01 ]                            --- The font weight should be set to "bold"
Italic     ::= [ 02 ]                            --- The font slant should be set to "italic"
Underline  ::= [ 03 ]                            --- A line should be drawn under the text's baseline.
Strike     ::= [ 04 ]                            --- A line should be drawn between the text's baseline and ascent line.
Mono       ::= [ 05 ]                            --- The font should be set to proportional / monospaced mode.
Color      ::= [ 06 ] float32 float32 float32    --- The text colour should be set to this R G B triplet.
Size       ::= [ 07 ] float32                    --- The text size should be multiplied by this factor.
Heading    ::= [ 08 ] Level                      --- The text should be a heading of this level.
Link       ::= [ 09 ] Address                    --- The text should be an interactable link to its address.
Target     ::= [ 0A ] Address                    --- The text should be a link target for its address.
Level      ::= uint8                             --- The heading level. The higher, the more deeply nested the heading is.
Address    ::= uint16 string{uint16}             --- Some kind of target identifier. Often a URL string.

The font family, default font size, background color, foreground color, and line wrapping mode are all determined by the visualiser. The visualiser may also apply default alternate styling to sections marked up with the Link option. If the Address of a Link is the same as that of a Target option, the Link markup should, when interacted with, point the user to the text marked up by the corresponding Target option. Otherwise the behaviour of interaction with the Link text is up to the implementation.

Use-Case

This format is useful for storing simple rich text documents that don't require complex layouting or processing.

Vector Graphic

This format offers a relatively simple but capable vector graphic format for scalable images.

Header       ::= Width Height Count
Payload      ::= Instruction{Count}
Width        ::= uint32                          --- The width of the visible canvas in units.
Height       ::= uint32                          --- The height of the visible canvas in units.
Count        ::= uint32                          --- The number of instructions to appear.
Instruction  ::= Element | Transform
Element      ::= Line | Rectangle | Circle | Polygon | Curve
Line         ::= [ 01 ] Color Thickness ShapeOutline
                                                 --- A sequence of connected line segments.
Rectangle    ::= [ 02 ] ShapeFill ShapeBounds    --- An axis-aligned rectangle.
Circle       ::= [ 03 ] ShapeFill ShapeBounds    --- An axis-aligned oval circle.
Polygon      ::= [ 04 ] ShapeFill ShapeOutline   --- A many-edged convex polygon.
Curve        ::= [ 05 ] ShapeFill ShapeOutline   --- A cubic Bezier curve directed by its control points.
Text         ::= [ 06 ] Color Point Font FontSize String
                                                 --- A single line of text.
Font         ::= uint16 string{uint16}           --- The name of the font family to use to render the text.
String       ::= uint16 string{uint16}           --- The string of text to be displayed
Transform    ::= Identity | Matrix
Identity     ::= [ 11 ]                          --- A reset of the current transform matrix to the 1 0 0, 0 1 0 matrix.
Matrix       ::= [ 12 ] float32{6}               --- A coordinate transform matrix to be applied to subsequent instructions.
ShapeOutline ::= Edges Point{Edges}              --- A list of edge points of a composite shape.
ShapeBounds  ::= Point Size                      --- The bounding box of a shape.
ShapeFill    ::= Color Color Thickness           --- The fill color, outline color, and outline thickness.
Point        ::= float32 float32                 --- A position in x/y.
Size         ::= float32 float32                 --- The bounding size of a shape in width/height.
Color        ::= float32 float32 float32 float32 --- An RGBA colour quadruplet in the range of [0, 1].
Edges        ::= uint16                          --- The number of points to appear in the edge list.
Thickness    ::= float32                         --- The thickness of the outline in units.
FontSize     ::= float32                         --- The size of an em in units.

The coordinate system should be defined with X growing to the right, Y growing upwards, and the origin being in the lower left corner of the canvas. Whenever a Transform is applied, the given matrix must be applied to all following Elements until the next Transform. This transformation applies to the shape as a whole, not just the Points that define it in the file.

When an element with a ShapeFill should be drawn, the fill must be drawn first, with the outline (if visible) second on top. If the shape is bounded by ShapeBounds, the fill should meet the bounds, and the outline should be centred on the bounds when hitting them. If the shape is bounded by a ShapeOutline, the outline must be drawn centered on the lines defined by the Edges.

For Curve and Polygon, if the Points do not form a closed shape, the fill should be drawn as a closed shape by directly connecting the first and last points with a straight line.

In the case of the Curve, the points should be interpreted as follows: EdgePoint ControlPoint (ControlPoint EdgePoint ControlPoint)* ControlPoint EdgePoint. As such, the Edges number for a Curve must always match x+2 % 3 == 0 and must at least be 4. Any other value is an error. The ControlPoint coordinates are relative to their corresponding EdgePoint.

In the case of Polygon and Rectangle, the Edges number must be at least 2. Any other value is an error.

For the Size, Thickness, FontSize, and Color, all float components must be positive and real. Infinities, NaNs, and negative numbers are an error.

For the Point and Matrix, all float components must be real. Infinities and NaNs are an error. The Matrix is specified in row-major order, meaning in order the entries are m00 m01 m02 m10 m11 m12.

When rendering Text, the Point specifies the location of the middle on the baseline of the first character that is rendered.

If a visualiser or editor of a vector graphic file does not have access to the font specified in a Font field, it should generate an error, but may exchange the font for a similar one. In either case, it must inform the user of the missing font.

Use-Case

This format is useful for representing vector graphics in a light-weight way that should be easy to write a visualiser for. It intentionally does not specify much about text processing and instead leaves most of this up to the implementation.

Metadata

These formats can be delivered as part of a binary stream or deposited in a file system. The following are recommendation for metadata identifiers to distinguish SF3 data without having to parse it.

Mime-Type

The mime-types for SF3 files should be as follows, according to the format used:

If a general SF3 file should be designated, the mime-type should be application/x.sf3. If/when the IANA registration for an official mime-type is approved, the x. prefix may be dropped.

File Extension

The file extension should always end with .sf3. Specifically, for the formats the following extended extensions may be used: