Binary SignWriting HTML Reference

Binary SignWriting

HTML Reference

Encoding



Welcome to the core reference for the open standards of SignWriting.

This reference

You can


Binary SignWriting

Script Encoding Model

Revision 3


Sign language

Sign language is a visual method of communication. The meaning derived from sign language occurs in several ways:

SignWriting

SignWriting is the universal script for all of the world's sign languages. The fundamental writing unit (grapheme) of SignWriting is called a symbol. The symbols of SignWriting are iconic in nature, because there is a link between the form of the symbol and its meaning.

Symbols come in several flavors: writing, sequence , and punctuation. Writing symbols represent shape or movement. Writing symbols are combined spatially on two dimensional canvases to form individual signs. This spatial writing for a sign is called the spelling or "Spatial SignSpelling".

Sequence symbols represent detailed locations and are only used in the sequence or "SignSpelling Sequence" of a sign. The SignSpelling Sequence is a list of writing symbols and sequence symbols. Most often, the symbols of the sequence are the same as used in the spelling, so it will not include sequence symbols. Sequence symbols are not used for everyday writing, but may be useful for sorting large dictionaries, refining animation, simplifying translation between scripts and notation systems, and for detailed analysis of location sometimes needed in linguistic research.

Punctuation symbols are used between signs, always alone in the middle lane.

Writing symbols can be divided into two groups, centering and non-centering. Head and torso symbols are centering symbols. The rest are non-centering. Centering symbols are important for determining the center of a sign. Each sign can be enclosed by a smallest possible rectangle called a bounding box. If a sign only has non-centering symbols, its center is the center of its bounding box.

If a sign contains centering symbols, then a centering bounding box is defined which only encloses the centering symbols of the sign. The center for such a sign is the center of its centering bounding box.

The completed signs are listed sequentially either horizontally or vertically. The signs are aligned based on their centers.

To represent changes to the center of gravity of a signer, SignWriting uses lanes for vertical writing. By default, signs are written in the middle lane with the center of the sign lined up with the center of the middle lane. For body weight shifts to the left or the right, the center of a sign can be offset to line up with the center of a left or right lane. The left and right lanes use a fixed horizontal offset from the middle lane.


International SignWriting Alphabet 2010

The latest symbol set of SignWriting is called the Internationsl SignWriting Alphabet 2010. The ISWA uses a 4 level hierarchy. The top level defines 7 Categories. Under the categories are the 30 SymbolGroups. Under the SymbolGroups are the 652 BaseSymbols. Under each BaseSymbol, there are a maximum of 96 valid symbols (6 fills by 16 rotations).

Abstract Character Repertoire

Binary SignWriting defines a repertoire of abstract characters. A character is a unit of information. A character can represent a printable symbol (grapheme) or a non-printable concept (control character). The printable characters of Binary SignWriting are the BaseSymbols of the ISWA 2010. The non-printable characters fall into 3 catagories: modifiers, structural markers and number characters.

Structural Markers

BaseSymbols

Modifiers

Number Characters

Coded Character Set

Character encoding pairs a unique code (or number) to each character. Binary SignWriting uses fixed width 12-bit character codes: numbers between 0 and 4095 that can be represented with three hexadecimal characters.

Structural Makers

BaseSymbols

Modifiers

Number Characters

Character Encoding Form

The Unicode standard will be used as the character encoding form. The codes for the set are shifted twice. First to shift the range and second to shift the plane.

Primary Shift

The primary shift moves the codes to a higher range. Each code is increased by 55,046 or d706 in hex. This shifts the 12 bit range of 0fa - 5f9 to the 16 bit range of d800 - dcff.

Secondary Shift

There are 3 potential secondary shifts that change the plane of the codes to plane 1, 15, or 16. Each shift is easy. Character d800 becomes 1d800 for plane 1, fd800 for plane 15, and 10d800 for plane 16. Plane 15 is used by default.

If you look at the roadmap for the Supplementary Multilingual Plane, you'll see that room has been set aside for SignWriting in rows 1d8 thru 1db. The next four rows are blank but set aside for notation systems in general. If row 1dc can be reserved for SignWriting as well, this new encoding will fit properly in the space allotted.

Character Encoding Scheme

UTF-8 is a great character encoding scheme. It's a little tricky to convert back and forth. If you know how to encode the first character of any plane, you can create a conversion for the entire plane.

Symbol Encoding

The ISWA 2010 defines 37,811 valid symbols, each with a unique ID. The symbol ID is a 6 part number system that is a combination of "category" - "group" - "base" - "variation" - "fill" - "rotation". Example "01-01-001-01-01-01".

A symbol is a specific fill and rotation of a BaseSymbol. A symbol shares the first 4 numbers of its symbol ID with the BaseSymbol. The fill value can range from 1 to 6, while the rotation can range from 1 to 16.

Each symbol has a unique key that is 5 hexadecimals long. The first 3 hexadecimals represent the BaseSymbol character. The 4th hexadecimal is equal to the fill minus one. The 5th hexadecimal is equal to the rotation minus one.

Each symbol has a unique code. The first symbol has a code of 1. Valid and invalid symbol positions are equally numbered, so 96 symbol codes are available per BaseSymbol. Using decimal values, a symbol code can be computed: ((BaseSymbol code - 256) * 96) + ((fill - 1) * 16) + rotation.

It requires three characters of Binary SignWriting to represent a specific symbol. The first character is the BaseSymbol character. The next character is the fill modifier, with a character code equal to the fill value plus decimal 907. The third character is the rotation modifier, with a character code equal to the rotation value plus decimal 913. Example "100 38c 392" in hexadecimal and "󽠆 󽪒 󽪘" in UTF-8.


Script Encoding Model

Valid Data Stream

The data stream for Binary SignWriting is a sequential list of characters. Not all data streams are valid. A valid stream must contain valid structures. There are 2 main structures: signs and punctuations. A valid punctuation data stream is three characters long that represents a punctuation symbol.

A sign is a much more complex structure. It consists of a SignBox Maker with a cluster of spatial symbols and an optional sequence. A spatial symbol is five characters long. A writing symbol followed by 2 number characters. A valid cluster of spatial symbols is a list of zero or more spatial symbols. A valid sequence is a Sequence Marker followed by one or more writing symbols and/or sequence symbols.

Tokens

A token is an alternate view of a character. Each Binary SignWriting character can be represented by a single ASCII character. For Binary SignWriting, there are 15 tokens: LBRQhmdftxsPion. Case does not matter. Upper and lower cases are used to aid human scanning. A data stream for Binary SignWriting can be converted into a token stream to validate and parse the data with regular expressions.

Name Regular expression for token analysis Description
SignBox Marker [LBR] Pick one: plain SignBox Marker, Left Lane SignBox Marker, Right Lane SignBox Maker
Writing BaseSymbol [hmdftx] Pick one: hand, movement, dynamic, head, trunk, or limb symbol
Spatial Symbol [hmdftx]ionn Writing BaseSymbol followed by a fill modifier, rotaion modifier and two number characters
Cluster ([hmdftx]ionn)* A list of zero or more spatial symbols
Sequence Q([hmdftxs]io)+ Sequence Maker followed by one or more writing symbols and/or sequence symbols
Sign [LBR]([hmdftx]ionn)*(Q([hmdftxs]io)+)? A SignBox Marker, followed by a cluster, followed by an options sequence
Punctuation Pio A punctuation BaseSymbol followed by a fill modifier and a rotation modifier
SignText ([LBR]([hmdftx]ionn)*(Q([hmdftxs]io)+)?|Pio)+ A list of signs and punctuation

Character Reference Table

Name Token BSW UTF Notes
Left Lane SignBox Marker L 0fa 󽠀 A marker for a new sign in the left lane
Middle Lane SignBox Marker B 0fb 󽠁 A marker for a new sign in the middle lane
Right Lane SignBox Marker R 0fc 󽠂 A marker for a new sign in the right lane
Sequence Marker Q 0fd 󽠃 A marker for a sequence of writing and sequence symbols
Hand BaseSymbols h 100 - 204 󽠆 - 󽤊 A hand BaseSymbol from category 1.
Movement BaseSymbols m 205 - 2f6 󽤋 - 󽧼 A movmement BaseSymbol from category 2.
Dynamic BaseSymbols d 2f7 - 2fe 󽧽 - 󽨄 A dynamic BaseSymbol from category 3.
Head BaseSymbols f 2ff - 36c 󽨅 - 󽩲 A head BaseSymbol from category 4. Responsible for primary centering.
Trunk BaseSymbols t 36d - 375 󽩳 - 󽩻 A trunk BaseSymbol from category 5, SymbolGroup 27. Responsible for secondary centering.
Limb BaseSymbols x 376 - 37e 󽩼 - 󽪄 A limb BaseSymbol from category 5, SymbolGroup 28. Responsible for tertiary centering.
Sequence BaseSymbols s 37f - 386 󽪅 - 󽪌 A non-spatial BaseSymbol that can only be used after the Sequence marker
Punctuation BaseSymbols P 387 - 38b 󽪍 - 󽪑 A non-spatial symbol always used by itself in the middle lane
Fill Modifiers i 38c - 391 󽪒 - 󽪗 A fill modifier for a BaseSymbol
Rotation Modifiers o 392 - 3a1 󽪘 - 󽪧 A rotation modifier for a BaseSymbol
Number Characters n 3a2 - 5f9 󽪨 - 󽳿 Number range -299 thru 300 as characters to avoid collision when parsing

Hello world example

The following is ASL for "Hello World."
0fb14c38e3924a54bd27138c3984bb4cf 0fb18738c39c4c04c618738c3934b94b020538c3924d14b32ef38c3924c949c 38838c392

The Token stream for this is:
Bhionnmionn Bhionnhionnmionnmionn Pio

If we consider the first sign, we find "Bhionnmionn". The first token is "B", a SignBox Marker in the middle lane. Next, there are 2 spatial symbols "hionn" and "mionn".

The next sign starts with the second "B" which has 4 spatial symbols. Finally, at the end we find a punctuation symbol "Pio".


Reference Guide

This reference guide documents and analyzes valid Binary SignWriting.

View Screen

The view screen displays sign text using either BSW or UTF-8. The sign text is displayed vertically with lanes in columns.

Detail Screen

The detail screen gives a detailed breakdown of the sign text. First, the text is broken down into structures of signs and punctuation. Next, each character of a sign (or punctuation) is detailed with BSW, UTF-8, Token and value data.

Sort Screen

The sort screen shows the sort for a group of signs using the SignSpelling Sequence. The sequence data is either contained in the sign data or automatically created from the symbols used in the Spatial SignSpelling. To create the automated sequence, the writing symbols are stripped from the spelling and sorted into a list.

Index Screen

The Index screen shows an alternate sort for a group of signs. For each sign, a sorted list of BaseSymbols is created according to the symbols used in the Spatial SignSpelling. The signs are sorted based on these BaseSymbol lists.

Frequency Screen

The Frequency screen has 2 sections. The BaseSymbol Frequency for All Signs sections shows the counts for BaseSymbols used in all of the Spatial SignSpellings, the counts for the BaseSymbols used in all of the SignSpelling Sequence, and the counts for all of the Punctuation BaseSymbols.

The Symbol Frequency by BaseSymbol gives a usage count for each BaseSymbol and lists the symbols used from that BaseSymbol.

Format Screen

The Format screen shows 4 representations of the same data: BSW, Unicode PUA, SignWriting Cartesian Markup, and XML. BSW and Unicode PUA are actively supported. The SignWriting Cartesian Markup is waiting for an official proposal and a technical note. The XML is for informational purposes only.



Copyright 2007-2010 Stephen E Slevinski Jr.
Some Rights Reserved.

Except where otherwise noted, this work is licensed under
Creative Commons Attribution ShareAlike 3.0