The Surrogate Pair Calculator etc.
more information below


Enter a hexadecimal Unicode scalar value (10000 - 10FFFF):

  calculate:  +  

 

Enter a hexadecimal Unicode surrogate pair (D800-DBFF and DC00-DFFF):

+   calculate:   
 

The following will convert Unicode text to HTML numeric character references, using both single codepoint values and surrogate pairs.  You may copy and paste text from any source, even UTF-8 or UTF-16 encoded, and even if you do not have the actual font.

Unicode text:   |   clear

convert    
pairs:

single:
 

Surrogate pair table:   <<   1   >>   plane 1

This page is a collection of information about using JavaScript to convert to and from Unicode surrogate pairs, and some tests of displaying supplementary plane characters.

For more explanation of surrogate pairs and supplementary plane characters, see
www.unicode.org
Tex Texin's site (I18n Guy)
David Perry's site
James Kass's site, home of the Code2001 font
–another useful font is Alphabetum.  This and Code2001 are used for supplemental character display on this page.

 

The algorithm for converting to and from surrogate pairs is not widely published on the internet.  The official source is The Unicode Standard 3.0 (not later versions), Section 3.7, Surrogates.

Conversion of a Unicode scalar value S to a surrogate pair <H, L>;

H = (S - 1000016) / 40016 + D80016
L = (S - 1000016) % 40016 + DC0016

where the operator "/" is defined in Section 0.2, Notational Conventions, as "integer division (rounded down)," and "%" as "modulo operation; equivalent to the integer remainder for positive numbers."

Sample JavaScript:


The conversion of a surrogate pair <H, L> to a scalar value:

N = (H - D80016) * 40016 + (L - DC0016) + 1000016

Sample JavaScript:

 

Here is some miscellaneous test information.  Using Windows XP, I am only able to display plane 1 characters on Internet Eplorer 6 when:
1.  they are written as numeric code references,
2.  the character encoding for the page is set to "User Defined," and
3.  the page is saved as an ANSI document, not any type of Unicode.

I am sometimes able to change the encoding of online documents displayed in the browser; but for documents that reside on my computer, if they are saved as anything other than ANSI, I am unable to view plane 1 characters in IE6 at all.

I am able to view UTF-8 and UTF-16 encoded characters on Mozilla Firefox 1.5, Netscape 8, Opera 9, and K-Meleon 1.02 after making the following registry change (see the MSDN library and Tex Texin's site for details):
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\LanguagePack]
"SURROGATE"=dword:00000002

The following change is also recommended (use the font of your choice),
[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\International\Scripts\42]
"IEFixedFontName"="Code2001"
"IEPropFontName"="Code2001"
but seems to have no effect on my system.  The keys in this group store the settings made via Tools-Internet Options-Fonts.  Key number 40 corresponds to "User Defined"; adding key number 42 makes no change that I can identify.

Using Windows XP Professional, Internet Explorer 7 displays all the character encodings above without any registry changes.

When creating or saving documents with Notepad, four encoding choices are available, but the nomenclature is not specific; in Windows applications, "Unicode" means "UTF-16 little endian."
• ANSI (plain text)
• Unicode (= UTF-16 little endian, the native Windows XP encoding)
• Unicode big endian (= UTF-16 big endian)
• UTF-8

This page was saved as an ANSI document, and the charset declaration is
<meta http-equiv="Content-Type" content="text/html; charset=x-user-defined">
You should be able to view the following if you have ALPHABETUM Unicode or Code2001 on your computer.  You can try changing the character encoding to "User Defined" (under the "View" menu) if it is not already.  You can set a default "User Defined" font via Tools-Internet Options-Fonts, or Tools-Options-Content or General-Fonts & Colors-Advanced.

The following test character is the one at codepoint 10381, Ugaritic letter beta.  It displays properly when the character is written with numeric code references, either as the original codepoint OR as a surrogate pair, using either hexadecimal or decimal:
&#x10381;  𐎁
&#66433;  𐎁
&#xD800;&#xDF81;  ��
&#55296;&#57217;  ��

When using JavaScript, however, the character MUST be scripted as a surrogate pair:
String.fromCharCode(0xD800) + String.fromCharCode(0xDF81); 
String.fromCharCode(55296) + String.fromCharCode(57217); 

It does not display when scripted using the original codepoint:
String.fromCharCode(0x10381); 
String.fromCharCode(66433); 

When reading a character, JavaScript "sees" it as a surrogate pair, even when the HTML is written as a single character.
The following form field contains the single HTML reference &#x10381; 

document.theForm.theField.value.charCodeAt(0) + " + " +
document.theForm.theField.value.charCodeAt(1); 

document.theForm.theField.value.charCodeAt(0).toString(16).toUpperCase() + " + " +
document.theForm.theField.value.charCodeAt(1).toString(16).toUpperCase(); 

 

www.russellcottrell.com