Chapter 6 - Character Set Translation

This chapter describes how you can define how the assembler is to perform translation between EBCDIC, ASCII and Unicode character sets.

Code Points and Code Pages

There are numerous ways to define which code points (numeric values) are assigned to which characters. Each of these schemes can be called a code page. IBM has assigned numbers to many of the different code pages.

Since the Tachyon assemblers can read and write files containing characters encoded in EBCDIC, ASCII and/or Unicode, the assembler needs to know how to translate between these character sets. The CODEPAGE option is used to tell the assembler in which EBCDIC and ASCII code pages it is to assume that the characters are encoded. In any assembly one EBCDIC and one ASCII code page will be used.

The assembler supports over 90 different Single Byte Character Set (SBCS) code pages. All of the EBCDIC and ASCII code pages are defined in terms of their translation to and from Unicode. When translating between EBCDIC and ASCII, characters are effectively converted first to Unicode and then to the target code page.

Normally any two given code pages will not define the same set of 256 characters, so usually some characters cannot be translated between the code pages. The assembler requires that all of the characters in the IBM High Level Assembler’s Standard Character Set must be translatable between the selected pair of EBCDIC and ASCII code pages. All but three of the characters (the national characters) must translate to their usual code points. These characters are the uppercase letters A-Z, the lowercase letters a-z and the digits 0-9 as well as the following:
blank & ' ( ) * + , - . / : = _
ASCII 20 26 27 28 29 2A 2B 2C 2D 2E 2F 3A 3D 5F
EBCDIC 40 50 7D 4D 5D 5C 4E 6B 60 4B 61 7A 7E 6D
The national characters at EBCDIC code points X'5B', X'7B' and X'7C' must also be translatable to ASCII. In code page 37 (EBCDIC USA), these code points correspond to the $, # and @ characters.

Note: Some EBCDIC code pages such as 290 (EBCDIC Katakana), 803 (EBCDIC Hebrew) and 1030 (EBCDIC Katakana Extended) do not define the lowercase letters a-z to their normal code points. These code pages are not usable by the assembler.


CODEPAGE Option

IBM’s High Level Assembler uses the CODEPAGE option to define the EBCDIC code page of the source files. It uses this code page information only to translate EBCDIC characters to Unicode in CU constants and literals. The Tachyon assemblers support an extended CODEPAGE option to specify both an EBCDIC and an ASCII code page.

The CODEPAGE option is specified as CODEPAGE(ebcdic,ascii,list) where ebcdic is an EBCDIC code page number, ascii is an ASCII code page number, and list is either LIST or NOLIST. The code page numbers may be specified as either decimal numbers or their hexadecimal equivalents using the X'hex' notation. When setting the CODEPAGE option, the EBCDIC code page must be specified. If the ASCII code page number is omitted, the default is 819 (ISO-8859-1 Latin-1). If the list option is omitted, the default is NOLIST. If LIST is specified, the resulting translation between the EBCDIC, ASCII and Unicode code points will be displayed in the assembly listing.

The default for the CODEPAGE option is CODEPAGE(1047,819,NOLIST). These code pages translate all 256 code points between EBCDIC and ASCII. These are also the default EBCDIC and ASCII code pages for z/OS UNIX Systems Services, Tachyon File Tools and the Tachyon Operating System. However, the default EBCDIC code page is different from High Level Assembler’s default of CODEPAGE(1148).

The following shows the output of the CODEPAGE(1047,819,LIST) option:

CodePage(1047,819)  EBCDIC/ASCII/Unicode Translation-
00/00/0000     01/01/0001     02/02/0002     03/03/0003     04/9C/009C     05/09/0009     06/86/0086     07/7F/007F
08/97/0097     09/8D/008D     0A/8E/008E     0B/0B/000B     0C/0C/000C     0D/0D/000D     0E/0E/000E     0F/0F/000F
10/10/0010     11/11/0011     12/12/0012     13/13/0013     14/9D/009D     15/85/0085     16/08/0008     17/87/0087
18/18/0018     19/19/0019     1A/92/0092     1B/8F/008F     1C/1C/001C     1D/1D/001D     1E/1E/001E     1F/1F/001F
20/80/0080     21/81/0081     22/82/0082     23/83/0083     24/84/0084     25/0A/000A     26/17/0017     27/1B/001B
28/88/0088     29/89/0089     2A/8A/008A     2B/8B/008B     2C/8C/008C     2D/05/0005     2E/06/0006     2F/07/0007
30/90/0090     31/91/0091     32/16/0016     33/93/0093     34/94/0094     35/95/0095     36/96/0096     37/04/0004
38/98/0098     39/99/0099     3A/9A/009A     3B/9B/009B     3C/14/0014     3D/15/0015     3E/9E/009E     3F/1A/001A
40/20/0020     41/A0/00A0     42/E2/00E2     43/E4/00E4     44/E0/00E0     45/E1/00E1     46/E3/00E3     47/E5/00E5
48/E7/00E7     49/F1/00F1     4A/A2/00A2     4B/2E/002E     4C/3C/003C     4D/28/0028     4E/2B/002B     4F/7C/007C
50/26/0026     51/E9/00E9     52/EA/00EA     53/EB/00EB     54/E8/00E8     55/ED/00ED     56/EE/00EE     57/EF/00EF
58/EC/00EC     59/DF/00DF     5A/21/0021     5B/24/0024     5C/2A/002A     5D/29/0029     5E/3B/003B     5F/5E/005E
60/2D/002D     61/2F/002F     62/C2/00C2     63/C4/00C4     64/C0/00C0     65/C1/00C1     66/C3/00C3     67/C5/00C5
68/C7/00C7     69/D1/00D1     6A/A6/00A6     6B/2C/002C     6C/25/0025     6D/5F/005F     6E/3E/003E     6F/3F/003F
70/F8/00F8     71/C9/00C9     72/CA/00CA     73/CB/00CB     74/C8/00C8     75/CD/00CD     76/CE/00CE     77/CF/00CF
78/CC/00CC     79/60/0060     7A/3A/003A     7B/23/0023     7C/40/0040     7D/27/0027     7E/3D/003D     7F/22/0022
80/D8/00D8     81/61/0061     82/62/0062     83/63/0063     84/64/0064     85/65/0065     86/66/0066     87/67/0067
88/68/0068     89/69/0069     8A/AB/00AB     8B/BB/00BB     8C/F0/00F0     8D/FD/00FD     8E/FE/00FE     8F/B1/00B1
90/B0/00B0     91/6A/006A     92/6B/006B     93/6C/006C     94/6D/006D     95/6E/006E     96/6F/006F     97/70/0070
98/71/0071     99/72/0072     9A/AA/00AA     9B/BA/00BA     9C/E6/00E6     9D/B8/00B8     9E/C6/00C6     9F/A4/00A4
A0/B5/00B5     A1/7E/007E     A2/73/0073     A3/74/0074     A4/75/0075     A5/76/0076     A6/77/0077     A7/78/0078
A8/79/0079     A9/7A/007A     AA/A1/00A1     AB/BF/00BF     AC/D0/00D0     AD/5B/005B     AE/DE/00DE     AF/AE/00AE
B0/AC/00AC     B1/A3/00A3     B2/A5/00A5     B3/B7/00B7     B4/A9/00A9     B5/A7/00A7     B6/B6/00B6     B7/BC/00BC
B8/BD/00BD     B9/BE/00BE     BA/DD/00DD     BB/A8/00A8     BC/AF/00AF     BD/5D/005D     BE/B4/00B4     BF/D7/00D7
C0/7B/007B     C1/41/0041     C2/42/0042     C3/43/0043     C4/44/0044     C5/45/0045     C6/46/0046     C7/47/0047
C8/48/0048     C9/49/0049     CA/AD/00AD     CB/F4/00F4     CC/F6/00F6     CD/F2/00F2     CE/F3/00F3     CF/F5/00F5
D0/7D/007D     D1/4A/004A     D2/4B/004B     D3/4C/004C     D4/4D/004D     D5/4E/004E     D6/4F/004F     D7/50/0050
D8/51/0051     D9/52/0052     DA/B9/00B9     DB/FB/00FB     DC/FC/00FC     DD/F9/00F9     DE/FA/00FA     DF/FF/00FF
E0/5C/005C     E1/F7/00F7     E2/53/0053     E3/54/0054     E4/55/0055     E5/56/0056     E6/57/0057     E7/58/0058
E8/59/0059     E9/5A/005A     EA/B2/00B2     EB/D4/00D4     EC/D6/00D6     ED/D2/00D2     EE/D3/00D3     EF/D5/00D5
F0/30/0030     F1/31/0031     F2/32/0032     F3/33/0033     F4/34/0034     F5/35/0035     F6/36/0036     F7/37/0037
F8/38/0038     F9/39/0039     FA/B3/00B3     FB/DB/00DB     FC/DC/00DC     FD/D9/00D9     FE/DA/00DA     FF/9F/009F
Each translatable EBCDIC character is displayed as a group of three code points. The first code point is for the selected EBCDIC code page, the second is for the selected ASCII code page and the third is the Unicode code point. If the EBCDIC character cannot be translated to ASCII, the ASCII code point will be listed as --.


EBCDIC Code Pages

Code PageDescription
00037EBCDIC USA, Canada, Australia, New Zealand, Netherlands, Brazil, Portugal
00264EBCDIC Print Train and Text Processing
00273EBCDIC Austria, Germany
00274EBCDIC Belgium
00275EBCDIC Brazil
00277EBCDIC Denmark, Norway
00278EBCDIC Finland, Sweden
00280EBCDIC Italy
00281EBCDIC Japanese English
00284EBCDIC Spanish
00285EBCDIC United Kingdom
00293EBCDIC APL
00297EBCDIC France
00420EBCDIC Arabic
00423EBCDIC Greek
00424EBCDIC Hebrew
00500EBCDIC Latin-1
00838EBCDIC Thai
00870EBCDIC Latin-2
00871EBCDIC Iceland
00875EBCDIC Greek
00880EBCDIC Cyrillic
00924EBCDIC Latin-9
01005EBCDIC Isomophic Text Communication
01025EBCDIC Russian
01026EBCDIC Turkey
01027EBCDIC Japanese (Latin) Extended
01031EBCDIC Japanese (Latin) Extended
01047EBCDIC Latin-1
01122EBCDIC Estonia
01123EBCDIC Ukraine
01130EBCDIC Vietnamese
01140EBCDIC USA, Canada, Australia, New Zealand, Netherlands
01141EBCDIC Austria, Germany
01142EBCDIC Denmark, Norway
01143EBCDIC Finland, Sweden
01144EBCDIC Italy
01145EBCDIC Spanish
01146EBCDIC United Kingdom
01147EBCDIC France
01148EBCDIC Latin-1
01149EBCDIC Iceland
01153EBCDIC Latin-2
01154EBCDIC Cyrillic
01155EBCDIC Turkey
01156EBCDIC Baltic
01157EBCDIC Estonia
01158EBCDIC Ukraine
01160EBCDIC Thai
01164EBCDIC Vietnamese
01165EBCDIC Latin-2


ASCII Code Pages

Code PageDescription
00367US-ASCII-7
00437DOS USA
00720DOS Arabic
00737DOS Greek
00775DOS Baltic
00813ISO-8859-7 Greek
00819ISO-8859-1 Latin-1 Western European
00850DOS Latin-1
00852DOS Latin-2
00855DOS Cyrillic
00856DOS Hebrew
00857DOS Turkish
00858DOS Latin-1 + Euro
00860DOS Portuguese
00861DOS Icelandic
00862DOS Israel
00863DOS French Canadian
00864DOS Arabic
00865DOS Nordic
00866DOS Russian
00869DOS Greek
00874ISO-8859-11 Thai
00878KOI8-R Russian
00907ASCII APL
00910DOS APL
00912ISO-8859-2 Latin-2 Eastern European
00913ISO-8859-3 Latin-3 Southern European
00914ISO-8859-4 Latin-4 Northern European
00915ISO-9959-5 Cyrillic
00916ISO-8859-8 Hebrew
00919ISO-8859-10 Latin-6 Nordic
00920ISO-8859-9 Latin-5 Turkish
00921ISO-8859-13 Latin-7 Baltic
00923ISO-8859-15 Latin-9
01006DOS Urdu
01089ISO-8859-6 Arabic
01139ASCII Japanese Alphanumeric Katakana
01250Windows Latin-2
01251Windows Cyrillic
01252Windows Latin-1
01253Windows Greek
01254Windows Latin-5 Turkish
01255Windows Hebrew
01256Windows Arabic
01257Windows Baltic
01258Windows Vietnamese


Frames No Frames Previous Next Contents
Introduction Setup Running Options Macros Translation Compatibility Messages
© Copyright 2004-2008, Tachyon Software® LLC.
Last modified on October 24, 2008