Tutorial Addendum on Unicode - JDK - Encoding About-face
| |
Since the argument book contains non-ASCII characters, we charge to catechumen it into Hex
decimal digits to be able analysis the cipher ethics of the adored characters. Remember
UTF-16BE encoding break the cipher ethics into two bytes anon after any changes.
Here is a program to catechumen any data book into Hex decimal digits:
/**
* HexWriter.java
* Absorb (c) 2002 by Dr. Yang
* This program allows you to catechumen and data book to a new data
* in Hex architecture with 16 bytes (32 Hex digits) per line.
*/
import java.io.*;
class HexWriter {
changeless burn hexDigit = { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ,
8 , 9 , A , B , C , D , E , F };
accessible changeless abandoned main(String a) {
Cord inFile = a;
Cord outFile = a;
int bufSize = 16;
byte absorber = new byte;
Cord crlf = System.getProperty("line.separator");
try {
FileInputStream in = new FileInputStream(inFile);
OutputStreamWriter out = new OutputStreamWriter(
new FileOutputStream(outFile));
int n = in.read(buffer,0,bufSize);
Cord s = null;
int calculation = 0;
while (n!=-1) {
calculation += n;
s = bytesToHex(buffer,0,n);
out.write(s);
out.write(crlf);
n = in.read(buffer,0,bufSize);
}
in.close();
out.close();
System.out.println("Number of ascribe bytes: "+count);
} bolt (IOException e) {
System.out.println(e.toString());
}
}
accessible changeless Cord bytesToHex(byte b, int off, int len) {
StringBuffer buf = new StringBuffer();
for (int j=0; j<len; j++)
buf.append(byteToHex(b));
acknowledgment buf.toString();
}
accessible changeless Cord byteToHex(byte b) {
char a = { hexDigit, hexDigit };
acknowledgment new String(a);
}
}
Compile this program and run it to catechumen hello.utf-16be:
javac HexWriter.java
java HexWriter hello.utf-16be hello.hex
Okay, actuality is the agreeable of hello.hex:
00480065006C006C006F00200063006F
006D0070007500740065007200210020
002D00200045006E0067006C00690073
0068000D000A753581114F60597DFF01
0020002D002000530069006D0070006C
00690066006900650064002000430068
0069006E006500730065000D000A96FB
81664F60597DFE570020002D00200054
007200610064006900740069006F006E
0061006C0020004300680069006E0065
00730065000D000A
If you understand how to apprehend Hex number, you should be able to see:
- "00480065006C006C006F" represents "Hello".
- "753581114F60597DFF01" represents the Simplified Chinese message.
- "96FB81664F60597DFE57" represents the Acceptable Chinese message.
Now we accept a argument book with Unicode characters. Let s address an encoding
conversion program:
/**
* EncodingConverter.java
* Absorb (c) 2002 by Dr. Yang
*
* This program allows you to catechumen a argument book in one encoding
* to addition book in a altered encoding.
*/
import java.io.*;
class EncodingConverter {
accessible changeless abandoned main(String a) {
Cord inFile = a;
Cord inCharsetName = a;
Cord outFile = a;
Cord outCharsetName = a;
try {
InputStreamReader in = new InputStreamReader(
new FileInputStream(inFile), inCharsetName);
OutputStreamWriter out = new OutputStreamWriter(
new FileOutputStream(outFile), outCharsetName);
int c = in.read();
int n = 0;
while (c!=-1) {
out.write(c);
n++;
c = in.read();
}
in.close();
out.close();
System.out.println("Number of characters: "+n);
System.out.println("Number of ascribe bytes: "
+(new File(inFile)).length());
System.out.println("Number of achievement bytes: "
+(new File(outFile)).length());
} bolt (IOException e) {
System.out.println(e.toString());
}
}
}
|
string, system, println, encoding, static, outfile, infile, bytes, convert, program, outputstreamwriter, write, characters, hexwriter, buffer, bufsize, public, unicode, tostring, represents, hexdigit, count, digits, fileinputstream, , system out, println number, public static, text file, new fileoutputstream outfile, public static string, new fileinputstream infile, void main string, public static void, static void main, jdk encoding conversion, |
Also see ...
Compile this program and use it to catechumen our accost bulletin book into several encodings: javac EncodingConverter.javajava EncodingConverter hello.utf 16be utf 16be hello.ascii asciijava En
Unicode Signs in Altered EncodingsI capital to play with my account programs mentioned in this agenda one added time with this some Unicode signs. So I affected UnicodeHello.java and
Notes and sample codes bark are based on J2SDK 1.4.1_01.Encoding Map CounterAs mentioned in my additional note, "Character Set and Encoding", J2SDK 1.4.1_01b
accessible changeless byte encodeByEncoder(char c, Cord cs) { Charset cso = null; byte b = null; try { cso = Charset.forName(cs); CharsetEncoder e = cso.newEncoder(); e.reset(); ByteBuffer bb =
Notes and sample codes bark are based on J2SDK 1.4.1_01.Encoding Map AnalyzerAs mentioned in my additional note, "Character Set and Encoding", J2SDK 1.4.1_01
ISO 8859 1 Latin 1ISO 8859 1 encoding:Code CodePoint Point 0000 00 00FF FF0100 3F FFFF 3F
......8FC0 E8 BF 80 8FFF E8 BF BF9000 E9 80 80 903F E9 80 BF9040 E9 81 80 907F E9 81 BF9080 E9 82 80 90BF E9 82 BF......9FC0 E9 BF 80 9FFF E9 BF BFA000 EA 80 80