See more articles about "unicode "

Tutorial Addendum on Unicode - JDK - Encoding About-face



 31 December 18:00   

    



    



    

Since the argument book contains non-ASCII characters, we charge to catechumen it into Hex

    

decimal digits to be able analysis the cipher ethics of the adored characters. Remember

    

UTF-16BE encoding break the cipher ethics into two bytes anon after any changes.

    

Here is a program to catechumen any data book into Hex decimal digits:

    

 

    

/**

    

* HexWriter.java

    

* Absorb (c) 2002 by Dr. Yang

    

* This program allows you to catechumen and data book to a new data

    

* in Hex architecture with 16 bytes (32 Hex digits) per line.

    

*/

    

import java.io.*;

    

class HexWriter {

    

changeless burn hexDigit = { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ,

    

8 , 9 , A , B , C , D , E , F };

    

accessible changeless abandoned main(String a) {

    

Cord inFile = a;

    

Cord outFile = a;

    

int bufSize = 16;

    

byte absorber = new byte;

    

Cord crlf = System.getProperty("line.separator");

    

try {

    

FileInputStream in = new FileInputStream(inFile);

    

OutputStreamWriter out = new OutputStreamWriter(

    

new FileOutputStream(outFile));

    

int n = in.read(buffer,0,bufSize);

    

Cord s = null;

    

int calculation = 0;

    

while (n!=-1) {

    

calculation += n;

    

s = bytesToHex(buffer,0,n);

    

out.write(s);

    

out.write(crlf);

    

n = in.read(buffer,0,bufSize);

    

}

    

in.close();

    

out.close();

    

System.out.println("Number of ascribe bytes: "+count);

    

} bolt (IOException e) {

    

System.out.println(e.toString());

    

}

    

}

    

accessible changeless Cord bytesToHex(byte b, int off, int len) {

    

StringBuffer buf = new StringBuffer();

    

for (int j=0; j<len; j++)

    

buf.append(byteToHex(b));

    

acknowledgment buf.toString();

    

}

    

accessible changeless Cord byteToHex(byte b) {

    

char a = { hexDigit, hexDigit };

    

acknowledgment new String(a);

    

}

    

}

    



    



    

Compile this program and run it to catechumen hello.utf-16be:

    

 

    

javac HexWriter.java

    

java HexWriter hello.utf-16be hello.hex

    



    



    

Okay, actuality is the agreeable of hello.hex:

    

 

    

00480065006C006C006F00200063006F

    

006D0070007500740065007200210020

    

002D00200045006E0067006C00690073

    

0068000D000A753581114F60597DFF01

    

0020002D002000530069006D0070006C

    

00690066006900650064002000430068

    

0069006E006500730065000D000A96FB

    

81664F60597DFE570020002D00200054

    

007200610064006900740069006F006E

    

0061006C0020004300680069006E0065

    

00730065000D000A

    



    



    

If you understand how to apprehend Hex number, you should be able to see:

    



        

  • "00480065006C006C006F" represents "Hello".


  •     

  • "753581114F60597DFF01" represents the Simplified Chinese message.


  •     

  • "96FB81664F60597DFE57" represents the Acceptable Chinese message.


  •     



    



    



    

Unicode encoding Conversion

    



    

Now we accept a argument book with Unicode characters. Let s address an encoding

    

conversion program:

    

 

    

/**

    

* EncodingConverter.java

    

* Absorb (c) 2002 by Dr. Yang

    

*

    

* This program allows you to catechumen a argument book in one encoding

    

* to addition book in a altered encoding.

    

*/

    

import java.io.*;

    

class EncodingConverter {

    

accessible changeless abandoned main(String a) {

    

Cord inFile = a;

    

Cord inCharsetName = a;

    

Cord outFile = a;

    

Cord outCharsetName = a;

    

try {

    

InputStreamReader in = new InputStreamReader(

    

new FileInputStream(inFile), inCharsetName);

    

OutputStreamWriter out = new OutputStreamWriter(

    

new FileOutputStream(outFile), outCharsetName);

    

int c = in.read();

    

int n = 0;

    

while (c!=-1) {

    

out.write(c);

    

n++;

    

c = in.read();

    

}

    

in.close();

    

out.close();

    

System.out.println("Number of characters: "+n);

    

System.out.println("Number of ascribe bytes: "

    

+(new File(inFile)).length());

    

System.out.println("Number of achievement bytes: "

    

+(new File(outFile)).length());

    

} bolt (IOException e) {

    

System.out.println(e.toString());

    

}

    

}

    

}

    



    



    



 


 string, system, println, encoding, static, outfile, infile, bytes, convert, program, outputstreamwriter, write, characters, hexwriter, buffer, bufsize, public, unicode, tostring, represents, hexdigit, count, digits, fileinputstream, , system out, println number, public static, text file, new fileoutputstream outfile, public static string, new fileinputstream infile, void main string, public static void, static void main, jdk encoding conversion,

Share Tutorial Addendum on Unicode - JDK - Encoding About-face:
Digg it!   Google Bookmarks   Del.icio.us   Yahoo! MyWeb   Furl  Binklist   Reddit!   Stumble Upon   Technorati   Windows Live   Bookmark

Text link code :
Hyper link code:

Also see ...

Tutorial Addendum on Unicode - JDK - Encoding About-face
Compile this program and use it to catechumen our accost bulletin book into several encodings: javac EncodingConverter.javajava EncodingConverter hello.utf 16be utf 16be hello.ascii asciijava En

Tutorial Addendum on Unicode - JDK - Encoding About-face
Unicode Signs in Altered EncodingsI capital to play with my account programs mentioned in this agenda one added time with this some Unicode signs. So I affected UnicodeHello.java and

Tutorial Addendum on Unicode - JDK - Encoding Map Counts
Notes and sample codes bark are based on J2SDK 1.4.1_01.Encoding Map CounterAs mentioned in my additional note, "Character Set and Encoding", J2SDK 1.4.1_01b

Tutorial Addendum on Unicode - JDK - Encoding Map Counts
accessible changeless byte encodeByEncoder(char c, Cord cs) { Charset cso = null; byte b = null; try { cso = Charset.forName(cs); CharsetEncoder e = cso.newEncoder(); e.reset(); ByteBuffer bb =

Tutorial Addendum on Unicode - JDK - Encoding Maps
Notes and sample codes bark are based on J2SDK 1.4.1_01.Encoding Map AnalyzerAs mentioned in my additional note, "Character Set and Encoding", J2SDK 1.4.1_01

Tutorial Addendum on Unicode - JDK - Encoding Maps
ISO 8859 1 Latin 1ISO 8859 1 encoding:Code CodePoint Point 0000 00 00FF FF0100 3F FFFF 3F

Tutorial Addendum on Unicode - JDK - Encoding Maps
......8FC0 E8 BF 80 8FFF E8 BF BF9000 E9 80 80 903F E9 80 BF9040 E9 81 80 907F E9 81 BF9080 E9 82 80 90BF E9 82 BF......9FC0 E9 BF 80 9FFF E9 BF BFA000 EA 80 80