The relationship between Unicode and UTF-16....?

Question

What I can make out so far is

1) unicode is a Character Set that uses a 'code point' to translate a code of number (and sometimes letters) into a Character. This includes any character or symbol on the keyboard and all of the characters of foreign alphabets as well.

2) UTF-16 is the default encoding of C#, which translates machine readable binary data into numbers, which are then translated into characters with the Unicode Character set.
UTF-16 characters are made up of TWO bytes

I think so far what is declared about is accurate, after this things get fuzzy.

A byte is made up of 8 binaray bits (eight, ones and zeros), correct?
If a UTF-16 character is made up of two of these bytes (or 16 bits), correct?

............. In the video Carling uses the UnicodeEncoding class to get the 2 bytes and insert them in a 2 item array.

byte[] unicodeBytes = UnicodeEncoding.Unicode.GetBytes(new char[] {'h'});
// returns  byte[2]{104,0}

What is {104,0} ??

I was expecting either two-8-digit-binary-bits. or something like a Unicode codepoint, which for the small letter 'h' is u0068

but {104,0} How should this two number array by understood?

Answer 1 · 2016-07-27T20:52:11Z

July 27, 2016 8:52pm

Those are the decimal byte values ("unsigned", not "unassigned").

Good job figuring that out. But the remaining issue is about storage vs. values. A Byte is a unit of storage that is 8 bits in size. Now the values it can store can be expressed many ways:

unsigned decimal: 0 - 255
signed decimal: -128 - 127
hexadecimal: 00 - FF
octal: 000 - 377
binary: 00000000 - 11111111

On the other hand, Unicode is a set of values, which have a range too big to be stored in a single byte. It is commonly stored as two bytes used together to get 16 bits of storage. The value of a particular unicode character can also be expressed many ways, similar to the ranges listed above, plus more since it can be shown as two individual bytes (as it was in the code you included) or as one 16-bit value (like "u0068").

Does that clear it up?

Answer 2 · 2017-12-26T21:19:18Z

December 26, 2017 9:19pm

Most programming languages read UTF 16 To be noted

Welcome to the Treehouse Community

Looking to learn something new?

Aaron Selonke

Aaron Selonke

The relationship between Unicode and UTF-16....?

Aaron Selonke

Aaron Selonke

2 Answers

Steven Parker

Steven Parker

Those are the decimal byte values ("unsigned", not "unassigned").

Aaron Selonke

Aaron Selonke

HIDAYATULLAH ARGHANDABI

HIDAYATULLAH ARGHANDABI