A little on charAt(), charCodeAt(), fromCharCode(), and at()

A little on charAt(), charCodeAt(), fromCharCode(), and at()

Spoiler alert: JavaScript has a Unicode problem.

·

6 min read

charAt

The charAt() String object takes an optional index parameter and will return the character located at that position.

The index parameter must be an integer between 0 and whatever string.length - 1 is. You can also count backwards and get the penultimate, etc. index, that's ok. You can even pass the string '5' and charAt() will convert it to the number 5!

If no index is provided, the default is 0 (so the first character will be returned).

If you try to use a floating point number rather than an integer (7.45 rather than just 7), then only the integer part will be taken into account.

Alternatively, if your index parameter can't be converted to an integer, charAt() will also apply the default 0.

Lastly, the integer provided is out of the range of the string - that is, if you try to get the character at index 50 but the string is only 20 characters long - then an empty string will be returned.

To make visualization easier:

const string = 'Well, look!'

console.log(string.charAt())
// output: "W"

const index = 10
console.log(string.charAt(index))
// output: "!"

const indexString = '10'
console.log(string.charAt(indexString))
// output: "!"

const floatingIndex = 1.14
console.log(string.charAt(floatingIndex))
// output: "e"

const penultimateIndex = string.length -2
console.log(string.charAt(penultimateIndex))
// output: "k"

const outOfRange = 99
console.log(string.charAt(outOfRange))
// output: ""

const invalidIndex = 'a'
console.log(string.charAt(invalidIndex))
// output: "W"

Ghosts have characters too

When I said charAt() returned a character, I wasn't lying... but it must be mentioned that, for definition reasons, a character will be ONE UTF-16 code unit long. Some characters can't be represented in a single unit.

Let's take the string G, (for Ghost).

let string = 'G'
console.log(string.length)
// output: 1

You see one character, and so does the computer.

Now spicing things up a bit, and let's make our string the ghost emoji 👻

let string = '👻'
console.log(string.length)
// output: 2

Well, would you look at that! A single ghost 👻 is actually two UTF-16 code units long! You see a single entity, but the computer sees two.

Lesson of the day: String length is NOT equal to character count.

That means your charAt() output might look a little different...

let string = '👻'
console.log(string.charAt(), string.charAt(1), string.charAt(2))
// output: "�" "�" ""

So we can log the first character at index 0, the second character at index 1, and from then on we get an empty string because 2 is out of the range of the string.

What a weird little �! What is it? Before we move on to charCodeAt(), let me show you something interesting. Look what happens if you remove that console.log and go straight to the source:

'👻'.charAt()
// output: '\uD83D'

'👻'.charAt(1)
// output: '\uDC7B'

or even

let string = '👻'
string.charAt()
//output: '\uD83D'

string.charAt(1)
//output: '\uDC7B'

We've now got the UTF-16, in hex.

Here are some resources

They've definitely helped me understand this Unicode stuff a little better:

Well, let's move on to charCodeAt() to help it make more sense.

charCodeAt

A UTF-16 character will have a code unit ranging from 0 to 65535, and that integer is what charCodeAt() will return.

It works similarly to charAt() in that it will take an index parameter, but this time around an index that's out of range will return NaN.

To use the same examples from above to help see the difference better:

const string = 'Well, look!'

console.log(string.charCodeAt())
// output: "87"

const index = 10
console.log(string.charCodeAt(index))
// output: "33"

const indexString = '10'
console.log(string.charCodeAt(indexString))
// output: "33"

const floatingIndex = 1.14
console.log(string.charCodeAt(floatingIndex))
// output: "101"

const penultimateIndex = string.length -2
console.log(string.charCodeAt(penultimateIndex))
// output: "107"

const outOfRange = 99
console.log(string.charCodeAt(outOfRange))
// output: "NaN"

const invalidIndex = 'a'
console.log(string.charCodeAt(invalidIndex))
// output: "87"

So now those are the UTF-16 values of our characters: W (uppercase) is 87, e (lowercase) is 101, and so on.

What happens to the ghost?

We know our friendly ghost is two characters long, so let's try checking the characters at index 0, 1 and 2 (which is out of range). Note that the output will be in decimal, not hex.

const string = '👻'

console.log(string.charCodeAt())
// output: "55357"

const index = 1
console.log(string.charCodeAt(index))
// output: "56443"

const indexString = '1'
console.log(string.charCodeAt(indexString))
// output: "56443"

const outOfRange = 2
console.log(string.charCodeAt(outOfRange))
// output: "NaN"

const invalidIndex = 'a'
console.log(string.charCodeAt(invalidIndex))
// output: "55357"

And then you can go the other way around with our next stop, at fromCharCode()

fromCharCode

fromCharCode() returns a string based on the UTF-16 sequence you feed it, up to... whatever you want.

Note that it returns a string and not a String object, so you'll always wrap it up like String.fromCharCode() rather than feed it a string you created.

In practice:

String.fromCharCode(87, 101, 108, 108, 44, 32, 108, 111, 111, 107, 33)
//output: 'Well, look!'

It'll work with the string equivalents, too:

String.fromCharCode('87', '101', '108', '108', '44', '32', '108', '111', '111', '107', '33')
//output: 'Well, look!'

It won't work if the char code is invalid, though. You'll just get back an empty value like so:

String.fromCharCode('a')
//output: '\x00'

And so that our Ghost isn't left behind, you can do that with it too!

Playing around with the ghost

Do note that if you input a decimal UTF-16 like the ones we were getting above, you'll get its hex counterpart! This hex counterpart will be Unicode escaped so it'll have \u before it.

String.fromCharCode(55357)
//output: '\uD83D'

String.fromCharCode(56443)
//output: '\uDC7B'

String.fromCharCode(55357, 56443)
//output: '👻'

// Slap a 0x in front of the non-escaped hex and see the magic! 
String.fromCharCode(0xD83D,0xDC7B)
//output: '👻'

Boo~!

What about at?

at() is pretty much another way of writing charAt(), sure, but there's a difference that means the world to us devs that like typing the least possible.

It allows us... THIS:

const string = 'Well, look!'

console.log(string.at(-1))
// output: "!"

Compare it with charAt() returning us an empty string because -1 is out of the string's length:

const string = 'Well, look!'

console.log(string.charAt(-1))
// output: ""

console.log(string.charAt(string.length -1))
// output: "!"

Since at() takes negative values out of the box, it does make for a cleaner-looking option.

Addendum

We can access individual characters in a string directly by their index using bracket notation and treating the string as an array:

let string = 'Witness me!'
string[0]
//output: 'W'
string[0].toLowerCase()
//output: 'w'

string.charAt()
//output: 'W'
string.charAt().toLowerCase()
//output: 'w'

Put them into practice

If you do wish to practice them, here's one interesting option:

This 7kyu CodeWars kata goes over the explanation and offers an interesting challenge decoding a secret message!

And that's it for these four string methods! I hope I was of help and that we've learned something together