+
String.prototype
: finding and matchingString.prototype
: extractingString.prototype
: combiningString.prototype
: transformingStrings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.
Plain string literals are delimited by either single quotes or double quotes:
Single quotes are used more often, because it makes it easier to mention HTML, where double quotes are preferred.
The next chapter covers template literals, which give you:
The backslash lets you create special characters:
'\n'
'\r\n'
'\t'
'\\'
The backslash also lets you use the delimiter of a string literal inside that literal:
JavaScript has no extra data type for characters – characters are always represented as strings.
const str = 'abc';
// Reading a character at a given index
assert.equal(str[1], 'b');
// Counting the characters in a string:
assert.equal(str.length, 3);
for-of
and spreadingIterating over strings via for-of
or spreading (...
) visits Unicode code point characters. Each code point character is encoded by 1–2 JavaScript characters. For more information, see §18.6 “Atoms of text: Unicode characters, JavaScript characters, grapheme clusters”.
This is how you iterate over the code point characters of a string via for-of
:
And this is how you convert a string into an Array of code point characters via spreading:
+
If at least one operand is a string, the plus operator (+
) converts any non-strings to strings and concatenates the result:
The assignment operator +=
is useful if you want to assemble a string, piece by piece:
let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';
assert.equal(str, 'Say it one more time');
Concatenating via
+
is efficient
Using +
to assemble strings is quite efficient, because most JavaScript engines internally optimize it.
Exercise: Concatenating strings
exercises/strings/concat_string_array_test.mjs
These are three ways of converting a value x
to a string:
String(x)
''+x
x.toString()
(does not work for undefined
and null
)Recommendation: use the descriptive and safe String()
.
Examples:
assert.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');
assert.equal(String(false), 'false');
assert.equal(String(true), 'true');
assert.equal(String(123.45), '123.45');
Pitfall for booleans: If you convert a boolean to a string via String()
, you generally can’t convert it back via Boolean()
:
The only string for which Boolean()
returns false
, is the empty string.
Plain objects have a default string representation that is not very useful:
Arrays have a better string representation, but it still hides much information:
> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'
> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'
> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'
Stringifying functions, returns their source code:
You can override the built-in way of stringifying objects by implementing the method toString()
:
The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify()
can also be used to convert values to strings:
The caveat is that JSON only supports null
, booleans, numbers, strings, Arrays and objects (which it always treats as if they were created by object literals).
Tip: The third parameter lets you switch on multi-line output and specify how much to indent. For example:
This statement produces the following output.
{
"first": "Jane",
"last": "Doe"
}
Strings can be compared via the following operators:
< <= > >=
There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:
Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl
).
Quick recap of §17 “Unicode – a brief introduction”:
The following code demonstrates that a single Unicode character comprises one or two JavaScript characters. We count the latter via .length
:
// 3 Unicode characters, 3 JavaScript characters:
assert.equal('abc'.length, 3);
// 1 Unicode character, 2 JavaScript characters:
assert.equal('🙂'.length, 2);
The following table summarizes the concepts we have just explored:
Entity | Numeric representation | Size | Encoded via |
---|---|---|---|
Grapheme cluster | 1+ code points | ||
Unicode character | Code point | 21 bits | 1–2 code units |
JavaScript character | UTF-16 code unit | 16 bits | – |
Let’s explore JavaScript’s tools for working with code points.
A code point escape lets you specify a code point hexadecimally. It produces one or two JavaScript characters.
String.fromCodePoint()
converts a single code point to 1–2 JavaScript characters:
.codePointAt()
converts 1–2 JavaScript characters to a single code point:
You can iterate over a string, which visits Unicode characters (not JavaScript characters). Iteration is described later in this book. One way of iterating is via a for-of
loop:
const str = '🙂a';
assert.equal(str.length, 3);
for (const codePointChar of str) {
console.log(codePointChar);
}
// Output:
// '🙂'
// 'a'
Spreading (...
) into Array literals is also based on iteration and visits Unicode characters:
That makes it a good tool for counting Unicode characters:
Indices and lengths of strings are based on JavaScript characters (as represented by UTF-16 code units).
To specify a code unit hexadecimally, you can use a code unit escape:
And you can use String.fromCharCode()
. Char code is the standard library’s name for code unit:
To get the char code of a character, use .charCodeAt()
:
When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of Unicode characters.
TC39 is working on Intl.Segmenter
, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).
Until that proposal becomes a standard, you can use one of several libraries that are available (do a web search for “JavaScript grapheme”).
Strings are immutable, none of the string methods ever modify their strings.
Tbl. 13 describes how various values are converted to strings.
x |
String(x) |
---|---|
undefined |
'undefined' |
null |
'null' |
Boolean value | false → 'false' , true → 'true' |
Number value | Example: 123 → '123' |
String value | x (input, unchanged) |
An object | Configurable via, e.g., toString() |
String.fromCharCode()
[ES1].charCodeAt()
[ES1]String.fromCodePoint()
[ES6].codePointAt()
[ES6]// Access characters via []
const str = 'abc';
assert.equal(str[1], 'b');
// Concatenate strings via +
assert.equal('a' + 'b' + 'c', 'abc');
assert.equal('take ' + 3 + ' oranges', 'take 3 oranges');
String.prototype
: finding and matching(String.prototype
is where the methods of strings are stored.)
.endsWith(searchString: string, endPos=this.length): boolean
[ES6]
Returns true
if the string would end with searchString
if its length were endPos
. Returns false
, otherwise.
.includes(searchString: string, startPos=0): boolean
[ES6]
Returns true
if the string contains the searchString
and false
, otherwise. The search starts at startPos
.
.indexOf(searchString: string, minIndex=0): number
[ES1]
Returns the lowest index at which searchString
appears within the string, or -1
, otherwise. Any returned index will be minIndex
or higher.
.lastIndexOf(searchString: string, maxIndex=Infinity): number
[ES1]
Returns the highest index at which searchString
appears within the string, or -1
, otherwise. Any returned index will be maxIndex
or lower.
[1 of 2] .match(regExp: string | RegExp): RegExpMatchArray | null
[ES3]
If regExp
is a regular expression with flag /g
not set, then .match()
returns the first match for regExp
within the string. Or null
if there is no match. If regExp
is a string, it is used to create a regular expression (think parameter of new RegExp()
) before performing the previously mentioned steps.
The result has the following type:
interface RegExpMatchArray extends Array<string> {
index: number;
input: string;
groups: undefined | {
[key: string]: string
};
}
Numbered capture groups become Array indices (which is why this type extends Array
). Named capture groups (ES2018) become properties of .groups
. In this mode, .match()
works like RegExp.prototype.exec()
.
Examples:
[2 of 2] .match(regExp: RegExp): string[] | null
[ES3]
If flag /g
of regExp
is set, .match()
returns either an Array with all matches or null
if there was no match.
.search(regExp: string | RegExp): number
[ES3]
Returns the index at which regExp
occurs within the string. If regExp
is a string, it is used to create a regular expression (think parameter of new RegExp()
).
.startsWith(searchString: string, startPos=0): boolean
[ES6]
Returns true
if searchString
occurs in the string at index startPos
. Returns false
, otherwise.
String.prototype
: extracting.slice(start=0, end=this.length): string
[ES3]
Returns the substring of the string that starts at (including) index start
and ends at (excluding) index end
. If an index is negative, it is added to .length
before they are used (-1
means this.length-1
, etc.).
.split(separator: string | RegExp, limit?: number): string[]
[ES3]
Splits the string into an Array of substrings – the strings that occur between the separators. The separator can be a string:
It can also be a regular expression:
> 'a : b : c'.split(/ *: */)
[ 'a', 'b', 'c' ]
> 'a : b : c'.split(/( *):( *)/)
[ 'a', ' ', ' ', 'b', ' ', ' ', 'c' ]
The last invocation demonstrates that captures made by groups in the regular expression become elements of the returned Array.
Warning: .split('')
splits a string into JavaScript characters. That doesn’t work well when dealing with astral Unicode characters (which are encoded as two JavaScript characters). For example, emojis are astral:
Instead, it is better to use spreading:
.substring(start: number, end=this.length): string
[ES1]
Use .slice()
instead of this method. .substring()
wasn’t implemented consistently in older engines and doesn’t support negative indices.
String.prototype
: combining.concat(...strings: string[]): string
[ES3]
Returns the concatenation of the string and strings
. 'a'.concat('b')
is equivalent to 'a'+'b'
. The latter is much more popular.
.padEnd(len: number, fillString=' '): string
[ES2017]
Appends (fragments of) fillString
to the string until it has the desired length len
. If it already has or exceeds len
, then it is returned without any changes.
.padStart(len: number, fillString=' '): string
[ES2017]
Prepends (fragments of) fillString
to the string until it has the desired length len
. If it already has or exceeds len
, then it is returned without any changes.
.repeat(count=0): string
[ES6]
Returns the string, concatenated count
times.
String.prototype
: transforming.normalize(form: 'NFC'|'NFD'|'NFKC'|'NFKD' = 'NFC'): string
[ES6]
Normalizes the string according to the Unicode Normalization Forms.
[1 of 2] .replace(searchValue: string | RegExp, replaceValue: string): string
[ES3]
Replace matches of searchValue
with replaceValue
. If searchValue
is a string, only the first verbatim occurrence is replaced. If searchValue
is a regular expression without flag /g
, only the first match is replaced. If searchValue
is a regular expression with /g
then all matches are replaced.
> 'x.x.'.replace('.', '#')
'x#x.'
> 'x.x.'.replace(/./, '#')
'#.x.'
> 'x.x.'.replace(/./g, '#')
'####'
Special characters in replaceValue
are:
$$
: becomes $
$n
: becomes the capture of numbered group n
(alas, $0
stands for the string '$0'
, it does not refer to the complete match)$&
: becomes the complete match$`
: becomes everything before the match$'
: becomes everything after the matchExamples:
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$2|')
'a |04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$&|')
'a |2020-04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$`|')
'a |a | b'
Named capture groups (ES2018) are supported, too:
$<name>
becomes the capture of named group name
Example:
[2 of 2] .replace(searchValue: string | RegExp, replacer: (...args: any[]) => string): string
[ES3]
If the second parameter is a function, occurrences are replaced with the strings it returns. Its parameters args
are:
matched: string
. The complete matchg1: string|undefined
. The capture of numbered group 1g2: string|undefined
. The capture of numbered group 2offset: number
. Where was the match found in the input string?input: string
. The whole input stringconst regexp = /([0-9]{4})-([0-9]{2})/;
const replacer = (all, year, month) => '|' + all + '|';
assert.equal(
'a 2020-04 b'.replace(regexp, replacer),
'a |2020-04| b');
Named capture groups (ES2018) are supported, too. If there are any, an argument is added at the end, with an object whose properties contain the captures:
.toUpperCase(): string
[ES1]
Returns a copy of the string, in which all lowercase alphabetic characters are converted to uppercase. How well that works for various alphabets, depends on the JavaScript engine.
.toLowerCase(): string
[ES1]
Returns a copy of the string, in which all uppercase alphabetic characters are converted to lowercase. How well that works for various alphabets, depends on the JavaScript engine.
.trim(): string
[ES5]
Returns a copy of the string, in which all leading and trailing whitespace (spaces, tabs, line terminators, etc.) is gone.
.trimEnd(): string
[ES2019]
Similar to .trim()
, but only the end of the string is trimmed:
.trimStart(): string
[ES2019]
Similar to .trim()
, but only the beginning of the string is trimmed:
Exercise: Using string methods
exercises/strings/remove_extension_test.mjs
Quiz
See quiz app.