Syntax of regular expressions in JavaScript and a core collection

A set of rules describing a condition in the compact form of a  regular expression. This allows to isolates a text in a page and eventually replace it.
A regular expression is defined by an object or a literal.

The literal form of an expression has a special format, it is included between two slashes:

var er = /xyz/

While the object is created from a common string between quotation marks:

var er = new RegExp("xyz")

When entering a regular expression from a form, we get a common string, the object must then used to assign the expression to a variable.

Building a regular expression, syntax and operators

The construction depends only on knowledge of operators of regular expression and special characters, as well as global modifiers.

Special Character

Special characters are introduced by the "\" code. In a literal expression (or in a form), but in a string, the slash is doubled.

x = /a\r/
x = new RegExp("a\\r")

This coupled with a letter represents a code that could not be displayed directly (such as line feed for example), but it is also used when it is associated with a code operator, to designate the character rather than the operator of regular expression:

\n Means the end of line and not the letter n. 
\* Refers to the star character and not the operator of expression regular. 
\t  Tabulation code and \v vertical tab.
\r  Line feed code.
\f  Form feed code.
\s Any separarator, including blank space, tabulation, line feed, form feed.
\S Any character other than a space, it is the opposite of \s.
\d Any digit. Similar to [0-9].
\D  Any non-digital character. Same as [^0-9].  
\w Any alphanumeric character. Same as [_A-Za-z0-9].
\W Any NOT alphanumeric character. Is the opposite of \w and is same as  [^_A-Za-z0-9].
\nnnn Where nnnn is a positive integer.    
\0  Represents the code 0 in the binary file (and not the 0 digit in the text).
\xhh  Where hh is an hexadecimal pair. Represents a code in the binary source.
\uhhhh  Four digits hexadecimal number.
Operators

By combining elements in an expression, we can apply logical operators. Adding to this the intervals, it becomes possible to express with few letters a set of rules.

The dot

The dot means any character in the text to compare, but the code of end of line.

Groups

()

The parentheses denote a group to recall, when the element in brackets is found, returned in the array or results and also in the variables of the object RegExp. The pattern (.) designates any character. Coupled with the + operator, as in (.)+ that means any character, and one at least, either a single character or a string.

For example (ba) can find "bar", or "barrel", or "sidebar", but "brain" is not accepted. Then ba is recalled.

(?:x)

Not capturing parentheses. The x element is searched, but it is not stored and is not present in results for the method that returns an array. Neither in internal variables.

[]

The square brackets designate an alternative. We are looking for one or the other elements in the list. In the case where [abc] is searched, then "ara", "bridget", "corel" can match (if we are testing the first letter.)

Interval

-

The dash symbol between two letters or digits designates an interval.
Examples:
a-z list of letters. Any letter in the list can match.
A-Z list of capitals.
0-9 list of digits.

Operators of parts

These symbols are used to designate a specific part of the text to compare with the regular expression.

^

Specifies that the element that follows, character or group should be placed at the beginning of the text to match the search. If the pattern is /^a/ the text "angela" matches and not "christina".
In the case of a text in several lines, with the modifier "m" option, this applies to the beginning of each line.

$

Specifies that the previous item, character or group must be at the end of the text. If the pattern is /a$/ the string "angela" was not accepted, but "christina" matches.
In the case of a text in several lines, with the modifier "m" option, this applies to the end of each line.

?

The preceding string may be present or not, means that there may be a letter or none. This allows to skip a character when it is present to apply the regular expression on the part of the text that comes after.

Operators of quantity

+

There must be one element or more of the letter or group followed by the symbol.

Examples:
a+ there must be one or several lettes a.
[abc]+ there must be one a or b or c or more of these letters (not a combination).

*

There may be an undetermined number of occurrence of the previous text, or none.

{ n }

n is an integer. This is the number of occurrences that are being expected.
Example:
a{2} looking for a chain which contains "aa".

{ x, y }

x and y represent two positive integers. There will be at least x occurrences and and no more than y occurrences.
For example: {2, 3} search for two or three occurrences of a chain.

Logical operators

x | y

The bar is the inclusive OR operator.
Example: (abc|def)
We are looking for the chain which contains either abc or def (or both).

[^]

The symbol "^" when it is bracketed does not mean the beginning of a string but excluding it.
Example:
[^xyz]
The expression represents all letters except x, y and z.

Conditional operators

x(?=y)

The text corresponds when x is followed by y.
Example:
me(?=she)
When "me" was followed directly by "she" in the text, the expression is matched. The two chains are added to the array of results, one writes: me (?=(she))

Example:
(0-9)+(?=\.)(0-9)+
Represents a decimal: Chain of digits, dot, and decimals. This can be written simply: \d+\.\d+

x(?!y)

The text x corresponds if not followed by y.
To represent a whole number we could write:
[0-9]+(?!\.) But [0-9]+ would be easier.

Important Note

In a string, the code "\" must be doubled. For example, you write \\d to represent the symbol \d, a digit. This is not the case when one enters the regular expression in a form, or in the literal form:

/\d+/
Modifiers

Modifiers are codes that apply a general rule to use the regular expression.
For example, the letter i means that there should be no difference between upper and lower case.
These are the letters i, g and m.

var er = /xyz/i
var er = new RegExp("xyz", "i")

You can use one or more modifiers at a time. For example:

var er = /xyz/igm

Uppercases

The "i" code states that no difference is made between upper and lower cases in the text. For example, if one applies the regular expression to the chain "doe", it will have the same result as "Doe" or "DOE."

Global

The "g" code indicates a global search.

Multiple lines

The "m" code states that the regular expression is applied to each line in a texte with several string separated by the end of line code. In case this option is chosen, the comparison is attempted for each line.

Methods of RegExp and modifier

A method of the RegExp object may by associated to a literal string.

 /xyz/i.exec("xxx")

The method is not associated with the code "i" but to the whole string /xyz/i.

This is similar to:

er = /xyz/i
er.exec("xxx");

List of commonly used regular expressions

Examples of regular expression that could be commonly used to recognize a string of modify it.
The expressions must be enclosed between two antislashes or quotation marks in a source.

Check if we have an integer

-?[0-9]+

A decimal number

-?\d+\.\d+

Un alpha-numeric string

Made only of alphabetical letters, lowercase or uppercase, or digits.

^[a-zA-Z0-9]+$

The full code:

var re = new RegExp("^[a-zA-Z0-9]+$", "g");
if(!re.test(str)) return false;

Removing quotation marks

This may be useful when the content of an HTML file is parsed.
[\"\']([^\"\']*)[\"\']
var re = /[\"\']([^\"\']*)[\"\']/
var test="'some text";
document.write(test.length()); var arr = re.exec(test);
document.write(arr[1].length());

How to validate an email address

([\w-\.]+@[\w\.]+\.{1}[\w]+)
var re = /([\w-\.]+@[\w\.]+\.{1}[\w]+)/;
if (re.test(email)) document.write("valid");   

How to validate a URL with a regular expression

(http://|ftp://)([\w-\.)(\.)([a-zA-Z]+)

Replace the trim() fonction

str = str.replace(/^\s\s*/, '').replace(/\s\s*$/, '') 

Online tool

Online tool to test regular expressions in JavaScript.
Buttons corresponding to the operators help to define an expression that applies to different types of texts, predefined and modified by the user.

See also

© 2008-2012 Xul.fr