Quick Reference on Regular Expressions

Overview of Unit System and Phone Number Formats

Regular expressions provide an efficient way of searching for patterns in text or validating user-supplied text data. Regular expressions occur frequently in the code within MarketDirect StoreFront, but MarketDirect StoreFront now requests these from administrator from the Accounting Code interface (when using patterns) to confirm that the buyer-supplied values match an expected format (see To enable accounting codes).

This appendix will illustrate what regular expressions can do through several examples. Online resources and small handbooks may be the best sources of information about how to set up specific expressions.

MarketDirect StoreFront uses regular expressions based on .NET. Other implementations (e.g., Unix, Javascript, etc.) provide similar functionality, but some details differ. So double-check the regular expressions you use.

Regular Expression Syntax

Regular expressions are all about how to describe a character pattern uniquely and in such a way that a computer can validate the pattern. You can use a number of escaped characters to denote certain keys, for instance \t means a tab. The table below lists a number of escaped characters.

Escaped Character

Description

Ordinary characters

Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.

\b

\b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. See the table below.

\t

Matches a tab.

\r

Matches a carriage return.

\v

Matches a vertical tab.

\f

Matches a form feed.

\n

Matches a new line.

\e

Matches an escape.

\

\  When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.

Character Class

Description

.

Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.

[aeiou]

Matches any single character included in the specified set of characters, in this case the a, e, i, o or u.

[0-9a-fA-F]

Use of a hyphen (–) allows specification of contiguous character ranges, 0 to 9, a to f and A to F.

\w

Matches any non-word character. \W is equivalent to [^a-zA-Z_0-9].

\s

Matches any white-space character. \s is equivalent to [ \f\n\r\t\v].

\S

Matches any non-white-space character. \S is equivalent to [^ \f\n\r\t\v].

\d

Matches any decimal digit. \d is equivalent to [0-9].

\D

Matches any nondigit. \D is equivalent to [^0-9].

Quantifiers add optional quantity data to a regular expression. A quantifier expression applies to the character, group, or character class that imMediately precedes it. The .NET Framework regular expressions support minimal matching (“lazy”) quantifiers.

The following table describes the metacharacters that affect matching quantity.

Quantifier

Description

*

Specifies zero or more matches; for example, \w* or (abc)*. Equivalent to {0,}.

+

Specifies one or more matches; for example, \w+ or (abc)+. Equivalent to {1,}.

 

?

Specifies zero or one matches; for example, \w? or (abc)?. Equivalent to {0,1}.

{n}

Specifies exactly n matches; for example, (pizza){2}.

{n,}

 Specifies at least n matches; for example, (abc){2,}.

{n,m}

Specifies at least n, but no more than m, matches.

*?

Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).

+?

Specifies as few repeats as possible, but at least one (equivalent to lazy +).

??

Specifies zero repeats if possible, or one (lazy ?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies as few repeats as possible, but at least n (lazy {n,}).

{n,m}?

Specifies as few repeats as possible between n and m (lazy {n,m}).

|

Matches any one of the terms separated by the | (vertical bar) character; for example, cat|dog|tiger. The leftmost successful match wins.

 Regular Expression Examples

Description

Expression

Comment

Social Security Number  

\d{3}-\d{2}-\d{4}

The social security number expects three digits, followed by a -, followed by 2 digits, followed by a -, and followed again by 4 digits.

Zip Code

 

 \d{5}

An US-based zip code consists of 5 digits.

Zip Code + 4  

\d{5}(-\d{4})

A US-based zip code followed by a - and 4 digits.

Alphanumeric Character  

[a-zA-Z0-9]

The character may consists of all characters between a to z, A to Z and 0 to 9.

Email Address word characters.

\w+([-+.]\w+)*@\w+([- .]\w+)*\.\w+([-.]\w+)*

 A word (optionally followed by -, + or . and a word as many times as needed), followed by the @, followed by a word, (optionally followed by - or . and a word as many times as needed), followed by a ., a word, (optionally followed by a - or . and a word as many times as needed). Note: \w+ means a word containing 1 or more

US Phone Number  

((\(\d{3}\) ?)|(\d{3}- ))?\d{3}-\d{4}

Optionally an ( followed by 3 digits followed by ) and a space or optionally 3 digits followed by a -, followed by 3 digits, a - and 4 digits.

Limit the number of characters a user can enter in a text field

^.{1,10}$

This expression would limit the number of characters entered in the text field to 10.

Limit the characters entered in user entry field to text characters only

^[a-z]{1,10}$

or

^[A-Z]{1,10}$

or

^[A-Za-z]{1,10}$

These expressions limit the characters entered in a user entry field to lowercase, uppercase, or mixed lower/uppercase letters.

Limit the characters entered in user entry field to digits (i.e., not text) characters only

^[0-9]{1,10}$

This expression limits the characters entered in a user entry field to numbers.

Password Policy

^[a-zA-Z]\w{3,14}$

Password is only alpha numeric and minimum length is 4 and maximum length is 15.

Further Information

The MSDN library at Microsoft provide more information about regular expressions: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconregularexpressionslanguageelements.asp