Regular Expressions

Posted: June 4, 2010 in Regular Expressions

About Regular Expression –

1. used with methods to search, replace and extract info from strings. Methods that work with RE are regexp.exec, regexp.test, string.match, string.replace,, string.split. Its adopted from PERL lang in BELL Labs.
2. Best when they r short n simple and it will have performance advantage over equivalent string operations.
3. It will have portability, performance problems when they are complex and nested.
4. Provides poor support for internationalization
5. ^ – RE start char, $ – RE end char.
 An unescaped ^ will match to beginning of the text when lastIndex = 0 or it can match line-ending char when “m” flag used
 An unescaped $ will match to end of the text when or it can match line-ending char when “m” flag used
6. . – match any one character except line ending char – $ and line feed
7. \d – digit character([0-9]), \D – non digit ([^0-9]), \f – formfeed char, \n – newline char, \r – carriage return char, \t – tab, \u – unicode char (16 bit hex constant), \b – backspace, \s – partial set of unicode whitespace chars, \S – opposite to \s, \w – [0-9A-Z_a-z], \W – [^0-9A-Z_a-z]
RE Flags –
1.  i – ignore case Ex: /^  $/i
2.  g – global (match multiple times) Ex: /^ $/gi
3.  m – multiline (means ^, $ can match line ending chars) Ex: /^ $/mi, – /^ $/mg
Create RE – 2 Ways 2 create RegExp object –
 1. Reg Exp Literal (Preferred way) – enclosed in slashes Ex: var re = /^\d+$/g; – indicates numbers
 2. Reg Exp constructor – useful when RE need 2 be generated at runtime based on conditions Ex: var re = new RegExp(“\d+”, ‘g’);. Properties of RegExp object global, ignoreCase, lastIndex, multiline, source. We can make our own RegExp object (using RE literals) Ex: function mymatch() {return //\d/gi;}

Regexp choice – | Ex: “into”.match(/in|int/);
Regexp Sequence – contains one more more RE factors. Each factor can optionally be followed by a quantifier that determines how many times the factor is allowed to appear. No quantifier then it will be matched once.
Regexp factor – can be a char, parenthesized group, char class or escape seq
Special chars – must be escaped with \ prefix are ( \/[](){}?+*|.^$ )
Regexp Escape – used for escaping special chars like \-, \/ when used part of RE literal
Regexp Groups – 4 types
 1. Capturing group – a RE choice wrapped in (), chars that match the group will be captured with number on a return variable, like array ret[1]
 2. Non Capturing group – a RE choice prefixed with (?: simply matches but it does not capture as part of return value. Gives performance and does not interfear with numbering of capturing group Ex: (?:a-z)
 3. Positive lookahead – a RE choice prefixed with (?= a non capturing group after matching but text is rewound to where the group started. Not a good part.
 4. Negative lookahead – a RE choice prefixed with (?! like positive lookahead group except it matches only if it fails to match. Not a good part.
Regexp Class – A convenient way of specifying one of a set of chars.
1. To match vowel simply write [aeiou] instead of (?:a|e|i|o|u)
2. To match 32 special chars using ranges – [!-\/:-@\[-‘\{-~]  range1 (!-/) range2 (:-@) range3 ([-‘) range4 ({-~) and used \ for escaping
3. To match all chars other than 32 special chars – [^!-\/:-@\[-‘\{-~]
Regexp Quantifier – suffix that determines how many times the factor should match.
 1. {0,3} – number should match either 0, 1, 2 or 3 times, {3} should match 3 times Ex: /mmm/ can be matched to /m{3}/
 2. ? – is same as {0,1} – 0 or 1 time
 3. * – is same as {0,} – 0 or more times 
 4. + – is same as {1,} – 1 or more times


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s