Posts Tagged ‘Regular Expressions’

Through .NET CLR Support in SQL Server –  http://www.codeproject.com/KB/database/xp_pcre.aspx

Through .NET CLR Support in SQL Server + Deploy using Visual Studio’s SQL Server Project – http://www.codeproject.com/KB/database/SqlRegularExpressions.aspx

Through .NET CLR Support in SQL Server (msdn link) – http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Using OLE Fn (SQL Server 2000) – http://www.simple-talk.com/sql/t-sql-programming/tsql-regular-expression-workbench/

Default pattern matching support in SQL Server – http://msdn.microsoft.com/en-us/library/ms187489(SQL.90).aspx

string.match – matches a string and a RE – use g flag for all occurrences
var m = “this is my text is this”.match(“is”); // m.length returns only 1
m = “this is my text is this”.match(/(is)/g); // m.length returns 4 – whole word match
m = “this is my text is this”.match(/[is]/g); // m.length returns 8 – individual char match
m = ‘<html><head><title>hi</title></head><body>bye</body></html>’.match(/(<[a-zA-Z]+>)|(<\/[a-zA-Z]+>)/g); // m.length returns 8 – matches all html tags        
               
string.replace –  use g flag for all occurrences
var p = ‘(555)666-1212’.replace(/\((\d{3})\)/g,’$1-‘); //555-666-1212
// $ has special meaning when it is part of the replaceValue Ex: $1, $$, $&, $`, $’
p = ‘(555)666-1212′.replace(/\((\d{3})\)/g,’$$’); //$-666-1212
p = ‘(555)666-1212′.replace(/\((\d{2})\)/g,’$&’); //(555)666-1212 – replaces with the same matched text
p = ‘1(555)666-1212′.replace(/\((\d{3})\)/g,’$`’); // 11666-1212  – replaces with preceding text of the matched text
p = ‘(555)666-1212’.replace(/\((\d{3})\)/g,”$'”); //666-1212666-1212 – replaces with following text of the matched text

Function.prototype.method = function (name, func) {
    if(!this.prototype[name])
        this.prototype[name] = func;
    //return this;
};
String.method(‘entityfy’, function() {
    var chars = {‘<‘:’&lt;’, ‘<‘:’&gt;’, ‘&’:’&amp;’, ‘”‘:’&quot;’};
    return function() {return this.replace(/[<>&]/g, function(c) {return chars[c];});};
});

string.search
var text = “search in this text”;
var pos = text.search(/(h)/); // pos = 4

string.split
var words = text.split(/( )/); // words.length – 7, spaces also exist in array
var words2 = text.split(” “); // words2.length – 4, spaces not exist in array

//Other interesting string functions

//string.toLocaleUpperCase() // locale specific
//string.toLocaleLowerCase() // locale specific
//string.charCodeAt() // returns 67 for C
var a = String.fromCharCode(67, 97, 116); // a = Cat

regexp.exec – powerful and slowest methods of RE. If it successfully matches then it returns an array with 0th element matched substring, 1st element text captured by group1, 2nd element text captured by group2

var re = /(<[a-zA-Z]+>)|(<\/[a-zA-Z]+>)/g;
var result = re.exec(“<html><head><title>hi</title></head><body>bye</body></html>”); // result.length – 3 – does not match all occurances
var a;
while((a = re.exec(“<html><head><title>hi</title></head><body>bye</body></html>”))) {
    // a.length – gives 3 for 8 times due to 8 matched tags
}

regexp.test – simplest and fastest method of RE
var b = re.test(“<html><head><title>hi</title></head><body>bye</body></html>”); // b is true

About Regular Expression –

1. used with methods to search, replace and extract info from strings. Methods that work with RE are regexp.exec, regexp.test, string.match, string.replace, string.search, string.split. Its adopted from PERL lang in BELL Labs.
2. Best when they r short n simple and it will have performance advantage over equivalent string operations.
3. It will have portability, performance problems when they are complex and nested.
4. Provides poor support for internationalization
5. ^ – RE start char, $ – RE end char.
 An unescaped ^ will match to beginning of the text when lastIndex = 0 or it can match line-ending char when “m” flag used
 An unescaped $ will match to end of the text when or it can match line-ending char when “m” flag used
6. . – match any one character except line ending char – $ and line feed
7. \d – digit character([0-9]), \D – non digit ([^0-9]), \f – formfeed char, \n – newline char, \r – carriage return char, \t – tab, \u – unicode char (16 bit hex constant), \b – backspace, \s – partial set of unicode whitespace chars, \S – opposite to \s, \w – [0-9A-Z_a-z], \W – [^0-9A-Z_a-z]
RE Flags –
1.  i – ignore case Ex: /^  $/i
2.  g – global (match multiple times) Ex: /^ $/gi
3.  m – multiline (means ^, $ can match line ending chars) Ex: /^ $/mi, – /^ $/mg
Create RE – 2 Ways 2 create RegExp object –
 1. Reg Exp Literal (Preferred way) – enclosed in slashes Ex: var re = /^\d+$/g; – indicates numbers
 2. Reg Exp constructor – useful when RE need 2 be generated at runtime based on conditions Ex: var re = new RegExp(“\d+”, ‘g’);. Properties of RegExp object global, ignoreCase, lastIndex, multiline, source. We can make our own RegExp object (using RE literals) Ex: function mymatch() {return //\d/gi;}

Regexp choice – | Ex: “into”.match(/in|int/);
Regexp Sequence – contains one more more RE factors. Each factor can optionally be followed by a quantifier that determines how many times the factor is allowed to appear. No quantifier then it will be matched once.
Regexp factor – can be a char, parenthesized group, char class or escape seq
Special chars – must be escaped with \ prefix are ( \/[](){}?+*|.^$ )
Regexp Escape – used for escaping special chars like \-, \/ when used part of RE literal
Regexp Groups – 4 types
 1. Capturing group – a RE choice wrapped in (), chars that match the group will be captured with number on a return variable, like array ret[1]
 2. Non Capturing group – a RE choice prefixed with (?: simply matches but it does not capture as part of return value. Gives performance and does not interfear with numbering of capturing group Ex: (?:a-z)
 3. Positive lookahead – a RE choice prefixed with (?= a non capturing group after matching but text is rewound to where the group started. Not a good part.
 4. Negative lookahead – a RE choice prefixed with (?! like positive lookahead group except it matches only if it fails to match. Not a good part.
Regexp Class – A convenient way of specifying one of a set of chars.
1. To match vowel simply write [aeiou] instead of (?:a|e|i|o|u)
2. To match 32 special chars using ranges – [!-\/:-@\[-‘\{-~]  range1 (!-/) range2 (:-@) range3 ([-‘) range4 ({-~) and used \ for escaping
3. To match all chars other than 32 special chars – [^!-\/:-@\[-‘\{-~]
Regexp Quantifier – suffix that determines how many times the factor should match.
 1. {0,3} – number should match either 0, 1, 2 or 3 times, {3} should match 3 times Ex: /mmm/ can be matched to /m{3}/
 2. ? – is same as {0,1} – 0 or 1 time
 3. * – is same as {0,} – 0 or more times 
 4. + – is same as {1,} – 1 or more times

http://www.evolt.org/article/Regular_Expressions_in_JavaScript/17/36435/