Tutorials, Tips and Technology: php- Regular Expressions simple functions and common reqular expressions

Regular Expressions

We will use simple functions which return TRUE or FALSE.
$regex will serve as our regular expression to match against and $text will be our text.


function do_reg($text, $regex)
{  
  if (preg_match($regex, $text))
  {   
      return TRUE;  
  }   
  else {   
        return FALSE;  
      }
}

The next function will get the part of a given string ($text) matched

by the regex ($regex) using a group srorage ($regs). By changing the

$regs[0] to $regs[1] we can use a capturing group (in this case griup

1) to match against. The capturing group can also have a name

($regs['groupname']):


function do_reg($text, $regex, $regs) {  

if (preg_match($regex, $text, $regs))
{   
    $result = $regs[0];  
}   
else {
  
   $result = "";  

   }  

 return $result;

 }


The following function will return an array of all regex
 matches in a given string ($text):

function do_reg($text, $regex)
{  
 preg_match_all($regex, $text, $result, PREG_PATTERN_ORDER);
 
 return $result = $result[0];

}

 Next we can iterate (loop) over all matches in a string ($text)
and output the results:


function do_reg($text, $regex)
{  

preg_match_all($regex, $text, $result, PREG_PATTERN_ORDER); 

for ($i = 0; $i < count($result[0]); $i++)
{  
  $result[0][$i];
 }

}

 Extending the above one we can iterate over all matches ($text)
and capture groups in a string ($text):


function do_reg($text, $regex)
{  
preg_match_all($regex, $text, $result, PREG_SET_ORDER);

for ($matchi = 0; $matchi < count($result); $matchi++)
 {   
  for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++)
  {   
     $result[$matchi][$backrefi];  
  } 
}

}

 Now lets see some useful regular expressions

Addresses  
//Address: State code (US) '/\\b(?:A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])\\b/'  //Address: ZIP code (US) '\b[0-9]{5}(?:-[0-9]{4})?\b'  Dates

//Date d/m/yy and dd/mm/yyyy
//1/1/00 through 31/12/99 and 01/01/1900 through 31/12/2099
//Matches invalid dates such as February 31st '\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\b'

//Date dd/mm/yyyy //01/01/1900 through 31/12/2099
//Matches invalid dates such as February 31st '(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)[0-9]{2}'

//Date m/d/y and mm/dd/yyyy //1/1/99 through 12/31/99 and 01/01/1900 through 12/31/2099
//Matches invalid dates such as February 31st //Accepts dashes, spaces, forward slashes and dots as date separators '\b(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}\b'

//Date mm/dd/yyyy //01/01/1900 through 12/31/2099
//Matches invalid dates such as February 31st '(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}'

//Date yy-m-d or yyyy-mm-dd //00-1-1 through 99-12-31 and 1900-01-01 through 2099-12-31
//Matches invalid dates such as February 31st '\b(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])\b'

//Date yyyy-mm-dd //1900-01-01 through 2099-12-31
//Matches invalid dates such as February 31st '(19|20)[0-9]{2}[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])'

Email address 

//Email address //Use this version to seek out email addresses in random documents and texts.
 //Does not match email addresses using an IP address instead of a domain name.
 //Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum. 
//Including these increases the risk of false positives when applying the regex to random documents. '\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'
//Email address (anchored)
//Use this anchored version to check if a valid email address was entered. //Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.
//Requires the "case insensitive" option to be ON. '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$'
//Email address (anchored; no consecutive dots)
//Use this anchored version to check if a valid email address was entered. //Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com
 //Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.   //Including these increases the risk of false positives when applying the regex to random documents. '^[A-Z0-9._%-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$'  //Email address (no consecutive dots)
//Use this version to seek out email addresses in random documents and texts.
//Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com //Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.  
//Including these increases the risk of false positives when applying the regex to random documents. '\b[A-Z0-9._%-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b'
//Email address (specific TLDs) //Does not match email addresses using an IP address instead of a domain name. //Matches all country code top level domains, and specific common top level domains.
 '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|biz|info|name|aero|biz|info|jobs|museum|name)$' 
//Email address: Replace with HTML link '\b(?:mailto:)?([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})\b'

URL's 

//URL: Different URL parts //Protocol, domain name, page and CGI parameters are captured into backreferenes 1 through 4 '\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&@#/%=~_|!:,.;]*)?((?#parameters)\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?' 
//URL: Different URL parts //Protocol, domain name, page and CGI parameters are captured into named capturing groups.
//Works as it is with .NET, and after conversion by RegexBuddy on the Use page with Python, PHP/preg and PCRE. '\b(?<protocol>https?|ftp)://(?<domain>[-A-Z0-9.]+)(?<file>/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(?<parameters>\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?' 
//URL: Find in full text //The final character class makes sure that if an URL is part of some text, punctuation such as a  //comma or full stop after the URL is not interpreted as part of the URL. '\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]' 
//URL: Replace URLs with HTML links preg_replace('\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]', '<a href="\0">\0</a>', $text);

Tutorials, Tips and Technology

Tuesday, March 17, 2009

php- Regular Expressions simple functions and common reqular expressions

No comments:

All Tutorials