Tuesday, March 31, 2009

Convert your PDFs to MS Word

There are several well-regarded, free ways to take advantage of the Print function to transform just about any file to a PDF. PrimoPDF and doPDF sit at the top of the list, but what about reverse engineering that conversion? Converting in the other direction, from a PDF to a Microsoft Word-compatible format like DOC or RTF is trickier.

For one thing, there's a lot of crap out there. Many PDF-to-DOC converters have similar or even identical names, differentiated sometimes by nothing more than a cunning tap of the space bar. Many offer features that are hamstrung in various ways unless you pay for an upgrade, and just about all of them offer imperfect conversions. Even with these problems, though, you can get a reasonable conversion from the four programs and three Web-based services listed below.

Sometimes right: Some PDF to Word Converter 1.5

(Credit: Screenshot by Seth Rosenblatt/CNET)

Some PDF to Word Converter 1.5: A basic but uncluttered interface introduces all of the program's conversion options in a sidebar on the right. Some handles batch conversions, converts outer fonts into text and embedded fonts into images, and supports both All Pages and page-range specific conversions. It can remove graphics on demand from the output document, which is always in the RTF format, and supports encryption.

The program suffers from two big drawbacks: the conversions aren't always the cleanest, with occasional image and text overlaps, and there's noticeable image deterioration. In place of drag-and-drop to add PDFs, you can add an entire folder via the folder icon. Some of the interface's option descriptions could be better phrased, too: "delete all graphics" with an option of "false" or "true" really could be posed better as "remove all graphics" and "yes" or "no."

Free PDF to Word Doc Converter 1.1 makes you jump through hoops for a great end result.

(Credit: Screenshot by Seth Rosenblatt/CNET)

Free PDF to Word Doc Converter 1.1 also gets a few things wrong, but eventually lands you the DOC output you want. Impressively, it offers one of the cleanest and most accurate free PDF-to-DOC conversions of the programs tested. You can change the output path and name, convert an entire document or just specific pages, and ditch images in the source PDF if need be. The final output will give you a pitch-perfect conversion.

From there, it goes a bit downhill. The option to open the output DOC in WordPad didn't function when we tested it, nor did the All Pages button. You can work around this by choosing Page Number instead of All Pages (Page Number defaults to the full page count), but it's still irritating. The other big frustration is that while the program is free, after five conversions you're asked to answer a math question a bit harder than the average Captcha. Batch conversion and encryption support are missing, too. If Free PDF to Word Doc Converter's bumpy ride didn't result in such a smooth landing, it wouldn't be worth touching.

Free PDF to Word Converter 1.3: common name, common problems.

(Credit: Screenshot by Seth Rosenblatt/CNET)

Free PDF to Word Converter 1.3 shares many things in common with its competitors besides a similar name: there's an imperfect balance of useful features and a perfect output. What's wrong: there's no drag-and-drop support, and you should be prepared for some minor yet annoying formatting errors, with occasional and minor word and image overlap.

What's right: The program can batch convert PDFs without being restricted to specific folders, can change your output destination, and put out either RTF or DOC. There's support for high levels of security, with space to provide passwords from the PDF owner and the PDF user. You can remove graphics on conversion and micro-manage the spaces between words and text boxes. It's fine for image-free PDFs, but somewhat less than exact with others.

Advanced PDF to Word Converter Free 5.0: nearly-perfect output.

(Credit: Screenshot by Seth Rosenblatt/CNET)

Advanced PDF to Word Converter Free 5.0 nearly crashes and burns on takeoff, but surprisingly leaves you with a nearly-perfect document. It will ask you to reboot on installation, which makes sense for programs that have deep hooks in your operating system but is a clarion warning for such a simple converter. However, as with all these converters, it's the final output that counts the most, and this program's final DOC output looks pretty good.

There's no support for encrypted PDFs, and when you convert or close the program, there's a nag screen to get you upgrade. Batch conversion, RTF and TXT output, and image deletion are restricted to the paid upgrade. The program does offer drag-and-drop additions, adding an entire folder, and user-selected output folders and output renaming. OpenOffice.org users will appreciate that this converter puts out a DOC that OpenOffice Writer can cleanly read images from--unlike many of the others. The biggest problem, of course, is that there is still some image quality degradation and minor text alignment problems.

OpenOffice users should take note that during these tests, I found that the OpenOffice Writer doesn't always play nicely with Rich Text Formatting. There are several free online conversion options as well, and they tend to have better output than the desktop programs.

You can send your PDF as an attachment to Adobe, and within a few minutes they'll send you back either a plain text TXT or HTML file. The service is basic but extremely fast. As long as you don't mind the lack of frills, you can e-mail pdf2txt@adobe.com for the plain text conversion and pdf2html@adobe.com for HTML output, although the HTML service wasn't working when I tested it. More details are here.

PDF to Word, from Nitro PDF.

(Credit: Screenshot by Seth Rosenblatt/CNET)

Nitro PDF, the makers of PrimoPDF, offer a glossy interface for their online PDF to Word format converter. Clearly delineated instructions guide you through uploading your PDF, choosing an output format--either DOC or RTF--and entering your e-mail address. The conversion took a bit longer than Adobe's, but it's worth the wait. The conversion output is a perfect document, precisely the kind of painless process that most of the downloadable options lack. There's no support for encryption, nor for batches, but Nitro's service gets high marks for its precise and fast conversion. Apparently, there are plans to incorporate the PDF-to-Word feature in future editions of Nitro PDF.

There are two other PDF-to-Word services worth mentioning: Koolwire and Zamzar. Koolwire will convert PDFs up to 10MB, but it can also handle DOC, XLS, PPT, VSD, MPP, RTF, TXT, JPEG, GIF, PNG, and MS Office 2007 formats like DOCX. Unlike the other services, clicking on their Web site opens an e-mail to which you only need to attach your PDF and then hit Send. The PDF comes out as RTF, with very minor image degradation and no formatting problems.

Zamzar's uploading interface.

(Credit: Screenshot by Seth Rosenblatt/CNET)

Zamzar will convert PDFs up to 100MB at a time, and in addition to converting your PDF into DOC or RTF, it can output ODT, TXT, PS, and PNG. It can handle batch conversion, as long as you don't mind uploading the files one at a time. When it finishes converting, you receive a link that stays active for 24 hours from which you can download your converted files one at a time or all at once in a ZIP. There was minor image degradation, similar to what Koolwire spit out, and one instance of a formatting error.

Overall, there is no option that can be declared 100 percent perfect. Where one service--whether it's online or desktop--fails, it also tends to offer a better interface, or more conversion options than others. The reverse also seems to hold true, where the best conversions are not always buttressed by the greatest of user experiences. Nitro PDF's PDF-to-Word Web site wins for its output and usability, but if you must go with a desktop client I'd choose Free PDF to Word Doc Converter 1.1 for the resulting document only. If you don't mind minor hiccups, but need a better user experience or more conversion options, Zamzar and Free PDF to Word Converter 1.3 are probably what you need.

Wednesday, March 25, 2009

PHP - redirect page

In this tutorial I will show you how to redirect pages in PHP. You can find all important aspects and code examples about PHP redirecting.

To make a redirection you can use the header() function. This function send a raw HTTP header to the browser. As result the browser will be redirected to the page defined in this new HTTP header. You only have to take care that header() must be called before any actual output is sent. It means you can not use and html tags, echo or print functions. Below is an example how to use redirection in PHP:


<?php

header('Location:http://phpcodetutorials.blogspot.com');

?>

The only thing you have to do is to change the URL inside the header parameter.

However if you write the echo before the redirection you will get an error like this:

Warning
: Cannot modify header information - headers already sent by

To avoid this problem you can use PHP output buffering as follows:


<?php

ob_start();

echo "Test";

header("Location: http://www.php.net");

ob_flush();

?>




Tuesday, March 17, 2009

php- Regular Expressions simple functions and common reqular expressions

Regular Expressions

We will use simple functions which return TRUE or FALSE.
$regex will serve as our regular expression to match against and $text will be our text.

 

function do_reg($text, $regex)
{
if (preg_match($regex, $text))
{
return TRUE;
}
else {
return FALSE;
}
}
 

The next function will get the part of a given string ($text) matched

by the regex ($regex) using a group srorage ($regs). By changing the

$regs[0] to $regs[1] we can use a capturing group (in this case griup

1) to match against. The capturing group can also have a name

($regs['groupname']):

 

function do_reg($text, $regex, $regs) {

if (preg_match($regex, $text, $regs))
{
$result = $regs[0];
}
else {

$result = "";

}

return $result;

}

The following function will return an array of all regex
matches in a given string ($text):
function do_reg($text, $regex)
{
preg_match_all($regex, $text, $result, PREG_PATTERN_ORDER);

return $result = $result[0];

}
 Next we can iterate (loop) over all matches in a string ($text)
and output the results:

function do_reg($text, $regex)
{

preg_match_all($regex, $text, $result, PREG_PATTERN_ORDER);

for ($i = 0; $i < count($result[0]); $i++)
{
$result[0][$i];
}

}
 Extending the above one we can iterate over all matches ($text)
and capture groups in a string ($text):

function do_reg($text, $regex)
{
preg_match_all($regex, $text, $result, PREG_SET_ORDER);

for ($matchi = 0; $matchi < count($result); $matchi++)
{
for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++)
{
$result[$matchi][$backrefi];
}
}

}
 Now lets see some useful regular expressions

Addresses

//Address: State code (US) '/\\b(?:A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY])\\b/' //Address: ZIP code (US) '\b[0-9]{5}(?:-[0-9]{4})?\b'
Dates
//Date d/m/yy and dd/mm/yyyy
//1/1/00 through 31/12/99 and 01/01/1900 through 31/12/2099
//Matches invalid dates such as February 31st '\b(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)?[0-9]{2}\b'

//Date dd/mm/yyyy //01/01/1900 through 31/12/2099
//Matches invalid dates such as February 31st '(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)[0-9]{2}'

//Date m/d/y and mm/dd/yyyy //1/1/99 through 12/31/99 and 01/01/1900 through 12/31/2099
//Matches invalid dates such as February 31st //Accepts dashes, spaces, forward slashes and dots as date separators '\b(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}\b'

//Date mm/dd/yyyy //01/01/1900 through 12/31/2099
//Matches invalid dates such as February 31st '(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}'

//Date yy-m-d or yyyy-mm-dd //00-1-1 through 99-12-31 and 1900-01-01 through 2099-12-31
//Matches invalid dates such as February 31st '\b(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])\b'


//Date yyyy-mm-dd //1900-01-01 through 2099-12-31
//Matches invalid dates such as February 31st '(19|20)[0-9]{2}[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])'
Email address 

//Email address //Use this version to seek out email addresses in random documents and texts.
//Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.
//Including these increases the risk of false positives when applying the regex to random documents. '\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b'
//Email address (anchored)
//Use this anchored version to check if a valid email address was entered. //Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.
//Requires the "case insensitive" option to be ON. '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$'
//Email address (anchored; no consecutive dots)
//Use this anchored version to check if a valid email address was entered. //Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com
//Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum. //Including these increases the risk of false positives when applying the regex to random documents. '^[A-Z0-9._%-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$' //Email address (no consecutive dots)
//Use this version to seek out email addresses in random documents and texts.
//Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com //Does not match email addresses using an IP address instead of a domain name.
//Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.
//Including these increases the risk of false positives when applying the regex to random documents. '\b[A-Z0-9._%-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b'
//Email address (specific TLDs) //Does not match email addresses using an IP address instead of a domain name. //Matches all country code top level domains, and specific common top level domains.
'^[A-Z0-9._%-]+@[A-Z0-9.-]+\.(?:[A-Z]{2}|com|org|net|biz|info|name|aero|biz|info|jobs|museum|name)$'
//Email address: Replace with HTML link '\b(?:mailto:)?([A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4})\b'

URL's


//URL: Different URL parts //Protocol, domain name, page and CGI parameters are captured into backreferenes 1 through 4 '\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&@#/%=~_|!:,.;]*)?((?#parameters)\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?'
//URL: Different URL parts //Protocol, domain name, page and CGI parameters are captured into named capturing groups.
//Works as it is with .NET, and after conversion by RegexBuddy on the Use page with Python, PHP/preg and PCRE. '\b(?<protocol>https?|ftp)://(?<domain>[-A-Z0-9.]+)(?<file>/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(?<parameters>\?[-A-Z0-9+&@#/%=~_|!:,.;]*)?'
//URL: Find in full text //The final character class makes sure that if an URL is part of some text, punctuation such as a //comma or full stop after the URL is not interpreted as part of the URL. '\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]'
//URL: Replace URLs with HTML links preg_replace('\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]', '<a href="\0">\0</a>', $text);