perl

Parsing RSS with Perl

This is based on a script provided in the 'Add RSS feeds to your Web site with Perl XML::RSS' from
http://articles.techrepublic.com.com/5100-6228_11-5487340.html

In the original script, it was assumed that the rss news feed would be located on your server. To get
around this limitation, use LWP to get the contents of a remote file, save it to a file on your server
then parse the file.


#!/usr/bin/perl -w
#use strict;

use XML::RSS;
use LWP::Simple;
#use Data::Dumper;

my $r = new XML::RSS;

$r->parse( get 'http://onaje.com/rss.xml' );

      Subscribe in a reader

Using the HTML::Strip Perl extension

Using the HTML::Strip Perl extension

Stripping HTML/XML/SGML

Example demonstrating how to use the HTML::Strip Perl extension for stripping HTML markup from text.

The results may not perfectly remove all HTML depending on the complexity of your markup.
strips HTML-like markup from text in a very quick and brutal manner. You can also use the extension
to strip XML or SGML from text.

Code

#!/usr/bin/perl
use HTML::Strip;

      Subscribe in a reader

Regex Coach

The Regex Coach

Content formatting with Regexes

The Regex Coach a graphical application for Linux and Windows which can be used to experiment with (Perl-compatible) regular expressions interactively.

weitz.de/regex-coach.

      Subscribe in a reader

Using PDFToText to convert PDFs to text

Using PDFToText to convert PDFs to text

PDF to Text ConversionThis is geared towards windows users. Pdftotext is a program for converting PDF files to text. for windows you can get it as part of the Xpdf open source viewer
http://www.foolabs.com/xpdf/download.html

From the README:

"What is Xpdf?
-------------

Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called 'Acrobat' files, from
the name of Adobe's PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other

      Subscribe in a reader
Syndicate content