Contents

Welcome to PicaLoader
Introduction
Picture Format Support
PicaLoader Features
Product History
Copyright
Getting Started
Starting the Program
System Requirements
Using Help Tools
Quick Tutor by example
User Interface
Main Window
Preview Pane
Project Pane
Tasks Tab
Profile Tab
Main Pane
Pictures Tab
Search Tab
Report Tab
Option Tab
Monitor Tab
Profile Tab
Queue Tab
Main Menu
Project Menu
New Project
Open Project
Save Project
Database
Rebuild thumbnails
Optimize thumbnails
Optimize project data
Set Password
Properties
Exit
Task Menu
New Task
Remove Task
Rename Task
Enable All Tasks
Disable All Tasks
Sort Tasks
By task name
By create time
By start URL
Start
Pause
Abort
Export Task
Export Enabled Tasks
Import Tasks
Keyword Test
Profile Menu
New Profile
Remove Profile
Rename Profile
Picture Menu
Search Pictures
Delete
Copy to...
Move to...
Check
Check None
Check All
Invert checked
Delete checked
Copy checked to...
Move checked to...
Rating checked
Set WallPaper
Centered
Tiled
Restore
Slide Show
View Menu
Toolbar
Status Bar
Project Window
Preview Window
Thumbnails
Detail
Sort
By Rating
By Filename
By Size
By Type
By Create Time
By Download Time
By Notes
By Referrer
By Width
By Height
By Width*Height
By Definition
By Local Filename
Options
Help Menu
Contents
Search...
Index...
What's This
Register...
Purchase On-Line
PicaLoader Homepage
About PicaLoader...
View Window(Viewer)
Search Pictures Dialog Box
Options Dialog Box
Toolbar
Status Bar
Drop Box
System Tray Icon
Using PicaLoader
Create A New Project
Use Regular Expression in URL filter
Create A New Task
Batch download numbered sequence URLs with one task
Customize local filename
Customize HTML Parser by Script
Downloading Pictures
Checking Download Progress
Using Profile
Sorting Pictures
Copy and Move
Rating Picures
Deleting Pictures
View Pictures in Full Screen Mode
Searching for Pictures
Wallpaper
Share task settings with others
How to Customize PicaLoader Using Options
Keyboard Shortcuts
Command line parameters
Get Help
FAQ
Uninstalling
How to Purchase
Contacting VOWSoft

 
Home
Picture Downloader Online Help
Prev Page Next Page
 
 

Use Regular Expression in URL filter

Home Download Forum Previous  Top  Next

URL filters allow you to easily control Project downloads by setting which pictures/pages should be loaded and which should be skipped.

 

URL Filters are divided into four parts:

 

Page URL Include Filters - determine which HTML pages should be accessed and analyse to follow the links.
Page URL Exclude Filters - determine which HTML pages should be skipped.
Picture URL Include Filters - determin which pictures should be downloaded.
Picture URL Exclude Filters - determin which pictures should be skipped.

 

You may enter several keywords into each of these filter lists, using a semicolon (;) to separate keywords.

You can use Perl like Regular Expression as keyword, A regular expression is a string of characters which tells PicaLoader which URL (or URLs) you are looking for. The following explains the format of regular expressions in detail. If you are familiar with Perl, you already know the syntax.

 

1.Simple Regular Expressions:In its simplest form, a regular expression is just a word or phrase to search for. For example,

   beatles

would match any URL with the string "beatles" in it, or which mentioned the word "beatles" in the URL line.Thus, URLs like "xxx.beatles.xxx", "xxx.music.xxx/beatles.htm" or "xxx.anmimal.xxx/beatleswild.htm" would all be matched.

 

2.Metacharacters:Some characters have a special meaning to the filter. These characters are called metacharacters. Although they may seem confusing at first, they add a great deal of flexibility and convenience to the filter.

 

The period (.) is a commonly used metacharacter. It matches exactly one character, regardless of what the character is. For example, the regular expression:

  pic.01

will match "pic001" and "pic101"... Note that the period matches exactly one character-- it will not match a string of characters, nor will it match the null string. Thus, "picture01" and "pic01" will not be matched by the above regular expression.

 

But what if you wanted to match for a URL containing a period? For example,

  pic001.jpg

This would indeed match "pic001.jpg", but it would also match "pic001ajpg", "pic0011jpg"... In short, any string of the form "pic001xjpg", where x is any character, would be matched by the regular expression above.

To get around this, we introduce a second metacharacter, the backslash (\). The backslash can be used to indicate that the character immediately to its right is to be taken literally. Thus, to match for the string "pic001.jpg", we would use:

  pic001\.jpg

This is called "quoting". We would say that the period in the regular expression above has been quoted. In general, whenever the backslash is placed before a metacharacter, the searcher treats the metacharacter literally rather than invoking its special meaning.

 

The question mark (?): indicates that the character immediately preceding it either zero times or one time. Thus

  pic0?1

will match "pic1" and "pic01".

 

The star (*): indicates that the character immediately to its left may be repeated any number of times, including zero. Thus

  pic0*1

will match "pic1", "pic01", "pic001", "pic0001", and any string that starts with an "pic", is followed by a sequence of "0"'s,  and ends with a "1".

 

The plus (+): indicates that the character immediately preceding it may be repeated one or more times. It is just like the star metacharacter, except it doesn't match the null string. Thus

  pic0+1

would not match "pic1", but it would match "pic01", "pic001", "pic0001" and so on.

 

Metacharacters may be combined. A common combination includes the period and star metacharacters, with the star immediately following the period. This is used to match an arbitrary string of any length, including the null string. For example:

  pic.*1

would match "pic1", "pic01" and even "picture_001" Any string that starts with "pic", is followed by an arbitrary string, and ends with "1" will be matched. Note that the null string will be matched by the period-star pair; thus, "pic1" would be matche by the above expression.

 

3.Earlier it was mentioned that the backslash can turn ordinary characters into metacharacters, as well as the other way around.

 

The digit metacharacter: which is invoked by following a backslash with a lower-case "d", like this: "\d". The "d" must be lower case. The digit metacharacter matches exactly one digit; that is, exactly one occurence of "0", "1", "2", "3", "4", "5", "6", "7", "8" or "9". For example, the regular expression:

  pic\d\.jpg

would match "pic0.jpg", "pic1.jpg" and so forth. Similarly,

  pic\d\d\.jpg

would match "pic00.jpg", "pic01.jpg" ~ "pic99.jpg".

We could combine the digit metacharacter with other metacharacters; for instance,

  pic\d+\.jpg

matches any string starting with "pic", followed by a string of numbers, followed by a ".jpg". (Note that the plus is used, and thus "pic.jpg" is not matched.)

 

The non-digit metacharacter: which uses the uppercase "D". The non-digit metacharacter looks like "\D" and matches any character except a digit. Thus,

  pic\D\.jpg

would match "pica.jpg", "picZ.jpg" or "pic+.jpg", but would not match "pic1.jpg", "pic5.jpg" or "pic9.jpg". Similarly,

\D+

Matches any non-null string which contains no numeric characters.

 

The word metacharacter: which matches exactly one letter, one number, or the underscore character (_). It is written as "\w". It's opposite, "\W", matches any one character except a letter, a number or the underscore. Thus,

  a\wz

would match "abz", "aTz", "a5z", "a_z", or any three-character string starting with "a", ending with "z", and whose second character was either a letter (upper- or lower-case), a number, or the underscore. Similarly,

  a\Wz

would not match "abz", "aTz", "a5z", or "a_z". It would match "a%z", "a{z", "a?z" or any three-character string starting with "a" and ending with "z" and whose second character was not a letter, number, or underscore. (This means the second character must either be a symbol or a whitespace character.)

 

The braces metacharacter: This metacharacter follows a normal character and contains two number separated by a comma (,) and surrounded by braces ({}). It is like the star metacharacter, except the length of the string it matches must be within the minimum and maximum length specified by the two numbers in braces. Thus,

  pic0{3,5}\.jpg

will match "pic000.jpg" and "pic00000.jpg". No other string is matched. Likewise,

  pic.{3,5}\.jpg

will match "pic000.jpg", "pic99999.jpg" or "picabc.jpg", but not "pic00.jpg", since "00" is only two characters long.

 

The alternative metacharacter: is represented by a vertical bar (|). It indicates an either/or behavior by separating two or more possible choices. For example:

  beatles|u2

will match any subject containing the strings "beatles" or "u2" or both.

 

The bracket metacharacter: matches one occurence of any character inside the brackets ([]). For example,

  pic_[abf]\.jpg

will match "pic_a.jpg", "pic_b.jpg" and "pic_f.jpg", but not "pic_0.jpg", "pic_c.jpg" or "pic_e.jpg". Similarly,

Ranges of characters can be used by using the dash (-) within the brackets. For example,

  pic[a-d]\.jpg

will match "pica.jpg", "picb.jpg", "picc.jpg" or "picd.jpg", and nothing else. Likewise,

  wallpaper[3-5]\d\.jpg

will match "wallpaper30.jpg" ~ "wallpaper59.jpg".

If you wish to include a dash within brackets as one of the characters to match, instead of to denote a range, put the dash immediately before the right bracket. Thus:

  a[1234-]z

and

  a[1-4-]z

both do the same thing. They both match "a1z", "a2z", "a3z", "a4z" or "a-z", and nothing else.

 

The bracket metacharacter can also be inverted by placing a caret (^) immediately after the left bracket. Thus,

wallpaper[^02468]\.jpg

matches any ten-character string starting with "wallpaper" and ending with anything except an even number. Inversion and ranges can be combined, so that

  \W[^f-h]ood\W

matches any four letter wording ending in "ood" except for "food", "good" or "hood". (Thus "mood" and "wood" would both be matched.)

Note that within brackets, ordinary quoting rules do not apply and other metacharacters are not available. The only characters that can be quoted in brackets are "[", "]", and "\". Thus,

  [\[\\\]]abc

matches any four letter string ending with "abc" and starting with "[", "]", or "\".

 

4.The table below lists some of the more useful special (meta) characters.

Reg-expr 

Description

.

Matches any character (except newline)

x?

Matches 0 or 1 x's, where x is any regular expression

x*

Matches 0 or more x's

x+

Matches 1 or more x's

foo|bar

Matches one of foo or bar

[xyz]

Matches any character in the set xyz, specify ranges with a -

[^xyz]

Matches any single character not in the set xyz

\w

Matches an alpha-numeric character, i.e., [a-zA-Z0-9_]

(x)

Brackets a regular expression

\metachar

Matches the metacharacter (takes away its special meaning)

 

5.The search is case insensitive; thus

picture

and

Picture

and

PICTURE

all search for the same set of strings. Each will match "picture", "PICTURE", "Picture", "PicTure" and so forth. Thus you need not worry about capitalization. (Note, however, that metacharacter must still have the proper case. This is especially important for metacharacters whose case determines whether their meaning is reversed or not.)