Welcome to PicaLoader
Picture Format Support
PicaLoader Features
Product History
Getting Started
Starting the Program
System Requirements
Using Help Tools
Quick Tutor by example
User Interface
Main Window
Preview Pane
Project Pane
Tasks Tab
Profile Tab
Main Pane
Pictures Tab
Search Tab
Report Tab
Option Tab
Monitor Tab
Profile Tab
Queue Tab
Main Menu
Project Menu
New Project
Open Project
Save Project
Rebuild thumbnails
Optimize thumbnails
Optimize project data
Set Password
Task Menu
New Task
Remove Task
Rename Task
Enable All Tasks
Disable All Tasks
Sort Tasks
By task name
By create time
By start URL
Export Task
Export Enabled Tasks
Import Tasks
Keyword Test
Profile Menu
New Profile
Remove Profile
Rename Profile
Picture Menu
Search Pictures
Copy to...
Move to...
Check None
Check All
Invert checked
Delete checked
Copy checked to...
Move checked to...
Rating checked
Set WallPaper
Slide Show
View Menu
Status Bar
Project Window
Preview Window
By Rating
By Filename
By Size
By Type
By Create Time
By Download Time
By Notes
By Referrer
By Width
By Height
By Width*Height
By Definition
By Local Filename
Help Menu
What's This
Purchase On-Line
PicaLoader Homepage
About PicaLoader...
View Window(Viewer)
Search Pictures Dialog Box
Options Dialog Box
Status Bar
Drop Box
System Tray Icon
Using PicaLoader
Create A New Project
Use Regular Expression in URL filter
Create A New Task
Batch download numbered sequence URLs with one task
Customize local filename
Customize HTML Parser by Script
Downloading Pictures
Checking Download Progress
Using Profile
Sorting Pictures
Copy and Move
Rating Picures
Deleting Pictures
View Pictures in Full Screen Mode
Searching for Pictures
Share task settings with others
How to Customize PicaLoader Using Options
Keyboard Shortcuts
Command line parameters
Get Help
How to Purchase
Contacting VOWSoft

Picture Downloader Online Help
Prev Page Next Page

Use Regular Expression in URL filter

Home Download Forum Previous  Top  Next

URL filters allow you to easily control Project downloads by setting which pictures/pages should be loaded and which should be skipped.


URL Filters are divided into four parts:


Page URL Include Filters - determine which HTML pages should be accessed and analyse to follow the links.
Page URL Exclude Filters - determine which HTML pages should be skipped.
Picture URL Include Filters - determin which pictures should be downloaded.
Picture URL Exclude Filters - determin which pictures should be skipped.


You may enter several keywords into each of these filter lists, using a semicolon (;) to separate keywords.

You can use Perl like Regular Expression as keyword, A regular expression is a string of characters which tells PicaLoader which URL (or URLs) you are looking for. The following explains the format of regular expressions in detail. If you are familiar with Perl, you already know the syntax.


1.Simple Regular Expressions:In its simplest form, a regular expression is just a word or phrase to search for. For example,


would match any URL with the string "beatles" in it, or which mentioned the word "beatles" in the URL line.Thus, URLs like "", "" or "" would all be matched.


2.Metacharacters:Some characters have a special meaning to the filter. These characters are called metacharacters. Although they may seem confusing at first, they add a great deal of flexibility and convenience to the filter.


The period (.) is a commonly used metacharacter. It matches exactly one character, regardless of what the character is. For example, the regular expression:


will match "pic001" and "pic101"... Note that the period matches exactly one character-- it will not match a string of characters, nor will it match the null string. Thus, "picture01" and "pic01" will not be matched by the above regular expression.


But what if you wanted to match for a URL containing a period? For example,


This would indeed match "pic001.jpg", but it would also match "pic001ajpg", "pic0011jpg"... In short, any string of the form "pic001xjpg", where x is any character, would be matched by the regular expression above.

To get around this, we introduce a second metacharacter, the backslash (\). The backslash can be used to indicate that the character immediately to its right is to be taken literally. Thus, to match for the string "pic001.jpg", we would use:


This is called "quoting". We would say that the period in the regular expression above has been quoted. In general, whenever the backslash is placed before a metacharacter, the searcher treats the metacharacter literally rather than invoking its special meaning.


The question mark (?): indicates that the character immediately preceding it either zero times or one time. Thus


will match "pic1" and "pic01".


The star (*): indicates that the character immediately to its left may be repeated any number of times, including zero. Thus


will match "pic1", "pic01", "pic001", "pic0001", and any string that starts with an "pic", is followed by a sequence of "0"'s,  and ends with a "1".


The plus (+): indicates that the character immediately preceding it may be repeated one or more times. It is just like the star metacharacter, except it doesn't match the null string. Thus


would not match "pic1", but it would match "pic01", "pic001", "pic0001" and so on.


Metacharacters may be combined. A common combination includes the period and star metacharacters, with the star immediately following the period. This is used to match an arbitrary string of any length, including the null string. For example:


would match "pic1", "pic01" and even "picture_001" Any string that starts with "pic", is followed by an arbitrary string, and ends with "1" will be matched. Note that the null string will be matched by the period-star pair; thus, "pic1" would be matche by the above expression.


3.Earlier it was mentioned that the backslash can turn ordinary characters into metacharacters, as well as the other way around.


The digit metacharacter: which is invoked by following a backslash with a lower-case "d", like this: "\d". The "d" must be lower case. The digit metacharacter matches exactly one digit; that is, exactly one occurence of "0", "1", "2", "3", "4", "5", "6", "7", "8" or "9". For example, the regular expression:


would match "pic0.jpg", "pic1.jpg" and so forth. Similarly,


would match "pic00.jpg", "pic01.jpg" ~ "pic99.jpg".

We could combine the digit metacharacter with other metacharacters; for instance,


matches any string starting with "pic", followed by a string of numbers, followed by a ".jpg". (Note that the plus is used, and thus "pic.jpg" is not matched.)


The non-digit metacharacter: which uses the uppercase "D". The non-digit metacharacter looks like "\D" and matches any character except a digit. Thus,


would match "pica.jpg", "picZ.jpg" or "pic+.jpg", but would not match "pic1.jpg", "pic5.jpg" or "pic9.jpg". Similarly,


Matches any non-null string which contains no numeric characters.


The word metacharacter: which matches exactly one letter, one number, or the underscore character (_). It is written as "\w". It's opposite, "\W", matches any one character except a letter, a number or the underscore. Thus,


would match "abz", "aTz", "a5z", "a_z", or any three-character string starting with "a", ending with "z", and whose second character was either a letter (upper- or lower-case), a number, or the underscore. Similarly,


would not match "abz", "aTz", "a5z", or "a_z". It would match "a%z", "a{z", "a?z" or any three-character string starting with "a" and ending with "z" and whose second character was not a letter, number, or underscore. (This means the second character must either be a symbol or a whitespace character.)


The braces metacharacter: This metacharacter follows a normal character and contains two number separated by a comma (,) and surrounded by braces ({}). It is like the star metacharacter, except the length of the string it matches must be within the minimum and maximum length specified by the two numbers in braces. Thus,


will match "pic000.jpg" and "pic00000.jpg". No other string is matched. Likewise,


will match "pic000.jpg", "pic99999.jpg" or "picabc.jpg", but not "pic00.jpg", since "00" is only two characters long.


The alternative metacharacter: is represented by a vertical bar (|). It indicates an either/or behavior by separating two or more possible choices. For example:


will match any subject containing the strings "beatles" or "u2" or both.


The bracket metacharacter: matches one occurence of any character inside the brackets ([]). For example,


will match "pic_a.jpg", "pic_b.jpg" and "pic_f.jpg", but not "pic_0.jpg", "pic_c.jpg" or "pic_e.jpg". Similarly,

Ranges of characters can be used by using the dash (-) within the brackets. For example,


will match "pica.jpg", "picb.jpg", "picc.jpg" or "picd.jpg", and nothing else. Likewise,


will match "wallpaper30.jpg" ~ "wallpaper59.jpg".

If you wish to include a dash within brackets as one of the characters to match, instead of to denote a range, put the dash immediately before the right bracket. Thus:




both do the same thing. They both match "a1z", "a2z", "a3z", "a4z" or "a-z", and nothing else.


The bracket metacharacter can also be inverted by placing a caret (^) immediately after the left bracket. Thus,


matches any ten-character string starting with "wallpaper" and ending with anything except an even number. Inversion and ranges can be combined, so that


matches any four letter wording ending in "ood" except for "food", "good" or "hood". (Thus "mood" and "wood" would both be matched.)

Note that within brackets, ordinary quoting rules do not apply and other metacharacters are not available. The only characters that can be quoted in brackets are "[", "]", and "\". Thus,


matches any four letter string ending with "abc" and starting with "[", "]", or "\".


4.The table below lists some of the more useful special (meta) characters.




Matches any character (except newline)


Matches 0 or 1 x's, where x is any regular expression


Matches 0 or more x's


Matches 1 or more x's


Matches one of foo or bar


Matches any character in the set xyz, specify ranges with a -


Matches any single character not in the set xyz


Matches an alpha-numeric character, i.e., [a-zA-Z0-9_]


Brackets a regular expression


Matches the metacharacter (takes away its special meaning)


5.The search is case insensitive; thus






all search for the same set of strings. Each will match "picture", "PICTURE", "Picture", "PicTure" and so forth. Thus you need not worry about capitalization. (Note, however, that metacharacter must still have the proper case. This is especially important for metacharacters whose case determines whether their meaning is reversed or not.)