Feature #1022

LinkGrabber filter: more advanced

Added by jiaz almost 10 years ago. Updated about 8 years ago.

Status:ClosedStart date:12/04/2009
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:Controlling
Target version:040 - FarfarAway
Resolution:Fixed

Description

Currently "Settings -> Basic -> User Interface -> LinkGrabber -> Link Filter" is a list of regular expressions defining links to ignore.

I would like to add to this functionality or have it added elsewhere:
RE Filters based on RE as it does now, and stop processing filters.
-|RE Equivalent to RE with no prefix.
+|RE Add this link and stop processing filters.
s|RE|STR| Substitute STR for RE (all occurrences) and keep processing.

This would work from top to bottom. Separator character is pipe character (rarely used in URIs, except to substitute for the ':' in IE). Any separator you select would be fine (I originally was thinking of ':').

Ideally, we would use the full VI substitution (including %1, %2, ...), but if that is not already available in Java, the simpler form shown here would be fine.

For example (real problem), I might want to download JPG files, but just from a specific photo site. I don't want to download viewer links and I want to get the real link from the thumbnail URL:

  1. I don't want thumbnails, but the rest of the URL is right
    s|_thumb\.jpg|\.jpg|
  1. Eliminate Viewer URI's
    -|.*?\.jpg\?viewer.*?
  1. I want jpeg files from this site
    +|.*?pic-host\.com.*?.jpg$
  1. When copying links, I usually don't want graphics.
    .+?\.jpg
    .+?\.jpeg
    .+?\.gif
    .+?\.png

This example corresponds to trying to get the images for the thumbnails on these pages
Right now, JD is the only tool I have that will collect the URLs for the thumbnails. It also picks up the viewer URIs. The thumbnail URLs are the same as the image URLs, except for the addition of _thumb to the name. The actual REs for the viewers would be slightly different


Related issues

Related to Feature #1114: Bypass Linkgrabber Closed 12/22/2009
Related to Feature #2459: Additional LinkGrabber Filter Option: Refilter after lookup Closed 09/27/2010

Also available in: Atom PDF