Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2018

How-To Defeat Facebook “Next or Next Page” Links

Filed under: Bash,Facebook — Patrick Durusau @ 10:09 pm

Not you but friends of yours are lured in by “click-bait” images to Facebook pages with “Next or Next Page” links. Like this one:

60 Groovy Photos You Probably Haven’t Seen Before

You can, depending on the speed of your connection and browser, follow each link. That’s tiresome and leaves you awash in ads for every page.

Here’s a start on a simple method to defeat such links.

First, if you follow the first link (they vary from site to site), you find:

http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/2

So we know from that URL that we need to increment the 2, up to and including 60, to access all the relevant pages.

If we do view source (CTRL-U), we find:

<div class=’gallery-image’>
<img src=’http://cdn.groovyhistory.com/content/50466/90669810f510ad0de494a9b55c1f67d2.jpg’
class=’img-responsive’ alt=” /></div>

We need to extract the image where its parent div has class=’gallery-image,’ write that to a file suitable for display.

I hacked out this quick one liner to do the deed:

echo "<html><head></head><body>" > pics.html;for i in `seq -w 1 59`;do wget -U Mozilla -q "http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/$i" -O - | grep gallery >> pics.html;echo "</body></html>" >> pics.html;done

Breaking the one-liner into steps:

  1. echo "<html><head></head><body>" > pics.html.

    Creates the HTML file pics.html and inserts markup down to the open body element.

  2. for i in `seq -w 1 60`.

    Creates the loop and the variable i, which is used in the next step to create the following URLs.

  3. do wget -U Mozilla -q "http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/$i" -O - .

    Begins the do loop, invokes wget, identifies it as Mozilla (-U Mozilla), suppresses messages (-q), gives the URL with the $i variable, requests the output of each URL (-O), pipes the output to standard out ( – ).

  4. | grep gallery >> pics.html.

    The | pipe sends the output of each URL to grep, which searches for gallery, when found, the line containing gallery is appended (>>) to pics.html. That continues until 60 is reached and the loop exits.

  5. echo "</body></html>" >> pics.html.

    After the loop exits, the closing body and html elements are appended to the pics.html file.

  6. done

    The loop having exited and other commands exhausted, the script exits.

Each step, in the one-liner, is separated from the others with a semi-colon “;”.

I converted the entities back to markup and it ran, except that it didn’t pickup the first image, a page without an appended number.

To avoid hand editing the script:

  • Pass URL at command line
  • Pass number of images on command line
  • Text to grep changes with host, so create switch statement that keys on host
  • Output file name as command line option

The next time you encounter “50 Famous Photo-Bombs,” “30 Celebs Now,” or “45 Unseen Beatles Pics,” a minute or two of editing even the crude version of this script will save you the time and tedium of loading advertisements.

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress