Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 3, 2018

How-To Defeat Facebook “Next or Next Page” Links

Filed under: Bash,Facebook — Patrick Durusau @ 10:09 pm

Not you but friends of yours are lured in by “click-bait” images to Facebook pages with “Next or Next Page” links. Like this one:

60 Groovy Photos You Probably Haven’t Seen Before

You can, depending on the speed of your connection and browser, follow each link. That’s tiresome and leaves you awash in ads for every page.

Here’s a start on a simple method to defeat such links.

First, if you follow the first link (they vary from site to site), you find:

http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/2

So we know from that URL that we need to increment the 2, up to and including 60, to access all the relevant pages.

If we do view source (CTRL-U), we find:

<div class=’gallery-image’>
<img src=’http://cdn.groovyhistory.com/content/50466/90669810f510ad0de494a9b55c1f67d2.jpg’
class=’img-responsive’ alt=” /></div>

We need to extract the image where its parent div has class=’gallery-image,’ write that to a file suitable for display.

I hacked out this quick one liner to do the deed:

echo "<html><head></head><body>" > pics.html;for i in `seq -w 1 59`;do wget -U Mozilla -q "http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/$i" -O - | grep gallery >> pics.html;echo "</body></html>" >> pics.html;done

Breaking the one-liner into steps:

  1. echo "<html><head></head><body>" > pics.html.

    Creates the HTML file pics.html and inserts markup down to the open body element.

  2. for i in `seq -w 1 60`.

    Creates the loop and the variable i, which is used in the next step to create the following URLs.

  3. do wget -U Mozilla -q "http://groovyhistory.com/60-groovy-photos-you-probably-havent-seen-before/$i" -O - .

    Begins the do loop, invokes wget, identifies it as Mozilla (-U Mozilla), suppresses messages (-q), gives the URL with the $i variable, requests the output of each URL (-O), pipes the output to standard out ( – ).

  4. | grep gallery >> pics.html.

    The | pipe sends the output of each URL to grep, which searches for gallery, when found, the line containing gallery is appended (>>) to pics.html. That continues until 60 is reached and the loop exits.

  5. echo "</body></html>" >> pics.html.

    After the loop exits, the closing body and html elements are appended to the pics.html file.

  6. done

    The loop having exited and other commands exhausted, the script exits.

Each step, in the one-liner, is separated from the others with a semi-colon “;”.

I converted the entities back to markup and it ran, except that it didn’t pickup the first image, a page without an appended number.

To avoid hand editing the script:

  • Pass URL at command line
  • Pass number of images on command line
  • Text to grep changes with host, so create switch statement that keys on host
  • Output file name as command line option

The next time you encounter “50 Famous Photo-Bombs,” “30 Celebs Now,” or “45 Unseen Beatles Pics,” a minute or two of editing even the crude version of this script will save you the time and tedium of loading advertisements.

Enjoy!

August 27, 2014

Unofficial Bash Strict Mode

Filed under: Bash,Shell Scripting — Patrick Durusau @ 7:26 pm

Use the Unofficial Bash Strict Mode (Unless You Looove Debugging) by Aaron Maxwell.

Let’s start with the punchline. Your bash scripts will be more robust, reliable and maintainable if you start them like this:

#!/bin/bash
set -euo pipefail
IFS=$'\n\t' 

I call this the unofficial bash strict mode. This causes bash to behave in a way that makes many classes of subtle bugs impossible. You’ll spend much less time debugging, and also avoid having unexpected complications in production.

There is a short-term downside: these settings make certain common bash idioms harder to work with. They all have simple workarounds, detailed below: jump to Issues & Solutions. But first, let’s look at what these obscure lines actually do.

The sort of thing you only hear about in rumors or stumble across in a Twitter feed.

Bookmark and share!

I first saw this in a tweet by Neil Saunders.

November 28, 2012

Bash One-Liners Explained (series)

Filed under: Bash,Data Mining,String Matching,Text Mining — Patrick Durusau @ 10:26 am

Bash One-Liners Explained by Peteris Krumins.

The series page for posts by Peteris Krumins on Bash one-liners.

So far:

One real advantage to Bash scripts is the lack of a graphical interface to get in the way.

A real advantage with “data” files but many times “text” files as well.

Powered by WordPress