Archive for the ‘Linux OS’ Category

Intro to Low-Level Graphics on Linux – Impressing Spouse’s Family

Sunday, November 12th, 2017

Intro to Low-Level Graphics on Linux

From the webpage:

This tutorial attempts to explain a few of the possible methods that exist on Linux to access the graphics hardware from a low level. I am not talking about using Xlib instead of GTK+ or QT5, nor am I talking about using DirectFB, I want to go even lower than that; I’m talking about drawing graphics to the screen without needing any external dependencies; I’m talking about communicating directly with the Linux kernel. I will also provide information about programming for newer graphical systems (Wayland/Mir) even though those do not involve direct communication with the kernel drivers. The reason I want to provide this information in this tutorial is that even though their APIs are higher level, the programming techniques used in low-level graphics programming can easily be adapted to work with Wayland and Mir. Also, similar to fbdev and KMS/DRM APIs, good programming resources are hard to come by.

Most Linux systems actually provide a few different methods for drawing graphics to the screen; there are options. However, the problem is that documentation is basically non-existent. So, I would like to explain here what you need to know to get started.

Please note that this tutorial assumes you have a basic knowledge of C, this is not a beginner tutorial, this is for people who are interested in something like learning more about how Linux works, or about programming for embedded systems, or just doing weird experimental stuff for fun.

You can impress your spouse’s family this holiday season by writing C code for low-level graphics on Linux. They won’t know you are frantically typing comments to the example code and will be suitably impressed by compiling.

The other reason to mention this is the presence of Linux on embedded systems. Embedded systems such as in industrial controllers, monitoring equipment, etc. The more comfortable you are will such systems the easy they will be to explore.

Enjoy!

Samba Flaw In Linux PCs

Thursday, May 25th, 2017

Samba Flaw Allows Hackers Access Thousands of Linux PCs Remotely

From the post:

A remote code execution vulnerability in Samba has potentially exposed a large number of Linux and UNIX machines to remote attackers. The code vulnerability (CVE-2017-7494) affects all machines with Samba versions newer than the 3.5.0 released last March 2010, making it a 7-year old flaw in the system.

Samba is a software that runs on most of the operating systems used today like Windows, UNIX, IBM, Linux, OpenVMS, and System 390. Due to its open source nature resulting from the reimplementation of the SMB (Server Message Block) networking protocol, Samba enables non-Windows operating systems like Mac OS X or GNU/Linux to give access to folders, printers, and files with Windows OS.

All affected machines can be remotely controlled by uploading a shared library to a writable program. Another command can then be used to cause the server to execute the code. This allows hackers access Linux PC remotely according to the published advisory by Samba last Wednesday, May 24.

Cited but not linked:

The Rapid7 Community post in particular has good details.

Not likely a repeat of WannaCry. It’s hard imagine NHS trusts running Linux.

😉

Continuous Unix commit history from 1970 until today

Thursday, December 29th, 2016

Continuous Unix commit history from 1970 until today

From the webpage:

The history and evolution of the Unix operating system is made available as a revision management repository, covering the period from its inception in 1970 as a 2.5 thousand line kernel and 26 commands, to 2016 as a widely-used 27 million line system. The 1.1GB repository contains about half a million commits and more than two thousand merges. The repository employs Git system for its storage and is hosted on GitHub. It has been created by synthesizing with custom software 24 snapshots of systems developed at Bell Labs, the University of California at Berkeley, and the 386BSD team, two legacy repositories, and the modern repository of the open source FreeBSD system. In total, about one thousand individual contributors are identified, the early ones through primary research. The data set can be used for empirical research in software engineering, information systems, and software archaeology.

You can read more details about the contents, creation, and uses of this repository through this link.

Two repositories are associated with the project:

  • unix-history-repo is a repository representing a reconstructed version of the Unix history, based on the currently available data. This repository will be often automatically regenerated from scratch, so this is not a place to make contributions. To ensure replicability its users are encouraged to fork it or archive it.
  • unix-history-make is a repository containing code and metadata used to build the above repository. Contributions to this repository are welcomed.

Not everyone will find this exciting but this rocks as a resource for:

empirical research in software engineering, information systems, and software archaeology

Need to think seriously about putting this on a low-end laptop and sealing it up in a Faraday cage.

Just in case. 😉

Things to learn about Linux

Tuesday, November 22nd, 2016

Things to learn about Linux

From the post:

I asked on Twitter today what Linux things they would like to know more about. I thought the replies were really cool so here’s a list (many of them could be discussed on any Unixy OS, some of them are Linux-specific)

I count forty-seven (47) entries on Julia’s list, which should keep you busy through any holiday!

Enjoy!

Status of the Kernel Self Protection Project

Monday, August 29th, 2016

Status of the Kernel Self Protection Project by Kees (“Case”) Cook.

Slides from the Linux Security Summit 2016.

Kernel Self Protection Project links:

kernel-hardening mailing list archive.

Kernel Self Protection Project – wiki page.

Kees’ review of bug classes provides a guide to searching for new bugs and capturing data about existing one.

Enjoy!

PS: Motivation to participate in this project:

Every bug fix, makes users safer from cybercriminals and incrementally diminishes government spying.

Linux debugging tools you’ll love: the zine

Saturday, August 27th, 2016

Linux debugging tools you’ll love: the zine by Julia Evans.

From the website:

There are a ton of amazing debugging tools for Linux that I love. strace! perf! tcpdump! wireshark! opensnoop! I think a lot of them aren’t as well-known as they should be, so I’m making a friendly zine explaining them.

Donate, subscribe (PDF or paper)!

If you follow Julia’s blog (http://jvns.ca) or twitter (@b0rk), you know what a treat the zine will be!

If you don’t (correct that error now) and consider the following sample:

julia-sample-460

It’s possible there are better explanations than Julia’s, so if and when you see one, sing out!

Until then, get the zine!

Debugging

Tuesday, August 23rd, 2016

Julia Evans tweeted:

evans-debugging-460

It’s been two days without another suggestion.

Considering Brendan D. Gregg’s homepage, do you have another suggestion?

Too rich of a resource to not write down.

Besides, for some subjects and their relationships, you need specialized tooling to see them.

Not to mention that if you can spot patterns in subjects, detecting an unknown 0-day may be easier.

Of course, you can leave USB sticks at popular eateries near Fort Meade, MD 20755-6248, but some people prefer to work for their 0-day exploits.

😉

TLDR pages [Explanation and Example Practice]

Friday, August 19th, 2016

TLDR pages

From the webpage:

The TLDR pages are a community effort to simplify the beloved man pages with practical examples.

Try the live demo below, have a look at the pdf version, or follow the installing instructions.

Be sure to read the Contributing guidelines.

I checked and ngrep isn’t there. 🙁

Well, ngrep only has thirty (30) options and switches before you reach <match expression> and <bpf filter>, so how much demand could there be for examples?

😉

Great opportunity to practice your skills at explanation and creating examples.

NGREP – Julia Evans

Sunday, July 31st, 2016

Julia Evans demonstrates how to get around the limits of Twitter and introduces you to a “starter network spy tool.”

ngrep-Julia-Evans-460

A demonstration of her writing skills as well!

Ngrep at sourceforge.

Installing on Ubuntu 14.04:

sudo apt-get update
sudo apt-get install ngrep

I’m a follower of Julia’s but even so, I checked the man page for ngrep before running the example.

The command:

sudo ngrep -d any metafilter is interpreted:

sudo – runs ngrep as superuser (hence my caution)

ngrep – network grep

-d any – ngrep listens to “any” interface *

metafilter – match expression, packets that match are dumped.

* The “any” value following -d was the hardest value to track down. The man page for ngrep describes the -d switch this way:

-d dev

By default ngrep will select a default interface to listen on. Use this option to force ngrep to listen on interface dev.

Well, that’s less than helpful. 😉

Until you discover on the tcpdump man page:

–interface=interface
Listen on interface. If unspecified, tcpdump searches the system interface list for the lowest numbered, configured up interface (excluding loopback), which may turn out to be, for example, “eth0”.
On Linux systems with 2.2 or later kernels, an interface argument of “any” can be used to capture packets from all interfaces. Note that captures on the “any” device will not be done in promiscuous mode. (bold highlight added)

If you are running a Linux system with a 2.2 or later kernel, you can use the “any” argument to the interface -d switch of ngrep.

Understanding the entire command, I then felt safe running it as root. 😉 Not that I expected a bad outcome but I learned something in the process of researching the command.

Be aware that ngrep is a plethora of switches, options, bpf filters (Berkeley packet filters) and the like. The man page runs eight pages of, well, man page type material.

Enjoy!

New Linux Journal Subscription Benefit!

Tuesday, July 12th, 2016

Benefits of a Linux Journal subscription you already know:

  1. Linux Journal, currently celebrating its 20th year of publication, is the original magazine of the global Linux community, delivering readers the advice and inspiration they need to get the most out of their Linux systems.”
  2. $29.50 (US) buys 12 issues and access to the Linux Journal archive.
  3. Linux Journal has columns written by regular columns written by Mick Bauer, Reuven Lerner, Dave Taylor, Kyle Rankin, Bill Childers, John Knight, James Gray, Zack Brown, Shawn Powers and Doc Searls.
  4. For more see the Linux Journal FAQ.

Now there is a new Linux Journal subscription benefit:

You are flagged as an extremist by the NSA

NSA Labels Linux Journal Readers and TOR and TAILS Users as Extremists by Dave Palmer.

End the constant worry, nagging anxiety, endless arguments with friends about who is being tracked by the NSA! For the small sum of $29.50 (US) you can buy your way into the surveillance list at the NSA.

I can’t think of a cheaper way to get on a watch list, unless you send threatening letters to the U.S. President, which is a crime, so don’t do it.

Step up and assume the mantle of “extremist” in the eyes of the NSA.

You would be hard pressed to find better company.

PS: Being noticed may not seem like a good idea. But the bigger the NSA haystack, the safer all needles will be.

Sketch of strace and tcpdump

Friday, May 6th, 2016

A workshop on strace & tcpdump by Julia Evans.

From the post:

This week at work, I ran a workshop on tcpdump and strace. a couple of people on Twitter asked about it so here are some notes. This is mostly just so I can reuse them more easily next time, but maybe you will also find it interesting. The notes are a bit sparse.

I basically did a bunch of live demos of how to use tcpdump & strace, and then took questions & comments as people had them. I ran it in an hour, which I think was fine for people who already had some familiarity with the tools, but really aggressive if you’re learning from scratch. Will do that differently next time.

As Julia says, the notes are rather sparse but you could expand them to make the presentation your own.

Good reminder that reports from tools are just that, reports from tools.

If you aren’t close to the metal, you are taking a tool’s word for messages and system state.

Do you trust your tools that much?

UNIX, Bi-Grams, Tri-Grams, and Topic Modeling

Sunday, April 17th, 2016

UNIX, Bi-Grams, Tri-Grams, and Topic Modeling by Greg Brown.

From the post:

I’ve built up a list of UNIX commands over the years for doing basic text analysis on written language. I’ve built this list from a number of sources (Jim Martin‘s NLP class, StackOverflow, web searches), but haven’t seen it much in one place. With these commands I can analyze everything from log files to user poll responses.

Mostly this just comes down to how cool UNIX commands are (which you probably already know). But the magic is how you mix them together. Hopefully you find these recipes useful. I’m always looking for more so please drop into the comments to tell me what I’m missing.

For all of these examples I assume that you are analyzing a series of user responses with one response per line in a single file: data.txt. With a few cut and paste commands I often apply the same methods to CSV files and log files.

My favorite comment on this post was a reader who extended the tri-gram generator to build a hexagram!

If that sounds unreasonable, you haven’t read very many government reports. 😉

While you are at Greg’s blog, notice a number of useful posts on Elasticsearch.

Linux System Calls – Linux/Mac/Windows

Tuesday, April 5th, 2016

Well, not quite yet but closer than it has been in the past!

The Definitive Guide to Linux System Calls.

From the post:

This blog post explains how Linux programs call functions in the Linux kernel.

It will outline several different methods of making systems calls, how to handcraft your own assembly to make system calls (examples included), kernel entry points into system calls, kernel exit points from system calls, glibc wrappers, bugs, and much, much more.

The only downside of the movement towards Linux is that its kernel, etc., will get much heavier scrutiny than in the past.

In the past, why bother with stronger code in a smaller market share?

Move Linux into a much larger market share, we may get to see if “…to many eyes all bugs are shallow.”

As an empirical matter, not just cant.

You Can Master the Z Shell (Pointer to How-To)

Friday, March 4th, 2016

Cutting through the toxic atmosphere created by governments around the world requires the sharpest tools and develop of skills at using them.

Unix shells are like a switchblade knife. Not for every job but if you need immediate results, its hard to beat. While you are opening an application, loading files, finding appropriate settings, etc., a quick shell command can have you on your way.

Nacho Caballero writes in Master Your Z Shell with These Outrageously Useful Tips:

If you had previously installed Zsh but never got around to exploring all of its magic features, this post is for you.

If you never thought of using a different shell than the one that came by default when you got your computer, I recommend you go out and check the Z shell. Here are some Linux guides that explain how to install it and set it as your default shell. You probably have Zsh installed you are on a Mac, but there’s nothing like the warm fuzzy feeling of running the latest version (here’s a way to upgrade using Homebrew).

The Zsh manual is a daunting beast. Just the chapter on expansions has 32 subsections. Forget about memorizing this madness in one sitting. Instead, we’ll focus on understanding a few useful concepts, and referencing the manual for additional help.

The three main sections of this post are file picking, variable transformations, and magic tabbing. If you’re pressed for time, read the beginning of each one, and come back later to soak up the details (make sure you stick around for the bonus tips at the end). (emphasis in original)

Would be authors/editors, want to try your hand at the chapter on expansions? Looking at the documentation for Zsh version 5.2, released December 2, 2015, there are 25 numbered subsections for 14 Expansion.

You will be impressed by the number of modifiers/operators available. If you do write a manual for expansions in Zsh, do distribute it widely.

I hope it doesn’t get overlooked by including it here but Nacho also wrote: AWK GTF! How to Analyze a Transcriptome Like a Pro – Part 1 (2 and 3). Awk is another switchblade like tool for your toolkit.

I first saw this in a tweet by Christophe Lalanne.

Do one thing…

Monday, November 2nd, 2015

Do one thing… I don’t want barely distinguishable tools that are mediocre at everything; I want tools that do one thing and do it well. by Mike Loukides.

From the post:

I’ve been lamenting the demise of the Unix philosophy: tools should do one thing, and do it well. The ability to connect many small tools is better than having a single tool that does everything poorly.

That philosophy was great, but hasn’t survived into the Web age. Unfortunately, nothing better has come along to replace it. Instead, we have “convergence”: a lot of tools converging on doing all the same things poorly.

The poster child for this blight is Evernote. I started using Evernote because it did an excellent job of solving one problem. I’d take notes at a conference or a meeting, or add someone to my phone list, and have to distribute those files by hand from my laptop to my desktop, to my tablets, to my phone, and to any and all other machines that I might use.

Mike takes a stick to Evernote, Gmail, Google Maps, Skype, Twitter, Flickr, Dropbox (insert your list of non-single purpose tools here), etc.

Then he offers a critical insight about web applications:

…There’s no good way to connect one Web application to another. Therefore, everything tends to be monolithic; and in a world of monolithic apps, everyone wants to build their own garden, inevitably with all the features that are in all the other gardens.

Mike mentions IFTTT, which connects web services but wants something a bit more generic.

I think of IFTTT as walkways between a designated set of walled gardens. Useful for traveling between walled gardens but not anything else.

Mike concludes:

I don’t want anyone’s walled garden. I’ve seen what’s inside the walls, and it isn’t a palace; it’s a tenement. I don’t want barely distinguishable tools that are mediocre at everything. I want tools that do one thing, and do it well. And that can be connected to each other to build powerful tools.

What single purpose tool are you developing?

How will it interact with other single purpose tools?

Linux on the Mainframe

Monday, August 24th, 2015

Linux Foundation Launches Open Mainframe Project to Advance Linux on the Mainframe

From the post:

The Linux Foundation, the nonprofit organization dedicated to accelerating the growth of Linux and collaborative development, announced the Open Mainframe Project. This initiative brings together industry experts to drive innovation and development of Linux on the mainframe.

Founding Platinum members of the Open Mainframe Project include ADP, CA Technologies, IBM and SUSE. Founding Silver members include BMC, Compuware, LC3, RSM Partners and Vicom Infinity. The first academic institutions participating in the effort include Marist College, University of Bedfordshire and The Center for Information Assurance and Cybersecurity at University of Washington. The announcement comes as the industry marks 15 years of Linux on the mainframe.

In just the last few years, demand for mainframe capabilities have drastically increased due to Big Data, mobile processing, cloud computing and virtualization. Linux excels in all these areas, often being recognized as the operating system of the cloud and for advancing the most complex technologies across data, mobile and virtualized environments. Linux on the mainframe today has reached a critical mass such that vendors, users and academia need a neutral forum to work together to advance Linux tools and technologies and increase enterprise innovation.

“Linux today is the fastest growing operating system in the world. As mobile and cloud computing become globally pervasive, new levels of speed and efficiency are required in the enterprise and Linux on the mainframe is poised to deliver,” said Jim Zemlin executive director at The Linux Foundation. “The Open Mainframe Project will bring the best technology leaders together to work on Linux and advanced technologies from across the IT industry and academia to advance the most complex enterprise operations of our time.”

Linux Foundation Collaborative Projects, visit: http://collabprojects.linuxfoundation.org/

Open Mainframe Project, visit: https://www.openmainframeproject.org/

In terms of ancient topic map history, recall that both topic maps and DocBook arose out of what became the X-Windows series by O’Reilly. If you are familiar with the series, you can imagine the difficulty of adapting it to the nuances of different vendor releases and vocabularies.

Several of the volumes from the X-Windows series are available in the O’Reilly OpenBook Project.

I mention that item of topic map history because documenting mainframe Linux isn’t going to be a trivial task. A useful index across documentation from multiple authors is going to require topic maps or something very close to it.

One last bit of trivia, the X-Windows project can be found at www.x.org. How’s that for cool? A single letter name.

Unix™ for Poets

Wednesday, July 29th, 2015

Unix™ for Poets by Kenneth Ward Church.

A very delightful take on using basic Unix tools for text processing.

Exercises cover:

1. Count words in a text

2. Sort a list of words in various ways

  • ascii order
  • dictionary order
  • ‘‘rhyming’’ order

3. Extract useful info from a dictionary

4. Compute ngram statistics

5. Make a Concordance

Fifty-three (53) pages of pure Unix joy!

Enjoy!

Linux still rules supercomputing

Saturday, July 18th, 2015

Linux still rules supercomputing by Steven J. Vaughan-Nichols.

Cutting to the chase, 486 out of the top 500 computers is running Linux.

You already knew that. You’re reading this post on a Linux box. 😉

Now if we can just get it routinely onto desktops!

Malware’s Most Wanted: Linux and Internet of Things Malware? (webinar)

Wednesday, May 6th, 2015

Malware’s Most Wanted: Linux and Internet of Things Malware?

From the description:

Speaker: Marion Marschalek, Security Researcher of Cyphort Labs
Date and Time: Thursday, May 28, 2015 9:00 AM PDT

Occasionally we see samples coming out of our pipe which do not fit with the stream of malware, such as clickjackers, banking Trojans and spybots. These exotic creatures are dedicated to target platforms other than the Windows operating system. While they make up for a significantly smaller portion than the load of Windows malware, Cyphort labs has registered a rise in Linux and Internet of Things Malware (IoT) malware. A number of different families has been seen. But what is their level of sophistication and the associated risk? This webinar provides an overview of Linux and IoT malware that Cyphort labs has spotted in the wild and gives an insight into the development of these threats and the direction they are taking. Attendees may opt in to receive a special edition t-shirt.

I haven’t seen a Cyphort webinar so I am taking a chance on this one.

Enjoy!

KDE and The Semantic Desktop

Saturday, March 14th, 2015

KDE and The Semantic Desktop by Vishesh Handa.

From the post:

During the KDE4 years the Semantic Desktop was one of the main pillars of KDE. Nepomuk was a massive, all encompassing, and integrated with many different part of KDE. However few people know what The Semantic Desktop was all about, and where KDE is heading.

History

The Semantic Desktop as it was originally envisioned comprised of both the technology and the philosophy behind The Semantic Web.

The Semantic Web is built on top of RDF and Graphs. This is a special way of storing data which focuses more on understanding what the data represents. This was primarily done by carefully annotating what everything means, starting with the definition of a resource, a property, a class, a thing, etc.

This process of all data being stored as RDF, having a central store, with applications respecting the store and following the ontologies was central to the idea of the Semantic Desktop.

The Semantic Desktop cannot exist without RDF. It is, for all intents and purposes, what the term “semantic” implies.

A brief post-mortem on the KDE Semantic Desktop which relied upon NEPOMUK (Networked Environment for Personal, Ontology-based Management of Unified Knowledge) for RDF-based features. (NEPOMUK was an EU project.)

The post mentions complexity more than once. A friend recently observed that RDF was all about supporting AI and not capturing arbitrary statements by a user.

Such as providing alternative identifiers for subjects. With enough alternative identifications (including context, which “scope” partially captures in topic maps), I suspect a deep learning application could do pretty well at subject recognition, including appropriate relationships (associations).

But that would not be by trying to guess or formulate formal rules (a la RDF/OWL) but by capturing the activities of users as they provide alternative identifications of and relationships for subjects.

Hmmm, merging then would be a learned behavior by our applications. Will have to give that some serious thought!

I first saw this in a tweet by Stefano Bertolo.

Root Linux Via DRAM

Tuesday, March 10th, 2015

Ouch! Google crocks capacitors and deviates DRAM to root Linux by Iain Thomson.

From the post:


Last summer Google gathered a bunch of leet [elite] security researchers as its Project Zero team and instructed them to find unusual zero-day flaws. They’ve had plenty of success on the software front – but on Monday announced a hardware hack that’s a real doozy.

The technique, dubbed “rowhammer”, rapidly writes and rewrites memory to force capacitor errors in DRAM, which can be exploited to gain control of the system. By repeatedly recharging one line of RAM cells, bits in an adjacent line can be altered, thus corrupting the data stored.

This corruption can lead to the wrong instructions being executed, or control structures that govern how memory is assigned to programs being altered – the latter case can be used by a normal program to gain kernel-level privileges.

The “rowhammer” routines are something to consider adding to your keychain USB (Edward Snowden) or fake Lady Gaga CD (writeable media) (Private Manning), in case you become curious about the security of a networked environment.

Iain’s post is suitable for passing on to middle-level worriers but if you need the read details consider:

Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors: Paper on rowhammer by Yoongu Jim et al.

Program for testing for the DRAM “rowhammer” problem Google’s Github repository on rowhammer.

Rowhammer Discuss (mailing list) Google mailing list for discussion of rowhammer.

The Linux faithful turned out comment the problem was in hardware and all operating systems were vulnerable. That is obvious from “hardware hack” and “rapidly writes and rewrites memory to force capacitor errors in DRAM.” But you do have to read more than the title to get that information.

Windows-based spys are waiting for someone to write a rowhammer application with a Windows installer so I don’t think the title is necessarily unfair to Linux. Personally I would just use a USB-based Linux OS to reboot a Windows machine. I don’t know if there is a “looks like MS Windows” interface for Linux or not. So long as you weren’t too productive, that could cover the fact you are not running Windows.

BTW, Iain, unlike many writers, included hyperlinks to non-local resources on rowhammer. That is how the Web is supposed to work. Favor the work of Iain and others like Iain if you want a better Web.

clf – Command line tool to search snippets on Commandlinefu.com

Sunday, March 8th, 2015

clf – Command line tool to search snippets on Commandlinefu.com by Nicolas Crocfer.

From the webpage:

Commandlinefu.com is the place to record awesome command-line snippets. This tool allows you to search and view the results into your terminal.

What a very clever idea!

Imagine if all the sed/awk scripts were collected from various archive sites, deduped and made searchable via such an interface!

Enjoy!

Spark: Parse CSV file and group by column value

Sunday, November 16th, 2014

Spark: Parse CSV file and group by column value by Mark Needham.

Mark parses a 1GB file that details 4 million crimes from the City of Chicago.

And he does it two ways: Using Unix and Spark.

Results? One way took more than 2 minutes, the other way, less than 10 seconds.

Place your bets with office staff and then visit Mark’s post for the results.

Spark for Data Science: A Case Study

Thursday, November 13th, 2014

Spark for Data Science: A Case Study by Casey Stella.

From the post:

I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:

  • A utility that specializes in a very common use-case
  • One utility to provide basic functionality from another utility

For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression:

find /path/to/root -exec grep -l "search phrase" {} \;

Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying. Turns out, there’s also power in not having to consult the man pages all the time. Another example, is the interaction between uniq and sort. uniq presumes sorted data. Of course, you need not sort your data using the Unix utility sort, but often you find yourself with a flow such as this:

sort filename.dat | uniq > uniq.dat

This is so common that a -u flag was added to sort to support this flow, like so:

sort -u filename.dat > uniq.dat

Now, obviously, uniq has utilities beyond simply providing distinct output from a stream, such as providing counts for each distinct occurrence. Even so, it’s nice for the situation where you don’t need the full power of uniq for the minimal functionality of uniq to be a part of sort. These simple motivating examples got me thinking:

  • Are there opportunities for folding another command’s basic functionality into another command as a feature (or flag) as in sort and uniq?
  • Can we answer the above question in a principled, data-driven way?

This sounds like a great challenge and an even greater opportunity to try out a new (to me) analytics platform, Apache Spark. So, I’m going to take you through a little journey doing some simple analysis and illustrate the general steps. We’re going to cover

  1. Data Gathering
  2. Data Engineering
  3. Data Analysis
  4. Presentation of Results and Conclusions

We’ll close with my impressions of using Spark as an analytics platform. Hope you enjoy!

All of that is just the setup for a very cool walk through a data analysis example with Spark.

Enjoy!

You can be a kernel hacker!

Friday, September 19th, 2014

You can be a kernel hacker! by Julia Evans.

From the post:

When I started Hacker School, I wanted to learn how the Linux kernel works. I’d been using Linux for ten years, but I still didn’t understand very well what my kernel did. While there, I found out that:

  • the Linux kernel source code isn’t all totally impossible to understand
  • kernel programming is not just for wizards, it can also be for me!
  • systems programming is REALLY INTERESTING
  • I could write toy kernel modules, for fun!
  • and, most surprisingly of all, all of this stuff was useful.

I hadn’t been doing low level programming at all – I’d written a little bit of C in university, and otherwise had been doing web development and machine learning. But it turned out that my newfound operating systems knowledge helped me solve regular programming tasks more easily.

Post by the same name as her presentation at Strange Loop 2014.

Another reason to study the Linux kernel: The closer to the metal your understanding, the more power you have over the results.

That’s true for the Linux kernel, machine learning algorithms, NLP, etc.

You can have a canned result prepared by someone else, which may be good enough, or you can bake something more to your liking.

I first saw this in a tweet by Felienne Hermans.

Update: Video of You can be a kernel hacker!

How is a binary executable organized? Let’s explore it!

Wednesday, September 10th, 2014

How is a binary executable organized? Let’s explore it! by Julia Evans.

From the post:

I used to think that executables were totally impenetrable. I’d compile a C program, and then that was it! I had a Magical Binary Executable that I could no longer read.

It is not so! Executable file formats are regular file formats that you can understand. I’ll explain some simple tools to start! We’ll be working on Linux, with ELF binaries. (binaries are kind of the definition of platform-specific, so this is all platform-specific.) We’ll be using C, but you could just as easily look at output from any compiled language.

I’ll be the first to admit that following Julia’s blog too closely carries the risk of changing you into a *nix kernel hacker.

I get a UTF-8 encoding error from her RSS feed so I have to follow her posts manually. Maybe the only thing that has saved me thus far. 😉

Seriously, Julia’s posts help you expand your knowledge of what is on other side of the screen.

Enjoy!

PS: Julia is demonstrating a world of subjects that are largely unknown to the casual user. Not looking for a subject does not protect you from a defect in that subject.

50 UNIX Commands

Thursday, August 28th, 2014

50 Most Frequently Used UNIX / Linux Commands (With Examples) by Ramesh Natarajan.

From the post:

This article provides practical examples for 50 most frequently used commands in Linux / UNIX.

This is not a comprehensive list by any means, but this should give you a jumpstart on some of the common Linux commands. Bookmark this article for your future reference.

Nothing new but handy if someone asks for guidance on basic Unix commands. Sending this list might save you some time.

Or, if you are a recruiter, edit out the examples and ask for an example of using each command. 😉

I first saw this in a tweet by Lincoln Mullen.

Cool Unix Tools (Is There Another Kind?)

Wednesday, August 13th, 2014

A little collection of cool unix terminal/console/curses tools by Kristof Kovacs.

From the webpage:

Just a list of 20 (now 28) tools for the command line. Some are little-known, some are just too useful to miss, some are pure obscure — I hope you find something useful that you weren’t aware of yet! Use your operating system’s package manager to install most of them. (Thanks for the tips, everybody!)

Great list, some familiar, some not.

I first saw the path to this in a tweet by Christophe Lalanne.

Bio-Linux 8 – Released July 2014

Thursday, July 31st, 2014

Bio-Linux 8 – Released July 2014

About Bio-Linux:

Bio-Linux 8 is a powerful, free bioinformatics workstation platform that can be installed on anything from a laptop to a large server, or run as a virtual machine. Bio-Linux 8 adds more than 250 bioinformatics packages to an Ubuntu Linux 14.04 LTS base, providing around 50 graphical applications and several hundred command line tools. The Galaxy environment for browser-based data analysis and workflow construction is also incorporated in Bio-Linux 8.

Bio-Linux 8 represents the continued commitment of NERC to maintain the platform, and comes with many updated and additional tools and libraries. With this release we support pre-prepared VM images for use with VirtualBox, VMWare or Parallels. Virtualised Bio-Linux will power the EOS Cloud, which is in development for launch in 2015.

You can install Bio-Linux on your machine, either as the only operating system, or as part of a dual-boot set-up which allows you to use your current system and Bio-Linux on the same hardware.

Bio-Linux can also run Live from a DVD or a USB stick. This runs in the memory of your machine and does not involve installing anything. This is a great, no-hassle way to try out Bio-Linux, demonstrate or teach with it, or to work with it when you are on the move.

Bio-Linux is built on open source systems and software, and so is free to to install and use. See What’s new on Bio-Linux 8. Also, check out the 2006 paper on Bio-Linux and open source systems for biologists.

Great news if you are handling biological data!

Not to mention being a good example of multiple delivery methods, you can use Bio-Linux 8 as your OS, run it from a VM, DVD or USB stick.

How is your software delivered?

Spy On Your CPU

Wednesday, May 14th, 2014

I can spy on my CPU cycles with perf! by Julia Evans.

From the post:

Yesterday I talked about using perf to profile assembly instructions. Today I learned how to make flame graphs with perf today and it is THE BEST. I found this because Graydon Hoare pointed me to Brendan Gregg’s excellent page on how to use perf.

Julia is up to her elbows in her CPU.

You can throw hardware at a problem or you can tune the program you are running on hardware.

Julia’s posts are about the latter.