Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 13, 2014

Spark for Data Science: A Case Study

Filed under: Linux OS,Spark — Patrick Durusau @ 7:28 pm

Spark for Data Science: A Case Study by Casey Stella.

From the post:

I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:

  • A utility that specializes in a very common use-case
  • One utility to provide basic functionality from another utility

For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression:

find /path/to/root -exec grep -l "search phrase" {} \;

Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying. Turns out, there’s also power in not having to consult the man pages all the time. Another example, is the interaction between uniq and sort. uniq presumes sorted data. Of course, you need not sort your data using the Unix utility sort, but often you find yourself with a flow such as this:

sort filename.dat | uniq > uniq.dat

This is so common that a -u flag was added to sort to support this flow, like so:

sort -u filename.dat > uniq.dat

Now, obviously, uniq has utilities beyond simply providing distinct output from a stream, such as providing counts for each distinct occurrence. Even so, it’s nice for the situation where you don’t need the full power of uniq for the minimal functionality of uniq to be a part of sort. These simple motivating examples got me thinking:

  • Are there opportunities for folding another command’s basic functionality into another command as a feature (or flag) as in sort and uniq?
  • Can we answer the above question in a principled, data-driven way?

This sounds like a great challenge and an even greater opportunity to try out a new (to me) analytics platform, Apache Spark. So, I’m going to take you through a little journey doing some simple analysis and illustrate the general steps. We’re going to cover

  1. Data Gathering
  2. Data Engineering
  3. Data Analysis
  4. Presentation of Results and Conclusions

We’ll close with my impressions of using Spark as an analytics platform. Hope you enjoy!

All of that is just the setup for a very cool walk through a data analysis example with Spark.

Enjoy!

September 19, 2014

You can be a kernel hacker!

Filed under: Linux OS — Patrick Durusau @ 6:58 pm

You can be a kernel hacker! by Julia Evans.

From the post:

When I started Hacker School, I wanted to learn how the Linux kernel works. I’d been using Linux for ten years, but I still didn’t understand very well what my kernel did. While there, I found out that:

  • the Linux kernel source code isn’t all totally impossible to understand
  • kernel programming is not just for wizards, it can also be for me!
  • systems programming is REALLY INTERESTING
  • I could write toy kernel modules, for fun!
  • and, most surprisingly of all, all of this stuff was useful.

I hadn’t been doing low level programming at all – I’d written a little bit of C in university, and otherwise had been doing web development and machine learning. But it turned out that my newfound operating systems knowledge helped me solve regular programming tasks more easily.

Post by the same name as her presentation at Strange Loop 2014.

Another reason to study the Linux kernel: The closer to the metal your understanding, the more power you have over the results.

That’s true for the Linux kernel, machine learning algorithms, NLP, etc.

You can have a canned result prepared by someone else, which may be good enough, or you can bake something more to your liking.

I first saw this in a tweet by Felienne Hermans.

Update: Video of You can be a kernel hacker!

September 10, 2014

How is a binary executable organized? Let’s explore it!

Filed under: Linux OS,Programming — Patrick Durusau @ 4:48 pm

How is a binary executable organized? Let’s explore it! by Julia Evans.

From the post:

I used to think that executables were totally impenetrable. I’d compile a C program, and then that was it! I had a Magical Binary Executable that I could no longer read.

It is not so! Executable file formats are regular file formats that you can understand. I’ll explain some simple tools to start! We’ll be working on Linux, with ELF binaries. (binaries are kind of the definition of platform-specific, so this is all platform-specific.) We’ll be using C, but you could just as easily look at output from any compiled language.

I’ll be the first to admit that following Julia’s blog too closely carries the risk of changing you into a *nix kernel hacker.

I get a UTF-8 encoding error from her RSS feed so I have to follow her posts manually. Maybe the only thing that has saved me thus far. 😉

Seriously, Julia’s posts help you expand your knowledge of what is on other side of the screen.

Enjoy!

PS: Julia is demonstrating a world of subjects that are largely unknown to the casual user. Not looking for a subject does not protect you from a defect in that subject.

August 28, 2014

50 UNIX Commands

Filed under: Linux OS — Patrick Durusau @ 2:42 pm

50 Most Frequently Used UNIX / Linux Commands (With Examples) by Ramesh Natarajan.

From the post:

This article provides practical examples for 50 most frequently used commands in Linux / UNIX.

This is not a comprehensive list by any means, but this should give you a jumpstart on some of the common Linux commands. Bookmark this article for your future reference.

Nothing new but handy if someone asks for guidance on basic Unix commands. Sending this list might save you some time.

Or, if you are a recruiter, edit out the examples and ask for an example of using each command. 😉

I first saw this in a tweet by Lincoln Mullen.

August 13, 2014

Cool Unix Tools (Is There Another Kind?)

Filed under: Linux OS — Patrick Durusau @ 11:00 am

A little collection of cool unix terminal/console/curses tools by Kristof Kovacs.

From the webpage:

Just a list of 20 (now 28) tools for the command line. Some are little-known, some are just too useful to miss, some are pure obscure — I hope you find something useful that you weren’t aware of yet! Use your operating system’s package manager to install most of them. (Thanks for the tips, everybody!)

Great list, some familiar, some not.

I first saw the path to this in a tweet by Christophe Lalanne.

July 31, 2014

Bio-Linux 8 – Released July 2014

Filed under: Bio-Linux,Bioinformatics,Linux OS — Patrick Durusau @ 7:29 am

Bio-Linux 8 – Released July 2014

About Bio-Linux:

Bio-Linux 8 is a powerful, free bioinformatics workstation platform that can be installed on anything from a laptop to a large server, or run as a virtual machine. Bio-Linux 8 adds more than 250 bioinformatics packages to an Ubuntu Linux 14.04 LTS base, providing around 50 graphical applications and several hundred command line tools. The Galaxy environment for browser-based data analysis and workflow construction is also incorporated in Bio-Linux 8.

Bio-Linux 8 represents the continued commitment of NERC to maintain the platform, and comes with many updated and additional tools and libraries. With this release we support pre-prepared VM images for use with VirtualBox, VMWare or Parallels. Virtualised Bio-Linux will power the EOS Cloud, which is in development for launch in 2015.

You can install Bio-Linux on your machine, either as the only operating system, or as part of a dual-boot set-up which allows you to use your current system and Bio-Linux on the same hardware.

Bio-Linux can also run Live from a DVD or a USB stick. This runs in the memory of your machine and does not involve installing anything. This is a great, no-hassle way to try out Bio-Linux, demonstrate or teach with it, or to work with it when you are on the move.

Bio-Linux is built on open source systems and software, and so is free to to install and use. See What’s new on Bio-Linux 8. Also, check out the 2006 paper on Bio-Linux and open source systems for biologists.

Great news if you are handling biological data!

Not to mention being a good example of multiple delivery methods, you can use Bio-Linux 8 as your OS, run it from a VM, DVD or USB stick.

How is your software delivered?

May 14, 2014

Spy On Your CPU

Filed under: Linux OS,Performance,Programming — Patrick Durusau @ 3:45 pm

I can spy on my CPU cycles with perf! by Julia Evans.

From the post:

Yesterday I talked about using perf to profile assembly instructions. Today I learned how to make flame graphs with perf today and it is THE BEST. I found this because Graydon Hoare pointed me to Brendan Gregg’s excellent page on how to use perf.

Julia is up to her elbows in her CPU.

You can throw hardware at a problem or you can tune the program you are running on hardware.

Julia’s posts are about the latter.

May 12, 2014

strace Wow Much Syscall

Filed under: Linux OS — Patrick Durusau @ 8:50 am

strace Wow Much Syscall by Brendan Gregg.

From the post:

I wouldn’t dare run strace(1) in production without seriously considering the consequences, and first trying the alternates. While it’s widely known (and continually rediscovered) that strace is an amazing tool, it’s much less known that it currently is – and always has been – dangerous.

strace is the system call tracer for Linux. It currently uses the arcane ptrace() (process trace) debugging interface, which operates in a violent manner: pausing the target process for each syscall so that the debugger can read state. And doing this twice: when the syscall begins, and when it ends.

With strace, this means pausing the target application for every syscall, twice, and context-switching between the application and strace. It’s like putting traffic metering lights on your application.

A great guide to strace, including a handy set of strace one-liners, references, “How To Learn strace,” and other goodies.

If you are interested in *nix internals and the potential of topic maps for the same, this is a great post.

I first saw this in post by Julia Evans.

May 10, 2014

Back to the future of databases

Filed under: Database,Linux OS,NoSQL — Patrick Durusau @ 6:37 pm

Back to the future of databases by Yin Wang.

From the post:

Why do we need databases? What a stupid question. I already heard some people say. But it is a legitimate question, and here is an answer that not many people know.

First of all, why can’t we just write programs that operate on objects? The answer is, obviously, we don’t have enough memory to hold all the data. But why can’t we just swap out the objects to disk and load them back when needed? The answer is yes we can, but not in Unix, because Unix manages memory as pages, not as objects. There are systems who lived before Unix that manage memory as objects, and perform object-granularity persistence. That is a feature ahead of its time, and is until today far more advanced than the current state-of-the-art. Here are some pictures of such systems:

Certainly thought provoking but how much of an advantage would object-granularity persistence have to offer before it could make headway against the install base of Unix?

The database field is certainly undergoing rapid research and development, with no clear path to a winner.

Will the same happen with OSes?

March 21, 2014

Linux Performance Analysis and Tools:…

Filed under: Linux OS — Patrick Durusau @ 9:48 am

Linux Performance Analysis and Tools: Brendan Gregg’s Talk at SCaLE 11x by Deirdré Straughan.

From the post:

The talk is about Linux Performance Analysis and Tools: specifically, observability tools and the methodologies to use them. Brendan gave a quick tour of over 20 Linux performance analysis tools, including advanced perf and DTrace for Linux, showing the reasons for using them. He also covered key methodologies, including a summary of the USE Method, to demonstrate best practices in using them effectively. There are many areas of the system that people don’t know to check, which are worthwhile to at least know about, especially for serious performance issues where you don’t want to leave any stone unturned. These methodologies – and exposure to the toolset – will help you understand why and how to do this. Brendan also introduced a few new of these key performance tuning tools during his presentation.
….

Be sure to watch the recorded presentation. You will also find this very cool graphic by Brendan Gregg.

Analysis and Tools

It’s suitable for printing and hanging on the wall as a quick reference.

No doubt you recognize some of these commands, but how many switches can you name for each one and what relationship, if any, does the information from one relate to another?

I first saw this in a tweet by nixCraft Linux Blog.

March 10, 2014

Open Source: Option of the Security Conscious

Filed under: Cybersecurity,Linux OS,Open Source,Security — Patrick Durusau @ 10:00 am

International Space Station attacked by ‘virus epidemics’ by Samuel Gibbs.

From the post:

Malware made its way aboard the International Space Station (ISS) causing “virus epidemics” in space, according to security expert Eugene Kaspersky.

Kaspersky, head of security firm Kaspersky labs, revealed at the Canberra Press Club 2013 in Australia that before the ISS switched from Windows XP to Linux computers, Russian cosmonauts managed to carry infected USB storage devices aboard the station spreading computer viruses to the connected computers.

…..

In May, the United Space Alliance, which oversees the running of if the ISS in orbit, migrated all the computer systems related to the ISS over to Linux for security, stability and reliability reasons.

If your or your company is at all concerned with security issues, open source software is the only realistic option.

Not that open source software has fewer bugs in fact on release, but because there is the potential for a large community of users to be seeking those bugs out and fixing them.

The recent Apple “goto fail” farce would not happen in an open source product. Some tester, intentionally or accidentally would use invalid credentials and so the problem would have surfaced.

If we are lucky, Apple had one tester who was also tasked with other duties and so we got what Apple chose to pay for.

This is not a knock against software companies that sell software for a profit. Rather it is a challenge to the current marketing of software for a profit.

Imagine that MS SQL Server was open source but commercial software. That is the source code is freely available but the licensing prohibits its use for commercial resale.

Do you really think that banks, insurance companies, enterprises are going to be grabbing source code and compiling it to avoid license fees?

I admit to having a low opinion of the morality of bank, insurance companies, etc., but they also have finely tuned senses of risk. Might save a few bucks in the short run, but the consequences of getting caught are quite severe.

So there would be lots of hobbyists hacking on, trying to improve, etc. MS SQL Server source code.

You know that hackers can no more keep a secret than a member of Congress, albeit hackers don’t usually blurt out secrets on the evening news. Every bug, improvement, etc. would become public knowledge fairly quickly.

MS could even make contribution of bugs, fixes as a condition of the open source download.

MS could continue to sell MS SQL Server as commercial software as before making it open source.

The difference would be instead of N programmers working to find and fix bugs, there would be N + Internet community working to find and fix bugs.

The other difference being the security conscious in military, national security, and government organizations, would not have to be planning migrations away from closed source software.

Post-Snowden, open source software is the only viable security option.

PS: Yes, I have seen the “we are not betraying you now” and/or “we betray you only when required by law to do so,” statements from various vendors.

I much prefer to not be betrayed at all.

You?

PS: There is another advantage to vendors from an all open source policy on software. Vendors worry about others copying their code, etc. With open source that should be easy enough to monitor and prove.

February 18, 2014

Kernel From Scratch

Filed under: Linux OS,Programming — Patrick Durusau @ 2:07 pm

Kernel From Scratch by David A. Dalrymple.

From the post:

One of my three major goals for Hacker School was to create a bootable, 64-bit kernel image from scratch, using only nasm and my text editor. Well, folks, one down, two to go.

The NASM/x64 assembly code is listed below, with copious comments for your pleasure. It comprises 136 lines including comments; 75 lines with comments removed. You may wish to refer to the Intel® 64 Software Developers’ Manual (16.5MB PDF), especially if you’re interested in doing something similar yourself.

Just in case you are looking for something more challenging that dialogue mapping. 😉

Just like natural languages, computer languages can represent subjects that are not explicitly identified. Probably don’t want subject identity overhead that close to the metal but for debugging purposes it might be worth investigating.

I first saw this in a tweet by Julia Evans.

February 17, 2014

Linux Kernel Map

Filed under: Interface Research/Design,Linux OS,Maps — Patrick Durusau @ 3:41 pm

Linux Kernel Map by Antony Peel.

A very good map of the Linux Kernel.

I haven’t tried to reproduce it here because the size reduction would make it useless.

In sufficient resolution, this would make a nice interface to usenet Linux postings.

I may have to find a print shop that can convert this into a folding map version.

Enjoy!

February 8, 2014

GNU Screen

Filed under: Linux OS — Patrick Durusau @ 4:46 pm

GNU Screen by Stephen Turner.

Speaking of useful things like R and Swirl reminded me of this post by Stephen:

This is one of those things I picked up years ago while in graduate school that I just assumed everyone else already knew about. GNU screen is a great utility built-in to most Linux installations for remote session management. Typing ‘screen’ at the command line enters a new screen session. Once launched, you can start processes in the screen session, detach the session with Ctrl-a+d, then reattach at a later point and resume where you left off. See this screencast I made below:

I’m not sure why but ‘screen’ has never come up that I recall.

Take a look at Stephen’s screencast and/or man screen.

January 6, 2014

LXC 1.0: Blog post series [0/10]

Filed under: Linux OS,Programming,Virtualization — Patrick Durusau @ 5:47 pm

LXC 1.0: Blog post series [0/10] by Stéphane Graber.

From the post:

So it’s almost the end of the year, I’ve got about 10 days of vacation for the holidays and a bit of time on my hands.

Since I’ve been doing quite a bit of work on LXC lately in prevision for the LXC 1.0 release early next year, I thought that it’d be a good use of some of that extra time to blog about the current state of LXC.

As a result, I’m preparing a series of 10 blog posts covering what I think are some of the most exciting features of LXC. The planned structure is:

Stéphane has promised to update the links on post 0/10 so keep that page bookmarked.

Whether you use LXC in practice or not, this a good enough introduction for you to ask probing questions.

And you may gain some insight into the identity issues that virtualization can give rise to.

December 10, 2013

Command Line One Liners

Filed under: Linux OS,Programming — Patrick Durusau @ 3:16 pm

Command Line One Liners Arturo Herrero.

From the webpage:

After my blog post about command line one-liners, many people want to contribute with their own commands.

What one-liner do you want to contribute for the holiday season?

November 2, 2013

Use multiple CPU Cores with your Linux commands…

Filed under: Linux OS,Parallelism — Patrick Durusau @ 5:53 pm

Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, grep, wc, etc.

From the post:

Here’s a common problem: You ever want to add up a very large list (hundreds of megabytes) or grep through it, or other kind of operation that is embarrassingly parallel? Data scientists, I am talking to you. You probably have about four cores or more, but our tried and true tools like grep, bzip2, wc, awk, sed and so forth are singly-threaded and will just use one CPU core. To paraphrase Cartman, “How do I reach these cores”? Let’s use all of our CPU cores on our Linux box by using GNU Parallel and doing a little in-machine map-reduce magic by using all of our cores and using the little-known parameter –pipes (otherwise known as –spreadstdin). Your pleasure is proportional to the number of CPUs, I promise. BZIP2 So, bzip2 is better compression than gzip, but it’s so slow! Put down the razor, we have the technology to solve this.

A very interesting post, particularly if you explore data with traditional Unix tools.

A comment to the post mentions that SSD is being presumed by the article.

Perhaps but learning the technique will be useful for when SSD is standard.

I first saw this in Pete Warden’s Five short links for Thursday, October 31, 2013.

October 29, 2013

Useful Unix/Linux One-Liners for Bioinformatics

Filed under: Bioinformatics,Linux OS,Text Mining,Uncategorized — Patrick Durusau @ 6:36 pm

Useful Unix/Linux One-Liners for Bioinformatics by Stephen Turner.

From the post:

Much of the work that bioinformaticians do is munging and wrangling around massive amounts of text. While there are some “standardized” file formats (FASTQ, SAM, VCF, etc.) and some tools for manipulating them (fastx toolkit, samtools, vcftools, etc.), there are still times where knowing a little bit of Unix/Linux is extremely helpful, namely awk, sed, cut, grep, GNU parallel, and others.

This is by no means an exhaustive catalog, but I’ve put together a short list of examples using various Unix/Linux utilities for text manipulation, from the very basic (e.g., sum a column) to the very advanced (munge a FASTQ file and print the total number of reads, total number unique reads, percentage of unique reads, most abundant sequence, and its frequency). Most of these examples (with the exception of the SeqTK examples) use built-in utilities installed on nearly every Linux system. These examples are a combination of tactics I used everyday and examples culled from other sources listed at the top of the page.

What one liners do you have laying about?

For what data sets?

September 30, 2013

Kali Linux: The Ultimate Penetration-Testing Tool?

Filed under: Cybersecurity,Linux OS,Security — Patrick Durusau @ 7:12 pm

Kali Linux: The Ultimate Penetration-Testing Tool? by David Strom.

From the post:

Kali.org’s version of Linux is an advanced penetration testing tool that should be a part of every security professional’s toolbox. Penetration testing involves using a variety of tools and techniques to test the limits of security policies and procedures. What Kali has done is collect just about everything you’ll need in a single CD. It includes more than 300 different tools, all of which are open source and available on GitHub. It’s incredibly well done, especially considering that it’s completely free of charge.

A new version, 1.0.5, was released earlier in September and contains more goodies than ever before, including the ability to install it on just about any Android phone, various improvements to its wireless radio support, near field communications, and tons more. Let’s take a closer look.

David gives a short summary of the latest release of Kali Linux.

A set of thumb drives should be on your short present list for the holiday season!

August 31, 2013

Working with PDFs…

Filed under: Linux OS,PDF — Patrick Durusau @ 3:43 pm

Working with PDFs Using Command Line Tools in Linux by William J. Turkel.

From the post:

We have already seen that the default assumption in Linux and UNIX is that everything is a file, ideally one that consists of human- and machine-readable text. As a result, we have a very wide variety of powerful tools for manipulating and analyzing text files. So it makes sense to try to convert our sources into text files whenever possible. In the previous post we used optical character recognition (OCR) to convert pictures of text into text files. Here we will use command line tools to extract text, images, page images and full pages from Adobe Acrobat PDF files.

A great post if you are working with PDF files.

August 11, 2013

The Linux Command Line (2nd Edition)

Filed under: Linux OS — Patrick Durusau @ 6:47 pm

The Linux Command Line (2nd Edition) by William Shotts.

From the webpage:

Designed for the new command line user, this 537-page volume covers the same material as LinuxCommand.org but in much greater detail. In addition to the basics of command line use and shell scripting, The Linux Command Line includes chapters on many common programs used on the command line, as well as more advanced topics.

Free PDF as well as print version from No Starch Press.

Downloads issues tonight but from my memory of the first edition, this is a must-download volume.

I first saw this in Christophe Lalanne’s A bag of tweets / July 2013.

August 8, 2013

Using the Unix Chainsaw:…

Filed under: Bioinformatics,Linux OS,Programming — Patrick Durusau @ 2:50 pm

Using the Unix Chainsaw: Named Pipes and Process Substitution by Vince Buffalo.

From the post:

It’s hard not to fall in love with Unix as a bioinformatician. In a past post I mentioned how Unix pipes are an extremely elegant way to interface bioinformatics programs (and do inter-process communication in general). In exploring other ways of interfacing programs in Unix, I’ve discovered two great but overlooked ways of interfacing programs: the named pipe and process substitution.

Why We Love Unix and Pipes

A few weeks ago I stumbled across a great talk by Gary Bernhardt entitled The Unix Chainsaw. Bernhardt’s “chainsaw” analogy is great: people sometimes fear doing work in Unix because it’s a powerful tool, and it’s easy to screw up with powerful tools. I think in the process of grokking Unix it’s not uncommon to ask “is this clever and elegant? or completely fucking stupid?”. This is normal, especially if you come from a language like Lisp or Python (or even C really). Unix is a get-shit-done system. I’ve used a chainsaw, and you’re simultaneously amazed at (1) how easily it slices through a tree, and (2) that you’re dumb enough to use this thing three feet away from your vital organs. This is Unix.
(…)

“The Unix Chainsaw.” Definitely a title for a drama about a group of shell hackers that uncover fraud and waste in large government projects. 😉

If you are not already a power user on *nix, this could be a step in that direction.

April 18, 2013

Parallella: The $99 Linux supercomputer

Filed under: Linux OS,Parallel Programming,Parallela,Parallelism — Patrick Durusau @ 1:23 pm

Parallella: The $99 Linux supercomputer by Steven J. Vaughan-Nichols.

From the post:

What Adapteva has done is create a credit-card sized parallel-processing board. This comes with a dual-core ARM A9 processor and a 64-core Epiphany Multicore Accelerator chip, along with 1GB of RAM, a microSD card, two USB 2.0 ports, 10/100/1000 Ethernet, and an HDMI connection. If all goes well, by itself, this board should deliver about 90 GFLOPS of performance, or — in terms PC users understand — about the same horse-power as a 45GHz CPU.

This board will use Ubuntu Linux 12.04 for its operating system. To put all this to work, the platform reference design and drivers are now available.

From Adapteva.

I wonder which will come first:

A really kick-ass 12 dimensional version of Asteroids?

or

New approaches to graph processing?

What do you think?

January 11, 2013

Getting Started with VM Depot

Filed under: Azure Marketplace,Cloud Computing,Linux OS,Microsoft,Virtual Machines — Patrick Durusau @ 7:35 pm

Getting Started with VM Depot by Doug Mahugh.

From the post:

Do you need to deploy a popular OSS package on a Windows Azure virtual machine, but don’t know where to start? Or do you have a favorite OSS configuration that you’d like to make available for others to deploy easily? If so, the new VM Depot community portal from Microsoft Open Technologies is just what you need. VM Depot is a community-driven catalog of preconfigured operating systems, applications, and development stacks that can easily be deployed on Windows Azure.

You can learn more about VM Depot in the announcement from Gianugo Rabellino over on Port 25 today. In this post, we’re going to cover the basics of how to use VM Depot, so that you can get started right away.

Doug outlines simple steps to get you rolling with the VM Depot.

Sounds a lot easier than trying to walk casual computer users through installation and configuration of software. I assume you could even load data onto the VMs.

Users just need to fire up the VM and they have the interface and data they want.

Sounds like a nice way to distribute topic map based information systems.

December 18, 2012

Bio-Linux 7 – Released November 2012

Filed under: Bio-Linux,Bioinformatics,Biomedical,Linux OS — Patrick Durusau @ 5:24 pm

Bio-Linux 7 – Released November 2012

From the webpage:

Bio-Linux 7 is a fully featured, powerful, configurable and easy to maintain bioinformatics workstation. Bio-Linux provides more than 500 bioinformatics programs on an Ubuntu Linux 12.04 LTS base. There is a
graphical menu for bioinformatics programs, as well as easy access to the Bio-Linux bioinformatics documentation system and sample data useful for testing programs. 

Bio-Linux 7 adds many improvements over previous versions, including the Galaxy analysis environment.  There are also various packages to handle new generation sequence data types.

You can install Bio-Linux on your machine, either as the only operating system, or as part of a dual-boot setup which allows you to use your current system and Bio-Linux on the same hardware.

Bio-Linux also runs Live from the DVD or a USB stick. This runs in the memory of your machine and does not involve installing anything. This is a great, no-hassle way to try out Bio-Linux, demonstrate or teach with it, or to work with when you are on the move.

Bio-Linux is built on open source systems and software, and so is free to to install and use. See What’s new on Bio-Linux 7. Also, check out the  2006 paper on Bio-Linux and open source systems for biologists.

Useful for exploring bioinformatics tools for Ubuntu.

But useful as well for considering how those tools could be used in data/text mining for other domains.

Not to mention the packaging for installation to DVD or USB stick.

Are there any topic map engines that are setup for burning to DVD or USB stick?

Packaging them that way with more than a minimal set of maps and/or data sets might be a useful avenue to explore.

September 15, 2012

Linux cheat sheets [Unix Sets Anyone?]

Filed under: Linux OS,Set Intersection,Sets — Patrick Durusau @ 3:07 pm

Linux cheat sheets

John D. Cook points to three new Linux cheat sheets from Peteris Krumins:

While investigating, I ran across:

Set Operations in the Unix Shell Simplified

From that post:

Remember my article on Set Operations in the Unix Shell? I implemented 14 various set operations by using common Unix utilities such as diff, comm, head, tail, grep, wc and others. I decided to create a simpler version of that post that just lists the operations. I also created a .txt cheat-sheet version of it and to make things more interesting I added an Awk implementation of each set op. If you want a detailed explanations of each operation, go to the original article.

June 13, 2012

Azure Changes Dress Code, Allows Tuxedos

Filed under: Cloud Computing,Linux OS,Marketing — Patrick Durusau @ 4:12 am

Azure Changes Dress Code, Allows Tuxedos by Robert Gelber.

Had it on my list to mention that Azure is now supporting Linux. Robert summarizes as follows:

Microsoft has released previews of upcoming services on their Azure cloud platform. The company seems focused on simplifying the transition of in-house resources to hybrid or external cloud deployments. Most notable is the ability for end users to create virtual machines with Linux images. The announcement will be live streamed later today at 1 p.m. PST.

Azure’s infrastructure will support CentOS 6.2, OpenSUSE 12.1, SUSE Linux Enterprise Server SP2 and Ubuntu 12.04 VM images. Microsoft has already updated their Azure site to reflect the compatibility. Other VM features include:

  • Virtual Hard Disks – Allowing end users to migrate data between on-site and cloud permises.
  • Workload Migration – Moving SQL Server, Sharepoint, Windows Server or Linux images to cloud services.
  • Common Virtualization Format – Microsoft has made the VHD file format freely available under an open specification promise.

Cloud offerings are changing, perhaps evolving would be a better word, at a rapid pace.

Although standardization may be premature, it is certainly a good time to start gathering information on services, vendors, in a way that cuts across the verbal jungle that is cloud computing PR.

Topic maps anyone?

March 18, 2012

The Pyed Piper

Filed under: Linux OS,Pyed Piper,Python — Patrick Durusau @ 8:52 pm

The Pyed Piper: A Modern Python Alternative to awk, sed and Other Unix Text Manipulation Utilities

Toby Rosen presents on Pyed Piper. Text processing for Python programmers.

Interesting that many movie studios use Python and Linux.

If you work in a Python environment, you probably want to give this a close look.

The project homepage.

February 3, 2012

Java, Python, Ruby, Linux, Windows, are all doomed

Filed under: Java,Linux OS,Parallelism,Python,Ruby — Patrick Durusau @ 5:02 pm

Java, Python, Ruby, Linux, Windows, are all doomed by Russell Winder.

From the description:

The Multicore Revolution gathers pace. Moore’s Law remains in force — chips are getting more and more transistors on a quarterly basis. Intel are now out and about touting the “many core chip”. The 80-core chip continues its role as research tool. The 48-core chip is now actively driving production engineering. Heterogeneity not homogeneity is the new “in” architecture.

Where Intel research, AMD and others cannot be far behind.

The virtual machine based architectures of the 1990s, Python, Ruby and Java, currently cannot cope with the new hardware architectures. Indeed Linux and Windows cannot cope with the new hardware architectures either. So either we will have lots of hardware which the software cannot cope with, or . . . . . . well you’ll just have to come to the session.

The slides are very hard to see so grab a copy at: http://www.russel.org.uk/Presentations/accu_london_2010-11-18.pdf

From the description: Heterogeneity not homogeneity is the new “in” architecture.

Is greater heterogeneity in programming languages coming?

January 22, 2012

Is it time to get rid of the Linux OS model in the cloud?

Filed under: Linux OS,Topic Map Software,Topic Map Systems — Patrick Durusau @ 7:32 pm

Is it time to get rid of the Linux OS model in the cloud?

From the post:

You program in a dynamic language, that runs on a JVM, that runs on a OS designed 40 years ago for a completely different purpose, that runs on virtualized hardware. Does this make sense? We’ve talked about this idea before in Machine VM + Cloud API – Rewriting The Cloud From Scratch, where the vision is to treat cloud virtual hardware as a compiler target, and converting high-level language source code directly into kernels that run on it.

As new technologies evolve the friction created by our old tool chains and architecture models becomes ever more obvious. Take, for example, what a team at USCD is releasing: a phase-change memory prototype  – a solid state storage device that provides performance thousands of times faster than a conventional hard drive and up to seven times faster than current state-of-the-art solid-state drives (SSDs). However, PCM has access latencies several times slower than DRAM.

This technology has obvious mind blowing implications, but an interesting not so obvious implication is what it says about our current standard datacenter stack. Gary Athens has written an excellent article, Revamping storage performance, spelling it all out in more detail:

Computer scientists at UCSD argue that new technologies such as PCM will hardly be worth developing for storage systems unless the hidden bottlenecks and faulty optimizations inherent in storage systems are eliminated.

Moneta, bypasses a number of functions in the operating system (OS) that typically slow the flow of data to and from storage. These functions were developed years ago to organize data on disk and manage input and output (I/O). The overhead introduced by them was so overshadowed by the inherent latency in a rotating disk that they seemed not to matter much. But with new technologies such as PCM, which are expected to approach dynamic random-access memory (DRAM) in speed, the delays stand in the way of the technologies’ reaching their full potential. Linux, for example, takes 20,000 instructions to perform a simple I/O request.

By redesigning the Linux I/O stack and by optimizing the hardware/software interface, researchers were able to reduce storage latency by 60% and increase bandwidth as much as 18 times.

The I/O scheduler in Linux performs various functions, such as assuring fair access to resources. Moneta bypasses the scheduler entirely, reducing overhead. Further gains come from removing all locks from the low-level driver, which block parallelism, by substituting more efficient mechanisms that do not.

Moneta performs I/O benchmarks 9.5 times faster than a RAID array of conventional disks, 2.8 times faster than a RAID array of flash-based solid-state drives (SSDs), and 2.2 times faster than fusion-io’s high-end, flash-based SSD.

Read the rest of the post and then ask yourself what architecture do you envision for a topic map application?

What if rather that moving data from one data structure to another, that the data structure addressed is identified by the data? If you wish to “see” the data as a table, it reports is location by table/column/row. If you wish to “see” the data as a matrix, it reports its matrix position. If you wish to “see” the data as a linked list, it can report its value, plus those ahead and behind.

It isn’t that difficult to imagine that data reports its location on a graph as the result of an operation. Perhaps storing its graph location for every graphing operation that is “run” using that data point.

True enough we need to create topic maps that run on conventional hardware/software but that isn’t an excuse to ignore possible futures.

Reminds me of a “grook” that I read years ago: “You will conquer the present suspiciously fast – if you smell of the future and stink of the past.” (Piet Hein but I don’t remember which book.)

« Newer Posts

Powered by WordPress