Archive for the ‘Software’ Category

Neo4j in Action – Software Metrics [Correction]

Wednesday, April 10th, 2013

Neo4j in Action – Software Metrics by Michael Hunger.

Michael walks through exploring a Java class as a graph.

Makes me curious about treating code as a graph in order to discover which classes call the same data?

BTW, the tweeted location: http://www.slideshare.net/mobile/jexp/class-graph-neo4j-and-software-metrics does not appear to work in a desktop browser.

I was able to locate: http://www.slideshare.net/jexp/class-graph-neo4j-and-software-metrics, which is the link I use above.

The Artful Business of Data Mining…

Friday, March 29th, 2013

David Coallier has two presentations under that general title:

Distributed Schema-less Document-Based Databases

and,

Computational Statistics with Open Source Tools

Neither one of which is a “…death by powerpoint…” type presentation where the speaker reads text you can read for yourself.

Which is good, except that with minimal slides, you get an occasional example, names of software/techniques, but you have to fill in a lot of context.

A pointer to videos of either of these presentations would be greatly appreciated!

Database Landscape Map – February 2013

Wednesday, March 27th, 2013

Database Landscape Map – February 2013 by 451 Research.

Database map

A truly awesome map of available databases.

Originated from Neither fish nor fowl: the rise of multi-model databases by Matthew Aslett.

Matthew writes:

One of the most complicated aspects of putting together our database landscape map was dealing with the growing number of (particularly NoSQL) databases that refuse to be pigeon-holed in any of the primary databases categories.

I have begun to refer to these as “multi-model databases” in recognition of the fact that they are able to take on the characteristics of multiple databases. In truth though there are probably two different groups of products that could be considered “multi-model”:

I think I understand the grouping from the key to the map but the ordering within groups, if meaningful, escapes me.

I am sure you will recognize most of the names but equally sure there will be some you can’t quite describe.

Enjoy!

Databases & Dragons

Friday, March 8th, 2013

Databases & Dragons by Kristina Chodorow.

From the post:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to figure out what’s going wrong and fix it.

Should be of interest if you are developing MongoDB to go into production.

The idea should also be of interest if you are developing other software to go into production.

Most software (not all) works fine with expected values, other components responding correctly, etc.

But those are the very conditions your software may not encounter in production.

Where’s your “databases &amps dragons” test for your software?

Liferay / Marketplace

Sunday, March 3rd, 2013

Liferay. Enterprise. Open Source. For Life.

Enterprise.

Liferay, Inc. was founded in 2004 in response to growing demand for Liferay Portal, the market’s leading independent portal product that was garnering industry acclaim and adoption across the world. Today, Liferay, Inc. houses a professional services group that provides training, consulting and enterprise support services to our clientele in the Americas, EMEA, and Asia Pacific. It also houses a core development team that steers product development.

Open Source.

Liferay Portal was, in fact, created in 2000 and boasts a rich open source heritage that offers organizations a level of innovation and flexibility unrivaled in the industry. Thanks to a decade of ongoing collaboration with its active and mature open source community, Liferay’s product development is the result of direct input from users with representation from all industries and organizational roles. It is for this reason, that organizations turn to Liferay technology for exceptional user experience, UI, and both technological and business flexibility.

For Life.

Liferay, Inc. was founded for a purpose greater than revenue and profit growth. Each quarter we donate to a number of worthy causes decided upon by our own employees. In the past we have made financial contributions toward AIDS relief and the Sudan refugee crisis through well-respected organizations such as Samaritan’s Purse and World Vision. This desire to impact the world community is the heart of our company, and ultimately the reason why we exist.

The Liferay Marketplace may be of interest for open source topic map projects.

There are only a few mentions of topic maps in the mailing list archives and none of those are recent.

Could be time to rekindle that conversation.

I first saw this at: Beyond Search.

usenet-legend

Sunday, February 24th, 2013

usenet-legend by Zach Beane

From the description:

This is Usenet Legend, an application for producing a searchable archive of an author’s comp.lang.lisp history from Ron Garrett’s large archive dump.

Zach mentions this in his post The Rob Warnock Lisp Usenet Archive but I thought it needed a separate post.

Making content more navigable is always a step in the right direction.

Finding tools vs. making tools:…

Sunday, February 17th, 2013

Finding tools vs. making tools: Discovering common ground between computer science and journalism by Nick Diakopoulos.

From the post:

The second Computation + Journalism Symposium convened recently at the Georgia Tech College of Computing to ask the broad question: What role does computation have in the practice of journalism today and in the near future? (I was one of its organizers.) The symposium attracted almost 150 participants, both technologists and journalists, to discuss and debate the issues and to forge a multi-disciplinary path forward around that question.

Topics for panels covered the gamut, from precision and data journalism, to verification of visual content, news dissemination on social media, sports and health beats, storytelling with data, longform interfaces, the new economic landscape of content, and the educational needs of aspiring journalists. But what made these sessions and topics really pop was that participants on both sides of the computation and journalism aisle met each other in a conversational format where intersections and differences in the ways they viewed these topics could be teased apart through dialogue. (Videos of the sessions are online.)

While the panelists were all too civilized for any brawls to break out, mixing two disciplines as different as computing and journalism nonetheless did lead to some interesting discussions, divergences, and opportunities that I’d like to explore further here. Keeping these issues top-of-mind should help as this field moves forward.

Tool foragers and tool forgers

The following metaphor is not meant to be incendiary, but rather to illuminate two different approaches to tool innovation that seemed apparent at the symposium.

Imagine you live about 10,000 years ago, on the cusp of the Neolithic Revolution. The invention of agriculture is just around the corner. It’s spring and you’re hungry after the long winter. You can start scrounging around for berries and other tasty roots to feed you and your family — or you can stop and try to invent some agricultural implements, tools adapted to your own local crops and soil that could lead to an era of prosperity. If you take the inventive approach, you might fail, and there’s a real chance you’ll starve trying — while foraging will likely guarantee you another year of subsistence life.

What role does computation have in your field of practice?

“Document Design and Purpose, Not Mechanics”

Friday, February 15th, 2013

“Document Design and Purpose, Not Mechanics” by Stephen Turner.

From the post:

If you ever write code for scientific computing (chances are you do if you’re here), stop what you’re doing and spend 8 minutes reading this open-access paper:

Wilson et al. Best Practices for Scientific Computing. arXiv:1210.0530 (2012). (Direct link to PDF).

The paper makes a number of good points regarding software as a tool just like any other lab equipment: it should be built, validated, and used as carefully as any other physical instrumentation. Yet most scientists who write software are self-taught, and haven’t been properly trained in fundamental software development skills.

The paper outlines ten practices every computational biologist should adopt when writing code for research computing. Most of these are the usual suspects that you’d probably guess – using version control, workflow management, writing good documentation, modularizing code into functions, unit testing, agile development, etc. One that particularly jumped out at me was the recommendation to document design and purpose, not mechanics.

We all know that good comments and documentation is critical for code reproducibility and maintenance, but inline documentation that recapitulates the code is hardly useful. Instead, we should aim to document the underlying ideas, interface, and reasons, not the implementation. (emphasis added)

There is no shortage of advice (largely unread) on good writing practices. ;-)

Stephen calling out the advice to “…document design and purpose, not mechanics” struck me as relevant to semantic integration solutions.

In both RDF and XTM topic maps, the same URI as an identifier is taken as identifying the same subject.

But that’s mechanics isn’t it? Just string to string comparison.

Mechanics are important but they are just mechanics.

Documenting the conditions for using a URI will help guide you or your successor to using the same URI the same way.

But that takes more than mechanics.

That takes “…document[ing] the underlying ideas, interface, and reasons, not the implementation.”

In Cyberwar, Software Flaws Are A Hot Commodity

Tuesday, February 12th, 2013

In Cyberwar, Software Flaws Are A Hot Commodity by Tom Gjelten.

Morning Edition ran a story today on firms that are finding software flaws and then selling them to the highest bidder.

A market that has exploded in the last two years.

If there is a market for the latest and greatest flaws, doesn’t the same exist for flaws in older software that hasn’t been upgraded?

Flaws that are “out there” and known, but scattered over email lists, web pages, blog posts, conference proceedings.

But not collated, verified and packaged together.

Just curious.

unicodex — High-performance Unicode Library (C++)

Monday, February 11th, 2013

unicodex — High-performance Unicode Library (C++) by Dustin Juliano.

From the post:

The following is a micro-optimized Unicode encoder/decoder for C++ that is capable of significant performance, sustaining 6 GiB/s for UTF-8 to UTF-16/32 on an AMD A8-3870 running in a single thread, and 8 GiB/s for UTF-16 to UTF-32. That would allow it to encode nearly the full English Wikipedia in approximately 6 seconds.

It maps between UTF-8, UTF-16, and UTF-32, and properly detects UTF-8 BOM and the UTF-16 BOMs. It has been unit tested with gigabytes of data and verified with binary analysis tools. Presently, only little-endian is supported, which should not pose any significant limitations on use. It is released under the BSD license, and can be used in both proprietary and free software projects.

The decoder is aware of malformed input and will raise an exception if the input sequence would cause a buffer overflow or is otherwise fatally incorrect. It does not, however, ensure that exact codepoints correspond to the specific Unicode planes; this is by design. The implementation has been designed to be robust against garbage input and specifically avoid encoding attacks.

One of those “practical” things that you may need for processing topic maps and or other digital information. ;-)

The Data Science Toolkit is now on Vagrant!

Tuesday, January 29th, 2013

The Data Science Toolkit is now on Vagrant! by Pete Warden.

From the post:

I have fallen in love with Vagrant over the last year, it turns an entire logical computer as a single unit of software. In simple terms, you can easily set up, run, and maintain a virtual machine image with all the frameworks and data dependencies pre-installed. You can wipe it, copy it to a different system, branch it to run experimental changes, keep multiple versions around, easily share it with other people, and quickly deploy multiple copies when you need to scale up. It’s as revolutionary as the introduction of distributed source control systems, you’re suddenly free to innovate because mistakes can be painlessly rolled back, and you can collaborate other people without worrying that anything will be overwritten.

Before I discovered Vagrant, I’d attempted to do something similar with my Data Science Toolkit package, distributing a VMware image of a full linux system with all the software and data it required pre-installed. It was a large download, and a lot of people used it, but the setup took more work than I liked. Vagrant solved a lot of the usability problems around downloading VMs, so I’ve been eager to create a compatible version of the DSTK image. I finally had a chance to get that working over the weekend, so you can create your own local geocoding server just by running:

vagrant box add dstk http://static.datasciencetoolkit.org/dstk_0.41.box

vagrant init

The box itself is almost 5GB with all the address data, so the download may take a while. Once it’s done go to http://localhost:8080 and you’ll see the web interface to the geocoding and unstructured data parsing functions.

Based on Oracle’s VirtualBox, this looks like a very cool way to distribute topic map applications with data.

Remember the Emulate Drug Dealers [Marketing Topic Maps] post?

I was very serious.

11 Interesting Releases From the First Weeks of January

Thursday, January 24th, 2013

11 Interesting Releases From the First Weeks of January by Alex Popescu.

Alex has collected links for eleven (11) interesting NoSQL releases in January 2013!

Visit Alex’s post. You won’t be disappointed.

Testling-CI

Wednesday, January 23rd, 2013

Announcing Testling-CI by Peteris Krumins.

From the post:

We at Browserling are proud to announce Testling-CI! Testling-CI lets you write continuous integration cross-browser tests that run on every git push!

testling-ci

There are a ton of modules on npm and github that aren’t just for node.js but for browsers, too. However, figuring out which browsers these modules work with can be tricky. It’s often that case that some module used to work in browsers but has accidentally stopped working because the developer hadn’t checked that their code still worked recently enough. If you use npm for frontend and backend modules, this can be particularly frustrating.

You will probably also be interested in: How to write Testling-CI tests.

A bit practical for me but with HTML5, browser-based interfaces are likely to become the default.

Useful to point out resources that will make it easier to cross-browser test topic map, browser-based interfaces.

13 Things People Hate about Your Open Source Docs

Saturday, January 12th, 2013

13 Things People Hate about Your Open Source Docs by Andy Lester.

From the post:

1. Lacking a good README or introduction

2. Docs not available online

3. Docs only available online

4. Docs not installed with the package

5. Lack of screenshots

6. Lack of realistic examples

7. Inadequate links and references

8. Forgetting the new user

9. Not listening to the users

10. Not accepting user input

11. No way to see what the software does without installing it

12. Relying on technology to do your writing

13. Arrogance and hostility toward the user

See Andy’s post for the details and suggestions on ways to improve.

Definitely worth a close read!

NewGenLib Open Source…Update! [Library software]

Wednesday, January 9th, 2013

NewGenLib Open Source releases version 3.0.4 R1 Update 1

From the blog:

The NewGenLib Open Source has announced the release of a new version 3.0.4 R1 Update 1. NewGenLib is an integrated library management system developed by Verus Solutions in conjunction with Kesaran Institute of Information and Knowledge Management in India. The software has the modules acquisitions, technical processing, serials management, circulation, administration, and MIS reports and OPAC.

What’s new in the Update?

This new update comes with a basket of additional features and enhancements, these include:

  • Full text indexing and searching of digital attachments: NewGenLib now uses Apache Tika. With this new tool not only catalogue records but their digital attachments and URLs are indexed. Now you can also search based on the content of your digital attachments
  • Web statistics: The software facilitates the generation of statistics on OPAC usage by having an allowance for Google Analytics code.
  • User ratings of Catalogue Records: An enhancement for User reviews is provided in OPAC. Users can now rate a catalogue record on a scale of 5 (Most useful to not useful). Also, one level of approval is added for User reviews and ratings. 
  • Circulation history download: Users can now download their Circulation history as a PDF file in OPAC

NewGenLib supports MARC 21 bibliographic data, MARC authority files, Z39.50 Client for federated searching. Bibliographic records can be exported in MODS 3.0 and AGRIS AP . The software is OAI-PMH compliant. NewGenLib has a user community with an online discussion forum.

If you are looking for potential topic map markets, the country population rank graphic from Wikipedia may help:
World Population Graph

Population isn’t everything but it should not be ignored either.

When is “Hello World,” Not “Hello World?”

Sunday, December 30th, 2012

To answer that question, you need to see the post: Travel NoSQL Application – Polyglot NoSQL with SpringData on Neo4J and MongoDB.

Just a quick sample:

 In this Fuse day, Tikal Java group decided to continue its previous Fuse research for NoSQL, but this time from a different point of view – SpringData and Polyglot persistence. We had two goals in this Fuse day: try working with more than one NoSQL in the same application, and also taking advantage of SpringData data access abstractions for NoSQL databases. We decided to take MongoDB and Neo4J as document DB, and Neo4J as graph database and put them behind an existing, classic and well known application – Spring Travel Sample application.

More than the usual “Hello World” example for languages and a bit more than for most applications.

It would be a nice trend to see more robust, perhaps “Hello World+” examples.

What is your enhanced “Hello World+” going to look like in 2013?

<ANGLES>

Friday, December 21st, 2012

<ANGLES>

From the homepage:

ANGLES is a research project aimed at developing a lightweight, online XML editor tuned to the needs of the scholarly text encoding community. By combining the model of intensive code development (the “code sprint”) with participatory design exercises, testing, and feedback from domain experts gathered at disciplinary conferences, ANGLES will contribute not only a working prototype of a new software tool but also another model for tool building in the digital humanities (the “community roadshow”).

Work on ANGLES began in November 2012.

We’ll have something to share very soon!

<ANGLES> is an extension of ACE:

ACE is an embeddable code editor written in JavaScript. It matches the features and performance of native editors such as Sublime, Vim and TextMate. It can be easily embedded in any web page and JavaScript application. ACE is maintained as the primary editor for Cloud9 IDE and is the successor of the Mozilla Skywriter (Bespin) project.

<ANGLES> code at Sourceforge.

I will be interested to see how ACE is extended. Just glancing at it this morning, it appears to be the traditional “display angle bang syntax” editor we all know so well.

What puzzles me is that we have been to the mountain of teaching users to be comfortable with raw XML markup and the results have not been promising.

As opposed to the experience with OpenOffice, MS Office, etc., which have proven that creating documents that are then expressed in XML, is within the range of ordinary users.

<ANGLES> looks like an interesting project but whether it brings XML editing within the reach of ordinary users is an open question.

If the XML editing puzzle is solved, perhaps it will have lessons for topic map editors.

Tails: The Amnesic Incognito Live System [Data Mining Where You Shouldn't]

Thursday, December 6th, 2012

Tails: The Amnesic Incognito Live System

From the webpage:

Privacy for anyone anywhere

Tails is a live DVD+ or live USB+ that aims at preserving your privacy and anonymity.

It helps you to:

  • use the Internet anonymously almost anywhere you go and on any computer: all connections to the Internet are forced to go through the Tor network;
  • leave no trace on the computer you’re using unless you ask it explicitly;
  • use state-of-the-art cryptographic tools to encrypt your files, email and instant messaging.

If you go data mining where you are unwanted, don’t use your regular user name and real address.

In fact, something like Tails might be in order.

Being mindful that possession of a USB stick with Tails on it could be considered a breach of security, should someone choose to take it that way.

Probably best to use a DVD disgiused as a Lady Gaga disk. ;-)

PS: Being mindful there is always the old fashioned hostile data mining, steal the drives: Swiss Spy Agency: Counter-Terrorism Secrets Stolen.

Consistency through semantics

Saturday, November 24th, 2012

Consistency through semantics by Oliver Kennedy.

From the post:

When designing a distributed systems, one of the first questions anyone asks is what kind of consistency model to use. This is a fairly nuanced question, as there isn’t really one right answer. Do you enforce strong consistency and accept the resulting latency and communication overhead? Do you use locking, and accept the resulting throughput limitations? Or do you just give up and use eventual consistency and accept that sometimes you’ll end up with results that are just a little bit out of sync.

It’s this last bit that I’d like to chat about today, because it’s actually quite common in a large number of applications. This model is present in everything from user-facing applications like Dropbox to SVN/GIT, to back-end infrastructure systems like Amazon’s Dynamo and Yahoo’s PNUTs. Often, especially in non-critical applications latency and throughput are more important than dealing with the possibility that two simultaneous updates will conflict.

So what happens when this dreadful possibility does come to pass? Clearly the system can’t grind to a halt, and often just randomly discarding one of these updates is the wrong thing to do. So what happens? The answer is common across most of these systems: They punt to the user.

Intuitively, this is the right thing to do. The user sees the big picture. The user knows best how to combine these operations. The user knows what to do, so on those rare occurrences where the system can’t handle it, the user can.

But why is this the right thing to do? What does the user have that the infrastructure doesn’t?

Take the time to read the rest of Oliver’s post.

He distinguishes rather nicely between applications and users.

Up to Date on Open Source Analytics

Tuesday, October 23rd, 2012

Up to Date on Open Source Analytics by Steve Miller.

Steve updates his Wintel laptop with the latest releases of open source analytics tools.

Steve’s list:

What’s on your list?

I first saw this mentioned at KDNuggets.

People and Process > Prescription and Technology

Monday, October 15th, 2012

Factors that affect software systems development project outcomes: A survey of research by Laurie McLeod and Stephen G. MacDonell. ACM Computing Surveys (CSUR) Surveys Volume 43 Issue 4, October 2011 Article No. 24, DOI: 10.1145/1978802.1978803.

Abstract:

Determining the factors that have an influence on software systems development and deployment project outcomes has been the focus of extensive and ongoing research for more than 30 years. We provide here a survey of the research literature that has addressed this topic in the period 1996–2006, with a particular focus on empirical analyses. On the basis of this survey we present a new classification framework that represents an abstracted and synthesized view of the types of factors that have been asserted as influencing project outcomes.

As with most survey work, particularly ones that summarize 177 papers, this is a long article, some fifty-six pages.

Let me try to tempt you into reading it by quoting from Angelica de Antonio’s review of it (in Computing Reviews, Oct. 2012):

An interesting discussion about the very concept of project outcome precedes the survey of factors, and an even more interesting discussion follows it. The authors stress the importance of institutional context in which the development project takes place (an aspect almost neglected in early research) and the increasing evidence that people and process have a greater effect on project outcomes than technology. A final reflection on what projects still continue to fail—even if we seem to know the factors that lead to success—raises a question on the utility of prescriptive factor-based research and leads to considerations that could inspire future research. (emphasis added)

Before you run off to the library or download a copy of the survey, two thoughts to keep in mind:

First, if “people and process” are more important than technology, where should we place the emphasis in projects involving semantics?

Second, if “prescription” can’t cure project failure, what are its chances with semantic diversity?

Thoughts?

Best Open Source[?]

Sunday, September 9th, 2012

Best Open Source

Are you familiar with this open source project listing site?

I ask because I encountered it today and while it looks interesting, I have the following concerns:

  • Entries are not dated (at least that I can find). Undated entries are not quite useless but nearly so.
  • Entries are not credited (no authors cited). Another strike against the entries.
  • Rating (basis for) isn’t clear.

It looks suspicious but it could be poor design.

Comments/suggestions?

Software fences

Saturday, September 8th, 2012

Software fences by John D. Cook.

A great quote from G. K. Chesterton.

Do reformers of every generation think their forefathers were fools or do reformers have a mistaken belief in “progress?”

Rather than saying “progress,” what if we say we know things “differently” than our forefathers?

Not better or worse, just differently.

Chaos Monkey released into the wild

Monday, July 30th, 2012

Chaos Monkey released into the wild by Cory Bennett and Ariel Tseitlin

From the post:

We have found that the best defense against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient. We are excited to make a long-awaited announcement today that will help others who embrace this approach.

We have written about our Simian Army in the past and we are now proud to announce that the source code for the founding member of the Simian Army, Chaos Monkey, is available to the community.

Do you think your applications can handle a troop of mischievous monkeys loose in your infrastructure? Now you can find out.

What is Chaos Monkey?

Chaos Monkey is a service which runs in the Amazon Web Services (AWS) that seeks out Auto Scaling Groups (ASGs) and terminates instances (virtual machines) per group. The software design is flexible enough to work with other cloud providers or instance groupings and can be enhanced to add that support. The service has a configurable schedule that, by default, runs on non-holiday weekdays between 9am and 3pm. In most cases, we have designed our applications to continue working when an instance goes offline, but in those special cases that they don’t, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond.

At first I was unsure if NetFlix is hopeful its competitors will run Chaos Monkey or if they really run it internally. ;-)

It certainly is a way to test your infrastructure. And quite possibly a selling point to clients who want more than projected or historical robustness.

Makes me curious, allowing for different infrastructures, how would you stress test a topic map installation?

And do so on a regular basis?

I first saw this at Alex Popescu’s myNoSQL.

OSCON 2012

Sunday, July 29th, 2012

OSCON 2012

Over 4,000 photographs were taken at the MS booth.

I wonder how many of them include Doug?

Drop by the OSCON website after you count photos of Doug.

Your efforts at topic mapping will improve from the experience.

From the OSCON site visit.

What you get from counting photos of Doug is unknown. ;-)

2013 FOSE Call for Presentations

Wednesday, July 18th, 2012

2013 FOSE Call for Presentations

From the webpage:

The FOSE Team welcomes presentation proposals that provide meaningful, actionable insights about technology development for government IT decision makers. We are looking for presentations that detail use-case studies, lessons learned, or emerging trends that improve operational efficiency and ignite innovation within and across government agencies. We are also specifically seeking Local, Federal and State Government Employees with stories to tell about their IT experiences and successes.

It’s a vendor show so prepare accordingly.

Lots of swag, hire booth help at the local modeling agency, etc.

You can’t make a sale if you don’t get their attention.

Deadline for submissions: September 14, 2012.

Topic map based solutions should make a good showing against traditional ETL (Extra Tax and Labor) solutions.

No charge for use the expansion of ETL (it probably isn’t even original but if not, I don’t remember the source).

Elements of Software Construction [MIT 6.005]

Saturday, June 23rd, 2012

Elements of Software Construction

Description:

This course introduces fundamental principles and techniques of software development. Students learn how to write software that is safe from bugs, easy to understand, and ready for change.

Topics include specifications and invariants; testing, test-case generation, and coverage; state machines; abstract data types and representation independence; design patterns for object-oriented programming; concurrent programming, including message passing and shared concurrency, and defending against races and deadlock; and functional programming with immutable data and higher-order functions.

From the MIT OpenCourseware site.

Of interest to anyone writing topic map software.

It should also be of interest to anyone evaluating how software shapes what subjects we can talk about and how we can talk about them. Data structures have the same implications.

Not necessary to undertake such investigations in all cases. There are many routine uses for common topic map software.

Being able to see when the edges of a domain don’t quite fit or there may be gaps in coverage for an information system, are necessary skills for non-routine cases.

NGINX: The Faster Web Server Alternative

Sunday, January 22nd, 2012

NGINX: The Faster Web Server Alternative by Steven J. Vaughan-Nichols.

From the post:

Picking a Web server used to be easy. If you ran a Windows shop, you used Internet Information Server (IIS); if you didn’t, you used Apache. No fuss. No muss. Now, though, you have more Web server choices, and far more decisions to make. One of the leading alternatives, the open-source NGINX, is now the number two Web server in the world, according to Netcraft, the Web server analytics company.

NGINX (pronounced “engine X”) is an open-source HTTP Web server that also includes mail services with an Internet Message Access Protocol (IMAP) and Post Office Protocol (POP) server. NGINX is ready to be used as a reverse proxy, too. In this mode NGINX is used to load balance among back-end servers, or to provide caching for a slower back-end server.

Companies like the online TV video on demand company Hulu use NGINX for its stability and simple configuration. Other users, such as Facebook and WordPress.com, use it because the web server’s asynchronous architecture gives it a small memory footprint and low resource consumption, making it ideal for handling multiple, actively changing Web pages.

That’s a tall order. According to NGINX’s principal architect Igor Sysoev, here’s how NGINX can support hundreds of millions of Facebook users.

I have to admit, NGINX being web server #2 caught my attention. Not to mention that it powers Hulu, Facebook and WordPress.com.

It has been years since I have even looked at an Apache web server (use to run them) but I do remember their stability and performance. And Apache would be my reflex recommendation for delivering web pages from a topic map application. Why re-write what already works?

Now NGINX comes along with impressive performance numbers and potentially new ways to organize on the server side.

Read the article, grab a copy of NGINX and let me know what you think.

Semantic Prediction?

Saturday, December 17th, 2011

Bug Prediction at Google

From the post:

I first read this post because of the claim that 50% of the code base at Google changes each month. So it says but perhaps more on that another day.

While reading the post I ran across the following:

In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range. Source-based metrics that could be used for prediction are how many lines of code, how many dependencies are required and whether those dependencies are cyclic. These can work well, but these metrics are going to flag our necessarily difficult, but otherwise innocuous code, as well as our hot spots. We’re only worried about our hot spots, so how do we only find them? Well, we actually have a great, authoritative record of where code has been requiring fixes: our bug tracker and our source control commit log! The research (for example, FixCache) indicates that predicting bugs from the source history works very well, so we decided to deploy it at Google.

How it works

In the literature, Rahman et al. found that a very cheap algorithm actually performs almost as well as some very expensive bug-prediction algorithms. They found that simply ranking files by the number of times they’ve been changed with a bug-fixing commit (i.e. a commit which fixes a bug) will find the hot spots in a code base. Simple! This matches our intuition: if a file keeps requiring bug-fixes, it must be a hot spot because developers are clearly struggling with it.

So, if that is true for software bugs, doesn’t it stand to reason the same is true for semantic impedance? That is when a user selects one result and then within some time window selects one different from the first, the reason is the first failed to meet their criteria for a match? Same intuition. Users change because the match, in their view, failed.

Rather than trying to “reason” about the semantics of terms, we can simply observe user behavior with regard to those terms in the aggregate. And perhaps even salt the mine as it were with deliberate cases to test theories about the semantics of terms.

I haven’t done the experiment, yet, but it is certainly something that I will be looking into this next year. I think it has definite potential and would scale.

Network Graph Visualizer

Wednesday, December 14th, 2011

Network Graph Visualizer

I ran across this at Github while tracking the progress of a project.

Although old hat (2008), I thought it worth pointing out as a graph that has one purpose, to keep developers informed of each others’ activities in a collaborative environment, and it does that very well.

I suspect there is a lesson there for topic map software (or even software in general).