Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 3, 2015

Principles of Model Checking

Filed under: Design,Modeling,Software,Software Engineering — Patrick Durusau @ 5:15 pm

Principles of Model Checking by Christel Baier and Joost-Pieter Katoen. Foreword by Kim Guldstrand Larsen.

From the webpage:

Our growing dependence on increasingly complex computer and software systems necessitates the development of formalisms, techniques, and tools for assessing functional properties of these systems. One such technique that has emerged in the last twenty years is model checking, which systematically (and automatically) checks whether a model of a given system satisfies a desired property such as deadlock freedom, invariants, or request-response properties. This automated technique for verification and debugging has developed into a mature and widely used approach with many applications. Principles of Model Checking offers a comprehensive introduction to model checking that is not only a text suitable for classroom use but also a valuable reference for researchers and practitioners in the field.

The book begins with the basic principles for modeling concurrent and communicating systems, introduces different classes of properties (including safety and liveness), presents the notion of fairness, and provides automata-based algorithms for these properties. It introduces the temporal logics LTL and CTL, compares them, and covers algorithms for verifying these logics, discussing real-time systems as well as systems subject to random phenomena. Separate chapters treat such efficiency-improving techniques as abstraction and symbolic manipulation. The book includes an extensive set of examples (most of which run through several chapters) and a complete set of basic results accompanied by detailed proofs. Each chapter concludes with a summary, bibliographic notes, and an extensive list of exercises of both practical and theoretical nature.

The present IT structure has shown itself to be as secure as a sieve. Do you expect the “Internet of Things” to be any more secure?

If you are interested in secure or at least less buggy software, more formal analysis is going to be a necessity. This title will give you an introduction to the field.

It dates from 2008 so some updating will be required.

I first saw this in a tweet by Reid Draper.

January 24, 2015

Tooling Up For JSON

Filed under: JSON,Software — Patrick Durusau @ 2:22 pm

I needed to explore a large (5.7MB) JSON file and my usual command line tools weren’t a good fit.

Casting about I discovered Jshon: Twice as fast, 1/6th the memory. From the home page for Jshon:

Jshon parses, reads and creates JSON. It is designed to be as usable as possible from within the shell and replaces fragile adhoc parsers made from grep/sed/awk as well as heavyweight one-line parsers made from perl/python. Requires Jansson

Jshon loads json text from stdin, performs actions, then displays the last action on stdout. Some of the options output json, others output plain text meta information. Because Bash has very poor nested datastructures, Jshon does not try to return a native bash datastructure as a tpical library would. Instead, Jshon provides a history stack containing all the manipulations.

The big change in the latest release is switching the everything from pass-by-value to pass-by-reference. In a typical use case (processing AUR search results for ‘python’) by-ref is twice as fast and uses one sixth the memory. If you are editing json, by-ref also makes your life a lot easier as modifications do not need to be manually inserted through the entire stack.

Jansson is described as: “…a C library for encoding, decoding and manipulating JSON data.” Usual ./configure, make, make install. Jshon has no configure or install script so just make and toss it somewhere that is in your path.

Under Bugs you will read: “Documentation is brief.”

That’s for sure!

Still, it has enough examples that with some practice you will find this a handy way to explore JSON files.

Enjoy!

January 23, 2015

DiRT Digital Research Tools

Filed under: Humanities,Software — Patrick Durusau @ 2:21 pm

DiRT Digital Research Tools

From the post:

The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

Interesting concept but the annotations are too brief to convey much information. Not to mention that within a category, say Conduct linguistic research or Transcribe handwritten or spoken texts, the entries have no apparent order, or should I say they are not arranged in alphabetical order by name. There may be some other order that is escaping me.

Some entries appear in the wrong categories, such as Xalan being found under Transcribe handwritten or spoken texts:

Xalan
Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Not what I think of when I think about transcribing handwritten or spoken texts. You?

I didn’t see a process for submitting corrections/comments on resources. I will check and post on this again. It could be a useful tool.

I first saw this in a tweet by Christophe Lalanne.

January 21, 2015

Emacs is My New Window Manager

Filed under: Editor,Software — Patrick Durusau @ 8:08 pm

Emacs is My New Window Manager by Howard Abrams.

From the post:

Most companies that employ me, hand me a “work laptop” as I enter the building. Of course, I do not install personal software and keep a clear division between my “work like” and my “real life.”

However, I also don’t like to carry two computers just to jot down personal notes. My remedy is to install a virtualization system and create a “personal” virtual machine. (Building cloud software as my day job means I usually have a few VMs running all the time.)

Since I want this VM to have minimal impact on my work, I base it on a “Server” version of Ubuntu. however, I like some graphical features, so my most minimal after market installation approach is:

Your mileage with Emacs is going to vary but this was too impressive to pass it unremarked.

I first saw this in a tweet by Christophe Lalanne.

October 29, 2014

Microsoft Garage

Filed under: Microsoft,Software — Patrick Durusau @ 1:43 pm

Microsoft Garage

From the webpage:

Hackers, makers, artists, tinkerers, musicians, inventors — on any given day you’ll find them in The Microsoft Garage.

We are a community of interns, employees, and teams from everywhere in the company who come together to turn our wild ideas into real projects. This site gives you early access to projects as they come to life.

Tell us what rocks, and what doesn’t.

Welcome to The Microsoft Garage.

Two projects (out of several) that I thought were interesting:

Collaborate

Host or join collaboration sessions on canvases that hold text cards and images. Ink on the canvas to organize your content, or manipulate the text and images using pinch, drag, and rotate gestures.

Floatz

Floatz, a Microsoft Garage project, lets you float an idea out to the people around you, and see what they think. Join in on any nearby Floatz conversation, or start a new one with a question, idea, or image that you share anonymously with people nearby.

Share your team spirit at a sporting event, or your awesome picture of the band at a rock concert. Ask the locals where to get a good meal when visiting an unfamiliar neighborhood. Speak your mind, express your feelings, and find out if there are others around you who feel the same way—all from the safety of an anonymous screen name in Floatz.

I understand the theory of asking for advice anonymously, but I assume that also means the person answering is anonymous as well. Yes? I don’t have a cellphone so I can’t test that theory. Comments?

On the other hand, if you are sharing data with known and unknown others, so you know which “anonymous” screen names to trust (for example, don’t trust name with FBI, CIA or NSA preceded or followed by hyphens), then Floatz could very useful.

I first saw this in Nat Torkington’s Four short links: 23 October 2014.

October 14, 2014

Cryptic genetic variation in software:…

Filed under: Bioinformatics,Bugs,Programming,Software — Patrick Durusau @ 6:18 pm

Cryptic genetic variation in software: hunting a buffered 41 year old bug by Sean Eddy.

From the post:

In genetics, cryptic genetic variation means that a genome can contain mutations whose phenotypic effects are invisible because they are suppressed or buffered, but under rare conditions they become visible and subject to selection pressure.

In software code, engineers sometimes also face the nightmare of a bug in one routine that has no visible effect because of a compensatory bug elsewhere. You fix the other routine, and suddenly the first routine starts failing for an apparently unrelated reason. Epistasis sucks.

I’ve just found an example in our code, and traced the origin of the problem back 41 years to the algorithm’s description in a 1973 applied mathematics paper. The algorithm — for sampling from a Gaussian distribution — is used worldwide, because it’s implemented in the venerable RANLIB software library still used in lots of numerical codebases, including GNU Octave. It looks to me that the only reason code has been working is that a compensatory “mutation” has been selected for in everyone else’s code except mine.

,,,

A bug hunting story to read and forward! Sean just bagged a forty-one (41) year old bug. What’s the oldest bug you have ever found?

When you reach the crux of the problem, you will understand why ambiguous, vague, incomplete and poorly organized standards annoy me to no end.

No guarantees of unambiguous results but if you need extra eyes on IT standards you know where to find me.

I first saw this in a tweet by Neil Saunders.

October 7, 2014

Software Security (MOOC, Starts October 13, 2014!)

Filed under: Cybersecurity,Programming,Security,Software — Patrick Durusau @ 7:21 pm

Software Security

From the post:

Weekly work done at your own pace and schedule by listening to lectures and podcasts, completing quizzes and exercises and peer evaluations. Estimated time commitment is 4 hours/week. Course runs for 9 weeks (ends December 5)


This MOOC introduces students to the discipline of designing, developing, and testing secure and dependable software-based systems. Students will be exposed to the techniques needed for the practice of effective software security techniques. By the end of the course, you should be able to do the following things:

  • Security risk management. Students will be able to assess the security risk of a system under development. Risk management will include the development of formal and informal misuse case and threat models. Risk management will also involve the utilization of security metrics.
  • Security testing. Students will be able to perform all types of security testing, including fuzz testing at each of these levels: white box, grey box, and black box/penetration testing.
  • Secure coding techniques. Students will understand secure coding practices to prevent common vulnerabilities from being injected into software.
  • Security requirements, validation and verification. Students will be able to write security requirements (which include privacy requirements). They will be able to validate these requirements and to perform additional verification practices of static analysis and security inspection.

This course is run by the Computer Science department at North Carolina State University.

Register

One course won’t make you a feared White/Black Hat but everyone has to start somewhere.

Looks like a great opportunity to learn about software security issues and to spot where subject identity techniques could help collate holes or fixes.

October 6, 2014

Bossies 2014: The Best of Open Source Software Awards

Filed under: Open Source,Software — Patrick Durusau @ 4:30 pm

Bossies 2014: The Best of Open Source Software Awards by Doug Dineley.

From the post:

If you hadn’t noticed, we’re in the midst of an incredible boom in enterprise technology development — and open source is leading it. You’re unlikely to find better proof of that dynamism than this year’s Best of Open Source Software Awards, affectionately known as the Bossies.

Have a look for yourself. The result of months of exploration and evaluation, plus the recommendations of many expert contributors, the 2014 Bossies cover more than 130 award winners in six categories:

(emphasis added)

Hard to judge the count because winners are presented one page at a time in each category. Not to mention that at least one winner appears in two separate categories.

Put into lists and sorted for review we find:

Open source applications (16)

Open source application development tools (42)

Open source big data tools (20)

Open source desktop and mobile software (14)

Open source data center and cloud software (19)

Open source networking and security software (9)

Creating the list presentation allows us to discover the actual count, allowing for entries with more than one software package mentioned, is 122 software packages.

BTW, Docker appears under application development tools and under data center and cloud software. Which should make the final count 121 different software packages. (You will have to check the entries at InfoWorld to verify that number.)

PS: The original presentation was in no discernible order. I put the lists into alphabetical order for ease of finding.

October 1, 2014

The Case for HTML Word Processors

Filed under: HTML,Software,Word Processing — Patrick Durusau @ 5:07 pm

The Case for HTML Word Processors by Adam Hyde.

From the post:

Making a case for HTML editors as stealth Desktop Word Processors…the strategy has been so stealthy that not even the developers realised what they were building.

We use all these over-complicated softwares to create Desktop documents. Microsoft Word, LibreOffice, whatever you like – we know them. They are one of the core apps in any users operating system. We also know that they are slow, unwieldy and have lots of quirky ways of doing things. However most of us just accept that this is the way it is and we try not to bother ourselves by noticing just how awful these softwares actually are.

So, I think it might be interesting to ask just this simple question – what if we used Desktop HTML Editors instead of Word Processors to do Word Processing? It might sound like an irrational proposition…Word Processors are, after all, for Word Processing. HTML editors are for creating…well, …HTML. But lets just forget that. What if we could allow ourselves to imagine we used an HTML editor for all our word processing needs and HTML replaces .docx and .odt and all those other over-burdened word processing formats. What do we win and what do we lose?

I’m not convinced about HTML word processors but Adam certainly starts with the right question:

What do we win and what do we lose? (emphasis added)

Line your favorite word processing format up along side HTML + CSS and calculate the wins and loses.

Not that HTML word processors can, should or will replace complex typography when appropriate, but how many documents need the full firepower of a modern word processor?

I would ask a similar question about authoring interfaces for topic maps. What is the least interface that can usefully produce a topic map?

The full bells and whistle versions are common now (I omit naming names) but should those be the only choices?

PS: As far as MS Word, I use “open,” “close,” “save,” “copy,” “paste,” “delete,” “hyperlink,” “bold,” and “italic.” What’s that? Nine operations? You experience may vary. 😉

I use LaTeX and another word processing application for most of my writing off the Web.

I first saw this in a tweet by Ivan Herman

September 28, 2014

Big Data – A curated list of big data frameworks, resources and tools

Filed under: BigData,Curation,Software — Patrick Durusau @ 4:28 pm

Big Data – A curated list of big data frameworks, resources and tools by Andrea Mostosi.

From the post:

“Big-data” is one of the most inflated buzzword of the last years. Technologies born to handle huge datasets and overcome limits of previous products are gaining popularity outside the research environment. The following list would be a reference of this world. It’s still incomplete and always will be.

Four hundred and eighty-four (484) resources by my count.

An impressive collection but HyperGraphDB is missing from this list.

Others that you can name off hand?

I don’t think the solution to the many partial “Big Data” lists of software, techniques and other resources is to create yet another list of the same. That would be a duplicated (and doomed) effort.

You?

Suggestions?

August 31, 2014

Web Data Commons Extraction Framework …

Filed under: Common Crawl,Software — Patrick Durusau @ 2:33 pm

Web Data Commons Extraction Framework for the Distributed Processing of CC Data by Robert Meusel.

Interested in a framework to process all the Common Crawl data?

From the post:

We used the extraction tool for example to extract a hyperlink graph covering over 3.5 billion pages and 126 billion hyperlinks from the 2012 CC corpus (over 100TB when uncompressed). Using our framework and 100 EC2 instances, the extraction took less than 12 hours and did costs less than US$ 500. The extracted graph had a size of less than 100GB zipped.

NSA level processing it’s not but then you are most likely looking for useful results, not data for the sake of filling up drives.

April 13, 2014

Cross-Scheme Management in VocBench 2.1

Filed under: Mapping,SKOS,Software,Vocabularies,VocBench — Patrick Durusau @ 1:54 pm

Cross-Scheme Management in VocBench 2.1 by Armando Stellato.

From the post:

One of the main features of the forthcoming VB2.1 will be SKOS Cross-Scheme Management

I started drafting some notes about cross-scheme management here: https://art-uniroma2.atlassian.net/wiki/display/OWLART/SKOS+Cross-Scheme+Management

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code. These notes will help users get acquainted with this feature in advance. Once completed, these will be included also in the manual of VB.

For the moment I’ve only written the introduction, some notes about data integrity and then described the checks carried upon the most dangerous operation: removing a concept from a scheme. Together with the VB development group, we will add more information in the next days. However, if you have some questions about this feature, you may post them here, as usual (or you may use the vocbench user/developer user groups).

A consistent set of operations and integrity checks for cross-scheme are already in place for this 2.1, which will be released in the next days.

VB2.2 will focus on other aspects (multi-project management), while we foresee a second wave of facilities for cross-scheme management (such as mass-move/add/remove actions, fixing utilities, analysis of dangling concepts, corrective actions etc..) for VB2.3

I agree that:

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code.

But I am less certain that following the integrity checks of SKOS is useful in all mappings between schemes.

If you are interested in such constraints, see Armando’s notes.

January 18, 2014

Licensing Your Code:…

Filed under: Licensing,Software — Patrick Durusau @ 9:10 pm

Licensing Your Code: GPL, BSD and Edvard Munch’s “The Scream” by Bruce Berriman.

From the post:

I have for some time considered changing to a more permissive license (with Caltech’s approval) for the Montage image mosaic engine, as the the current license forbids modification and redistribution of the code. My first attempt at navigating the many licenses available led me to believe that the subject of Edvard Munch’s “The Scream” was not oppressed by society but simply trying to find the best license for his software.

The license, of course, specifies the terms and conditions under which the software may be used, distributed and modified, and distinctions between licenses are important. Trouble is, there are so many of them. The Wikipedia page on Comparison of Free and Open Source Licenses licenses lists over 40 such licenses, each with varying degrees of approval from the free software community.

Not that I have any code to release but I assume the same issues apply to releasing data sets.

Do not to leave licensing of code or data as “understood” or to “later” in a project. Interests and levels of cooperation may vary over time.

Best to get data and code licensing details in writing when everyone is in a good humor.

December 27, 2013

Naming Software?

Filed under: Names,Software — Patrick Durusau @ 4:20 pm

When you are naming software, please do not use UPPERCASE letters to distinguish your software from another name.

Why?

Because the income generating imitations of search engines regularize case, even if the terms are double quoted.

Thus, if I search for TWITter*, the first hit, (drum roll) will be: “twitter.com”

Which If I have gone to the trouble of double quoting the text, very likely isn’t what I am looking for.

Choose what you think is a good name for your software but if you want people to find it, don’t be clever with case as though it makes a difference.

*TWITer: I don’t know if this is the name of a real project or not. If it is, my apologies.

November 21, 2013

How To Make Operating System by Yourself ?

Filed under: Programming,Software — Patrick Durusau @ 2:52 pm

How To Make Operating System by Yourself? by Jasmin Shah.

From the post:

Having an Operating System named after you, Sounds Amazing ! Isn’t it ?

Specially, after watching the IronMan Series, I am a die hard fan on J.A.R.V.I.S. Operating System.

So, let’s get started to make Operating System on our own. Once you are done with it, Don’t forget to share your operating system with me in the comment section below.

A bit oversold, ;-), but Jasmin walks the reader through using SuseStudio.com to create a complete operating system.

Uses?

Well, an appliance that saves first-time topic map users from installation purgatory is one idea.

Another idea would be to bundle content and/or tutorials with your topic map software.

Or to bundle databases/stores, etc. for a side by side comparison by users on the same content.

What would you put in your “operating system?”

October 14, 2013

Astrophysics Source Code Library…

Filed under: Algorithms,Astroinformatics,Programming,Software — Patrick Durusau @ 4:25 pm

Astrophysics Source Code Library: Where do we go from here? by Ms. Alice Allen.

From the introduction:

This week I am featuring a guest post by Ms. Alice Allen, the Editor of the the Astrophysics Source Code Library, an on-line index of codes used in astronomical research and that have been referenced in peer-reviewed journal articles. The post is essentially a talk given by Ms. Allen at the recent ADASS XXIII meeting. The impact of the ASCL is growing – a poster by Associate Editor Kim DuPrie at ADASS XXIII showed that there are now 700+ codes indexed, and quarterly page views have quadrupled from Q1/2011 to 24,ooo. Researchers are explicitly citing the code in papers that use the software, the ADS is linking software to papers about the code, and the ASCL is sponsoring workshops and discussion forums to identify obstacles to code sharing and propose solutions. And now, over to you, Alice: (emphasis in original)

Alice describes “success” as:

Success for us is this: you read a paper, want to see the code, click a link or two, and can look at the code for its underlying assumptions, methods, and computations. Alternately, if you want to investigate an unfamiliar domain, you can peruse the ASCL to see what codes have been written in that area.

Imagine having that level of “success” for data sets or data extraction source code.

October 11, 2013

Security Patch Bounties!

Filed under: Cybersecurity,NSA,Programming,Security,Software — Patrick Durusau @ 6:17 pm

Google Offers New Bounty Program For Securing Open-Source Software by Kelly Jackson Higgins.

From the post:

First there was the bug bounty, and now there’s the patch bounty: Google has launched a new program that pays researchers for security fixes to open-source software.

The new experimental program offers rewards from $500 to $3,133.70 for coming up with security improvements to key open-source software projects. It is geared to complement Google’s bug bounty program for Google Web applications and Chrome.

Google’s program initially will encompass network services OpenSSH, BIND, ISC DHCP; image parsers libjpeg, libjpeg-turbo, libpng, giflib; Chromium and Blink in Chrome; libraries for OpenSSh and zlib; and Linux kernel components, including KVM. Google plans to next include Web servers Apache httpd, lighttpd, ngix; SMTP services Sendmail, Postfix, Exim; and GCC, binutils, and llvm; and OpenVPN.

Industry concerns over security flaws in open-source code have escalated as more applications rely on these components. Michal Zalewski of the Google Security Team says the search engine giant initially considered a bug bounty program for open-source software, but decided to provide financial incentives for better locking down open-source code.

“We all benefit from the amazing volunteer work done by the open-source community. That’s why we keep asking ourselves how to take the model pioneered with our Vulnerability Reward Program — and employ it to improve the security of key third-party software critical to the health of the entire Internet,” Zalewski said in a blog post. “We thought about simply kicking off an OSS bug-hunting program, but this approach can easily backfire. In addition to valid reports, bug bounties invite a significant volume of spurious traffic — enough to completely overwhelm a small community of volunteers. On top of this, fixing a problem often requires more effort than finding it.”

So Google went with offering money for improving the security of open-source software “that goes beyond merely fixing a known security bug,” he blogged. “Whether you want to switch to a more secure allocator, to add privilege separation, to clean up a bunch of sketchy calls to strcat(), or even just to enable ASLR – we want to help.”

The official rules include this statement:

Reactive patches that merely address a single, previously discovered vulnerability will typically not be eligible for rewards.

I read that to mean that hardening the security of the covered projects may qualify for an award (must be accepted by the project first).

I wonder if Google will consider a bonus if the patch repairs an NSA induced security weakness?

October 7, 2013

The IMS Open Corpus Workbench (CWB)

Filed under: Corpora,Corpus Linguistics,Software — Patrick Durusau @ 3:46 pm

The IMS Open Corpus Workbench (CWB)

From the webpage:

The IMS Open Corpus Workbench (CWB) is a collection of open-source tools for managing and querying large text corpora (ranging from 10 million to 2 billion words) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

The first official open-source release of the Corpus Workbench (Version 3.0) is now available from this website. While many pages are still under construction, you can download release versions of the CWB, associated software and sample corpora. You will also find some documentation and other information in the different sections of this site.

If you are investigating large amounts of text, this may be the tool for you.

BTW, don’t miss: Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium by Stefan Evert and Andrew Hardie.

Abstract:

Corpus Workbench (CWB) is a widely-used architecture for corpus analysis, originally designed at the IMS, University of Stuttgart (Christ 1994). It consists of a set of tools for indexing, managing and querying very large corpora with multiple layers of word-level annotation. CWB’s central component is the Corpus Query Processor (CQP), an extremely powerful and efficient concordance system implementing a flexible two-level search language that allows complex query patterns to be specified both at the level of an individual word or annotation, and at the level of a fully- or partially-specified pattern of tokens. CWB and CQP are commonly used as the back-end for web-based corpus interfaces, for example, in the popular BNCweb interface to the British National Corpus (Hoffmann et al. 2008). CWB has influenced other tools, such as the Manatee software used in SketchEngine, which implements the same query language (Kilgarriff et al. 2004).

This paper details recent work to update CWB for the new century. Perhaps the most significant development is that CWB version 3 is now an open source project, licensed under the GNU General Public Licence. This change has substantially enlarged the community of developers and users and has enabled us to leverage existing open-source libraries in extending CWB’s capabilities. As a result, several key improvements were made to the CWB core: (i) support for multiple character sets, most especially Unicode (in the form of UTF-8), allowing all the world’s writing systems to be utilised within a CWB-indexed corpus; (ii) support for powerful Perl-style regular expressions in CQP queries, based on the open-source PCRE library; (iii) support for a wider range of OS platforms including Mac OS X, Linux, and Windows; and (iv) support for larger corpus sizes of up to 2 billion words on 64-bit platforms.

Outside the CWB core, a key concern is the user-friendliness of the interface. CQP itself can be daunting for beginners. However, it is common for access to CQP queries to be provided via a web-interface, supported in CWB version 3 by several Perl modules that give easy access to different facets of CWB/CQP functionality. The CQPweb front-end (Hardie forthcoming) has now been adopted as an integral component of CWB. CQPweb provides analysis options beyond concordancing (such as collocations, frequency lists, and keywords) by using a MySQL database alongside CQP. Available in both the Perl interface and CQPweb is the Common Elementary Query Language (CEQL), a simple-syntax set of search patterns and wildcards which puts much of
the power of CQP in a form accessible to beginning students and non-corpus-linguists.

The paper concludes with a roadmap for future development of the CWB (version 4 and above), with a focus on even larger corpora, full support for XML and dependency annotation, new types of query languages, and improved efficiency of complex CQP queries. All interested users are invited to help us shape the future of CWB by discussing requirements and contributing to the implementation of these features.

I have been using some commercial concordance software recently on standards drafts.

I need to give the IMS Open Corpus Workbench (CBW) a spin.

I would not worry about the 2 billion word corpus limitation.

That’s approximately 3,333.33 times the number of words in War and Peace by Leo Tolstoy. (I rounded the English translation word count up to 600,000 for an even number.)

September 11, 2013

Mikut Data Mining Tools Big List – Update

Filed under: Data Mining,Software — Patrick Durusau @ 5:14 pm

Mikut Data Mining Tools Big List – Update

From the post:

An update of the Excel table describing 325 recent and historical data mining tools is now online (Excel format), 31 of them were added since the last update in November 2012. These new updated tools include new published tools and some well-established tools with a statistical background.

Here is the full updated table of tools, (XLS format) which contains additional material to the paper

R. Mikut, M. Reischl: “Data Mining Tools“. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. DOI: 10.1002/widm.24., September/October 2011, Vol. 1

Please help the authors to improve this Excel table:
Contact: ralf.mikut@kit.edu

The post includes a table of the active tools with hyperlinks.

After looking at the spreadsheet, I was puzzled to find that “active and relevant” tools number only one hundred (100).

Does that seem low to you? Especially with the duplication of basic capabilities in different languages?

If you spot any obvious omissions, please send them to: ralf.mikut@kit.edu

August 22, 2013

Antepedia…

Filed under: Programming,Search Engines,Software — Patrick Durusau @ 6:18 pm

Antepedia Open Source Project Search Engine

From the “more” information link on the homepage:

Antepedia is the largest knowledge base of open source components with over 2 million current projects, and 1,000 more added daily. Antepedia continuously aggregates data from various directories that include Google Code, Apache, GitHub, Maven, and many more. These directories allow Antepedia to consistently grow as the world’s largest knowledge base of open source components.

Antepedia helps companies protect and secure their software assets, by providing a multi-source tracking solution that assists them in their management of open source governance. This implementation of Antepedia allows an organization to reduce licensing risks and security vulnerabilities in your open source component integration.

Antepedia is a public site that provides a way for anyone to search for an open source project. In cases where a project is not currently indexed in the knowledge base, you can manually submit that project, and help build upon the Antepedia knowledge base. These various benefits allow Antepedia to grow and offer the necessary functionalities, which provide the information you need, when you need it. With Antepedia you can assure that you have the newest & most relevant information for all your open source management and detection projects.

See also: Antepedia Reporter Free Edition for tracking open source projects.

If you like open source projects, take a look at: http://www.antelink.com/ (sponsor of Antepedia).

Do navigate on and off the Antelink homepage and watch the Antepedia counter increment, to the same number. 😉 I’m sure the total changes day to day but it was funny to see it reach the same number more than twice.

August 13, 2013

Source Code Search Engines [DYI Drones]

Filed under: Open Source,Programming,Software — Patrick Durusau @ 4:04 pm

Open Source Matters: 6 Source Code Search Engines You Can Use For Programming Projects by Saikat Basu.

From the post:

The Open source movement is playing a remarkable role in pushing technology and making it available to all. The success of Linux is also an example how open source can translate into a successful business model. Open source is pretty much mainstream now and in the coming years, it could have a major footprint across cutting edge educational technology and aerospace (think DIY drones).

Open source projects need all the help they can get. If not with funding, then with volunteers contributing to open source programming and free tools they can brandish. Search engines tuned with algorithms to find source code for programming projects are among the tools for the kit bag. While reusing code is a much debated topic in higher circles, they could be of help to beginner programmers and those trying to work their way through a coding logjam by cross-referencing their code. Here are six:

I don’t think any of these search engine will show up in comScore results. 😉

But they are search engines for a particular niche. And so free to optimize for their expected content, rather than trying to search everything. (Is there a lesson there?)

Which ones do you like best?

PS: On DYI drones, see: DIY DRONES – The Leading Community for Personal UAVs.

You may recall Abbie Hoffman saying in Steal this Book:

If you are around a military base, you will find it relatively easy to get your hands on an M-79 grenade launcher, which is like a giant shotgun and is probably the best self-defense weapon of all time. Just inquire discreetly among some long-haired soldiers.

Will DYI drones replace the M-79 grenade launcher as the “best” self-defense weapon?

April 10, 2013

Neo4j in Action – Software Metrics [Correction]

Filed under: Graphs,Neo4j,Software — Patrick Durusau @ 1:38 pm

Neo4j in Action – Software Metrics by Michael Hunger.

Michael walks through exploring a Java class as a graph.

Makes me curious about treating code as a graph in order to discover which classes call the same data?

BTW, the tweeted location: http://www.slideshare.net/mobile/jexp/class-graph-neo4j-and-software-metrics does not appear to work in a desktop browser.

I was able to locate: http://www.slideshare.net/jexp/class-graph-neo4j-and-software-metrics, which is the link I use above.

March 29, 2013

The Artful Business of Data Mining…

Filed under: Data Mining,Software,Statistics — Patrick Durusau @ 8:25 am

David Coallier has two presentations under that general title:

Distributed Schema-less Document-Based Databases

and,

Computational Statistics with Open Source Tools

Neither one of which is a “…death by powerpoint…” type presentation where the speaker reads text you can read for yourself.

Which is good, except that with minimal slides, you get an occasional example, names of software/techniques, but you have to fill in a lot of context.

A pointer to videos of either of these presentations would be greatly appreciated!

March 27, 2013

Database Landscape Map – February 2013

Filed under: Database,Graph Databases,Key-Value Stores,NoSQL,Software,SQL — Patrick Durusau @ 11:55 am

Database Landscape Map – February 2013 by 451 Research.

Database map

A truly awesome map of available databases.

Originated from Neither fish nor fowl: the rise of multi-model databases by Matthew Aslett.

Matthew writes:

One of the most complicated aspects of putting together our database landscape map was dealing with the growing number of (particularly NoSQL) databases that refuse to be pigeon-holed in any of the primary databases categories.

I have begun to refer to these as “multi-model databases” in recognition of the fact that they are able to take on the characteristics of multiple databases. In truth though there are probably two different groups of products that could be considered “multi-model”:

I think I understand the grouping from the key to the map but the ordering within groups, if meaningful, escapes me.

I am sure you will recognize most of the names but equally sure there will be some you can’t quite describe.

Enjoy!

March 8, 2013

Databases & Dragons

Filed under: MongoDB,Software — Patrick Durusau @ 5:17 pm

Databases & Dragons by Kristina Chodorow.

From the post:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to figure out what’s going wrong and fix it.

Should be of interest if you are developing MongoDB to go into production.

The idea should also be of interest if you are developing other software to go into production.

Most software (not all) works fine with expected values, other components responding correctly, etc.

But those are the very conditions your software may not encounter in production.

Where’s your “databases &amps dragons” test for your software?

March 3, 2013

Liferay / Marketplace

Filed under: Enterprise Integration,Open Source,Software — Patrick Durusau @ 2:14 pm

Liferay. Enterprise. Open Source. For Life.

Enterprise.

Liferay, Inc. was founded in 2004 in response to growing demand for Liferay Portal, the market’s leading independent portal product that was garnering industry acclaim and adoption across the world. Today, Liferay, Inc. houses a professional services group that provides training, consulting and enterprise support services to our clientele in the Americas, EMEA, and Asia Pacific. It also houses a core development team that steers product development.

Open Source.

Liferay Portal was, in fact, created in 2000 and boasts a rich open source heritage that offers organizations a level of innovation and flexibility unrivaled in the industry. Thanks to a decade of ongoing collaboration with its active and mature open source community, Liferay’s product development is the result of direct input from users with representation from all industries and organizational roles. It is for this reason, that organizations turn to Liferay technology for exceptional user experience, UI, and both technological and business flexibility.

For Life.

Liferay, Inc. was founded for a purpose greater than revenue and profit growth. Each quarter we donate to a number of worthy causes decided upon by our own employees. In the past we have made financial contributions toward AIDS relief and the Sudan refugee crisis through well-respected organizations such as Samaritan’s Purse and World Vision. This desire to impact the world community is the heart of our company, and ultimately the reason why we exist.

The Liferay Marketplace may be of interest for open source topic map projects.

There are only a few mentions of topic maps in the mailing list archives and none of those are recent.

Could be time to rekindle that conversation.

I first saw this at: Beyond Search.

February 24, 2013

usenet-legend

Filed under: Archives,Software — Patrick Durusau @ 7:56 pm

usenet-legend by Zach Beane

From the description:

This is Usenet Legend, an application for producing a searchable archive of an author’s comp.lang.lisp history from Ron Garrett’s large archive dump.

Zach mentions this in his post The Rob Warnock Lisp Usenet Archive but I thought it needed a separate post.

Making content more navigable is always a step in the right direction.

February 17, 2013

Finding tools vs. making tools:…

Filed under: Journalism,News,Software — Patrick Durusau @ 8:17 pm

Finding tools vs. making tools: Discovering common ground between computer science and journalism by Nick Diakopoulos.

From the post:

The second Computation + Journalism Symposium convened recently at the Georgia Tech College of Computing to ask the broad question: What role does computation have in the practice of journalism today and in the near future? (I was one of its organizers.) The symposium attracted almost 150 participants, both technologists and journalists, to discuss and debate the issues and to forge a multi-disciplinary path forward around that question.

Topics for panels covered the gamut, from precision and data journalism, to verification of visual content, news dissemination on social media, sports and health beats, storytelling with data, longform interfaces, the new economic landscape of content, and the educational needs of aspiring journalists. But what made these sessions and topics really pop was that participants on both sides of the computation and journalism aisle met each other in a conversational format where intersections and differences in the ways they viewed these topics could be teased apart through dialogue. (Videos of the sessions are online.)

While the panelists were all too civilized for any brawls to break out, mixing two disciplines as different as computing and journalism nonetheless did lead to some interesting discussions, divergences, and opportunities that I’d like to explore further here. Keeping these issues top-of-mind should help as this field moves forward.

Tool foragers and tool forgers

The following metaphor is not meant to be incendiary, but rather to illuminate two different approaches to tool innovation that seemed apparent at the symposium.

Imagine you live about 10,000 years ago, on the cusp of the Neolithic Revolution. The invention of agriculture is just around the corner. It’s spring and you’re hungry after the long winter. You can start scrounging around for berries and other tasty roots to feed you and your family — or you can stop and try to invent some agricultural implements, tools adapted to your own local crops and soil that could lead to an era of prosperity. If you take the inventive approach, you might fail, and there’s a real chance you’ll starve trying — while foraging will likely guarantee you another year of subsistence life.

What role does computation have in your field of practice?

February 15, 2013

“Document Design and Purpose, Not Mechanics”

Filed under: Documentation,Software — Patrick Durusau @ 1:51 pm

“Document Design and Purpose, Not Mechanics” by Stephen Turner.

From the post:

If you ever write code for scientific computing (chances are you do if you’re here), stop what you’re doing and spend 8 minutes reading this open-access paper:

Wilson et al. Best Practices for Scientific Computing. arXiv:1210.0530 (2012). (Direct link to PDF).

The paper makes a number of good points regarding software as a tool just like any other lab equipment: it should be built, validated, and used as carefully as any other physical instrumentation. Yet most scientists who write software are self-taught, and haven’t been properly trained in fundamental software development skills.

The paper outlines ten practices every computational biologist should adopt when writing code for research computing. Most of these are the usual suspects that you’d probably guess – using version control, workflow management, writing good documentation, modularizing code into functions, unit testing, agile development, etc. One that particularly jumped out at me was the recommendation to document design and purpose, not mechanics.

We all know that good comments and documentation is critical for code reproducibility and maintenance, but inline documentation that recapitulates the code is hardly useful. Instead, we should aim to document the underlying ideas, interface, and reasons, not the implementation. (emphasis added)

There is no shortage of advice (largely unread) on good writing practices. 😉

Stephen calling out the advice to “…document design and purpose, not mechanics” struck me as relevant to semantic integration solutions.

In both RDF and XTM topic maps, the same URI as an identifier is taken as identifying the same subject.

But that’s mechanics isn’t it? Just string to string comparison.

Mechanics are important but they are just mechanics.

Documenting the conditions for using a URI will help guide you or your successor to using the same URI the same way.

But that takes more than mechanics.

That takes “…document[ing] the underlying ideas, interface, and reasons, not the implementation.”

February 12, 2013

In Cyberwar, Software Flaws Are A Hot Commodity

Filed under: Security,Software — Patrick Durusau @ 6:20 pm

In Cyberwar, Software Flaws Are A Hot Commodity by Tom Gjelten.

Morning Edition ran a story today on firms that are finding software flaws and then selling them to the highest bidder.

A market that has exploded in the last two years.

If there is a market for the latest and greatest flaws, doesn’t the same exist for flaws in older software that hasn’t been upgraded?

Flaws that are “out there” and known, but scattered over email lists, web pages, blog posts, conference proceedings.

But not collated, verified and packaged together.

Just curious.

« Newer PostsOlder Posts »

Powered by WordPress