Ack 2.0 enhances the “grep for source code”
From the post:
The developers of ack have released version 2.0 of their grep-like tool optimised for searching source code. Described as “designed for programmers”, ack has been available since 2005 and is based on Perl’s regular expressions engine. It minimises false positives by ignoring version control directories by default and has flexible highlighting for matches. The newly released ack 2.0 introduces a more flexible identification system, better support for ackrc configuration files and the ability to read the list of files to be searched from stdin.
Its developers say that ack is designed to perform in a similar fashion to GNU grep but to improve on it when searching source code repositories. The programs web site at beyondgrep.com lists a number of reasons why programmers might want to use ack instead of grep when searching through source code, the least of which being that the ack command is quicker to type than grep. But ack brings a lot more to the table than that as it is specifically designed to deal with source code and understand a large number of programming languages and tools such as build systems and version control software.
Is there any ongoing discussion of semantic searching for source code?
At least from searching I have done, there seems to be a gap in that area. Given the semi-structured nature of source code and the controlled semantics of programming languages, I had expected to find more but maybe I’m using the wrong search terms. (Where is semantic search when you need it?)
Over the last year I found plenty of sophisticated source code searching/analysis tools. Here are a few.
http://www.rigi.csc.uvic.ca/Pages/download.html
https://www.scitools.com/
http://www.headwaysoftware.com/products/structure101/structural-analysis.php
I also found a few semantic search tools that may be applicable with some configuration or customization.
http://www.ontotext.com/kim/tailoring-kim
http://www.io-informatics.com/products/index.html
http://www.iqser.com/web/guest/iqser-gin-weaver
This may combine both semantics and source code analysis but it does not look too active.
http://latino.sourceforge.net/
Comment by clemp — April 23, 2013 @ 10:53 pm
Great pointers! Thanks!
After sleeping on it, I would separate out the semantics of the operations from the semantics of the variables and data.
For example, if a data center has a collection of Pig scripts, how would you search them for write operations to a particular data set? Should be in documentation but would seem to be more reliable to search the code itself.
Comment by Patrick Durusau — April 24, 2013 @ 5:29 am
I agree. There are probably several other dimensions that would be helpful to explore separately as well. One thing I’ve learned from years of working on large, configurable industrial systems is that the documentation is never up to date and only sometimes accurate at any point in time. In fact, nobody who supports a system trusts the documentation for more than a few months after the system is installed. It’s helpful as background information but only as a supplement to the source code.
Comment by clemp — April 24, 2013 @ 7:49 pm