Archive for the ‘Researchers’ Category

Every Congressional Research Service Report – 8,000+ and growing!

Wednesday, October 19th, 2016

From the homepage:

We’re publishing reports by Congress’s think tank, the Congressional Research Service, which provides valuable insight and non-partisan analysis of issues of public debate. These reports are already available to the well-connected — we’re making them available to everyone for free.

From the about page:

Congressional Research Service reports are the best way for anyone to quickly get up to speed on major political issues without having to worry about spin — from the same source Congress uses.

CRS is Congress’ think tank, and its reports are relied upon by academics, businesses, judges, policy advocates, students, librarians, journalists, and policymakers for accurate and timely analysis of important policy issues. The reports are not classified and do not contain individualized advice to any specific member of Congress. (More: What is a CRS report?)

Until today, CRS reports were generally available only to the well-connected.

Now, in partnership with a Republican and Democratic member of Congress, we are making these reports available to everyone for free online.

A coalition of public interest groups, journalists, academics, students, some Members of Congress, and former CRS employees have been advocating for greater access to CRS reports for over twenty years. Two bills in Congress to make these reports widely available already have 10 sponsors (S. 2639 and H.R. 4702, 114th Congress) and we urge Congress to finish the job.

This website shows Congress one vision of how it could be done.

What does include? includes 8,255 CRS reports. The number changes regularly.

It’s every CRS report that’s available on Congress’s internal website.

We redact the phone number, email address, and names of virtually all the analysts from the reports. We add disclaimer language regarding copyright and the role CRS reports are intended to play. That’s it.

If you’re looking for older reports, our good friends at may have them.

We also show how much a report has changed over time (whenever CRS publishes an update), provide RSS feeds, and we hope to add more features in the future. Help us make that possible.

To receive an email alert for all new reports and new reports in a particular topic area, use the RSS icon next to the topic area titles and a third-party service, like IFTTT, to monitor the RSS feed for new additions.

This is major joyful news for policy wonks and researchers everywhere.

A must bookmark and contribute to support site!

My joy was alloyed by the notice:

We redact the phone number, email address, and names of virtually all the analysts from the reports. We add disclaimer language regarding copyright and the role CRS reports are intended to play. That’s it.

The privileged, who get the CRS reports anyway, have that information?

What is the value in withholding it from the public?

Support the project but let’s put the public on an even footing with the privileged shall we?

Software Carpentry Bug BBQ (June 13th, 2016)

Sunday, June 5th, 2016

Software Carpentry Bug BBQ

From the post:

Software Carpentry is having a Bug BBQ on June 13th

Software Carpentry is aiming to ship a new version (5.4) of the Software Carpentry lessons by the end of June. To help get us over the finish line we are having a Bug BBQ on June 13th to squash as many bugs as we can before we publish the lessons. The June 13th Bug BBQ is also an opportunity for you to engage with our world-wide community. For more info about the event, read-on and visit our Bug BBQ website.

How can you participate? We’re asking you, members of the Software Carpentry community, to spend a few hours on June 13th to wrap up outstanding tasks to improve the lessons. Ahead of the event, the lesson maintainers will be creating milestones to identify all the issues and pull requests that need to be resolved we wrap up version 5.4. In addition to specific fixes laid out in the milestones, we also need help to proofread and bugtest the

Where will this be? Join in from where you are: No need to go anywhere – if you’d like to participate remotely, start by having a look at the milestones on the website to see what tasks are still open, and send a pull request with your ideas to the corresponding repo. If you’d like to get together with other people working on these lessons live, we have created this map for live sites that are being organized. And if there’s no site listed near you, organize one yourself and let us know you are doing that here so that we can add your site to the map!

The Bug BBQ is going to be a great chance to get the community together, get our latest lessons over the finish line, and wrap up a product that gives you and all our contributors credit for your hard work with a citable object – we will be minting a DOI for this on publication.

A community BBQ that is open to everyone, dietary restrictions or not!

And the organizers have removed distance as a consideration for “attending.”

For those of us on non-BBQ diets, a unique opportunity to participate with others in the community for a worthy cause.

Mark your calendars today!

Reproducible Research Resources for Research(ing) Parasites

Friday, June 3rd, 2016

Reproducible Research Resources for Research(ing) Parasites by Scott Edmunds.

From the post:

Two new research papers on scabies and tapeworms published today showcase a new collaboration with This demonstrates a new way to share scientific methods that allows scientists to better repeat and build upon these complicated studies on difficult-to-study parasites. It also highlights a new means of writing all research papers with citable methods that can be updated over time.

While there has been recent controversy (and hashtags in response) from some of the more conservative sections of the medical community calling those who use or build on previous data “research parasites”, as data publishers we strongly disagree with this. And also feel it is unfair to drag parasites into this when they can teach us a thing or two about good research practice. Parasitology remains a complex field given the often extreme differences between parasites, which all fall under the umbrella definition of an organism that lives in or on another organism (host) and derives nutrients at the host’s expense. Published today in GigaScience are articles on two parasitic organisms, scabies and on the tapeworm Schistocephalus solidus. Not only are both papers in parasitology, but the way in which these studies are presented showcase a new collaboration with that provides a unique means for reporting the Methods that serves to improve reproducibility. Here the authors take advantage of their open access repository of scientific methods and a collaborative protocol-centered platform, and we for the first time have integrated this into our submission, review and publication process. We now also have a groups page on the portal where our methods can be stored.

A great example of how sharing data advances research.

Of course, that assumes that one of your goals is to advance research and not solely yourself, your funding and/or your department.

Such self-centered as opposed to research-centered individuals do exist, but I would not malign true parasites by describing them as such, even colloquially.

The days of science data hoarders are numbered and one can only hope that the same is true for the “gatekeepers” of humanities data, manuscripts and artifacts.

The only known contribution of hoarders or “gatekeepers” has been to the retarding of their respective disciplines.

Given the choice of advancing your field along with yourself, or only yourself, which one will you choose?

Innovation Down Under!

Sunday, December 6th, 2015

Twenty-nine “Welcome to the Ideas Boom” one-pagers from

I saw this in a tweet by Leanne O’Donnell thanking @stilgherrian for putting these in one PDF file.

Hard to say what the results will be but certainly more successful than fattening the usual suspects. (NSF: BD Spokes (pronounced “hoax”) initiative)

Watch for the success factors so you can build upon the experience Australia has with its new approaches.

How to Read a Paper

Saturday, October 17th, 2015

How to Read a Paper by S. Keshav.


Researchers spend a great deal of time reading research papers. However, this skill is rarely taught, leading to much wasted effort. This article outlines a practical and efficient three-pass method for reading research papers. I also describe how to use this method to do a literature survey.

Sean Cribbs mentions this paper in: The Refreshingly Rewarding Realm of Research Papers but it is important enough for a separate post.

You should keep a copy of it at hand until the three-pass method becomes habit.

Other resources that Keshav mentions:

T. Roscoe, Writing Reviews for Systems Conferences

H. Schulzrinne, Writing Technical Articles

G.M. Whitesides, Whitesides’ Group: Writing a Paper (updated URL)

All three are fairly short and well worth your time to read and re-read.

Experienced writers as well!

After more than thirty years of professional writing I still benefit from well-written writing/editing advice.

Data Journalism Tools

Tuesday, October 13th, 2015

Data Journalism Tools

From the webpage:

This Silk is a structured database listing tools and resources that (data) journalists might want to include in their toolkit. We tried to cover the main steps of the ddj process: from data collection and scraping to data cleaning and enhancement; from analysis to data visualization and publishing. We’re trying to showcase especially tools that are free/freemium and open source, but you will find a bit of everything.

This Silk is updated regularly: we have collected a list of hundreds of tools, which we manually tag (are they open source tools? Free? for interactive datavizs?). Make sure you follow this Silk, so you won’t miss an update!

As of 13 October 2015, there are 120 tools listed.

Graphics have a strong showing but not overly so. There are tools for collaboration, web scrapping, writing, etc.

Pitched toward journalists but librarians, researchers, bloggers, etc., will all find tools of interest at this site.

The Economics of Reproducibility in Preclinical Research

Wednesday, June 10th, 2015

The Economics of Reproducibility in Preclinical Research by Leonard P. Freedman, Iain M. Cockburn, Timothy S. Simcoe. PLOS Published: June 9, 2015 DOI: 10.1371/journal.pbio.1002165.


Low reproducibility rates within life science research undermine cumulative knowledge production and contribute to both delays and costs of therapeutic drug development. An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone. We outline a framework for solutions and a plan for long-term improvements in reproducibility rates that will help to accelerate the discovery of life-saving therapies and cures.

The authors find four categories of irreproducibility:

(1) study design, (2) biological reagents and reference materials, (3) laboratory protocols, and (4) data analysis and reporting.

But only address “(1) study design, (2) biological reagents and reference materials.”

Once again, documentation doesn’t make the cut. 🙁

I find that curious because judging just from the flood of social media data, people in general spend a good part of every day capturing and transmitting information. Where is the pain point between that activity and formal documentation that makes the later into an anathema?

Documentation, among other things, could lead to higher reproducibility rates for medical and other research areas, to say nothing of saving data scientists time puzzling out data and/or programmers debugging old code.

“At least they don’t seal the fire exits”…

Friday, June 5th, 2015

“At least they don’t seal the fire exits” Or why unpaid internships are BS by Auriel M. V. Fournier.

From the post:

I’m flipping through a job board, scanning for post docs, dreamily reading field technician posts and there they are

Unpaid internship in Amazing Place A

Unpaid technician working with Cool Species B

Some are obvious, and put their unpaid status it in the title, others you have to dig through the fine print, before you are hit you over the head with what a ‘unique oppurtunity this internship is’ how rare the animal or system, and how you should smile and love that you are not going to get paid, and might even have to pay them for the pleasure of working for them.

Every time I see one of these posts my skin crawls, my heart races, my eyes narrow. These jobs anger me, at my core, and I think we as a scientific community need to stop doing this to ourselves and our young scientists.

We get up and talk about how we need diversity in our field (whatever field it is, for me its wildlife ecology) how we need people from all backgrounds, cultures, creeds and races. Then we create positions that only those who come from means, and continue to have them can take. We are shooting ourselves in the foot by excluding people from getting into science. How is someone who has student loans (most students do), someone who has no financial support, someone with a child, or a sick parent, no family to buy a plane ticket for them, or any other kind of life situation supposed to take these positions? How?

Take the time to read Auriel’s post, whether you use unpaid internships or not. It’s not long and worth the read. I will wait for you to come back before continuing….back so soon?

Abstract just a little bit from Auriel’s post and think about her main point separate and apart from the specifics of unpaid internships. Is it that unpaid work can be undertaken only by those who can survive without getting paid for that work? Yes?

If you agree with that, how many unpaid editors, unpaid journal board members, unpaid peer reviewers, unpaid copy editors, unpaid program unit chairs, unpaid presenters, unpaid organizational officers, etc., do you think exist in academic circles?

Hmmm, do you think the people in all those unpaid positions still have to make ends meet at the end of the month? Take care of expenses out of their own pockets for travel and other expenses? Do you think the utility company cares whether you have done a good job as a volunteer peer reviewer this past month?

The same logic that Auriel uses in her post applies to all those unpaid positions as well. Not that academic groups can make all unpaid volunteer positions paid but any unpaid or underpaid position means you have made choices about who can hold those positions.

Replication in Psychology?

Friday, May 1st, 2015

First results from psychology’s largest reproducibility test by Monya Baker.

From the post:

An ambitious effort to replicate 100 research findings in psychology ended last week — and the data look worrying. Results posted online on 24 April, which have not yet been peer-reviewed, suggest that key findings from only 39 of the published studies could be reproduced.

But the situation is more nuanced than the top-line numbers suggest (See graphic, ‘Reliability test’). Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria, such as statistical significance, that would count as a successful replication.

The project, known as the “Reproducibility Project: Psychology”, is the largest of a wave of collaborative attempts to replicate previously published work, following reports of fraud and faulty statistical analysis as well as heated arguments about whether classic psychology studies were robust. One such effort, the ‘Many Labs’ project, successfully reproduced the findings of 10 of 13 well-known studies3.

Replication is a “hot” issue and likely to get hotter if peer review shifts to be “open.”

Do you really want to be listed as a peer reviewer for a study that cannot be replicated?

Perhaps open peer review will lead to more accountability of peer reviewers.


How to Make More Published Research True

Tuesday, October 21st, 2014

How to Make More Published Research True by John P. A. Ioannidis. (DOI: 10.1371/journal.pmed.1001747)

If you think the title is provocative, check out the first paragraph:

The achievements of scientific research are amazing. Science has grown from the occupation of a few dilettanti into a vibrant global industry with more than 15,000,000 people authoring more than 25,000,000 scientific papers in 1996–2011 alone [1]. However, true and readily applicable major discoveries are far fewer. Many new proposed associations and/or effects are false or grossly exaggerated [2],[3], and translation of knowledge into useful applications is often slow and potentially inefficient [4]. Given the abundance of data, research on research (i.e., meta-research) can derive empirical estimates of the prevalence of risk factors for high false-positive rates (underpowered studies; small effect sizes; low pre-study odds; flexibility in designs, definitions, outcomes, analyses; biases and conflicts of interest; bandwagon patterns; and lack of collaboration) [3]. Currently, an estimated 85% of research resources are wasted [5]. (footnote links omitted, emphasis added)

I doubt anyone can disagree with the need for reform in scientific research, but it is one thing to call for reform in general versus the specific.

The following story depends a great deal on cultural context, Southern religious cultural context, but I will tell the story and then attempt to explain if necessary.

One Sunday morning service the minister was delivering a powerful sermon on sins that his flock could avoid. He touched on drinking and smoking at length and as he ended each of those, an older woman in the front pew would “Amen!” very loudly. The same response was given to his condemnation of smoking. Finally, the sermon touched on dipping snuff and chewing tobacco. Dead silence from the older woman on the front row. The sermon ended some time later, hymns were sung and the congregation was dismissed.

As the congregation exited the church, the minister stood at the door, greeting one and all. Finally the older woman from the front pew appeared and the minister greeted her warmly. She had after all, appeared to enjoy most of his sermon. After some small talk, the minister did say: “You liked most of my sermon but you became very quite when I mentioned dipping snuff and chewing tobacco. If you don’t mind, can you tell me what was different about that part?” To which the old woman replied: “I was very happy while you were preaching but then you went to meddling.”

So long as the minister was talking about the “sins” that she did not practice, that was preaching. When the minister starting talking about “sins” she committed like dipping snuff or chewing tobacco, that was “meddling.”

I suspect that Ioannidis’ preaching will find widespread support but when you get down to actual projects and experiments, well, you have gone to “meddling.”

In order to root out waste, it will be necessary to map out who benefits from such projects, who supported them, who participated, and their relationships to others and other projects.

Considering that universities are rumored to get at least fifty (50) to (60) percent of grants as administrative overhead, they are unlikely to be your allies in creating such mappings or reducing waste in any way. Appeals to funders may be effective, save some funders, like the NIH, have an investment in the research structure as it exists.

Whatever the odds of change, naming names, charting relationships over time and interests in projects is at least a step down the road to useful rather than remunerative scientific research.

Topic map excel at modeling relationships, whether known at the outset of your tracking or lately discovered, unexpectedly.

PS: With a topic map you can skip endless committee meetings with each project to agree on how to track that project and their methodologies for waste, should any waste exists. Yes, the first line of a tar baby (in it’s traditional West African sense) defense by universities and others, let’s have a pre-meeting to plan our first meeting, etc.

…lotteries to pick NIH research-grant recipients

Thursday, April 17th, 2014

Wall Street Journal op-ed advocates lotteries to pick NIH research-grant recipients by Steven T. Corneliussen

From the post:

The subhead for the Wall Street Journal op-ed “Taking the Powerball approach to funding medical research” summarizes its coauthors’ argument about research funding at the National Institutes of Health (NIH): “Winning a government grant is already a crapshoot. Making it official by running a lottery would be an improvement.”

The coauthors, Ferric C. Fang and Arturo Casadevall, serve respectively as a professor of laboratory medicine and microbiology at the University of Washington School of Medicine and as professor and chairman of microbiology and immunology at the Albert Einstein College of Medicine of Yeshiva University.

At a time when funding levels are historically low, they note, grant peer review remains expensive. The NIH Center for Scientific Review has a $110 million annual budget. Grant-submission and grant-review processes extract an additional high toll from participants. Within this context, the coauthors summarize criticisms of NIH peer review. They mention a 2012 Nature commentary that argued, they say, that the system’s structure “encourages conformity.” In particular, after mentioning a study in the journal Circulation Research, they propose that concerning projects judged good enough for funding, “NIH peer reviewers fare no better than random chance when it comes to predicting how well grant recipients will perform.”

Nature should use a “mock” lottery to judge the acceptance of papers along side its normal peer review process. Publish the results after a year of peer review “competing” with a lottery.

Care to speculate on the results as evaluated by Nature readers?

Data-Intensive Librarians for Data-Intensive Research

Friday, August 10th, 2012

Data-Intensive Librarians for Data-Intensive Research by Chelcie Rowell.

From the post:

A packed house heard Tony Hey and Clifford Lynch present on The Fourth Paradigm: Data-Intensive Research, Digital Scholarship and Implications for Libraries at the 2012 ALA Annual Conference.

Jim Gray coined The Fourth Paradigm in 2007 to reflect a movement toward data-intensive science. Adapting to this change would, Gray noted, require an infrastructure to support the dissemination of both published work and underlying research data. But the return on investment for building the infrastructure would be to accelerate the transformation of raw data to recombined data to knowledge.

In outlining the current research landscape, Hey and Lynch underscored how right Gray was.

Hey led the audience on a whirlwind tour of how scientific research is practiced in the Fourth Paradigm. He showcased several projects that manage data from capture to curation to analysis and long-term preservation. One example he mentioned was the Dataverse Network Project that is working to preserve diverse scholarly outputs from published work to data, images and software.

Lynch reflected on the changing nature of the scientific record and the different collaborative structures that will be needed to define, generate and preserve that record. He noted that we tend to think of the scholarly record in terms of published works. In light of data-intensive science, Lynch said the definition must be expanded to include the datasets which underlie results and the software required to render data.

I wasn’t able to find a video of the presentations and/or slides but while you wait for those to appear, you can consult the homepages of Lynch and Hey for related materials.

Librarians already have searching and bibliographic skills, which are appropriate to the Fourth Paradigm.

What if they were to add big data design, if not processing, skills to their resumes?

What if articles in professional journals carried a byline in addition to the authors: Librarian(s): ?

Tilera’s TILE-Gx Processor Family and the Open Source Community [topic maps lab resource?]

Thursday, June 21st, 2012

Tilera’s TILE-Gx Processor Family and the Open Source Community Deliver the World’s Highest Performance per Watt to Networking, Multimedia, and the Cloud

It’s summer and on hot afternoons it’s easy to look at all the cool stuff at online trade zines. Like really high-end processors that we could stuff in our boxes, to run, well, really complicated stuff to be sure. 😉

On one hand we should be mindful that our toys have far more processing power than mainframes of not too long ago. So we need to step up our skill at using the excess capacity on our desktops.

On the other hand, it would be nice to have access to cutting edge processors that will be common place in another cycle or two, today!

From the post:

Tilera® Corporation, the leader in 64-bit manycore general purpose processors, announced the general availability of its Multicore Development Environment™ (MDE) 4.0 release on the TILE-Gx processor family. The release integrates a complete Linux distribution including the kernel 2.6.38, glibc 2.12, GNU tool chain, more than 3000 CentOS 6.2 packages, and the industry’s most advanced manycore tools developed by Tilera in collaboration with the open source community. This release brings standards, familiarity, ease of use, quality and all the development benefits of the Linux environment and open source tools onto the TILE-Gx processor family; both the world’s highest performance and highest performance per watt manycore processor in the market. Tilera’s MDE 4.0 is available now.

“High quality software and standard programming are essential elements for the application development process. Developers don’t have time to waste on buggy and hard to program software tools, they need an environment that works, is easy and feels natural to them,” said Devesh Garg, co-founder, president and chief executive officer, Tilera. “From 60 million packets per second to 40 channels of H.264 encoding on a Linux SMP system, this release further empowers developers with the benefits of manycore processors.”

Using the TILE-Gx processor family and the MDE 4.0 software release, customers have demonstrated high performance, low latency, and the highest performance per watt on many applications. These include Firewall, Intrusion Prevention, Routers, Application Delivery Controllers, Intrusion Detection, Network Monitoring, Network Packet Brokering, Application Switching for Software Defined Networking, Deep Packet Inspection, Web Caching, Storage, High Frequency Trading, Image Processing, and Video Transcoding.

The MDE provides a comprehensive runtime software stack, including Linux kernel 2.6.38, glibc 2.12, binutil, Boost, stdlib and other libraries. It also provides full support for Perl, Python, PHP, Erlang, and TBB; high-performance kernel and user space PCIe drivers; high performance low latency Ethernet drivers; and a hypervisor for hardware abstraction and virtualization. For development tools the MDE includes standard C/C++ GNU compiler v4.4 and 4.6; an Eclipse Integrated Development Environment (IDE); debugging tools such as gdb 7 and mudflap; profiling tools including gprof, oprofile, and perf_events; native and cross build environments; and graphical manycore application debugging and profiling tools.

Should a topic maps lab offer this sort of resource to a geographically distributed set of researchers? (Just curious. I don’t have funding but should the occasion arise.)

Even with the cloud, thinking topic map researchers need access to high-end architectures for experiments with data structures and processing techniques.

Dominic Widdows

Tuesday, June 5th, 2012

While tracking references, I ran across the homepage of Dominic Widdows at Google.

Actually I found the Papers and Publications page for Dominic Widdows and then found his homepage. 😉

There is much to be read here.

DBLP page for Dominic Widdows.

Mihai Surdeanu

Sunday, May 27th, 2012

I ran across Mihai Surdeanu‘s publication page while hunting down an NLP article.

There are pages for software and other resources as well.


ORCID (Open Researcher & Contributor ID)

Saturday, September 24th, 2011

ORCID (Open Researcher & Contributor ID)

From the About page:

ORCID, Inc. is a non-profit organization dedicated to solving the name ambiguity problem in scholarly research and brings together the leaders of the most influential universities, funding organizations, societies, publishers and corporations from around the globe. The ideal solution is to establish a registry that is adopted and embraced as the de facto standard by the whole of the community. A resolution to the systemic name ambiguity problem, by means of assigning unique identifiers linkable to an individual’s research output, will enhance the scientific discovery process and improve the efficiency of funding and collaboration. The organization is managed by a fourteen member Board of Directors.

ORCID’s principles will guide the initiative as it grows and operates. The principles confirm our commitment to open access, global communication, and researcher privacy.

Accurate identification of researchers and their work is one of the pillars for the transition from science to e-Science, wherein scholarly publications can be mined to spot links and ideas hidden in the ever-growing volume of scholarly literature. A disambiguated set of authors will allow new services and benefits to be built for the research community by all stakeholders in scholarly communication: from commercial actors to non-profit organizations, from governments to universities.

Thomson Reuters and Nature Publishing Group convened the first Name Identifier Summit in Cambridge, MA in November 2009, where a cross-section of the research community explored approaches to address name ambiguity. The ORCID initiative officially launched as a non-profit organization in August 2010 and is moving ahead with broad stakeholder participation (view participant gallery). As ORCID develops, we plan to engage researchers and other community members directly via social media and other activity. Participation from all stakeholders at all levels is essential to fulfilling the Initiative’s mission.

I am not altogether certain that elimination of ambiguity in identification will enable “…min[ing] to spot links and ideas hidden in the ever-growing volume of scientific literature.” Or should I say there is no demonstrated connection between unambiguous identification of researchers and such gains?

True enough, the claim is made but I thought science was based on evidence, not simply making claims.

And, like most researchers, I have discovered unexpected riches when mistaking one researcher’s name for another’s. Reducing ambiguity in identification will reduce the incidence of, well, ambiguity in identification.

Jack Park forwarded this link to me.

Tamara Munzner – Graphics

Tuesday, October 12th, 2010

Tamara Munzer is a professor at University of British Columbia and one of the leading researchers on visualization of data.

I ran across her site looking for information on 3D visualization of graphs.

Check out her publications or software pages for a preview of items you will see here sooner or later.

Computation, Information, Cognition: The Nexus and the Liminal – Book

Saturday, July 3rd, 2010

Computation, Information, Cognition: The Nexus and the Liminal by Gordana Dodig-Crnkovic and Susan Stuart, is a deeply delightful collection of essays from European Computing and Philosophy Conference (E-CAP), 2005.

I originally ordered it because of Graeme Hirst’s essay, “Views of Text Meaning in Computational Linguistics: Past, Present, and Future.” More on that in a future post but suffice it to say that he sees computational linguistics returning to a realization that “meaning” isn’t nearly as flat as some people would like to believe.

I could not help perusing some of the other essays and ran across Werner Ceusters and Barry Smith, in “Ontology as the Core Discipline of Biomedical Informatics – Legacies of the Past and Recommendations for the Future Directions of Research,” bashing the work of ISO/IEC TC 37, and its founder, Eugen Wüster, as International Standard Bad Philosophy. Not that I care for “realist ontologies” all that much but it is a very amusing essay.

Not to mention Patrick Allo’s “Formalizing Semantic Information: Lessons from Logical Pluralism.” If I say “informational pluralism” does anyone need more of a hint as to why I would like this essay?

I feel bad that I can’t mention in a reasonable sized posts all the other essays in this volume, or do more to give the flavor of those I mention above. This isn’t a scripting source book but the ideas you will find in it are going to shape the future of computation and our little corner of it for some time to come.


Saturday, June 12th, 2010

MURAKAMI Harumi focuses on knowledge sharing and integration of library catalogs.

ReaD An alternative listing to dblp. DBLP lists four (4) publications, ReaD list six (6) plus fifty (50) papers and notes.



Harumi’s (given name, MURAKAMI is the family name) work on Subject World (Japanese only) (my post on Subject World includes English language references) caught my attention because of its visualization of heterogeneous terminology in a library OPAC setting.

Since I am innocent of any Japanese, I am interested in hearing reactions from those fluent in Japanese to the visualization interface. This could also be an opportunity to explore how visualization preferences do or don’t differ across cultural lines.

Citation Indexing

Sunday, June 6th, 2010

Eugene Garfield’s homepage may not be familiar to topic map fans but it should be.

Garfield invented citation indexing in the late 1950’s/early 1960’s.

Among the treasures you will find here:

Terrorism Resources

Wednesday, May 26th, 2010

Terrorism Informatics Resources is a resource listing for an area where topic maps can make a difference.

Peter McBrien

Saturday, May 22nd, 2010

Peter McBrien focuses on data modeling and integration.

Part of the AutoMed project on database integration. Recent work includes temporal constraints and P2P exchange of heterogeneous data.

Publications (dblp).


Databases: Tools and Data for Teaching and Research: Useful collection of datasets and other materials on databases, data modeling and integration.

I first encountered Peter’s research in Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model.

From a topic map perspective, the authors assumed the identities of the subjects to which their transformation rules were applied. Someone less familiar with the schema languages could have made other choices.

That’s the hard question isn’t it? How to have reliable integration without presuming a common perspective/interpretation of the schema languages?

PS: This is the first of many posts on researchers working in areas of interest to the topic maps community.

Context of Data?

Wednesday, May 19th, 2010

Cristiana Bolchini and others in And What Can Context Do For Data? have started down an interesting path for exploration.

That all data exists in some context is an unremarkable observation until one considers how often that context can be stated, attributed to data, to say nothing of being used to filter or access that data.

Bolchini introduces the notion of a context dimension tree (CDT) which “models context in terms of a set of context dimensions, each capturing a different characteristic of the context.” (CACM, Nov. 2009, page 137) Note that dimensions can be decomposed into sub-trees for further analysis. Further operations combine these dimensions into the “context” of the data that is used to produce a particular view of the data.

Not quite what is meant by scope in topic maps but something a bit more nuanced and subtle. I would argue (no surprise) that the context of a subject is part and parcel of its identity. And how much of that context we choose to represent will vary from project to project.

Further reading:

Bolchini, C., Curino, C. A., Quintaretti, E., Tanca, L. and Schreber, F. A. A data-oriented study of context models. SIGMOD Record, 2007.

Bolchini, C., Quintaretti, E. and Rossato, R. Relational data tailoring through view composition. In Proc. Intl. Conf. on Conceptual Modeling (ER’2007). LNCS. Nov. 2007

Context-ADDICT (its an acronym, I swear!) Website for the project developing this line of research. Prototype software available.