Archive for the ‘Researchers’ Category

Data-Intensive Librarians for Data-Intensive Research

Friday, August 10th, 2012

Data-Intensive Librarians for Data-Intensive Research by Chelcie Rowell.

From the post:

A packed house heard Tony Hey and Clifford Lynch present on The Fourth Paradigm: Data-Intensive Research, Digital Scholarship and Implications for Libraries at the 2012 ALA Annual Conference.

Jim Gray coined The Fourth Paradigm in 2007 to reflect a movement toward data-intensive science. Adapting to this change would, Gray noted, require an infrastructure to support the dissemination of both published work and underlying research data. But the return on investment for building the infrastructure would be to accelerate the transformation of raw data to recombined data to knowledge.

In outlining the current research landscape, Hey and Lynch underscored how right Gray was.

Hey led the audience on a whirlwind tour of how scientific research is practiced in the Fourth Paradigm. He showcased several projects that manage data from capture to curation to analysis and long-term preservation. One example he mentioned was the Dataverse Network Project that is working to preserve diverse scholarly outputs from published work to data, images and software.

Lynch reflected on the changing nature of the scientific record and the different collaborative structures that will be needed to define, generate and preserve that record. He noted that we tend to think of the scholarly record in terms of published works. In light of data-intensive science, Lynch said the definition must be expanded to include the datasets which underlie results and the software required to render data.

I wasn’t able to find a video of the presentations and/or slides but while you wait for those to appear, you can consult the homepages of Lynch and Hey for related materials.

Librarians already have searching and bibliographic skills, which are appropriate to the Fourth Paradigm.

What if they were to add big data design, if not processing, skills to their resumes?

What if articles in professional journals carried a byline in addition to the authors: Librarian(s): ?

Tilera’s TILE-Gx Processor Family and the Open Source Community [topic maps lab resource?]

Thursday, June 21st, 2012

Tilera’s TILE-Gx Processor Family and the Open Source Community Deliver the World’s Highest Performance per Watt to Networking, Multimedia, and the Cloud

It’s summer and on hot afternoons it’s easy to look at all the cool stuff at online trade zines. Like really high-end processors that we could stuff in our boxes, to run, well, really complicated stuff to be sure. ;-)

On one hand we should be mindful that our toys have far more processing power than mainframes of not too long ago. So we need to step up our skill at using the excess capacity on our desktops.

On the other hand, it would be nice to have access to cutting edge processors that will be common place in another cycle or two, today!

From the post:

Tilera® Corporation, the leader in 64-bit manycore general purpose processors, announced the general availability of its Multicore Development Environment™ (MDE) 4.0 release on the TILE-Gx processor family. The release integrates a complete Linux distribution including the kernel 2.6.38, glibc 2.12, GNU tool chain, more than 3000 CentOS 6.2 packages, and the industry’s most advanced manycore tools developed by Tilera in collaboration with the open source community. This release brings standards, familiarity, ease of use, quality and all the development benefits of the Linux environment and open source tools onto the TILE-Gx processor family; both the world’s highest performance and highest performance per watt manycore processor in the market. Tilera’s MDE 4.0 is available now.

“High quality software and standard programming are essential elements for the application development process. Developers don’t have time to waste on buggy and hard to program software tools, they need an environment that works, is easy and feels natural to them,” said Devesh Garg, co-founder, president and chief executive officer, Tilera. “From 60 million packets per second to 40 channels of H.264 encoding on a Linux SMP system, this release further empowers developers with the benefits of manycore processors.”

Using the TILE-Gx processor family and the MDE 4.0 software release, customers have demonstrated high performance, low latency, and the highest performance per watt on many applications. These include Firewall, Intrusion Prevention, Routers, Application Delivery Controllers, Intrusion Detection, Network Monitoring, Network Packet Brokering, Application Switching for Software Defined Networking, Deep Packet Inspection, Web Caching, Storage, High Frequency Trading, Image Processing, and Video Transcoding.

The MDE provides a comprehensive runtime software stack, including Linux kernel 2.6.38, glibc 2.12, binutil, Boost, stdlib and other libraries. It also provides full support for Perl, Python, PHP, Erlang, and TBB; high-performance kernel and user space PCIe drivers; high performance low latency Ethernet drivers; and a hypervisor for hardware abstraction and virtualization. For development tools the MDE includes standard C/C++ GNU compiler v4.4 and 4.6; an Eclipse Integrated Development Environment (IDE); debugging tools such as gdb 7 and mudflap; profiling tools including gprof, oprofile, and perf_events; native and cross build environments; and graphical manycore application debugging and profiling tools.

Should a topic maps lab offer this sort of resource to a geographically distributed set of researchers? (Just curious. I don’t have funding but should the occasion arise.)

Even with the cloud, thinking topic map researchers need access to high-end architectures for experiments with data structures and processing techniques.

Dominic Widdows

Tuesday, June 5th, 2012

While tracking references, I ran across the homepage of Dominic Widdows at Google.

Actually I found the Papers and Publications page for Dominic Widdows and then found his homepage. ;-)

There is much to be read here.

DBLP page for Dominic Widdows.

Mihai Surdeanu

Sunday, May 27th, 2012

I ran across Mihai Surdeanu‘s publication page while hunting down an NLP article.

There are pages for software and other resources as well.

Enjoy!

ORCID (Open Researcher & Contributor ID)

Saturday, September 24th, 2011

ORCID (Open Researcher & Contributor ID)

From the About page:

ORCID, Inc. is a non-profit organization dedicated to solving the name ambiguity problem in scholarly research and brings together the leaders of the most influential universities, funding organizations, societies, publishers and corporations from around the globe. The ideal solution is to establish a registry that is adopted and embraced as the de facto standard by the whole of the community. A resolution to the systemic name ambiguity problem, by means of assigning unique identifiers linkable to an individual’s research output, will enhance the scientific discovery process and improve the efficiency of funding and collaboration. The organization is managed by a fourteen member Board of Directors.

ORCID’s principles will guide the initiative as it grows and operates. The principles confirm our commitment to open access, global communication, and researcher privacy.

Accurate identification of researchers and their work is one of the pillars for the transition from science to e-Science, wherein scholarly publications can be mined to spot links and ideas hidden in the ever-growing volume of scholarly literature. A disambiguated set of authors will allow new services and benefits to be built for the research community by all stakeholders in scholarly communication: from commercial actors to non-profit organizations, from governments to universities.

Thomson Reuters and Nature Publishing Group convened the first Name Identifier Summit in Cambridge, MA in November 2009, where a cross-section of the research community explored approaches to address name ambiguity. The ORCID initiative officially launched as a non-profit organization in August 2010 and is moving ahead with broad stakeholder participation (view participant gallery). As ORCID develops, we plan to engage researchers and other community members directly via social media and other activity. Participation from all stakeholders at all levels is essential to fulfilling the Initiative’s mission.

I am not altogether certain that elimination of ambiguity in identification will enable “…min[ing] to spot links and ideas hidden in the ever-growing volume of scientific literature.” Or should I say there is no demonstrated connection between unambiguous identification of researchers and such gains?

True enough, the claim is made but I thought science was based on evidence, not simply making claims.

And, like most researchers, I have discovered unexpected riches when mistaking one researcher’s name for another’s. Reducing ambiguity in identification will reduce the incidence of, well, ambiguity in identification.

Jack Park forwarded this link to me.

Tamara Munzner – Graphics

Tuesday, October 12th, 2010

Tamara Munzer is a professor at University of British Columbia and one of the leading researchers on visualization of data.

I ran across her site looking for information on 3D visualization of graphs.

Check out her publications or software pages for a preview of items you will see here sooner or later.

Computation, Information, Cognition: The Nexus and the Liminal – Book

Saturday, July 3rd, 2010

Computation, Information, Cognition: The Nexus and the Liminal by Gordana Dodig-Crnkovic and Susan Stuart, is a deeply delightful collection of essays from European Computing and Philosophy Conference (E-CAP), 2005.

I originally ordered it because of Graeme Hirst’s essay, “Views of Text Meaning in Computational Linguistics: Past, Present, and Future.” More on that in a future post but suffice it to say that he sees computational linguistics returning to a realization that “meaning” isn’t nearly as flat as some people would like to believe.

I could not help perusing some of the other essays and ran across Werner Ceusters and Barry Smith, in “Ontology as the Core Discipline of Biomedical Informatics – Legacies of the Past and Recommendations for the Future Directions of Research,” bashing the work of ISO/IEC TC 37, and its founder, Eugen Wüster, as International Standard Bad Philosophy. Not that I care for “realist ontologies” all that much but it is a very amusing essay.

Not to mention Patrick Allo’s “Formalizing Semantic Information: Lessons from Logical Pluralism.” If I say “informational pluralism” does anyone need more of a hint as to why I would like this essay?

I feel bad that I can’t mention in a reasonable sized posts all the other essays in this volume, or do more to give the flavor of those I mention above. This isn’t a scripting source book but the ideas you will find in it are going to shape the future of computation and our little corner of it for some time to come.

MURAKAMI Harumi

Saturday, June 12th, 2010

MURAKAMI Harumi focuses on knowledge sharing and integration of library catalogs.

ReaD An alternative listing to dblp. DBLP lists four (4) publications, ReaD list six (6) plus fifty (50) papers and notes.

dblp

Homepage

Harumi’s (given name, MURAKAMI is the family name) work on Subject World (Japanese only) (my post on Subject World includes English language references) caught my attention because of its visualization of heterogeneous terminology in a library OPAC setting.

Since I am innocent of any Japanese, I am interested in hearing reactions from those fluent in Japanese to the visualization interface. This could also be an opportunity to explore how visualization preferences do or don’t differ across cultural lines.

Citation Indexing

Sunday, June 6th, 2010

Eugene Garfield’s homepage may not be familiar to topic map fans but it should be.

Garfield invented citation indexing in the late 1950′s/early 1960′s.

Among the treasures you will find here:

Terrorism Resources

Wednesday, May 26th, 2010

Terrorism Informatics Resources is a resource listing for an area where topic maps can make a difference.

Peter McBrien

Saturday, May 22nd, 2010

Peter McBrien focuses on data modeling and integration.

Part of the AutoMed project on database integration. Recent work includes temporal constraints and P2P exchange of heterogeneous data.

Publications (dblp).

Homepage

Databases: Tools and Data for Teaching and Research: Useful collection of datasets and other materials on databases, data modeling and integration.

I first encountered Peter’s research in Comparing and Transforming Between Data Models via an Intermediate Hypergraph Data Model.

From a topic map perspective, the authors assumed the identities of the subjects to which their transformation rules were applied. Someone less familiar with the schema languages could have made other choices.

That’s the hard question isn’t it? How to have reliable integration without presuming a common perspective/interpretation of the schema languages?

*****
PS: This is the first of many posts on researchers working in areas of interest to the topic maps community.

Context of Data?

Wednesday, May 19th, 2010

Cristiana Bolchini and others in And What Can Context Do For Data? have started down an interesting path for exploration.

That all data exists in some context is an unremarkable observation until one considers how often that context can be stated, attributed to data, to say nothing of being used to filter or access that data.

Bolchini introduces the notion of a context dimension tree (CDT) which “models context in terms of a set of context dimensions, each capturing a different characteristic of the context.” (CACM, Nov. 2009, page 137) Note that dimensions can be decomposed into sub-trees for further analysis. Further operations combine these dimensions into the “context” of the data that is used to produce a particular view of the data.

Not quite what is meant by scope in topic maps but something a bit more nuanced and subtle. I would argue (no surprise) that the context of a subject is part and parcel of its identity. And how much of that context we choose to represent will vary from project to project.

Further reading:

Bolchini, C., Curino, C. A., Quintaretti, E., Tanca, L. and Schreber, F. A. A data-oriented study of context models. SIGMOD Record, 2007.

Bolchini, C., Quintaretti, E. and Rossato, R. Relational data tailoring through view composition. In Proc. Intl. Conf. on Conceptual Modeling (ER’2007). LNCS. Nov. 2007

Context-ADDICT (its an acronym, I swear!) Website for the project developing this line of research. Prototype software available.