Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 19, 2011

Ptolemy Project

Filed under: Polymorphism,Semantics,Types — Patrick Durusau @ 3:16 pm

Ptolemy Project: heterogeneous modeling and design

If you think you have heard me use the name Ptolemy in this blog, you would be correct. Not the same one, Ptolemy V, whose decree was recorded on the Rosetta Stone. See: An Early Example of Collocation. There are even earlier multi-lingual texts, I need to track down good images of them and do a post about them.

Back to the Ptolemy Project. Webpage reads in part:

The Ptolemy project studies modeling, simulation, and design of concurrent, real-time, embedded systems. The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation. A software system called Ptolemy II is being constructed in Java….

One of their current research thrusts:

Abstract semantics: Domain polymorphism, behavioral type systems, meta-modeling of semantics, comparative models of computation.

BTW, this is the core engine for the Kepler project.

The Kepler Project

Filed under: Bioinformatics,Data Analysis,ELN Integration,Information Flow,Workflow — Patrick Durusau @ 3:16 pm

The Kepler Project

From the website:

The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging “R” scripts with compiled “C” code, or facilitating remote, distributed execution of models. Using Kepler’s graphical user interface, users simply select and then connect pertinent analytical components and data sources to create a “scientific workflow”—an executable representation of the steps required to generate results. The Kepler software helps users share and reuse data, workflows, and compo­nents developed by the scientific community to address common needs.

The Kepler software is developed and maintained by the cross-project Kepler collaboration, which is led by a team consisting of several of the key institutions that originated the project: UC Davis, UC Santa Barbara, and UC San Diego. Primary responsibility for achieving the goals of the Kepler Project reside with the Leadership Team, which works to assure the long-term technical and financial viability of Kepler by making strategic decisions on behalf of the Kepler user community, as well as providing an official and durable point-of-contact to articulate and represent the interests of the Kepler Project and the Kepler software application. Details about how to get more involved with the Kepler Project can be found in the developer section of this website.

Kepler is a java-based application that is maintained for the Windows, OSX, and Linux operating systems. The Kepler Project supports the official code-base for Kepler development, as well as provides materials and mechanisms for learning how to use Kepler, sharing experiences with other workflow developers, reporting bugs, suggesting enhancements, etc.

I found this from an announcement of an NSF grant for a bioKepler project.

Questions:

  1. Review the Kepler project and prepare a short summary of it. (3 – 5 pages)
  2. Workflow by its very nature involves subjects moving from one process or user to another. How is that handled by Kepler in general?
  3. Can you use intersect the workflow of Kepler with other workflow management software? If not, why not? (research project)

MyBioSoftware

Filed under: Bioinformatics,Biomedical,Software — Patrick Durusau @ 3:16 pm

MyBioSoftware: Bioinformatics Software Blog

From the blog:

My Biosoftware Blog supplies free bioinformatics software for biology scientist, every day.

Impressive listing of bioinformatics software. Not my area (by training). It is one in which I am interested because of the rapid development of data analysis techniques, which may be applicable more broadly.

Question/Task: Select any two software packages in a category and document the output formats that they support. Thinking it would be useful to have a chart of formats supported for each category. May uncover places where interchange isn’t easy or perhaps even possible.

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

Filed under: Bioinformatics,Java,Neo4j — Patrick Durusau @ 3:16 pm

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

From the post:

I just finished this afternoon a small project I had to do about identification of microsatellites in DNA sequences. As with every new project I start, I think of something that:

  • I didn’t try before
  • is worth learning
  • is applicable in order to meet the needs of the specific project

These last few days it was the chance to get to know and try the visualization tool included in the last version of Neo4j Webadmin dashboard.

I had already heard of it a couple of times from different sources but had not had the chance to play a bit with it yet. So, after my first contact with it I have to say that although it’s something Neo4j introduced in the last versions, it already has a decent GUI and promising functionality.

Covers his domain model and the results of same.

North American Fuzzy Information Processing Society (NAFIPS)

Filed under: Fuzzy Logic,Fuzzy Matching,Fuzzy Sets — Patrick Durusau @ 3:16 pm

North American Fuzzy Information Processing Society (NAFIPS)

From the website:

As the premier fuzzy society in North America established in 1981, our purpose is to help guide and encourage the development of fuzzy sets and related technologies for the benefit of mankind. In this role, we understand the importance of, and the need for, developing a strong intellectual basis and encouraging new and innovative applications. In addition, we acknowledge our leadership role to foster interaction and technology transfer to other national and international organizations to bring the benefits of this technology to North America and the world.

Links, pointers to software, journals, etc.

NAFIPS 2012 : North American Fuzzy Information Processing Society

Filed under: Conferences,Fuzzy Logic,Fuzzy Matching,Fuzzy Sets — Patrick Durusau @ 3:16 pm

NAFIPS 2012 : North American Fuzzy Information Processing Society

Dates:

When Aug 6, 2012 – Aug 8, 2012
Where Berkeley, CA
Submission Deadline Jan 29, 2012
Notification Due Mar 11, 2012
Final Version Due Apr 15, 2012

From the announcement:

Aims and Scope

NAFIPS 2012 aims to bring together researchers, engineers and practitioners to present the latest achievements and innovations in the area of fuzzy information processing, to discuss thought-provoking developments and challenges, to consider potential future directions.

Topics

The topics cover all aspects of fuzzy systems and their applications including, but not limited to:

  • fuzzy sets and fuzzy logic
  • mathematical foundations of fuzzy sets and fuzzy systems
  • approximate reasoning, fuzzy inference models, and soft computing
  • fuzzy decision analysis, decision making, optimization, and design
  • fuzzy system architectures and hardware
  • fuzzy methods in data analysis, statistics and imprecise probability
  • fuzzy databases and information retrieval
  • fuzzy pattern recognition and image processing
  • fuzzy sets in management science
  • fuzzy control and robotics
  • possibility theory
  • fuzzy sets and logic in ontology, web, and social networks
  • fuzzy preference modelling
  • fuzzy sets in operations research and manufacturing
  • fuzzy database mining and financial forecasting
  • fuzzy neural networks
  • evolutionary and hybrid systems
  • intelligent agents and ambient intelligence
  • learning, adaptive, and evolvable fuzzy systems

First experiences with GeoCouch

Filed under: Geographic Data,Geographic Information Retrieval,Humor — Patrick Durusau @ 3:15 pm

First experiences with GeoCouch by tbuchwaldt.

From the post:

To learn some new stuff about cool databases and geo-aware services we started fiddling with GeoCouch, a CouchDB extension. To have a real scenario we could work on, we designed a small project: A CouchDB database contains documents with descriptions of fastfood restaurants. We agreed on 3 types of restaurants: KFC, Mc Donalds & Burgerking. We gave them some additonal information, namely opening and closing times and a boolean called “supersize”.

It sounds to me like this sort of service, coupled with a topic map of campus locations/services, could prove to be very amusing during “rush” week when directions and locations are not well known.

Knime4Bio:…Next Generation Sequencing data with KNIME

Filed under: Bioinformatics,Biomedical,Data Mining — Patrick Durusau @ 3:15 pm

Knime4Bio:…Next Generation Sequencing data with KNIME by # Pierre Lindenbaum, Solena Le Scouarnec, Vincent Portero and Richard Redon.

Abstract:

Analysing large amounts of data generated by next-generation sequencing (NGS) technologies is difficult for researchers or clinicians without computational skills. They are often compelled to delegate this task to computer biologists working with command line utilities. The availability of easy-to-use tools will become essential with the generalisation of NGS in research and diagnosis. It will enable investigators to handle much more of the analysis. Here, we describe Knime4Bio, a set of custom nodes for the KNIME (The Konstanz Information Miner) interactive graphical workbench, for the interpretation of large biological datasets. We demonstrate that this tool can be utilised to quickly retrieve previously published scientific findings.

Code: http://code.google.com/p/knime4bio/

While I applaud the trend towards “easy-to-use” software, I do worry about results that are returned by automated analysis, which of course “must be true.”

I am mindful of the four-year old whose name was on a terrorist watch list and so delayed the departure of a plane. The ground personnel lacked the moral courage or judgement to act on what was clearly a case of mistaken identity.

As “bigdata” grows ever larger, I wonder if “easy” interfaces will really be facile interfaces, that we lack the courage (skill?) to question?

Not A Problem

Filed under: Humor — Patrick Durusau @ 3:15 pm

I was deeply amused to read “not a problem” being defined as follows:

It is not in any way “solvable”, at least not by means accessible to us (which in some sense defines it as “not a problem”).

From: Don’t test for exact equality of floating point numbers

Just in case you run into any non-solvable “problems” when creating topic maps.

The correct terminology for such cases is “not a problem.” 😉

Rapid-I: Report the Future

Filed under: Analytics,Data Mining,Document Classification,Prediction — Patrick Durusau @ 3:15 pm

Rapid-I: Report the Future

Source of:

RapidMiner: Professional open source data mining made easy.

Analytical ETL, Data Mining, and Predictive Reporting with a single solution

RapidAnalytics: Collaborative data analysis power.

No 1 in open source business analytics

The key product for business critical predictive analysis

RapidDoc: Webbased solution for document retrieval and analysis.

Classify text, identify trends as well as emerging topics

Easy to use and configure

From About Rapid-I:

Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open-source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.

The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products. By now, thousands of applications of RapidMiner in more than 30 countries give their users a competitive edge. Among the users are well-known companies as Ford, Honda, Nokia, Miele, Philips, IBM, HP, Cisco, Merrill Lynch, BNP Paribas, Bank of America, mobilkom austria, Akzo Nobel, Aureus Pharma, PharmaDM, Cyprotex, Celera, Revere, LexisNexis, Mitre and many medium-sized businesses benefitting from the open-source business model of Rapid-I.

Data mining/analysis is the first part of any topic map project, however large or small. These tools, which I have not (yet) tried, are likely to prove useful in such projects. Comments welcome.

October 18, 2011

Computational Omics and Systems Biology Group

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 2:41 pm

Computational Omics and Systems Biology Group

From the webpage:

>Introduction

The Computational Omics and Systems Biology Group, headed by Prof. Dr. Lennart Martens, is part of the Department of Biochemistry of the Faculty of Medicine and Health Sciences of Ghent University, and the Department of Medical Protein Research of VIB, both in Ghent, Belgium.

The group has its roots in Ghent, but has active members all over Europe, and specializes in the management, analysis and integration of high-throughput data (as obtained from various Omics approaches) with an aim towards establishing solid data stores, processing methods and tools to enable downstream systems biology research.

A major source of open source software, standards and other work.

Geological Survey Austria launches thesaurus project

Filed under: Geographic Data,Geographic Information Retrieval,Maps,Thesaurus — Patrick Durusau @ 2:41 pm

Geological Survey Austria launches thesaurus project by Helmut Nagy.

From the post:

Throughout the last year the Semantic Web Company team has supported the Geological Survey of Austria (GBA) in setting up their thesaurusA thesaurus is a book that lists words grouped together according to similarity of meaning, in contrast to a dictionary, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than … project. It started with a workshop in summer 2010 where we discussed use cases for using semantic web technologies as means to fulfill the INSPIRE directive. Now in fall 2011 GBA published their first thesauri as Linked Data using PoolParty’s new Linked Data front-end.

The Thesaurus Project of the GBA aims to create controlled vocabularies for the semantic harmonization of map-based geodata. The content-related realization of this project is governed by the Thesaurus Editorial Team, which consists of domain experts from the Geological Survey of Austria. With the development of semantically and technically interoperable geo-data the Geological Survey of Austria implements its legal obligation defined by the EU-Directive 2007/2/EC INSPIRE and the national “Geodateninfrastrukturgesetz” (GeoDIG), respectively.

I wonder if their “controlled vocabularies” are going to map to the terminology used over the history of Europe, in maps, art, accounts, histories, and other recorded materials?

If not, I wonder if there would be any support to tie that history into current efforts or do they plan on simply cutting off the historical record and starting with their new thesaurus?

ack

Filed under: Perl,Regex — Patrick Durusau @ 2:41 pm

ack

From the webpage:

ack is a tool like grep, designed for programmers with large trees of heterogeneous source code.

ack is written purely in Perl, and takes advantage of the power of Perl’s regular expressions.

It is said to be “pure Perl” so Robert shouldn’t have any problems running it on Windows. 😉

Seriously, the more I think about something Lars Marius said to me years ago, about it all being about string matching, the more that rings true.

Granting that we attach semantics to the results of that string matching but insofar as our machines are concerned, it’s just strings. We may have defined complex processing for strings, but they remain, so long as they are not viewed by us, simply strings.

(What I remember of conversations, remarks is always subject to correction by others who were present. I am sure their memories are better than mine.)

New food web dataset

Filed under: Data Source,Graphs,R,Visualization — Patrick Durusau @ 2:41 pm

New food web dataset

From the post:

So, there is a new food web dataset out that was put in Ecological Archives here, and I thought I would play with it. The food web is from Otago Harbour, an intertidal mudflat ecosystem in New Zealand. The web contains 180 nodes, with 1,924 links.

Fun stuff…

Interesting visuals but do you find that they help you understand the data?

Important question for visualizing topic maps because you can make the nodes and associations between them jump, jitter, blink (shades of the browser wars), or zoom in and out. OK, so if I am playing “Idiotfield 3” or whatever that might be interesting.

But the question for topic maps or any information systems is whether it helps me find or understand the underlying data?

What do you think here? Data is available. What would you do differently?

Ecological Society of America (esa) – Ecological Archives

Filed under: Data Source — Patrick Durusau @ 2:41 pm

Ecological Society of America (esa) – Ecological Archives

If you are interested in ecological data for use with topic maps, this looks like a good place to start.

The available digital files/supplements to published papers go back to 1982.

Published by the Ecological Society of America.

rOpenSci

Filed under: Data Mining,Data Source,R — Patrick Durusau @ 2:40 pm

rOpenSci

From the website:

Projects in rOpenSci fall into two categories: those for working with the scientific literature, and those for working directly with the databases. Visit the active development hub of each project on github, where you can see and download source-code, see updates, and follow or join the developer discussions of issues. Most of the packages work through an API provided by the resource (database, paper archive) to access data and bring it within reach of R’s powerful manipulation.

Project started this past summer but has already collected some tutorials and data.

Good opportunity to learn some R as well as talk up the notion of re-using scientific data in new ways. Don’t jump right into recursion of subject identity as it relates to data, data structures and the subject both represent. 😉 YOU MAY THINK THAT, but what you say is: How do you say when you are talking about the same subject across data sets? Would that be useful to you? (Note the strategy of asking the user, not explaining their problem first. The explaining of their problem for them in terms I understand is mostly my strategy so this is a reminder to me to not do that!)

The Second International Workshop on Diversity in Document Retrieval (DDR-2012)

Filed under: Conferences,Information Retrieval,Semantic Diversity — Patrick Durusau @ 2:40 pm

The Second International Workshop on Diversity in Document Retrieval (DDR-2012)

Dates:

When Feb 12, 2012 – Feb 12, 2012
Where Seattle WA, USA
Submission Deadline Dec 5, 2011
Notification Due Jan 10, 2012
Final Version Due Jan 17, 2012

From the webpage:

In conjunction with WSDM 2012 – the 5th ACM International Conference on Web Search and Data Mining

Overview
=======
When an ambiguous query is received, a sensible approach is for the information retrieval (IR) system to diversify the results retrieved for this query, in the hope that at least one of the interpretations of the query intent will satisfy the user. Diversity is an increasingly important topic, of interest to both academic researchers (such as participants in the TREC Web and Blog track diversity tasks), as well as to search engines professionals. In this workshop, we solicit submissions both on approaches and models for diversity, the evaluation of diverse search results, and on applications and presentation of diverse search results.

Topics:

  • Modelling Diversity:
    • Implicit diversification approaches
    • Explicit diversification approaches
    • Query log mining for diversity
    • Learning-to-rank for diversification
    • Clustering of results for diversification
    • Query intent understanding
    • Query type classification
  • Modelling Risk:
    • Probability ranking principle
    • Risk Minimization frameworks and role diversity
  • Evaluation:
    • Test collections for diversity
    • Evaluating of diverse search results
    • Measuring the ambiguity of queries
    • Measuring query aspects importance
  • Applications:
    • Product & review diversification
    • Opinion and sentiment diversification
    • Diversifying Web crawling policy
    • Graph analysis for diversity
    • Summarisation
    • Legal precedents & patents
    • Diverse recommender systems
    • Diversifying in real-time & news search
    • Diversification in other verticals (image/video search etc.)
    • Presentation of diverse search results

While typing this up, I remembered the “little search engine that could” post (Going Head to Head with Google (and winning)). Are we really condemned to have to manage unforeseeable complexity or is that a poor design choice we made for search engines?

After all, I am not really interested in the entire WWW. At least for this blog I am interested in probably less than 1/10 of 1% of the web (or less). So if I had a search engine for all the CS/Library/Informatics publications, blogs, subject domains relevant to data/information, I would pretty much be set. A big semantic field and one that is changing, but not anything like search everything that is connected (or not, for the DeepWeb) to the WWW.

I don’t have an answer for that but I think it is an issue that may enable management of semantic diversity. That is we get to declare the edge of the map. Yes, there are other things beyond the edge but we aren’t going to include them in this particular map.

Neo4j: Running Embedded Server with WebConsole

Filed under: Neo4j — Patrick Durusau @ 2:40 pm

Neo4j: Running Embedded Server with WebConsole by Devender Gollapally.

From the post:

Took me couple of hours to figure this out so blogging it, hopefully it helps someone else.

If you are running Neo4j in embedded mode, you can still get the web console, data browser and other goodies, they do mention this in the manual but what they don’t mention is that you will need 2 extra jars to do this neo4j-server.jar and neo4j-server-static-web.jar and these are not available on neo’s repo, so you will have to clone their source from git and build it locally.

Sounds like the manual needs to beef up on this topic.

I am sure everyone who is running Neo4j embedded will appreciate the info. Thanks!

Basic interface to Apache Solr (Python recipe)

Filed under: Python,Solr,Teaching — Patrick Durusau @ 2:40 pm

Basic interface to Apache Solr (Python recipe) by Graham Poulter.

From the post:

A basic model class representing Apache Solr. Abstracts the select, delete, update, and commit operations.

Select operation returns Python object parsed from a JSON-formatted response.

(code omitted)

There are several full-fledged Python libraries for interfacing to Apache Solr.

But sometimes all you need is a little code to build an appropriate HTTP request and parse the response. In that case, using this class could save you some time.

I think recipes are a good thing. What I have found in cooking is that at first I follow them closely until I gain confidence with the techniques and the likely result. The longer I use them the more I am likely to depart from them. So I get, usually, an edible result and learn something in the bargain.

I think there is a lesson here for teaching people about semantic and data mining techniques in general and topic maps in particular.

Search Algorithms with Google Director of Research Peter Norvig

Filed under: Search Algorithms,Search Engines,Searching — Patrick Durusau @ 2:40 pm

Search Algorithms with Google Director of Research Peter Norvig

From the post:

As you will see in the transcript below, this discussion focused on the use of artificial intelligence algorithms in search. Peter outlines for us the approach used by Google on a number of interesting search problems, and how they view search problems in general. This is fascinating reading for those of you who want to get a deeper understanding of how search is evolving and the technological approaches that are driving it. The types of things that are detailed in this interview include:

  1. The basic approach used to build Google Translate
  2. The process Google uses to test and implement algorithm updates
  3. How voice driven search works
  4. The methodology being used for image recognition
  5. How Google views speed in search
  6. How Google views the goals of search overall

Some of the particularly interesting tidbits include:

  1. Teaching automated translation systems vocabularly and grammar rules is not a viable approach. There are too many exceptions, and language changes and evolved rapidly. Google Translate uses a data driven approach of finding millions of real world translations on the web and learning from them.
  2. Chrome will auto translate foreign language websites for you on the fly (if you want it to).
  3. Google tests tens of thousands of algorithm changes per year, and make one to two actual changes every day
  4. Test is layered, starting with a panel of users comparing current and proposed results, perhaps a spin through the usability lab at Google, and finally with a live test with a small subset of actual Google users.
  5. Google Voice Search relies on 230 billion real world search queries to learn all the different ways that people articulate given words. So people no longer need to train their speech recognition for their own voice, as Google has enough real world examples to make that step unecessary.
  6. Google Image search allows you to drag and drop images onto the search box, and it will try to figure out what it is for you. I show a screen shot of an example of this for you below. I LOVE that feature!
  7. Google is obsessed with speed. As Peter says “you want the answer before you’re done thinking of the question”. Expressed from a productivity perspective, if you don’t have the answer that soon your flow of thought will be interrupted.

Reading the interview it occurred to me that perhaps, just perhaps, that authoring semantic applications, whether Semantic Web or Topic Maps, that we have been overly concerned with “correctness.” More so on the logic side where applications fall on their sides when they encounter outliers but precision is also the enemy of large scale production of topic maps.

What if we took a tack from Google’s use of a data driven approach to find mappings between data structures and the terms in data structures? I know automated techniques have been used for preliminary mapping of schemas before. What I am suggesting that we capture the basis for the mapping, so we can improve or change it.

Although there are more than 70 names for “insurance policy number” in information systems, I suspect that within the domain those stand in relationship to other subjects that would assist in refining a mining of those terms over time. Rather than making mining/mapping a “run it again Sam” type event, capturing that information could improve our odds at other mappings.

Depending on the domain, how accurate does it need to be? Particularly since we can build feedback into the systems so that as users encounter errors, those are corrected and cascade back to other users. Places users don’t visit may be wrong, but if no one visits, what difference does it make?

Very compelling interview and I suggest you read it in full.

October 17, 2011

Adopt-A-Doc

Filed under: Bibliography — Patrick Durusau @ 6:44 pm

Adopt-A-Doc

Please pass this along to your friends! An innovative way to preserve technical literature that will otherwise be difficult to access.

From the website:

Help make important research available online by adopting a U.S. Department of Energy (DOE) technical report. There are more than 200,000 DOE technical reports in need of digitization. In fact, most DOE technical reports from the 1940s to 1991 are still only available in hard copy or microfiche. This means that important research is not easily accessible by researchers and the public.

Why would I want to Adopt-A-Doc?

You may find a technical report that you want to share with others or you think worthy of making broadly available on the Web to support the advancement of science. When you search for important science information in your area of interest, you can choose to sponsor the digitization of any adoptable technical report. The cost is $85 (approximately the same cost as ordering a hard copy). Discounts for larger scale projects may be available. For additional information contact Susan Tackett at 865-576-5699 or tacketts@osti.gov.

CENDI: Federal STI Managers Group

Filed under: Government Data,Information Retrieval,Librarian/Expert Searchers,Library — Patrick Durusau @ 6:44 pm

CENDI: Federal STI Managers Group

From the webpage:

Welcome to the CENDI web site

CENDI’s vision is to provide its member federal STI agencies a cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of accomplishments is greater than each individual agency can achieve on its own.

CENDI’s mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information-support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness.

CENDI is an interagency working group of senior scientific and technical information (STI) managers from 14 U.S. federal agencies:

  • Defense Technical Information Center (Department of Defense)
  • Office of Research and Development & Office of Environmental Information (Environmental Protection Agency)
  • Government Printing Office
  • Library of Congress
  • NASA Scientific and Technical Information Program
  • National Agricultural Library (Department of Agriculture)
  • National Archives and Records Administration
  • National Library of Education (Department of Education)
  • National Library of Medicine (Department of Health and Human Services)
  • National Science Foundation
  • National Technical Information Service (Department of Commerce)
  • National Transportation Library (Department of Transportation)
  • Office of Scientific and Technical Information (Department of Energy)
  • USGS/Core Science Systems (Department of Interior)

These programs represent over 97% of the federal research and development budget.

The CENDI web site is hosted by the Defense Technical Information Center (DTIC), and is maintained by the CENDI secretariat. (emphasis added)

Yeah, I thought the 97% figure would catch your attention. 😉 Not sure how it compares with spending on IT and information systems in law enforcement and the spook agencies.

Topic Maps Class Project: Select one of the fourteen members and prepare a report for the class on their primary web interface. What did you like/dislike about the interface? How would you integrate the information you found there with your “home” library site (for students already employed elsewhere) or with the GSLIS site?

BTW, I think you will find that these agencies and their personnel have bee thinking deeply about information integration for decades. It is an extremely difficult problem that has no fixed or easy solution.

“Value polymorphism”, simple explanation with examples

Filed under: Haskell — Patrick Durusau @ 6:42 pm

“Value polymorphism”, simple explanation with examples

At first I thought this would be a post that would interest only one topic map person that I know. But upon reading the entire post, I decided it might interest two.

What pushed me over the edge was:

Thus by merely specifying the return type we have effectively generated a parser. An invalid string will produce an error:

That is a very powerful mechanism to use with processing values returned in processing a topic map (or data about to be included in a topic map).

Biological and Environmental Research (BER) Abstracts Database

Filed under: Bibliography,Bioinformatics,Environment — Patrick Durusau @ 6:41 pm

Biological and Environmental Research (BER) Abstracts Database

From the webpage:

Since 1995, OSTI has provided assistance and support to the Office of Biological and Environmental Research (BER) by developing and maintaining a database of BER research project information. Called the BER Abstracts Database (http://www.osti.gov/oberabstracts/index.jsp), it contains summaries of research projects supported by the program. Made up of two divisions, Biological Systems Science Division and Climate and Environmental Sciences Division, BER is responsible for world-class biological and environmental research programs and scientific user facilities. BER’s research program is closely aligned with DOE’s mission goals and focuses on two main areas: the Nation’s Energy Security (developing cost-effective cellulosic biofuels) and the Nation’s Environmental Future (improving the ability to understand, predict, and mitigate the impacts of energy production and use on climate change).

The BER Abstracts Database is publicly available to scientists, researchers, and interested citizens. Each BER research project is represented in the database, including both current/active projects and historical projects dating back to 1995. The information available on each research project includes: project title, abstract, principal investigator, research institution, research area, project term, and funding. Users may conduct basic or advanced searches, and various sorting and downloading options are available.

The BER Abstracts Database serves as a tool for BER program managers and a valuable resource for the public. The database also meets the Department’s strategic goals to disseminate research information and results. Over the past 16 years, over 6,000 project records have been created for the database, offering a fascinating look into the BER research program and how it has evolved. BER played a major role in the development of genomics-based systems biology and in the biotechnology revolution occurring over this period, while also supporting ground-breaking research on the impacts of energy production and use on the environment. The BER Abstracts Database, made available through the collaborative partnership between BER and OSTI, highlights these scientific advancements and maximizes the public value of BER’s research.

Particularly if this is an area of interest for you, take some time to become familiar with the interface.

  1. What do you think about the basic vs. advanced search?
  2. Does the advanced search offer any substantial advantages or do you have to start off with more complete information?
  3. What advantages (if any) does the use of abstracts offer over full text searching?

Science Conference Proceedings Portal

Filed under: Bibliography,Computer Science,Conferences — Patrick Durusau @ 6:41 pm

Science Conference Proceedings Portal

From the website:

Welcome to the DOE Office of Scientific and Technical Information’s (OSTI) Science Conference Proceedings Portal. This distributed portal provides access to science and technology conference proceedings and conference papers from a number of authoritative sites (professional societies and national labs, largely) whose areas of interest in the physical sciences and technology intersect those of the Department of Energy. Proceedings and papers from scientific meetings can be found in these fields, among others: particle physics, nuclear physics, chemistry, petroleum, aeronautics and astronautics, meteorology, engineering, computer science, electric power, fossil fuels. From here you can simultaneously query any or all of the listed organizations and collections for scientific and technical conference proceedings or papers. Simply enter your search term(s) in the “Search” box, check one or more of the listed sites (or check “Select All”), and click the “Search” button.

One of the conference organizations listed is the Association for Computing Machinery (ACM).

No doubt a very good site but I wonder about conferences that only appear as Springer publications, for example? Or that are concerned with computers but only appear as publications of other publishers or organizations?

Question: In a week, how many indexes that include computer science conferences can you find? How do they differ in terms of coverage?

Networked Knowledge Organization Systems/Services NKOS

Filed under: Knowledge Organization,Library,Ontology,Terminology,Thesaurus,Vocabularies — Patrick Durusau @ 6:40 pm

Networked Knowledge Organization Systems/Services NKOS

From the website:

NKOS is devoted to the discussion of the functional and data model for enabling knowledge organization systems/services (KOS), such as classification systems, thesauri, gazetteers, and ontologies, as networked interactive information services to support the description and retrieval of diverse information resources through the Internet.

Knowledge Organization Systems/Services (KOS) model the underlying semantic structure of a domain. Embodied as Web-based services, they can facilitate resource discovery and retrieval. They act as semantic road maps and make possible a common orientation by indexers and future users (whether human or machine). — Douglas Tudhope, Traugott Koch, New Applications of Knowledge Organization Systems

A wide variety of resources that will interest anyone working with knowledge systems. I would expect any number of these to appear in future posts with comments or observations.

CENDI Science Terminology Locator

Filed under: Government Data,Terminology,Vocabularies — Patrick Durusau @ 6:40 pm

CENDI Science Terminology Locator

Another CENDI resource that merits special mention.

From the webpage:

Browse the terminology resources across the U.S. Federal Science Agencies by selecting a topic and clicking the acronym resource link next to the category.

What you get when following one of the terminology links varies from “page not found” for NASA, RDF as an option at NALT, very complex term navigation (DOE), apparently search results in an agency database (USGS), a listing of terms with definitions and some navigation (DTIC), Descriptor Data (MeSH), “page not found” for NBII, to an outdated link for ERIC, but redirects to a thesaurus navigation page.

If you have someone in government who doesn’t think varying terminologies is an issue, send them this link. The varying responses and what you see when you get there should be proof enough for anyone.

TaxoBank

Filed under: Ontology,Taxonomy,Thesaurus,Vocabularies — Patrick Durusau @ 6:39 pm

TaxoBank: Access, deposit, save, share, and discuss taxonomy resources

From the webpage:

Welcome to the TaxoBank Terminology Registry

The TaxoBank contains information about controlled vocabularies of all types and complexities. We invite you to both browse and contribute. Enjoy term lists for special purpose use, get ideas for building your own vocabulary, perhaps find one that can give you a quicker start.

The information collected about each vocabulary follows a study (TRSS) conducted by JISC, the Joint Information Systems Committee of the Higher and Further Education Funding Councils. All of the recommended fields included in the study’s final report are included; some of those the study identified as Optional are not. See more about the Terminology Registry Scoping Study (TRSS) at their site. In addition, input from other information experts was elicited in planning the site.

This is an interactive web site. To add information about a vocabulary, click on Create Content in the left navigation pane (you’ll need to register as a user first; we just need your name and email). There are only eight required fields, but your listing will be more useful if you complete all the applicable fields about your vocabulary.

Add a comment to almost any page – how you’ve used the vocabulary, what you’d add to it, how you’d use it if expanded to an ontology, etc. Comments are welcome on Event and Blog pages as well. Click on Add Comment, and enter your thoughts. Even anonymous visitors (not signed in) can add comments, but they’ll be reviewed by a site admin before they’re made visible.

You may also update the Events section of the site. Taxonomy, Knowledge Systems, Information Architecture or Management, Metadata are all appropriate event themes. Click on Create Content and then on Events to add a new one (you’ll need to be a registered user).

Contact us through the Contact page, with suggestions, corrections, or to discuss displaying your vocabulary on this site (particularly important if it was created on a college server and faces erasure at the end of the academic year), or if you have questions.

Thank you for visiting (and participating)!

The “Vocabulary spotlight” suggested “Thesaurus of BellyDancing” on my first visit.

To be honest, I had never thought about belly dancing having a thesaurus or even a standard vocabulary for its description.

For class: Browse the listing and pick out an entry for a subject area unfamiliar to you. Prepare a short, say less than 5 minute oral review of the entry. What did you like/dislike, find useful, less than useful, etc. Did any thing about the entry interest you in finding out more about the subject matter or its treatment?

CENDI Agency Terminology Resources

Filed under: Government Data,Terminology,Thesaurus,Vocabularies — Patrick Durusau @ 6:39 pm

CENDI Agency Terminology Resources

From the webpage:

The following URLs provide access to the online thesauri and indexing resources of the various federal scientific & technical agencies including CENDI agencies. These resources are of interest to those wishing to know about the scientific and technical terminology used in various fields.

  • Agriculture & Food
  • Applied Science & Technologies
  • Astronomy & Space
  • Biology & Nature
  • Earth & Ocean Sciences
  • Energy & Energy Conservation
  • Environment & Environmental Quality
  • General Science
  • Health & Medicine
  • Physics, Chemistry, and Mathematics
  • Science Education

I will post on CENDI but I thought this was important enough to call out separately. Particularly since there are multiple thesauri in some of these categories.

For example:

NAL Agricultural Thesaurus http://agclass.nal.usda.gov/agt/agt.shtml

The NAL Agricultural Thesaurus (NALT) is annually updated and the 2007 edition contains over 65,800 terms organized into 17 subject categories. NALT is searchable online and is available in several formats (PDF, ASCII text, XML, SKOS) for download from the web site. NALT has standard hierarchical, equivalence and associative relationships and provides scope notes and over 2,400 definitions of terms for clarity. Proposals for new terminology can be sent to thes@nal.usda.gov. Published by the National Agricultural Library, United States Department of Agriculture.

Tesauro Agrícola http://agclass.nal.usda.gov/agt_es.shtml

Tesauro Agrícola is the Spanish language translation of the NAL Agricultural Thesaurus (NALT). The thesaurus accommodates the complexity of the Spanish language from a Western Hemisphere perspective. First published in May 2007, the thesaurus contains over 15,700 translated concepts and contains definitions for more than 2,400 terms. The thesaurus is searchable with a Spanish interface and is available in several formats (PDF, ASCII text, XML) for download from the web site. Proposals for new terminology can be sent to thes@nal.usda.gov . Published by the National Agricultural Library, United States Department of Agriculture.

WorldWideScience.org

Filed under: Bibliography — Patrick Durusau @ 5:28 am

WorldWideScience.org: The Global Science Gateway

From the webpage:

WorldWideScience.org is a global science gateway—accelerating scientific discovery and progress through a multilateral partnership to enable federated searching of national and international scientific databases and portals.

You have to pick “Advanced Search” to get an idea of the range of coverage offered by this gateway.

Note that the service offers multilingual searching powered by Microsoft Translator.

I did a search for “partially observable Markov processes” (thinking to avoid a real flood of “hits”) and was quickly shown six (6) “hits.” Then a popup appeared advising that a full search was complete, asking if it should add another four hundred-forty seven (447) results. The criteria for the “quick” results isn’t clear but it is impressive. Now the interface advises: 453 results from at least 3266 found.

Odd to see SpringerLink listed first under the Author facet on the left-hand side of the screen.

The search “hits” re-ordered themselves and since I had used an exact match string, the first item was Technical rept. no. 4 from MIT, Corporate Author: MASSACHUSETTS INST OF TECH CAMBRIDGE OPERATIONS RESEARCH CENTER, Personal Author: Kramer,J. David R. ,Jr., Report Date: April 1964.

You get “alerts” of later results but only if you have a registered account. But you have to search before you see a link to the login page, where you can create an account. For your convenience, the login page.

It is a very interesting “federation” of search results but I am troubled by not knowing the limitations of the underlying search engines.

« Newer PostsOlder Posts »

Powered by WordPress