Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 25, 2011

Humanizing Bioinformatics

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 7:33 pm

Humanizing Bioinformatics by Saaien Tist.

From the post:

I was invited last week to give a talk at this year’s meeting of the Graduate School Structure and Function of Biological Macromolecules, Bioinformatics and Modeling (SFMBBM). It ended up being a day with great talks, by some bright PhD students and postdocs. There were 2 keynotes (one by Prof Bert Poolman from Groningen (NL) and one by myself), and a panel discussion on what the future holds for people nearing the end of their PhDs.

My talk was titled “Humanizing Bioinformatics” and received quite well (at least some people still laughed at my jokes (if you can call them that); even at the end). I put the slides up on slideshare, but I thought I’d explain things here as well, because those slides will probably not convey the complete story.

Let’s ruin the plot by mentioning it here: we need data visualization to counteract the alienation that’s happening between bioinformaticians and bright data miners on the one hand, and the user/clinician/biologist on the other. We need to make bioinformatics human again. (emphasis in original)

I just wish there had been a video recording of this presentation!

Questions:

  1. Do you agree with the issues that Saalen raises? Are there more that you would raise? 2-3 pages (no citations)
  2. Have “semantics” become what can be evaluated by a computer? Pick yes, no, undecided and cite web examples for your position. 2-3 pages
  3. How much do you trust the answers to your searches? (Classroom discussion question.)

October 20, 2011

BioGene 1.1 – Information Tool for Biological Research for Iphone

Filed under: Bioinformatics,Search Algorithms,Searching — Patrick Durusau @ 6:37 pm

BioGene 1.1 – Information Tool for Biological Research for Iphone

From the website:

BioGene is an information tool for biological research. Use BioGene to learn about gene function. Enter a gene symbol or gene name, for example “CDK4″ or “cyclin dependent kinase 4″ and BioGene will retrieve its gene function and references into its function (GeneRIF).

The search/match criteria of Biogene is instructive:

Where does BioGene get its data?
BioGene provides primary information from Entrez Gene, a searchable database hosted by the NCBI.
What is a GeneRIF?
A GeneRIF is a functional annotation of a gene described in Entrez Gene. The annotation includes a link to a citation within PubMed which describes the function. Please see GeneRIF for more information.
How does BioGene search Entrez Gene?
BioGene attempts to match a query against a gene name (symbol). If no matching records are found, BioGene applies mild increments in permissiveness until a match is found. For example, if we are searching for the following single-term query, “trk”, BioGene will attempt the following sequence of queries in succession, stopping whenever one or more matching records is returned:

  • search for a gene name (symbol) that matches the exact sequence of characters “trk”
  • search for a gene name (symbol) that starts with the sequence of characters “trk”
  • search for a gene name (symbol) that contains the sequence of characters “trk” within a word
  • perform a free text search that matches the exact sequence of characters “trk”
  • perform a free text search that starts with the sequence of characters “trk”
  • perform a free text search that contains the sequence of characters “trk” within a word

In Entrez Gene parlance, for the following single-term query “trk”, the following sequence of queries is attempted:

  • trk[pref]
  • trk*[pref] OR trk[sym]
  • trk*[sym]
  • *trk*[sym]
  • trk[All Fields]
  • trk*[All Fields]
  • *trk*[All Fields]

If however, we are searching for the following multi-term query, “protein kinase 4”, BioGene will attempt the following sequence of queries in succession, stopping whenever one or more matching records is returned:

  • search for a full gene name that matches the exact sequence of characters “protein kinase 4”
  • perform a free text search that contains every term in the multi-term query “protein kinase 4”
  • perform a free text search that contains one of the terms in the multi-term query “protein kinase 4”

In Entrez Gene parlance, for the following multi-term query “protein kinase 4”, the following sequence of queries is attempted:

  • protein+kinase+4[gene full name]
  • protein[All Fields] OR kinase[All Fields] OR 4[All Fields]

If BioGene detects one or more of the following character sequences within the query:

[   ]   *   AND   OR   NOT

it treats this as an “advanced” query and passes the query directly to Entrez Gene. In this situation, BioGene ignores the organism filter specified in the application settings and expects the user to embed this filter within the query.

BioGene gives you access to subjects by multiple identifiers and at least for a closed set of data, attempts to find a “best” match.

Review Entrez Gene.

What other identifiers suggest themselves as bases of integrating known sources of additional information?

(Only “known” sources can be meaningfully displayed/integrated for the user.)

October 19, 2011

The Kepler Project

Filed under: Bioinformatics,Data Analysis,ELN Integration,Information Flow,Workflow — Patrick Durusau @ 3:16 pm

The Kepler Project

From the website:

The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines. Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging “R” scripts with compiled “C” code, or facilitating remote, distributed execution of models. Using Kepler’s graphical user interface, users simply select and then connect pertinent analytical components and data sources to create a “scientific workflow”—an executable representation of the steps required to generate results. The Kepler software helps users share and reuse data, workflows, and compo­nents developed by the scientific community to address common needs.

The Kepler software is developed and maintained by the cross-project Kepler collaboration, which is led by a team consisting of several of the key institutions that originated the project: UC Davis, UC Santa Barbara, and UC San Diego. Primary responsibility for achieving the goals of the Kepler Project reside with the Leadership Team, which works to assure the long-term technical and financial viability of Kepler by making strategic decisions on behalf of the Kepler user community, as well as providing an official and durable point-of-contact to articulate and represent the interests of the Kepler Project and the Kepler software application. Details about how to get more involved with the Kepler Project can be found in the developer section of this website.

Kepler is a java-based application that is maintained for the Windows, OSX, and Linux operating systems. The Kepler Project supports the official code-base for Kepler development, as well as provides materials and mechanisms for learning how to use Kepler, sharing experiences with other workflow developers, reporting bugs, suggesting enhancements, etc.

I found this from an announcement of an NSF grant for a bioKepler project.

Questions:

  1. Review the Kepler project and prepare a short summary of it. (3 – 5 pages)
  2. Workflow by its very nature involves subjects moving from one process or user to another. How is that handled by Kepler in general?
  3. Can you use intersect the workflow of Kepler with other workflow management software? If not, why not? (research project)

MyBioSoftware

Filed under: Bioinformatics,Biomedical,Software — Patrick Durusau @ 3:16 pm

MyBioSoftware: Bioinformatics Software Blog

From the blog:

My Biosoftware Blog supplies free bioinformatics software for biology scientist, every day.

Impressive listing of bioinformatics software. Not my area (by training). It is one in which I am interested because of the rapid development of data analysis techniques, which may be applicable more broadly.

Question/Task: Select any two software packages in a category and document the output formats that they support. Thinking it would be useful to have a chart of formats supported for each category. May uncover places where interchange isn’t easy or perhaps even possible.

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

Filed under: Bioinformatics,Java,Neo4j — Patrick Durusau @ 3:16 pm

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

From the post:

I just finished this afternoon a small project I had to do about identification of microsatellites in DNA sequences. As with every new project I start, I think of something that:

  • I didn’t try before
  • is worth learning
  • is applicable in order to meet the needs of the specific project

These last few days it was the chance to get to know and try the visualization tool included in the last version of Neo4j Webadmin dashboard.

I had already heard of it a couple of times from different sources but had not had the chance to play a bit with it yet. So, after my first contact with it I have to say that although it’s something Neo4j introduced in the last versions, it already has a decent GUI and promising functionality.

Covers his domain model and the results of same.

Knime4Bio:…Next Generation Sequencing data with KNIME

Filed under: Bioinformatics,Biomedical,Data Mining — Patrick Durusau @ 3:15 pm

Knime4Bio:…Next Generation Sequencing data with KNIME by # Pierre Lindenbaum, Solena Le Scouarnec, Vincent Portero and Richard Redon.

Abstract:

Analysing large amounts of data generated by next-generation sequencing (NGS) technologies is difficult for researchers or clinicians without computational skills. They are often compelled to delegate this task to computer biologists working with command line utilities. The availability of easy-to-use tools will become essential with the generalisation of NGS in research and diagnosis. It will enable investigators to handle much more of the analysis. Here, we describe Knime4Bio, a set of custom nodes for the KNIME (The Konstanz Information Miner) interactive graphical workbench, for the interpretation of large biological datasets. We demonstrate that this tool can be utilised to quickly retrieve previously published scientific findings.

Code: http://code.google.com/p/knime4bio/

While I applaud the trend towards “easy-to-use” software, I do worry about results that are returned by automated analysis, which of course “must be true.”

I am mindful of the four-year old whose name was on a terrorist watch list and so delayed the departure of a plane. The ground personnel lacked the moral courage or judgement to act on what was clearly a case of mistaken identity.

As “bigdata” grows ever larger, I wonder if “easy” interfaces will really be facile interfaces, that we lack the courage (skill?) to question?

October 18, 2011

Computational Omics and Systems Biology Group

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 2:41 pm

Computational Omics and Systems Biology Group

From the webpage:

>Introduction

The Computational Omics and Systems Biology Group, headed by Prof. Dr. Lennart Martens, is part of the Department of Biochemistry of the Faculty of Medicine and Health Sciences of Ghent University, and the Department of Medical Protein Research of VIB, both in Ghent, Belgium.

The group has its roots in Ghent, but has active members all over Europe, and specializes in the management, analysis and integration of high-throughput data (as obtained from various Omics approaches) with an aim towards establishing solid data stores, processing methods and tools to enable downstream systems biology research.

A major source of open source software, standards and other work.

October 17, 2011

Biological and Environmental Research (BER) Abstracts Database

Filed under: Bibliography,Bioinformatics,Environment — Patrick Durusau @ 6:41 pm

Biological and Environmental Research (BER) Abstracts Database

From the webpage:

Since 1995, OSTI has provided assistance and support to the Office of Biological and Environmental Research (BER) by developing and maintaining a database of BER research project information. Called the BER Abstracts Database (http://www.osti.gov/oberabstracts/index.jsp), it contains summaries of research projects supported by the program. Made up of two divisions, Biological Systems Science Division and Climate and Environmental Sciences Division, BER is responsible for world-class biological and environmental research programs and scientific user facilities. BER’s research program is closely aligned with DOE’s mission goals and focuses on two main areas: the Nation’s Energy Security (developing cost-effective cellulosic biofuels) and the Nation’s Environmental Future (improving the ability to understand, predict, and mitigate the impacts of energy production and use on climate change).

The BER Abstracts Database is publicly available to scientists, researchers, and interested citizens. Each BER research project is represented in the database, including both current/active projects and historical projects dating back to 1995. The information available on each research project includes: project title, abstract, principal investigator, research institution, research area, project term, and funding. Users may conduct basic or advanced searches, and various sorting and downloading options are available.

The BER Abstracts Database serves as a tool for BER program managers and a valuable resource for the public. The database also meets the Department’s strategic goals to disseminate research information and results. Over the past 16 years, over 6,000 project records have been created for the database, offering a fascinating look into the BER research program and how it has evolved. BER played a major role in the development of genomics-based systems biology and in the biotechnology revolution occurring over this period, while also supporting ground-breaking research on the impacts of energy production and use on the environment. The BER Abstracts Database, made available through the collaborative partnership between BER and OSTI, highlights these scientific advancements and maximizes the public value of BER’s research.

Particularly if this is an area of interest for you, take some time to become familiar with the interface.

  1. What do you think about the basic vs. advanced search?
  2. Does the advanced search offer any substantial advantages or do you have to start off with more complete information?
  3. What advantages (if any) does the use of abstracts offer over full text searching?

October 15, 2011

Making Sense of Unstructured Data in Medicine Using Ontologies – October 19th

Filed under: Bioinformatics,Biomedical,Ontology — Patrick Durusau @ 4:30 pm

From the email announcement:

The next NCBO Webinar will be presented by Dr. Nigam Shah from Stanford University on “Making Sense of Unstructured Data in Medicine Using Ontologies” at 10:00am PT, Wednesday, October 19. Below is information on how to join the online meeting via WebEx and accompanying teleconference. For the full schedule of the NCBO Webinar presentations see: http://www.bioontology.org/webinar-series.

ABSTRACT:

Changes in biomedical science, public policy, information technology, and electronic heath record (EHR) adoption have converged recently to enable a transformation in the delivery, efficiency, and effectiveness of health care. While analyzing structured electronic records have proven useful in many different contexts, the true richness and complexity of health records—roughly 80 percent—lies within the clinical notes, which are free-text reports written by doctors and nurses in their daily practice. We have developed a scalable annotation and analysis workflow that uses public biomedical ontologies and is based on the term recognition tools developed by the National Center for Biomedical Ontology (NCBO). This talk will discuss the applications of this workflow to 9.5 million clinical documents—from the electronic health records of approximately one million adult patients from the STRIDE Clinical Data Warehouse—to identify statistically significant patterns of drug use and to conduct drug safety surveillance. For the patterns of drug use, we validate the usage patterns learned from the data against FDA-approved indications as well as external sources of known off-label use such as Medi-Span. For drug safety surveillance, we show that drug–disease co-occurrences and the temporal ordering of drugs and disease mentions in clinical notes can be examined for statistical enrichment and used to detect potential adverse events.

WEBEX DETAILS:
——————————————————-
To join the online meeting (Now from mobile devices!)
——————————————————-
1. Go to https://stanford.webex.com/stanford/j.php?ED=108527772&UID=0&PW=NZDdmNWNjOGMw&RT=MiM0
2. If requested, enter your name and email address.
3. If a password is required, enter the meeting password: ncbo
4. Click “Join”.

——————————————————-
To join the audio conference only
——————————————————-
To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code.
Call-in toll number (US/Canada): 1-650-429-3300
Global call-in numbers: https://stanford.webex.com/stanford/globalcallin.php?serviceType=MC&ED=108527772&tollFree=0

Access code:929 613 752

October 10, 2011

Bio4jExplorer

Filed under: Bio4j,Bioinformatics,Biomedical,Cloud Computing,Graphs — Patrick Durusau @ 6:17 pm

Bio4jExplorer: familiarize yourself with Bio4j nodes and relationships

From the post:

I just uploaded a new tool aimed to be used both as a reference manual and initial contact for Bio4j domain model: Bio4jExplorer

Bio4jExplorer allows you to:

  • Navigate through all nodes and relationships
  • Access the javadocs of any node or relationship
  • Graphically explore the neighbourhood of a node/relationship
  • Look up for the different indexes that may serve as an entry point for a node
  • Check incoming/outgoing relationships of a specific node
  • Check start/end nodes of a specific relationship

And take note:

For those interested on how this was done, on the server side I created an AWS SimpleDB database holding all the information about the model of Bio4j, i.e. everything regarding nodes, relationships, indexes… (here you can check the program used for creating this database using java aws sdk)

Meanwhile, in the client side I used Flare prefuse AS3 library for the graph visualization.

When people are this productive as well as a benefit to the community, I am deeply envious but glad for them (and the rest of us) at the same time. Simply must work harder. 😉

October 6, 2011

PostScript as a Programming Language for Bioinformatics

Filed under: Bioinformatics,PostScript,Visualization — Patrick Durusau @ 5:36 pm

PostScript as a Programming Language for Bioinformatics

From the post:

“PostScript (PS) is an interpreted, stack-based programming language. It is best known for its use as a page description language in the electronic and desktop publishing areas.”[wikipedia]. In this post, I’ll show how I’ve used to create a simple and lightweight view of the genome.

Awesome in a number of respects! Have you used PostScript to visualize a topic map? Not that it would be likely to be a production device but the discipline of doing it could be interesting.

KDD and MUCMD 2011

Filed under: Bioinformatics,Biomedical,Data Mining,Knowledge Discovery — Patrick Durusau @ 5:33 pm

KDD and MUCMD 2011

An interesting review of KDD and MUCMD (Meaningful Use of Complex Medical Data) 2011:

At KDD I enjoyed Stephen Boyd’s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler’s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with.

Cites a number their favorite papers. Which ones are yours?

October 3, 2011

Automated extraction of domain-specific clinical ontologies – Weds Oct. 5th

Filed under: Bioinformatics,Biomedical,Ontology,SNOMED — Patrick Durusau @ 7:09 pm

Automated extraction of domain-specific clinical ontologies by Chimezie Ogbuji from Case Western Research University School of Medicine. 10 AM PT Weds Oct. 5, 2011.

Full NCBO Webinar schedule: http://www.bioontology.org/webinar-series

ABSTRACT:

A significant set of challenges in the use of large, source ontologies in the medical domain include: automated translation, customization of source ontologies, and performance issues associated with the use of logical reasoning systems to interpret the meaning of a domain captured in a formal knowledge representation.

SNOMED-CT and FMA are two reference ontologies that cover much of the domain of clinical medicine and motivate a better means for the re-use of such ontologies. In this presentation, the author will present a set of automated methods (and tools) for segmenting, merging, and surveying modules extracted from these ontologies for a specific domain.

I’m interested generally but in particular about the merging aspects, for obvious reasons. Another reason to be interested is some research I encountered recently on “outliers” in reasoning systems. Apparently there is a class of reasoning systems that simply “fall over” if they encounter a concept they recognize (or “think” they do) only to find it has some property (what makes it an “outlier”) that they don’t. Seems rather fragile to me but I haven’t finished running it to ground. Curious how these methods and tools handle the “outlier” issue.

SPEAKER BIO:

Chimezie is a senior research associate in the Clinical Investigations Department of the Case Western Research University School of Medicine where he is responsible for managing, developing, and implementing Clinical and Translational Science Collaborative (CTSC) projects as well as clinical, biomedical, and administrative informatics projects for the Case Comprehensive Cancer Center.

His research interests are in applied ontology, knowledge representation, content repository infrastructure, and medical informatics. He has a BS in computer engineering from the University of Illinois and is a part-time PhD student in the Case Western School of Engineering. He most recently appeared as a guest editor in IEEE Internet Computing’s special issue on Personal Health Records in the August 2011 edition.

DETAILS:

——————————————————-
To join the online meeting (Now from mobile devices!)
——————————————————-
1. Go to https://stanford.webex.com/stanford/j.php?ED=107799137&UID=0&PW=NNjE3OWYzODk3&RT=MiM0
2. If requested, enter your name and email address.
3. If a password is required, enter the meeting password: ncbo
4. Click “Join”.

——————————————————-
To join the audio conference only
——————————————————-
To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code.
Call-in toll number (US/Canada): 1-650-429-3300
Global call-in numbers: https://stanford.webex.com/stanford/globalcallin.php?serviceType=MC&ED=107799137&tollFree=0

Access code:926 719 478

September 27, 2011

A Faster LZ77-Based Index

Filed under: Bioinformatics,Biomedical,Indexing — Patrick Durusau @ 7:19 am

A Faster LZ77-Based Index by Travis Gagie and Pawel Gawrychowski.

Abstract:

Suppose we are given an AVL-grammar with $r$ rules for a string (S [1..n]) whose LZ77 parse consists of $z$ phrases. Then we can add $\Oh{z \log \log z}$ words and obtain a compressed self-index for $S$ such that, given a pattern (P [1..m]), we can list the occurrences of $P$ in $S$ in $\Oh{m^2 + (m + \occ) \log \log n}$ time.

Not the best abstract I have ever read. At least in terms of attracting the most likely audience to be interested.

I would have started with: “Indexing of genomes, which are 99.9% same, can be improved in terms of searching, response times and reporting of secondary occurrences.” Then follow with the technical description of the contribution. Don’t make people work for a reason to read the paper.

Any advancement in indexing, but particularly in an area like genomics, is important to topic maps.


Update: See the updated version of this paper: A Faster Grammar-Based Self-Index.

September 23, 2011

Top Scoring Pairs for Feature Selection in Machine Learning and Applications to Cancer Outcome Prediction

Filed under: Bioinformatics,Biomedical,Classifier,Machine Learning,Prediction — Patrick Durusau @ 6:15 pm

Top Scoring Pairs for Feature Selection in Machine Learning and Applications to Cancer Outcome Prediction by Ping Shi, Surajit Ray, Qifu Zhu and Mark A Kon.

BMC Bioinformatics 2011, 12:375 doi:10.1186/1471-2105-12-375 Published: 23 September 2011

Abstract:

Background

The widely used k top scoring pair (k-TSP) algorithm is a simple yet powerful parameter-free classifier. It owes its success in many cancer microarray datasets to an effective feature selection algorithm that is based on relative expression ordering of gene pairs. However, its general robustness does not extend to some difficult datasets, such as those involving cancer outcome prediction, which may be due to the relatively simple voting scheme used by the classifier. We believe that the performance can be enhanced by separating its effective feature selection component and combining it with a powerful classifier such as the support vector machine (SVM). More generally the top scoring pairs generated by the k-TSP ranking algorithm can be used as a dimensionally reduced subspace for other machine learning classifiers.

Results

We developed an approach integrating the k-TSP ranking algorithm (TSP) with other machine learning methods, allowing combination of the computationally efficient, multivariate feature ranking of k-TSP with multivariate classifiers such as SVM. We evaluated this hybrid scheme (k-TSP+SVM) in a range of simulated datasets with known data structures. As compared with other feature selection methods, such as a univariate method similar to Fisher’s discriminant criterion (Fisher), or a recursive feature elimination embedded in SVM (RFE), TSP is increasingly more effective than the other two methods as the informative genes become progressively more correlated, which is demonstrated both in terms of the classification performance and the ability to recover true informative genes. We also applied this hybrid scheme to four cancer prognosis datasets, in which k-TSP+SVM outperforms k-TSP classifier in all datasets, and achieves either comparable or superior performance to that using SVM alone. In concurrence with what is observed in simulation, TSP appears to be a better feature selector than Fisher and RFE in some of the cancer datasets

Conclusions

The k-TSP ranking algorithm can be used as a computationally efficient, multivariate filter method for feature selection in machine learning. SVM in combination with k-TSP ranking algorithm outperforms k-TSP and SVM alone in simulated datasets and in some cancer prognosis datasets. Simulation studies suggest that as a feature selector, it is better tuned to certain data characteristics, i.e. correlations among informative genes, which is potentially interesting as an alternative feature ranking method in pathway analysis.

Knowing the tools that are already in use in bioinformatics will help you design topic map applications of interest to those in that field. And this is a very nice combination of methods to study on its own.

September 17, 2011

Got Hadoop?

Filed under: Bioinformatics,Biomedical,Hadoop — Patrick Durusau @ 8:12 pm

Got Hadoop?

This is going to require free registration at Genomeweb but I think it will be worth it. (Genomeweb also offers $premium content but I haven’t tried any of it, yet.)

Nice overview of Hadoop in genome research.

Annoying in that it lists the following projects, sans hyperlinks. I have supplied the project listing with hyperlinks, just in case you are interested in Hadoop and genome research.

Crossbow: Whole genome resequencing analysis; SNP genotyping from short reads
Contrail: De novo assembly from short sequencing reads
Myrna: Ultrafast short read alignment and differential gene expression from large RNA-seq eakRanger: Cloud-enabled peak caller for ChIP-seq data
Quake: Quality-aware detection and sequencing error correction tool
BlastReduce: High-performance short read mapping (superceded by CloudBurst)
CloudBLAST*: Hadoop implementation of NCBI’s Blast
MrsRF: Algorithm for analyzing large evolutionary trees

*CloudBLAST was the only project without a webpage or similar source of information. This is a paper, perhaps the original paper on the technique. Searching for any of these techniques reveals a wealth of material on using Hadoop in bioinformatics.

Topic maps can capture your path through data (think of bread crumbs or string). So when today you think, “I should have gone left, rather than right”, you can retrace your steps and take a another path. Try that with a Google search. If you are lucky, you may get the same ads. 😉

You can also share your bread crumbs or string with others, but that is a story for another time.

September 8, 2011

Bioportal 3.2

Filed under: Bioinformatics,Biomedical,Ontology — Patrick Durusau @ 5:50 pm

Bioportal 3.2

From the announcement:

The National Center for Biomedical Ontology is pleased to announce the release of BioPortal 3.2.

New features include updates to the Web interface and Web services:

Added Ontology Recommender feature, http://bioportal.bioontology.org/recommender
Added support for access control for viewing ontologies
Added link to subscribe to BioPortal Notes emails
Synchronized “Jump To” feature with ontology parsing and display
Added documentation on Ontology Groups
Annotator Web service – disabled use of “longest only” parameter when also selecting “ontologies to expand” parameter
Removed the metric “Number of classes without an author”
Handling of obsolete terms, part 1 – term name is grayed out and element is returned in Web service response for obsolete terms from OBO and RRF ontologies. This feature will be extended to cover OWL ontologies in a subsequent release.

Bug Fix

Fixed calculation of “Classes with no definition” metric
Added re-direct from old BioPortal URL format to new URL format to provide working links from archived search results

Firefox Extension for NCBO API Key:

To make it easier to test Web service calls from your browser, we have released the NCBO API Key Firefox Extension. This extension will automatically add your API Key to NCBO REST URLs any time you visit them in Firefox. The extension is available at Mozilla’s Add-On site. To use the extension, follow the installation directions, restart Firefox, and add your API Key into the “Options” dialog menu on the Add-Ons management screen. After that, the extension will automatically append your stored API Key any time you visit http://rest.bioontology.org.

Upcoming software license change:

The next release of NCBO software will be under the two-clause BSD license rather than under the currently used three-clause BSD license. This change should not affect anyone’s use of NCBO software and this change is to a less restrictive license. More information about these licenses is available at the site: http://www.opensource.org/licenses. Please contact support at bioontology.org with any questions concerning this change.

Even if you aren’t active in the bioontology area, you need to spend some time with this site.

September 6, 2011

Sage Bionetworks Synapse Project – Webinar – Weds. 7 Sept. 2011

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 7:02 pm

Sage Bionetworks Synapse Project – Webinar – Weds. 7 Sept. 2011

Call-in Details:

——————————————————-
To join the online meeting (Now from mobile devices!)
——————————————————-
1. Go to https://stanford.webex.com/stanford/j.php?ED=107799137&UID=0&PW=NNjE3OWYzODk3&RT=MiM0
2. If requested, enter your name and email address.
3. If a password is required, enter the meeting password: ncbo
4. Click “Join”.

——————————————————-
To join the audio conference only
——————————————————-
To receive a call back, provide your phone number when you join the meeting, or call the number below and enter the access code.
Call-in toll number (US/Canada): 1-650-429-3300
Global call-in numbers: https://stanford.webex.com/stanford/globalcallin.php?serviceType=MC&ED=107799137&tollFree=0

Access code:926 719 478

Abstract:

The recent exponential growth of biological “omics” data has occurred concurrently with a decline in the number of New Molecular Entities approved by the FDA, proving that biological research productivity does not scale with biological data generation and the analysis and interpretation of genomic data is a bottleneck in the development of new treatments. Sage Bionetworks’ mission is to catalyze a cultural transition from the traditional single lab, single-company, and single-therapy R&D paradigm to a model with broad precompetitive collaboration on the analysis of large scale data in medical sciences. Part of Sage’s solution is Synapse, a platform for open, reproducible data-driven science, which will support the reusability of information facilitated by ontology-based services and applications directed at scientific researchers and data curators. Sage Bionetworks is actively pursuing the acquisition, curation, statistical quality control, and hosting of datasets that integrate both clinical phenotype and genomic data along with an intermediate molecular layer such as gene expression or proteomic data. We expect hosting these sorts of unique, integrative, high value datasets in the public domain on Synapse will seed a variety of analytical approaches to drive new treatments based on better understanding of disease states and the biological effects of existing drugs. In this webinar, Dr. Michael Kellen, Director of Technology at Sage Bionetworks will provide a demonstration of an alpha version of the Synapse platform, and discuss its application to clinical science.

Interesting claim about the decline in the number of New Molecular Entities (NMEs) approved by the FDA, see: NMEs approved by CDER. Approvals are on average about the same. But then applications for NMEs have to be filed in order to be approved.

Just for background reading, you might want to look at: New Chemical Entity over at Wikipedia.

Or, The Scope of New Chemical Entity Exclusivity and FDA’s “Umbrella” Exclusivity Policy

I don’t disagree that better data analysis tools are needed but remain puzzled what the FDA approval rate for NMEs has to do with the problem.

August 22, 2011

Bio-recipes (Bioinformatics recipes) in Darwin

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 7:43 pm

Bio-recipes (Bioinformatics recipes) in Darwin

If you are working on topic maps and bioinformatics, you are likely to find this a useful resource.

From the webpage:

Bio-recipes are a collection of Darwin example programs. They show how to solve standard problems in Bioinformatics. Each bio-recipe consists of an introduction, explanations, graphs, figures, and most importantly, Darwin commands (the input commands and the output that they produce) that solve the given problem.

Darwin is an interactive language of the same lineage as Maple designed to solve problems in Bioinformatics. It relies on a simple language for the interactive user, plus the infrastructure necessary for writing object oriented libraries, plus very efficient primitive operations. The primitive operations of Darwin are the most common and time consuming operations typical of bioinformatics, including linear algebra operations.

The reasons behind this particular format are the following.

  1. It is much easier to understand an algorithm or a procedure or even a theorem, when it is illustrated with a running example.
  2. The procedures, as written, may be run on different data and hence serve a useful purpose.
  3. It is an order of magnitude easier to modify a correct, existing program, than to write a new one from scratch. This is particularly true for non-computer scientists.
  4. The full examples show some features of the language and of the system that may not known to the casual user of the Darwin, hence they serve a tutorial purpose.

BTW, see also:

DARWIN – A Genetic Algorithm Programming Language

The Darwin Manual

August 18, 2011

BMC Bioinformatics

Filed under: Bioinformatics,Biomedical,Clustering — Patrick Durusau @ 6:49 pm

BMC Bioinformatics

From the webpage:

BMC Bioinformatics is an open access journal publishing original peer-reviewed research articles in all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics (ISSN 1471-2105) is indexed/tracked/covered by PubMed, MEDLINE, BIOSIS, CAS, EMBASE, Scopus, ACM, CABI, Thomson Reuters (ISI) and Google Scholar.

Let me give you a sample of what you will find here:

MINE: Module Identification in Networks by Kahn Rhrissorrakrai and Kristin C Gunsalus. BMC Bioinformatics 2011, 12:192 doi:10.1186/1471-2105-12-192.

Abstract:

Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks.

Medicine isn’t my field by profession (although I enjoy reading about it) but it doesn’t take much to see the applicability of an “agglomerative clustering method” to other highly interconnected networks.

Reading across domain specific IR publications can help keep you from re-inventing the wheel or perhaps sparking an idea for a better wheel of your own making.

August 17, 2011

Virtual Cell Software Repository

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 6:49 pm

Virtual Cell Software Repository

From the webpage:

Developing large volume multi-scale systems dynamics interpretation technology is very important source for making virtual cell application systems. Also, this technology is focused on the core research topics in a post-genome era in order to maintain the national competitive power. It is new analysis technology which can analyze multi-scale from nano level to physiological level in system level. Therefore, if using excellent information technology and super computing power in our nation, we can hold a dominant position in the large volume multi-scale systems dynamics interpretation technology. In order to take independent technology, we need to research a field of study which have been not well known in the bio system informatics technology like the large volume multi-scale systems dynamics interpretation technology.

The purpose of virtual cell application systems is developing the analysis technology and service which can model bio application circuits based on super computing technology. For success of virtual cell application systems based on super computing power, we have researched large volume multi-scale systems dynamics technology as a core sub technology.

  • Developing analysis and modeling technology of multi-scale convergence information from nano level to physiological level
  • Developing protein structure modeling algorithm using multi-scale bio information
  • Developing quality and quantity character analysis technology of multi-scale networks
  • Developing protein modification search algorithm
  • Developing large volume multi-scale systems dynamics interpretation technology interpreting possible circumstances in complex parameter spaces

Amazing set of resources available here:

PSExplorer: Parameter Space Explorer

Mathematical models of biological systems often have a large number of parameters whose combinational variations can yield distinct qualitative behaviors. Since it is intractable to examine all possible combinations of parameters for nontrivial biological pathways, it is required to have a systematic way to explore the parameter space in a computational way so that distinct dynamic behaviors of a given pathway are estimated.

We present PSExplorer, an efficient computational tool to explore high dimensional parameter space of computational models for identifying qualitative behaviors and key parameters. The software supports input models in SBML format. It provides a friendly graphical user interface allowing users to vary model parameters and perform time-course simulations at ease. Various graphical plotting features helps users analyze the model dynamics conveniently. Its output is a tree structure that encapsulates the parameter space partitioning results in a form that is easy to visualize and provide users with additional information about important parameters and sub-regions with robust behaviors.

MONET: MOdularized NETwork learning

Although gene expression data has been continuously accumulated and meta-analysis approaches have been developed to integrate independent expression profiles into larger datasets, the amount of information is still insufficient to infer large scale genetic networks. In addition, global optimization such as Bayesian network inference, one of the most representative techniques for genetic network inference, requires tremendous computational load far beyond the capacity of moderate workstations.

MONET is a Cytoscape plugin to infer genome-scale networks from gene expression profiles. It alleviates the shortage of information by incorporating pre-existing annotations. The current version of MONET utilizes thousands of parallel computational cores in the supercomputing center in KISTI, Korea, to cope with the computational requirement for large scale genetic network inference.

RBSDesigner

RBS Designer was developed to computationally design synthetic ribosome binding sites (RBS) to control gene expression levels. Generally transcription processes are the major target for gene expression control, however, without considering translation processes the control could lead to unexpected expression results since translation efficiency is highly affected by nucleotide sequences nearby RBS such as coding sequences leading to distortion of RBS secondary structure. Such problems obscure the intuitive design of RBS nucleotides with a desired level of protein expression. We developed RBSDesigner based on a mathematical model on translation initiation to design synthetic ribosome binding sites that yield a desired level of expression of user-specified coding sequences.

SBN simulator: Switching Boolean Networks Simulator

Switching Boolean Networks Simulator(SBNsimulator) was developed to simulate large-scale signaling network. Boolean Networks is widely used in modeling signaling networks because of its straightforwardness, robustness, and compatibility with qualitative data. Signaling networks are not completely known yet in Biology. Because of this, there are gaps between biological reality and modeling such as inhibitor-only or activator-only in signaling networks. Synchronous update algorithm in threshold Boolean network has limitation which cannot sample differences in the speed of signal propagation. To overcome these limitation which are modeling anomaly and Limitation of synchronous update algorithm, we developed SBNsimulator. It can simulate how each node effect to target node. Therefore, It can say which node is important for signaling network.

MKEM: Multi-level Knowledge Emergence Model

Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypotheses and expand knowledge. We propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships.

The system constitutes of two parts, tagger and the extractor (may require compilation)

A sentence of interest is given to the tagger which then proceeds to the creation of rule sets. The tagger stores this in a folder by the name of “ruleList”. These rule sets are then given by copying this folder to the extractor directory.

I blogged about an article on this project at: MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge.

Biodata Mining

Filed under: Bioinformatics,Biomedical,Data Mining — Patrick Durusau @ 6:47 pm

Biodata Mining

From the webpage:

BioData Mining is an open access, peer reviewed, online journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data.

What you would have seen since 1 July 2011:

An R Package Implementation of Multifactor Dimensionality Reduction

Hill-Climbing Search and Diversification within an Evolutionary Approach to Protein Structure Prediction

Detection of putative new mutacins by bioinformatic analysis using available web tools

Evolving hard problems: Generating human genetics datasets with a complex etiology

Taxon ordering in phylogenetic trees by means of evolutionary algorithms

Enjoy!

August 4, 2011

NCBI Handbook

Filed under: Bioinformatics,Biomedical — Patrick Durusau @ 6:20 pm

NCBI Handbook

From the website:

Bioinformatics consists of a computational approach to biomedical information management and analysis. It is being used increasingly as a component of research within both academic and industrial settings and is becoming integrated into both undergraduate and postgraduate curricula. The new generation of biology graduates is emerging with experience in using bioinformatics resources and, in some cases, programming skills.

The National Center for Biotechnology Information (NCBI) is one of the world’s premier Web sites for biomedical and bioinformatics research. Based within the National Library of Medicine at the National Institutes of Health, USA, the NCBI hosts many databases used by biomedical and research professionals. The services include PubMed, the bibliographic database; GenBank, the nucleotide sequence database; and the BLAST algorithm for sequence comparison, among many others.

Although each NCBI resource has online help documentation associated with it, there is no cohesive approach to describing the databases and search engines, nor any significant information on how the databases work or how they can be leveraged, for bioinformatics research on a larger scale. The NCBI Handbook is designed to address this information gap.

An extraordinary resource for learning about bioinformatics information sources.

July 31, 2011

Journal of Biomedical Semantics

Filed under: Bioinformatics,Biomedical,Searching,Semantics — Patrick Durusau @ 7:49 pm

Journal of Biomedical Semantics

From the webpage:

Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas:

Infrastructure for biomedical semantics: focusing on semantic resources and repositoires, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability.

Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.

As of 31 July 2011, here are the titles of the “latest” articles:

A shortest-path graph kernel for estimating gene product semantic similarity Alvarez MA, Qi X and Yan C Journal of Biomedical Semantics 2011, 2:3 (29 July 2011)

Semantic validation of the use of SNOMED CT in HL7 clinical documents Heymans S, McKennirey M and Phillips J Journal of Biomedical Semantics 2011, 2:2 (15 July 2011)

Protein interaction sentence detection using multiple semantic kernels Polajnar T, Damoulas T and Girolami M Journal of Biomedical Semantics 2011, 2:1 (14 May 2011)

Foundations for a realist ontology of mental disease Ceusters W and Smith B Journal of Biomedical Semantics 2010, 1:10 (9 December 2010)

Simple tricks for improving pattern-based information extraction from the biomedical literature Nguyen QL, Tikk D and Leser U Journal of Biomedical Semantics 2010, 1:9 (24 September 2010)

The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows Katayama T, Arakawa K, Nakao M, Ono K, Aoki-Kinoshita KF, Yamamoto Y, Yamaguchi A, Kawashima S et al. Journal of Biomedical Semantics 2010, 1:8 (21 August 2010)

Oh, did I mention this is an open access journal?

July 18, 2011

The Future of Hadoop in Bioinformatics

Filed under: BigData,Bioinformatics,Hadoop,Heterogeneous Data — Patrick Durusau @ 6:44 pm

The Future of Hadoop in Bioinformatics: Hadoop and its ecosystem including MapReduce are the dominant open source Big Data solution by Bob Gourley.

From the post:

Earlier, I wrote on the use of Hadoop in the exciting, evolving field of Bioinformatics. I have since had the pleasure of speaking with Dr. Ron Taylor of Pacific Northwest National Library, the author of “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics“, on what’s changed in the half-year since its publication and what’s to come.

As Dr. Taylor expected, Hadoop and it’s “ecosystem” including MapReduce are the dominant open source Big Data solution for next generation DNA sequencing analysis. This is currently the sub-field generating the most data and requiring the most computationally expensive analysis. For example, de novo assembly pieces together tens of millions of short reads (which may be 50 bases long on ABI SOLiD sequencers). To do so, every read needs to be compared to the others, which scales in proportion to n(logn), meaning, even assuming reads that are 100 base pairs in length and a human genome of 3 billion pairs, analyzing an entire human genome will take 7.5 times longer than if it scaled linearly. By dividing the task up into a Hadoop cluster, the analysis will be faster and, unlike other high performance computing alternatives, it can run on regular commodity servers that are much cheaper than custom supercomputers. This, combined with the savings from using open source software, ease of use due to seamless scaling, and the strength of the Hadoop community make Hadoop and related software the parallelization solution of choice in next generation sequencing.In other areas, however, traditional HPC is still more common and Hadoop has not yet caught on. Dr. Taylor believes that in the next year to 18 months, this will change due to the following trends:

So, over the next year to eighteen months, what do you see as the evolution of topic map software and services?

Or what problems do you see becoming apparent in bioinformatics or other areas (like the Department of Energy’s knowledgebase) that will require topic maps?

(More on the DOE project later this week.)

July 6, 2011

The Neo4j Rest API. My Notebook

Filed under: Bioinformatics,Biomedical,Java,Neo4j — Patrick Durusau @ 2:14 pm

The Neo4j Rest API. My Notebook

From the post:

Neo4j is a open-source graph engine implemented in Java. This post is my notebook for the Neo4J-server, a server combining a REST API and a webadmin application into a single stand-alone server.

Nothing new in this Neo4j summary but Pierre Lindenbaum profiles himself: “PhD in Virology, bioinformatics, genetics, science, geek, java.”

Someone worth watching in the Neo4j/topic map universe.

June 28, 2011

Big Data Genomics – How to efficiently store and retrieve mutation

Filed under: Bioinformatics,Biomedical,Cassandra — Patrick Durusau @ 9:49 am

Big Data Genomics – How to efficiently store and retrieve mutation data by David Suvee.

About the post:

This blog post is the first one in a series of articles that describe the use of NoSQL databases to efficiently store and retrieve mutation data. Part one introduces the notion of mutation data and describes the conceptual use of the Cassandra NoSQL datastore.

From the post:

The only way to learn a new technology is by putting it into practice. Just try to find a suitable use case in your immediate working environment and give it go. In my case, it was trying to efficiently store and retrieve mutation data through a variety of NoSQL data stores, including Cassandra, MongoDB and Neo4J.

Promises to be an interesting series of posts that focus on a common data set and problem!

June 23, 2011

Bio4j – as an AWS snapshot

Filed under: Bio4j,Bioinformatics,Biomedical — Patrick Durusau @ 1:54 pm

Bio4j current release now available as an AWS snapshot

From the post:

For those using AWS (or willing to…) I just created a public snapshot containing the last version of Bio4j DB.

The snapshot details are the following:

  • Snapshot id: snap-25192d4c
  • Snapshot region: EU West (Ireland)
  • Snapshot size: 90 GB

The whole DB is under the folder ‘bio4jdb’.
In order to use it, just create a Bio4jManager instance and start navigating the graph!

Very cool!

June 22, 2011

DocumentLens

Filed under: Bioinformatics,Biomedical,DocumentLens,Navigation — Patrick Durusau @ 6:37 pm

DocumentLens – A Revolution In How Researchers Access Information & Colleagues

From the post:

Keeping up with the flood of scientific information has been challenging…Spotting patterns and extracting useful information has been even harder. DocumentLens™ has just made it easier to gain insightful knowledge from information and to share ideas with collaborators.

Praxeon, Inc., the award-winning Boston-based leader in delivering knowledge solutions for the Healthcare and Life Science communities, today announced the launch of DocumentLens™. Their cloud-based web application helps scientific researchers deal with the ever increasing deluge of online and electronic data and information from peer-reviewed journals, regulatory sites, patents and proprietary sources. DocumentLens provides an easy-to-utilize environment to enrich discovery, enhance idea generation, shorten the investigation time, improve productivity and engage collaboration.

“One of the most challenging problems researchers face is collecting, integrating and understanding new information. Keeping up with peer-reviewed journals, regulatory sites, patents and proprietary sources, even in a single area of research, is time consuming. But failure to keep up with information from many different sources results in knowledge gaps and lost opportunities,” stated Dr. Dennis Underwood, Praxeon CEO.

“DocumentLens is a web-based tool that enables you to ask the research question you want to ask – just as you would ask a colleague,” Underwood went on to say. “You can also dive deeper into research articles, explore the content and ideas using DocumentLens and integrate them with sources that you trust and rely on. DocumentLens takes you not only to the relevant documents, but to the most relevant sections saving an immense amount of time and effort. Our DocumentLens Navigators open up your content, using images and figures, chemistry and important topics. Storylines provide a place to accumulate and share insights with colleagues.”

Praxeon has created www.documentlens.com, a website devoted to the new application that contains background on the use of the software, the Eye of the Lens blog (http://www.documentlens.com/blog), and a live version of DocumentLens™ for visitors to try out free-of-charge to see for themselves firsthand the value of the application.

OK, so I do one of the sandbox pre-composed queries: “What is the incidence and prevalence of dementia?”

and DocumentLens reports back that page 15 of a document has relevant information (note, not the entire document but a particular page), highlighted material included:

conducting a collaborative, multicentre trial in FTLD. Such a collaborative effort will certainly be necessary to recruit the cohort of over 200 FTLD patients per trial that may be needed to demonstrate treatment effects in FTLD.[194]

3. Ratnavalli E, Brayne C, Dawson K, et al. The prevalence of frontotemporal dementia. Neurology 2002;58:1615–21. [PubMed: 12058088]

4. Mercy L, Hodges JR, Dawson K, et al. Incidence of early-onset dementias in Cambridgeshire,

8. Gislason TB, Sjogren M, Larsson L, et al. The prevalence of frontal variant frontotemporal dementia and the frontal lobe syndrome in a population based sample of 85 year olds. J Neurol Neurosurg

The first text block has no obvious (or other) relevance to the question of incidence or prevalence of dementia.

The incomplete marking of citations 4 and 8 occurs for no apparent reason.

Like any indexing resource, its value depends on the skill of the indexers.

There are the usual issues, how do I reliably share information with other DocumentLens or even non-DocumentLens users? Can I and other users create interoperable files in parallel? Do we need or required to have a common vocabulary? How do we integrate materials that use other vocabularies?

(Do send a note to the topic map naysayers. Product first, then start selling it to customers.)

June 13, 2011

Linking Science and Semantics… (webinar)
15 June 2011 – 10 AM PT (17:00 GMT)

Filed under: Bioinformatics,Biomedical,OWL,RDF,Semantics — Patrick Durusau @ 7:03 pm

Linking science and semantics with the Annotation Ontology and the SWAN Annotation Tool

Abstract:

The Annotation Ontology (AO) is an open ontology in OWL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation.

The SWAN Annotation Tool, recently renamed DOMEO (Document Metadata Exchange Organizer), is an extensible web application enabling users to visually and efficiently create and share ontology-based stand-off annotation metadata on HTML or XML document targets, using the Annotation Ontology RDF model. The tool supports manual, fully automated, and semi-automated annotation with complete provenance records, as well as personal or community annotation with access authorization and control.
[AO] http://code.google.com/p/annotation-ontology

I’m interested in how “stand-off” annotation is being handled, being an overlapping markup person myself. Also curious how close it comes to HyTime like mechanisms.

More after the webinar.

« Newer PostsOlder Posts »

Powered by WordPress