Community conversations and a new package for full text by Scott Chamberlain and Karthik Ram.
ROpenSci announces they are reopening their public Google list.
We encourage you to sign up and post ideas for packages, solicit feedback on new ideas, and most importantly find other collaborators who share your domain interests. We also plan to use the list to solicit feedback on some of the bigger rOpenSci projects early on in the development phase allowing our community to shape future direction and also collaborate where appropriate.
Among the work that is underway:
Through time we have been attempting to unify our R packages that interact with individual data sources into single packages that handle one use case. For example, spocc aims to create a single entry point to many different sources (currently 6) of species occurrence data, including GBIF, AntWeb, and others.
Another area we hope to simplify is acquiring text data, specifically text from scholarly journal articles. We call this R package
fulltext
. The goal offulltext
is to allow a single user interface to searching for and retrieving full text data from scholarly journal articles. Rather than learning a different interface for each data source, you can learn one interface, making your work easier.fulltext
will likely only get you data, and make it easy to browse that data, and use it downstream for manipulation, analysis, and vizualization.We currently have R packages for a number of sources of scholarly article text, including for Public Library of Science (PLOS), Biomed Central (BMC), and eLife – which could all be included in
fulltext
. We can add more sources as they become available.Instead of us rOpenSci core members planning out the whole package, we'd love to get the community involved at the beginning.
The “individual data sources into single packages” sounds particularly ripe for enhancement with topic map based ideas.
Not a plea for topic map syntax or modeling, although either would make nice output options. The critical idea being to identify central subjects with key/value pairs to enable robust identification of subjects by later users.
Surface tokens with unexpressed contexts set hard boundaries to the usefulness and accuracy of search results. If we capture what is known to identity surface tokens, we enrich our world and the world of others.