Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 15, 2014

(String/text processing)++:…

Filed under: String Matching,Text Feature Extraction,Text Mining,Unicode — Patrick Durusau @ 2:49 pm

(String/text processing)++: stringi 0.2-3 released by Marek Gągolewski.

From the post:

A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds).

stringi is a package providing (but definitely not limiting to) replacements for nearly all the character string processing functions known from base R. While developing the package we had high performance and portability of its facilities in our minds.

Here is a very general list of the most important features available in the current version of stringi:

  • string searching:
    • with ICU (Java-like) regular expressions,
    • ICU USearch-based locale-aware string searching (quite slow, but working properly e.g. for non-Unicode normalized strings),
    • very fast, locale-independent byte-wise pattern matching;
  • joining and duplicating strings;
  • extracting and replacing substrings;
  • string trimming, padding, and text wrapping (e.g. with Knuth's dynamic word wrap algorithm);
  • text transliteration;
  • text collation (comparing, sorting);
  • text boundary analysis (e.g. for extracting individual words);
  • random string generation;
  • Unicode normalization;
  • character encoding conversion and detection;

and many more.

Interesting isn’t it? How CS keeps circling around back to strings?

Enjoy!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress