Archive for the ‘Information Science’ Category

Threatening the President: A Signal/Noise Problem

Tuesday, October 18th, 2016

Even if you can’t remember why the pointy end of a pencil is important, you too can create national news.

This bit of noise reminded me of an incident when I was in high school where some similar type person bragged in a local bar about assassinating then President Nixon*. Was arrested and sentenced to several years in prison.

At the time I puzzled briefly over the waste of time and effort in such a prosecution and then promptly forgot it.

Until this incident with the overly “clever” Trump supporter.

To get us off on the same foot:

18 U.S. Code § 871 – Threats against President and successors to the Presidency

(a) Whoever knowingly and willfully deposits for conveyance in the mail or for a delivery from any post office or by any letter carrier any letter, paper, writing, print, missive, or document containing any threat to take the life of, to kidnap, or to inflict bodily harm upon the President of the United States, the President-elect, the Vice President or other officer next in the order of succession to the office of President of the United States, or the Vice President-elect, or knowingly and willfully otherwise makes any such threat against the President, President-elect, Vice President or other officer next in the order of succession to the office of President, or Vice President-elect, shall be fined under this title or imprisoned not more than five years, or both.

(b) The terms “President-elect” and “Vice President-elect” as used in this section shall mean such persons as are the apparent successful candidates for the offices of President and Vice President, respectively, as ascertained from the results of the general elections held to determine the electors of President and Vice President in accordance with title 3, United States Code, sections 1 and 2. The phrase “other officer next in the order of succession to the office of President” as used in this section shall mean the person next in the order of succession to act as President in accordance with title 3, United States Code, sections 19 and 20.

Commonplace threatening letters, calls, etc., aren’t documented for the public but President Barack Obama has a Wikipedia page devoted to the more significant ones: Assassination threats against Barack Obama.

Just as no one knows you are a dog on the internet, no one can tell by looking at a threat online if you are still learning how to use a pencil or are a more serious opponent.

Leaving to one side that a truly serious opponent allows actions to announce their presence or goal.

The treatment of even idle bar threats as serious is an attempt to improve the signal-to-noise ratio:

In analog and digital communications, signal-to-noise ratio, often written S/N or SNR, is a measure of signal strength relative to background noise. The ratio is usually measured in decibels (dB) using a signal-to-noise ratio formula. If the incoming signal strength in microvolts is Vs, and the noise level, also in microvolts, is Vn, then the signal-to-noise ratio, S/N, in decibels is given by the formula: S/N = 20 log10(Vs/Vn)

If Vs = Vn, then S/N = 0. In this situation, the signal borders on unreadable, because the noise level severely competes with it. In digital communications, this will probably cause a reduction in data speed because of frequent errors that require the source (transmitting) computer or terminal to resend some packets of data.

I’m guessing the reasoning is the more threats that go unspoken, the less chaff the Secret Service has to winnow in order to uncover viable threats.

One assumes they discard physical mail with return addresses of prisons, mental hospitals, etc., or at most request notice of the release of such people from state custody.

Beyond that, they don’t appear to be too picky about credible threats, noting that in one case an unspecified “death ray” was going to be used against President Obama.

The EuroNews description of that case must be shared:

Two American men have been arrested and charged with building a remote-controlled X-ray machine intended for killing Muslims and other perceived enemies of the U.S.

Following a 15-month investigation launched in April 2012, Glenn Scott Crawford and Eric J. Feight are accused of developing the device, which the FBI has described as “mobile, remotely operated, radiation emitting and capable of killing human targets silently and from a distance with lethal doses of radiation”.

Sure, right. I will post a copy of the 67-page complaint, which uses terminology rather loosely, to say the least, in a day or so. Suffice it to say that the defendants never acquired a source for the needed radioactivity production.

On the order of having a complete nuclear bomb but not nuclear material to make it into a nuclear bomb. You would be in more danger from the conventional explosive degrading than the bomb as a nuclear weapon.

Those charged with defending public officials want to deter the making of threats, so as to improve the signal/noise ratio.

The goal of those attacking public officials is a signal/noise ratio of exactly 0.0.

Viewing threats from an information science perspective suggests various strategies for either side. (Another dividend of studying information science.)

*They did find a good picture of Nixon for the White House page. Doesn’t look as much like a weasel as he did in real life. Gimp/Photoshop you think?

Astrostatistics: The Re-Emergence of a Statistical Discipline

Thursday, January 2nd, 2014

Astrostatistics: The Re-Emergence of a Statistical Discipline by Joseph M. Hilbe.

From the post:

If statistics can be generically understood as the science of collecting and analyzing data for the purpose of classification and prediction and of attempting to quantify and understand the uncertainty inherent in phenomena underlying data, surely astrostatistics must be considered as one of the oldest, if not the oldest, applications of statistical science to the study of nature. Astrostatistics is the discipline dealing with the statistical analysis of astronomical and astrophysical data. It also has been understood by most researchers in the area to incorporate astroinformatics, which is the science of gathering and digitalizing astronomical data for the purpose of analysis.

I mentioned that astrostatistics is a very old discipline—if we accept the broad criterion I gave for how statistics can be understood. Egyptian and Babylonian priests who assiduously studied the motions of the sun, moon, planets, and stars as long ago as 1500 BCE classified and attempted to predict future events for the purpose of knowing when to plant, determining when a new year began, and so forth. However, their predictions were infused by the attempt to understand the effects of the celestial motions on human affairs (astrology). Later, Thales (d 546 BCE), the Ionian Greek reputed to be both the first philosopher and mathematician, apparently began to divorce mythology from scientific investigation. He is credited with predicting an eclipse in 585 BCE, which he allegedly based on studies made of previous eclipses from records kept by Egyptian priests.

A short but interesting review of the history of astrostatistics and its increasing importance as the rate of astronomical data collection continues to increase.

And a call for more inter-disciplinary work between astronomers, astrophysicists, statisticians and information scientists.

The ability to cross over tribal (disciplinary) boundaries could be eased by cross-disciplinary mappings.

Advances in Neural Information Processing Systems 26

Sunday, December 8th, 2013

Advances in Neural Information Processing Systems 26

The NIPS 2013 conference ended today.

All of the NIPS 2013 papers were posted today.

I count three hundred and sixty (360) papers.

From the NIPS Foundation homepage:

The Foundation: The Neural Information Processing Systems (NIPS) Foundation is a non-profit corporation whose purpose is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects. Neural information processing is a field which benefits from a combined view of biological, physical, mathematical, and computational sciences.

The primary focus of the NIPS Foundation is the presentation of a continuing series of professional meetings known as the Neural Information Processing Systems Conference, held over the years at various locations in the United States, Canada and Spain.

Enjoy the proceedings collection!

I first saw this in a tweet by Benoit Maison.

Information Extraction from the Internet

Saturday, August 24th, 2013

Information Extraction from the Internet by Nan Tang.

From the description at Amazon ($116.22):

As the Internet continues to become part of our lives, there now exists an overabundance of reliable information sources on this medium. The temporal and cognitive resources of human beings, however, do not change. “Information Extraction from the Internet” provides methods and tools for Web information extraction and retrieval. Success in this area will greatly enhance business processes and provide information seekers new tools that allow them to reduce their searching time and cost involvement. This book focuses on the latest approaches for Web content extraction, and analyzes the limitations of existing technology and solutions. “Information Extraction from the Internet” includes several interesting and popular topics that are being widely discussed in the area of information extraction: data spasity and field-associated knowledge (Chapters 1–2), Web agent design and mining components (Chapters 3–4), extraction skills on various documents (Chapters 5–7), duplicate detection for music documents (Chapter 8), name disambiguation in digital libraries using Web information (Chapter 9), Web personalization and user-behavior issues (Chapters 10–11), and information retrieval case studies (Chapters 12–14). “Information Extraction from the Internet” is suitable for advanced undergraduate students and postgraduate students. It takes a practical approach rather than a conceptual approach. Moreover, it offers a truly reader-friendly way to get to the subject related to information extraction, making it the ideal resource for any student new to this subject, and providing a definitive guide to anyone in this vibrant and evolving discipline. This book is an invaluable companion for students, from their first encounter with the subject to more advanced studies, while the full-color artworks are designed to present the key concepts with simplicity, clarity, and consistency.

I discovered this volume while searching for the publisher of: On-demand Synonym Extraction Using Suffix Arrays.

As you can see from the description, a wide ranging coverage of information extraction interests.

All of the chapters are free for downloading at the publisher’s site.

iConcepts Press has a number of books and periodicals you may find interesting.

Strata 2013

Wednesday, January 23rd, 2013

Strata 2013

Feb. 26-28, 2013
Santa Clara, CA

From the website:

The breadth and depth of expertise at Strata is unsurpassed—with over 120 speakers and 100 presentations and events, you’ll find solutions to your most pressing data issues. The conference program covers strategy, technology, and policy:

  • Data-driven Business: Solve some of today’s thorniest business problems with big data, new interfaces, and the advent of ubiquitous computing.
  • Big Data for Enterprise IT: Create big data strategy, manage your first project, demystify vendor solutions, and understand how big data differs from BI.
  • Beyond Hadoop: Dive deep into Cassandra, Storm, Drill, and other emerging technologies.
  • Connected World: Explore the implications—and opportunities—as low-cost networks and sensors create an ever-connected world.
  • Data Science: Immerse yourself inside the world of data practictioners—from the hard science of new algorithms to cultural change and teambuilding.
  • Design: Make data matter with highly effective user experiences, using new interfaces, interactivity, and visualization.
  • Hadoop in Practice: Get practical lessons, integration tricks, and a glimpse of the road ahead.
  • Law, Ethics, and Open Data: Tackle the biggest issues in compliance, governance, and ethics in the era of open data and heightened privacy concerns.

OK, it’s not Balisage (Markup Olympics (Balisage) [No Drug Testing])) but it isn’t in August/Montreal either. 😉

Still, a great gathering of data/information folk, if more general than Balisage.

History of Information Organization (Infographic)

Thursday, March 8th, 2012

From Cartography to Card Catalogs [Infographic]: History of Information Organization

Mindjet has posted an infographic and blog post about the history of information organization. I have embedded the graphic below.

Let me preface my remarks by saying I have known people at Mindjet and it is a fairly remarkable organization. And to be fair, the history of information organization is of interest to me, although I am far from being a specialist in the field.

However, when a graphic jumps from “850 CE The First Byzantine Encyclopedia,” to “1276 CE Oldest Continuously Functioning Library” and informs the reader on the edge in between that was “3,000 years ago,” it seems to be lacking in precision or proofing, perhaps both.

Although information has to be summarized for such a presentation, I thought the rise of writing in Egypt/Sumeria would have merited a note, perhaps the library of Ashurbanipal (first library of the ancient Middle East) or the Library of Alexandria, just to name two. Noting you would have to go before Ashurbanipal to get 3,000 years ago. And there were written texts and collections of such texts for anywhere from 2,000 to 3,000 years before that.

I do appreciate that Mindjet doesn’t think information issues arose with the digital computer. I am hopeful that they will encourage a re-examination of older methods and solutions in hopes of finding clues to new solutions.

ODLIS: Online Dictionary for Library and Information Science

Friday, February 10th, 2012

ODLIS: Online Dictionary for Library and Information Science by Joan M. Reitz.

ODLIS is known to all librarians and graduate school library students but perhaps not to those of us who abuse library terminology in CS and related pursuits. Can’t promise it will make our usage any better but certainly won’t make it any worse. 😉

This would make a very interesting “term for a day” type resource.

Certainly one you should bookmark and browse at your leisure.

History of the Dictionary

ODLIS began at the Haas Library in 1994 as a four-page printed handout titled Library Lingo, intended for undergraduates not fluent in English and for English-speaking students unfamiliar with basic library terminology. In 1996, the text was expanded and converted to HTML format for installation on the WCSU Libraries Homepage under the title Hypertext Library Lingo: A Glossary of Library Terminology. In 1997, many more hypertext links were added and the format improved in response to suggestions from users. During the summer of 1999, several hundred terms and definitions were added, and a generic version was created that omitted all reference to specific conditions and practices at the Haas Library.

In the fall of 1999, the glossary was expanded to 1,800 terms, renamed to reflect its extended scope, and copyrighted. In February, 2000, ODLIS was indexed in Yahoo! under “Reference – Dictionaries – Subject.” It was also indexed in the WorldCat database, available via OCLC FirstSearch. During the year 2000, the dictionary was expanded to 2,600 terms and by 2002 an additional 800 terms had been added. From 2002 to 2004, the dictionary was expanded to 4,200 terms and cross-references were added, in preparation for the print edition. Since 2004, an additional 600 terms and definitions have been added.

Purpose of the Dictionary

ODLIS is designed as a hypertext reference resource for library and information science professionals, university students and faculty, and users of all types of libraries. The primary criterion for including a term is whether a librarian or other information professional might reasonably be expected to know its meaning in the context of his or her work. A newly coined term is added when, in the author’s judgment, it is likely to become a permanent addition to the lexicon of library and information science. The dictionary reflects North American practice; however, because ODLIS was first developed as an online resource available worldwide, with an e-mail contact address for feedback, users from many countries have contributed to its growth, often suggesting additional terms and commenting on existing definitions. Expansion of the dictionary is an ongoing process.

Broad in scope, ODLIS includes not only the terminology of the various specializations within library science and information studies but also the vocabulary of publishing, printing, binding, the book trade, graphic arts, book history, literature, bibliography, telecommunications, and computer science when, in the author’s judgment, a definition might prove useful to librarians and information specialists in their work. Entries are descriptive, with examples provided when appropriate. The definitions of terms used in the Anglo-American Cataloging Rules follow AACR2 closely and are therefore intended to be prescriptive. The dictionary includes some slang terms and idioms and a few obsolete terms, often as See references to the term in current use. When the meaning of a term varies according to the field in which it is used, priority is given to the definition that applies within the field with which it is most closely associated. Definitions unrelated to library and information science are generally omitted. As a rule, definition is given under an acronym only when it is generally used in preference to the full term. Alphabetization is letter-by-letter. The authority for spelling and hyphenation is Webster’s New World Dictionary of the American Language (College Edition). URLs, current as of date of publication, are updated annually.