Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 14, 2011

Sexier, smarter, faster Information architecture with topic Maps

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:22 pm

Sexier, smarter, faster Information architecture with topic Maps

A bit dated now (4 years old) by Alexander Johannesen, but you can’t argue with the title. 😉

I wonder if that is like “faster, better, cheaper,” where you can have any one of those?

So you have to pick for your topic map sexier, smarter or faster?

Or has Alexander found a way to get all three?

You can find out what makes Alexander excited (not that I was curious on that score), as well as the basic concepts of topic maps.

Ignore the comment about slides 107-120, all the slides display.

October 13, 2011

Numbrary

Filed under: Data Source,Dataset — Patrick Durusau @ 7:00 pm

Numbrary

From the website:

Numbrary is a free online service dedicated to finding, using and sharing numbers on the web.

With 26,475 data tables from the US Department of Labor, I get to Producer Price Indexes (3428 items) and then to Commodities (WPU 101) and there is very nice access to the underlying data:

http://numbrary.com/sources/10d891fc1320-produce-price-index-commodi.

Except that I don’t know how that should (could?) be reconciled with other data? Or what “other” data that would be, save for “See Also” on the webpage, but I don’t know why I should see that data as well.

Beyond just my lack of experience with economic data, this may illustrate something about “transparency” in government.

Can a government be said to be “transparent” if it provides data that is no more “transparent” to voters than the lack of data?

What burden does it have to make data more than simply accessible, but also meaningful? (I am mindful of the credit disclosure laws that provided foot faults for those wishing to pursue members of the credit industry but that did not credit rate disclosures meaningful.)

Still, a useful source of data that I commend to your attention.

Peter Skomoroch – Delicious

Filed under: Data Source,Dataset — Patrick Durusau @ 6:59 pm

Peter Skomoroch – Delicious

As of today, 7845 links to data and data sources.

A prime candidate to illustrate that there is no shortage of data, but a serious shortage of meaningful navigation of data.

Processing

Filed under: Processing,Visualization — Patrick Durusau @ 6:59 pm

Processing

From the website:

Processing is an open source programming language and environment for people who want to create images, animations, and interactions. Initially developed to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing also has evolved into a tool for generating finished professional work. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.

  • Free to download and open source
  • Interactive programs using 2D, 3D or PDF output
  • OpenGL integration for accelerated 3D
  • For GNU/Linux, Mac OS X, and Windows
  • Projects run online or as double-clickable applications
  • Over 100 libraries extend the software into sound, video, computer vision, and more…

Neo4j – Delicious

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:59 pm

Neo4j – Delicious

As of October 12, 2011, 388 links. Another resource primarily for another project. Stay tuned.

SymmetricDS

Filed under: Data Replication,Database — Patrick Durusau @ 6:58 pm

SymmetricDS

From the website:

SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage.

By using database triggers, SymmetricDS guarantees that data changes are captured and atomicity is preserved. Support for database vendors is provided through a Database Dialect layer, with implementations for MySQL, Oracle, SQL Server, PostgreSQL, DB2, Firebird, HSQLDB, H2, and Apache Derby included.

This is very cool!

(Spotted by Marko Rodriguez)

Using A Graph Database To Power The “Web of Things”

Filed under: GraphDB,Graphs,Neo4j — Patrick Durusau @ 6:58 pm

Using A Graph Database To Power The “Web of Things”

From the post:

Rick Bullotta and Emil Eifrem discuss how to use graph databases to model the real world, people, systems and things, talking advantage of the relationships between various data elements.

Taped at QCon on 11 October 2011. Short of attending, it doesn’t get much fresher than that!

In Depth with Campaign Finance Data

Filed under: Data Source,Dataset — Patrick Durusau @ 6:57 pm

In Depth with Campaign Finance Data by Ethan Phelps-Goodman.

Introduction

Influence Explorer and TransparencyData are the Sunlight Foundation’s two main sources for data on money and influence in politics. Both sites are warehouses for a variety of datasets, including campaign finance, lobbying, earmarks, federal spending and various other corporate accountability datasets. The underlying data is the same for both sites, but the presentation is very different. Influence Explorer takes the most important or prominent entities in the data–such as state and federal politicians, well-known individuals, and large companies and organizations–and gives each its own page with easy to understand charts and graphs. TransparencyData, on the other hand, gives searchable access to the raw records that make up each Influence Explorer page. Influence Explorer can answer questions like, “who was the top donor to Obama’s presidential campaign?” TransparencyData lets you dig down into the details of every single donation to that campaign.

If you are interested in campaign finance data this is a very good starting point. At least you can get a sense for the difficulty in simply tracking the money. I think you will find that money can buy access, but that isn’t the same thing as influence. That more complicated.

Topic maps can help in several ways. First, there is the ability to consolidate information from a variety of sources so no one person has to try to assemble all the pieces. Second, the use of associations can help you discover patterns in relationships that may uncover some hidden (or relatively so) avenues of influence or access. Not to mention that being able to trade-up information with others, may help you build a better portfolio of data for when you go calling to exercise some influence.

Tracking Unique Terms in News Feeds

Filed under: Duplicates,News,Visualization — Patrick Durusau @ 6:57 pm

Tracking Unique Terms in News Feeds by Matthew Hurst.

From the post:

I’ve put together a simple system which reads news feeds (the BBC, NPR, the Economist and Reuters) in approximately real time and maintains a record of the distribution of terms found in the articles. It then indicates in a stream visualization the articles and unique terms that are observed by the system for the first time within them. The result being that articles which contain no new terms at all are grayed out.

The larger idea here is to build a ‘linguistic dashboard’ for the web which captures real time evolution of language.

This is a very cool idea! It could certainly be a news “filter” that would cut down on clutter in news feeds. No new terms = No new news? Something to think about.

Open – Videos

Filed under: Conferences,HTML — Patrick Durusau @ 6:56 pm

Open – Videos

For those of you who don’t think HTML5 and developers are all that weird:

Full-length videos from the first two TimesOpen events, HTML5 and Beyond, and Innovating Developer Culture, are now available. Approximately five (5!) hours in total, there’s a lot of good information.

We have the lineup in place for the next TimesOpen on Personalization & Privacy, taking place Tuesday October 25, 6:30 p.m., at the Times Building. Details and registration information will be posted soon (like next week).

Predicting What People Want

Filed under: Interface Research/Design,Marketing — Patrick Durusau @ 6:56 pm

If you haven’t see Steve Yegge’s rant about Google, which was meant to be internal for Google, you can read about it (with the full text) at: Google Engineer Accidentally Posts Rant About Google+.

Todd Wasserman reports:

Yegge went on to write, “Our Google+ team took a look at the aftermarket and said: ‘Gosh, it looks like we need some games. Let’s go contract someone to, um, write some games for us.’ Do you begin to see how incredibly wrong that thinking is now? The problem is that we are trying to predict what people want and deliver it for them.” (emphasis added)

That’s doable, the history of marketing in the 20th century has made that very clear. See Selling Blue Elephants.

What doesn’t work is for very talented programmers, academics, do-gooders, etc., to sit around in conference rooms to plan what other people ought to want.

What or should I say who is missing from the conference room?

Oh, yeah, the people we want to use or even pay for the service. Opps!

You may have guessed/predicted where this is going: The same is true for interfaces, computer or otherwise.

Successful interfaces happen by:

  1. Dumb luck
  2. Management/developers decide on presentation/features
  3. Testing with users and management/developers decide on presentation/features
  4. Testing with users and user feedback determines presentation/features

Care to guess which one I suspect Google used? If you picked door #3, you would be correct! (Sorry, no prize.)

True enough, management/developers also being users they won’t be wrong everytime.

Question: Would you go see a doctor who wasn’t wrong everytime?

I never thought I would recommend that anyone read marketing/advertising texts but I guess there is a time and place for everything. I would rather see you doing that than to see more interfaces that hide your hard and valuable work from users.

OK, this is a bit over long, let me summarize the rule for developing both programs (in terms of capabilities) and interfaces (in terms of features):

Don’t predict what people want! Go ask them!

October 12, 2011

Querying ElasticSearch from VIM

Filed under: ElasticSearch,JSON — Patrick Durusau @ 4:40 pm

Querying ElasticSearch from VIM

From the post:

I’m using ElasticSearch quite a bit and finally decided to make it easy to debug. I now write JSON queries with a .es extension. And have this in my .vim/filetype.vim file:

Debugging ElasticSearch results with Perl.

I just know Robert (Barta) has a one liner for this and thought this might temp him into commenting. 😉

Be careful with dictionary-based text analysis

Filed under: Dictionary,Semantics,Sentiment Analysis — Patrick Durusau @ 4:39 pm

Be careful with dictionary-based text analysis

Brendan O’Connor writes:

OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done for sentiment analysis: count positive and negative words (according to a sentiment polarity lexicon, which was derived from human raters or previous researchers’ intuitions), and then proclaim the output yields sentiment levels of the documents. More and more papers come out every day that do this. I’ve done this myself. It’s interesting and fun, but it’s easy to get a bunch of meaningless numbers if you don’t carefully validate what’s going on. There are certainly good studies in this area that do further validation and analysis, but it’s hard to trust a study that just presents a graph with a few overly strong speculative claims as to its meaning. This happens more than it ought to.

How does “measurement” of sentiment in a document differ from “measurement” of the semantics of terms in that document?

Have we traded “access” to large numbers of documents (think about the usual Internet search engine) for validated collections? By validated collections I mean the discipline-based indexes where the user did not have to weed out completely irrelevant results.

SDSC’s New Storage Cloud: ‘Flickr for Scientific Data’

Filed under: Cloud Computing — Patrick Durusau @ 4:39 pm

SDSC’s New Storage Cloud: ‘Flickr for Scientific Data’ by Michael Feldman.

From the post:

Last month, the San Diego Supercomputer Center launched what it believes is “the largest academic-based cloud storage system in the U.S.” The infrastructure is designed to serve the country’s research community and will be available to scientists and engineers from essentially any government agency that needs to archive and share super-sized data sets.

Certainly the need for such a service exists. The modern practice of science is a community activity and the way researchers collaborate is by sharing their data. Before the emergence of cloud, the main way to accomplish that was via emails and sending manuscripts back and forth over the internet. But with the coalescence of some old and new technologies, there are now economically viable ways for sharing really large amounts of data with colleagues.

In the press release describing the storage cloud, SDSC director Michael Norman described it thusly: “We believe that the SDSC Cloud may well revolutionize how data is preserved and shared among researchers, especially massive datasets that are becoming more prevalent in this new era of data-intensive research and computing.” Or as he told us more succinctly, “I think of it as Flickr for scientific data.”

The article ends with:

Whether the center’s roll-your-own cloud will be able to compete against commercial clouds on a long-term basis remains to be seen. One of the reasons a relatively small organization like SDSC can even build such a beast today is thanks in large part to the availability of cheap commodity hardware and the native expertise at the center to build high-end storage systems from parts.

There is also OpenStack — an open-source cloud OS that the SDSC is using as the basis of their offering. Besides being essentially free for the taking, the non-proprietary nature of OpenStack also means the center will not be locked into any particular software or hardware vendors down the road.

“With OpenStack going open source, it’s now possible for anybody to set up a little cloud business,” explains Norman “We’re just doing it in an academic environment.”

From a long term need/employment situation, having lots of “little cloud” businesses, each with its own semantics, isn’t a bad thing.

It does make me wonder to what degree the ability to have more semantics accessible increases the semantic resistance (not the Newcomb word, dissonance perhaps?) or it is simply more evident. That is the overall level of semantic dissonance is the same, but the WWW, etc. has increased the rate at which we encounter it.

Different companies always had different database semantics, for example, but the only way to encounter it was to either change jobs or merge the companies, both of which were one on one events. Now it is easy to encounter multiple database semantics in a single day or hour, etc.

Active learning: far from solved

Filed under: Active Learning,Linguistics — Patrick Durusau @ 4:38 pm

Active learning: far from solved

From the post:

As Daniel Hsu and John Langford pointed out recently, there has been a lot of recent progress in active learning. This is to the point where I might actually be tempted to suggest some of these algorithms to people to use in practice, for instance the one John has that learns faster than supervised learning because it’s very careful about what work it performs. That is, in particular, I might suggest that people try it out instead of the usual query-by-uncertainty (QBU) or query-by-committee (QBC). This post is a brief overview of what I understand of the state of the art in active learning (paragraphs 2 and 3) and then a discussion of why I think (a) researchers don’t tend to make much use of active learning and (b) why the problem is far from solved. (a will lead to b.)

This is a deeply interesting article that could give rise to mini and major projects. I particularly like his point about not throwing away training data. No, you have to read the post for yourself. It’s not that long.

Braque: News for Researchers

Filed under: News — Patrick Durusau @ 4:38 pm

Braque: News for Researchers

I just registered today but this looks like a very interesting resource.

It is like arXiv.org > cs except filtered by subject area with the capability rate each article so I suspect there is machine learning happening in the background.

Looks like the sort of thing that I will greatly enjoy. See what you think.

Statistical Computing

Filed under: R,Statistics — Patrick Durusau @ 4:37 pm

Statistical Computing

From the webpage:

Description

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to programming, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs.

Students will learn the core of ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will in particular learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations, how to parallelize data analyses, how to employ numerical optimization algorithms and diagnose their limitations, and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code.

The class will be taught in the R language.
Carnegie Mellon

Statistics this Fall at Carnegie Mellon. Lectures, Homework, Labs.

Whether you simply missed statistics or it has been a while, this looks very good.

Where to find data to use with R

Filed under: Data Source,Dataset,R — Patrick Durusau @ 4:37 pm

Where to find data to use with R

From the post:

Hardly a day goes by without someone or something reminding me that we are drowning in a sea of data (a bummer day ):, or that the new hero is the data scientist (a Yes! let’s go make some money kind of day!!). This morning I read “…Google grew from processing 100 terrabytes of data a day with MapReduce in 2004 to processing 20 petabytes a day with MapReduce in 2008. (Lin and Dyer, Data-Intensive Text Processing with MapReduce: Morgan&Claypool, 2010 p1) Assuming linear growth, that would mean did about 400 terabytes during the 15 minutes it took me to check my email. Even if Google is getting more than its fair share, data should be everywhere, more data that I could ever need, more than I could process, more than I could ever imagine.

So, how come every time I go to write a blog post or try some new stats I can never find any data? A few hours ago I Googled “free data sets” and got over 74,000,000 hits, but it looks as if it’s going to be another evening of me with iris. What’s wrong here? At the root, it’s a deep problem that gets at the essence of data. What are data anyway? My answer: data are structured information. Part of the structure includes meta-information describing the intention and the integrity with which the data were collected. When looking for a data set, even for some purpose that is not that important we all want some evidence that the data were either collected with intentions that are similar to our intentions to use the data or that the data can be re-purposed. Moreover, we need to establish some comfort level that the data were not collected to deceive, that they are reasonable representative, reasonably randomized, reasonable unbiased etc. The more we importance we place on our project the more we tighten up on these requirements. This is not all philosophy. I think that focusing on intentions and integrity provides some practical guidance of where to search for data on the internet.

If you are using R and need data, here is a first stop. Note the author is maintaining a list of such data sources.

Top 50 Statistics Blogs

Filed under: Mathematics,Statistics — Patrick Durusau @ 4:36 pm

Top 50 Statistics Blogs

From the post:

Statistics is a branch of mathematics that deals with the interpretation of data. Statisticians work in a wide variety of fields in both the private and the public sectors. They are teachers, consultants, watchdogs, journalists, designers, programmers, and by in large, ordinary people like you and me. And some of them blog.

In searching for the top statistics blogs on the web we only considered blogs that have been active in 2011. In deciding which ones to include in our (admittedly unscientific) list of the 50 best statistics blogs we considered a range of factors, including visual appeal/aesthetics, frequency of posts, and accessibility to non-specialists. Our goal is to highlight blogs that students and prospective students will find useful and interesting in their exploration of the field.

I’m not quite sure of the reason for the explanation of statistics at the head of a list of the top 50 statistics blogs but it isn’t a serious defect.

(I first saw this at www.r-bloggers.org.)

pumpkin patches and queuing theory

Filed under: Operations Research — Patrick Durusau @ 4:36 pm

pumpkin patches and queuing theory

From the post:

This weekend, my family and I went to a pumpkin patch. Everyone else had the same idea. The line stretched out of the pumpkin patch gates and through the parking lot. We waited in line for ten minutes and then balked. When we left, about 90% of those that were leaving did not have pumpkins. We arrived in the morning on Sunday. It was only going to get busier. I cannot imagine the amount of revenue that was lost. We found out later that it took nearly two hours to get through the line.

During our short wait and on our drive to another orchard, we discussed queuing and pumpkin patches.

With a lead-in like that, how could I resist pointing to it? (Besides, I am a Charlie Brown fan.)

A little operations research discussion won’t hurt you and might be useful in terms of dealing with organizations that like that sort of thing. The US DoD, etc. Might even provide some insight into how they assign/create subject identity.

Neo4j 1.5 “Borden Bord” Milestone 2 – Autumnal Fruits of our Labor

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 4:35 pm

Neo4j 1.5 “Borden Bord” Milestone 2 – Autumnal Fruits of our Labor

To temp you in reading the announcement (and downloading the release):

As the last of the summer sunshine leaves us and the northern winter approaches, at Neo HQ we’ve been hunkered around our laptops for warmth and been busy packing in all manner of new functionality for the forthcoming Neo4j 1.5 release. In our last milestone release before our GA, we’re opening the floodgates and letting out a feature-complete stack. And there’s a lot in here too!

<snip>

The team behind the Cypher query language continues to innovate at a ferocious pace which has meant some powerful upgrades to the syntax. Some existing queries might have to be migrated. In this release Cypher’s been extended and refined so that:

  • Relationships can be made optional
  • Added new predicates for iterables: ALL/ANY/NONE/SINGLE to refine filtering on returned subgraphs
  • New path functions: NODES/RELATIONSHIPS/LENGTH return respectively the nodes, relationships or length of a path
  • Parameters for literals, index queries and node/relationship id
  • Shortest path support has been added
  • The Pattern matcher implementation will, if possible, eliminate subgraphs early, by using the predicates from the WHERE clause providing faster response times
  • Relationships can be bound
  • Added IS NULL for painless null checking
  • Added new aggregate function COLLECT which combines multiple result rows into a single list of values

Cypher’s capabilities and expressiveness continue to improve, and they’re fueled by your feedback so take these features for a test drive.

There are lots of other new features, I just have a weakness for Cypher features. Comment to list your favorites!

October 11, 2011

Introducing Crunch: Easy MapReduce Pipelines for Hadoop

Filed under: Flow-Based Programming (FBP),Hadoop,MapReduce — Patrick Durusau @ 6:08 pm

Introducing Crunch: Easy MapReduce Pipelines for Hadoop

Josh Wills writes:

As a data scientist at Cloudera, I work with customers across a wide range of industries that use Hadoop to solve their business problems. Many of the solutions we create involve multi-stage pipelines of MapReduce jobs that join, clean, aggregate, and analyze enormous amounts of data. When working with log files or relational database tables, we use high-level tools like Pig and Hive for their convenient and powerful support for creating pipelines over structured and semi-structured records.

As Hadoop has spread from web companies to other industries, the variety of data that is stored in HDFS has expanded dramatically. Hadoop clusters are being used to process satellite images, time series data, audio files, and seismograms. These formats are not a natural fit for the data schemas imposed by Pig and Hive, in the same way that structured binary data in a relational database can be a bit awkward to work with. For these use cases, we either end up writing large, custom libraries of user-defined functions in Pig or Hive, or simply give up on our high-level tools and go back to writing MapReduces in Java. Either of these options is a serious drain on developer productivity.

Today, we’re pleased to introduce Crunch, a Java library that aims to make writing, testing, and running MapReduce pipelines easy, efficient, and even fun. Crunch’s design is modeled after Google’s FlumeJava, focusing on a small set of simple primitive operations and lightweight user-defined functions that can be combined to create complex, multi-stage pipelines. At runtime, Crunch compiles the pipeline into a sequence of MapReduce jobs and manages their execution.

Sounds like DataFlow Programming… or Flow-Based Programming (FBP) to me. In which case the claim that:

It’s just Java. Crunch shares a core philosophical belief with Google’s FlumeJava: novelty is the enemy of adoption.

must be true, as FBP is over forty years old now. I doubt programmers involved in Crunch would be aware of it. Programming history started with their first programming language, at least for them.

From a vendor perspective, I would turn the phrase a bit to read: novelty is the enemy of market/mind share.

Unless you are a startup, in which case, novelty is good until you reach critical mass and then novelty loses its luster.

Unnecessary novelty, like new web programming languages for their own sake, can also be a bid for market/mind share.

Interesting to see both within days of each other.

SciDB Community meeting Oct 18th [2011]

Filed under: SciDB — Patrick Durusau @ 6:07 pm

SciDB Community meeting Oct 18th [2011]

From Marilyn Matz:

Join us for updates about the SciDB project at a SciDB community meeting:

Oct 18, 2011, 5:15 PM – 6:00 PM,
SLAC, Research Office Building, bldg number 48
2575 Sand Hill Road, Menlo Park, CA

You are welcome to come even if you have not registered for XLDB.

At the community meeting we will review current performance, preview the contents of the upcoming release, hear about the EPICS project’s use of SciDB, and talk about community work-in-progress on HDF5 and FITs loaders and in-situ access.

This meeting is great opportunity to share your experiences using the software, tell us what you would like to see in future releases, and find out how to participate.

EPICS, just in case you are unfamiliar with the project.

EPICS is a set of Open Source software tools, libraries and applications developed collaboratively and used worldwide to create distributed soft real-time control systems for scientific instruments such as a particle accelerators, telescopes and other large scientific experiments.

Linked Literature, Linked TV – Everything Looks like a Graph

Filed under: Graphs,Librarian/Expert Searchers,Library,Maps — Patrick Durusau @ 5:55 pm

Linked Literature, Linked TV – Everything Looks like a Graph

From the post:

…When do graphs become maps?

I report here on some experiments that stem from two collaborations around Linked Data. All the visuals in the post are views of bibliographic data, based on similarity measures derrived from book / subject keyword associations, with visualization and a little additional analysis using Gephi. Click-through to Flickr to see larger versions of any image. You can’t always see the inter-node links, but the presentation is based on graph layout tools.

Firstly, in my ongoing work in the NoTube project, we have been working with TV-related data, ranging from ‘social Web’ activity streams, user profiles, TV archive catalogues and classification systems like Lonclass. Secondly, over the summer I have been working with the Library Innovation Lab at Harvard, looking at ways of opening up bibliographic catalogues to the Web as Linked Data, and at ways of cross-linking Web materials (e.g. video materials) to a Webbified notion of ‘bookshelf‘.

I like the exploratory perspective of this post.

What other data could you link to materials in a library holding?

Since I live in the Deep South, what if entries in the library catalog on desegregation had links to local residents who participated in civil rights (or resisted) activities? The stories of the leadership are well known. What about all the thousands of others who played their own parts, without being sought after by PBS during Pledge week years later?

Or people who resisted the draft, were interred during WW II, by the Axis or Allied Powers, or who were missile launch officers, sworn to “turn the keys” on receipt of a valid launch order.

Would that help make your library a more obvious resource of community and continuity?

Introduction to Artificial Intelligence – Stanford Class Update

Filed under: Artificial Intelligence,Teaching,Topic Maps — Patrick Durusau @ 5:55 pm

The “Introduction to Artificial Intelligence” class at Stanford has begun with over 145,000 students. I remember lecture classes being large but not quite this large. 😉

The first class lecture is up and I am impressed with the delivery mechanisms chosen for the class.

For example, I have seen graphic tablets used in math videos to draw equations, examples and lecture note type materials. I checked the pricing on such tablets.

Guess what they are using in the Stanford classes? Paper and different colored pens. Well, and printed materials, maps and such, that they can draw upon with the pens.

It doesn’t hurt that both of the presenters are world class lecturers but it also validates the notion that very simple tools can be used very effectively.

Not to mention that the longest each segment has been so far is about 3 minutes or so.

Can say a lot in 3 minutes (or less) if: 1) You know what you want to say, and, 2) You say it clearly.

Another nice aspect is that they are using what appear to be cgi-based graphics to embed quizzes (another low tech solution) at the end of videos.

Points for me to remember: Creating educational materials need not wait for equipment that I then have to master (though I will have to practice using a pen) in order to be productive. (It will be nice to have a pack of pens in different colors, cheaper than a graphics tablet too.)

persistent search urls; can be tricker than it seems

Filed under: Persistent Search URLs,Searching — Patrick Durusau @ 5:54 pm

persistent search urls; can be tricker than it seems

From the post:

One simple thing everyone these days wants (or ought to) in a ‘discovery layer’ these days is persistent urls for nearly all pages; including both individual record pages and search results pages, as well as other appendixes.

By persistent urls, we mean a url which you can bookmark, or include in a blog post, or tweet, or send in an email — and when you or someone else later accesses the URL, it will still work, and still point to the same page it did before.

The uses in that list beyond ‘bookmark’ are actually probably more important that actually bookmarking. It’s what lets our catalogs or discovery layers particulate in the community, conversation, and information ecology of the web. For instance, librarians here at my place of work are already using persistent URLs to particular searches in our new blacklight-based catalog to point to what we have on a certain topic, linking in blog or facebook posts.

Many of our legacy OPAC’s failed at persistent URLs in the most basic way; with all or most of the URLs not only having a session ID in the URL, but having URLs that were not neccesarily interpretable by software at all outside the context of that sessionID. That is, there was a sessionID in the URL and the URL could not be interpreted by software without the session ID.

Web software stopped being designed that way oh, about 10 years ago, when people started to realize that making the URLs good web citizens was important for useability and power.

So our ‘next generation’ interfaces start out (rightfully) by avoiding this, and having persistent URLs that do not include session IDs.

And that’s pretty much it for your individual record URLs, which then will proabably look something like “/records/12345″. But there are still a couple tricks for reliably pesistent search urls.

Excellent post!

Question: Are “persistent search URLs” more of an issue for librarians than others?

Free Programming Books

Filed under: Language,Programming,Recognition,Subject Identity — Patrick Durusau @ 5:54 pm

Free Programming Books

Despite the title (the author’s update only went so far), there are more than 50 books listed here.

I won’t have tweeted this because like Lucene turning ten, everyone in the world has already tweeted or retweeted the news of these books.

I seem to be on a run of mostly programming resources today and I thought you might find the list interesting, possibly useful.

Especially those of you interested on pattern matching.

It occurs to me that programming languages and books about programming languages are fit fodder for the same tools used on other texts.

I am sure there probably exists an index with all the “hello, world” examples from various computer languages but are there more/deeper similarities that the universal initial example?

There was a universe of programming languages prior to “hello, world” and there is certainly a very large one beyond those listed here but one has to start somewhere. So why not with this set?

I think the first question I would ask is the obvious one: Are there groupings of these works, other than the ones noted? What measures would you use and why? What results do you get?

I suppose before that you need to gather up the texts and do whatever cleanup/conversion is required, perhaps a short note on what you did there would be useful.

What parts were in anticipation of your methods for grouping the texts?

Patience topic map fans, we are getting to the questions of subject recognition.

So, what subjects should we recognize across programming languages? Not tokens or even signatures but subjects. Signatures may be a way of identifying subjects, but can the same subjects have different signatures in distinct languages?

Would recognition of subjects across programming languages assist in learning languages?, in developing new languages (what is commonly needed)?, in studying the evolution of languages (where did we go right/wrong)?, in usefully indexing CS literature?, etc.

And you thought this was a post about “free” programming books. 😉

Hoogle

Filed under: Haskell,Programming — Patrick Durusau @ 5:54 pm

Hoogle

From the webpage:

Hoogle is a Haskell API search engine, which allows you to search many standard Haskell libraries by either function name, or by approximate type signature.

[example searches omitted]

The Hoogle manual contains more details, including further details on search queries, how to install Hoogle as a command line application and how to integrate Hoogle with Firefox/Emacs/Vim etc.

Just in case you missed the reference in our post on Scalex (a similar interface for Scala).

Oh, Hoogle enables searching by “…function name, or by approximate type signature.”

What other searches you would like to perform?

How would you implement those searches? (by what kind of indexing including software to be used) Care to do that as a project to contribute back to the community?

Do you think topic maps could improve this resource? In what way?

Scalex

Filed under: Scala — Patrick Durusau @ 5:54 pm

Scalex

From the webpage:

Scaladoc Index

Much like Hoogle for Haskell, Scalex lets you find Scala functions quickly.

  • map Search for the text “map”
  • list map Search for the text “list” and the text “map”
  • A => A Search for the type “A => A”
  • : A => A Search for the type “A => A”
  • a Search for the text “a”
  • map : List[A] => (A => B) => List[B]
    Search for the text “map” and the type “List[A] => (A => B) => List[B]”

Searches can be either textual (a list of words), or by type (a type signature) or both. A type search may optionally start with a : symbol. A search is considered a text search unless it contains a combination of text and symbols, or if it starts with :. To search for both a type and a name, place a : between them, for example size : List[A] => Int

It occurs to me that a topic map version of such a resource could have “occurrences” of functions drawn from a code base that exist in associations with known programs and programmers. As an added resource to see how things are done with a particular function by experts.

Without documentation of the surrounding code that might be less useful than one would otherwise think but all good code is documented, isn’t it? 😉

4Clojure

Filed under: Clojure,Programming — Patrick Durusau @ 5:53 pm

4Clojure

From the webpage:

4Clojure is a resource to help fledgling clojurians learn the language through interactive problems. The first few problems are easy enough that even someone with no prior experience should find the learning curve forgiving. See ‘Help’ for more information.

Problem based approach to learning Clojure.

I have seen this sort of thing before for other programming languages.

Wonder if something like this would work for semantic representation? Such that common problems are generally solved by all new contenders? Might be a basis for conversation if nothing else.

« Newer PostsOlder Posts »

Powered by WordPress