Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 1, 2013

Design Pattern Sources?

Filed under: Design,Design Patterns,Modeling — Patrick Durusau @ 2:23 pm

To continue with the need for topic map design pattern thread, what sources would you suggest for design patterns?

Thinking that it would be more efficient to start from commonly known patterns and then when necessary, to branch out into new or unique ones.

Not to mention that starting with familiar patterns, as opposed to esoteric ones, will provide some comfort level for users.

Sources that I have found useful include:

Data Model Patterns: Conventions of Thought by David C. Hay.

Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans.

Developing High Quality Data Models by Matthew West. (Think Shell Oil. Serious enterprise context.)

Do you have any favorites you would suggest?

After a day or two of favorites, the next logical step would be to choose a design pattern and with an eye on Kal’s Design Pattern Examples , attempt to fashion a design template.

Just one, not bother to specify what comes next.

Working one bite at a time will make the task seem manageable.

Yes?

Topic Map Design Patterns For Information Architecture

Filed under: Design,Design Patterns,Modeling,TMCL — Patrick Durusau @ 1:21 pm

Topic Map Design Patterns For Information Architecture by Kal Ahmed.

Abstract:

Software design patterns give programmers a high level language for discussing the design of software applications. For topic maps to achieve widespread adoption and improved interoperability, a set of topic map design patterns are needed to codify existing practices and make them available to a wider audience. Combining structured descriptions of design patterns with Published Subject Identifiers would enable not only the reuse of design approaches but also encourage the use of common sets of PSIs. This paper presents the arguments for developing and publishing topic map design patterns and a proposed notation for diagramming design patterns based on UML. Finally, by way of examples, the paper presents some design patterns for representation of traditional classification schemes such as thesauri, hierarchical and faceted classification.

Kal used UML to model the design patterns and their constraints. (TMCL, the Topic Map Constraint Language, had yet to be written. (TMCL)

For visual modeling purposes, are there any constraints in TMCL that cannot be modeled in UML?

I ask because I have not compared TMCL to UML.

Using UML to express the generic constraints in TMCL would be a first step towards answering the need for topic maps design patterns.

Topic Map Design Patterns

Filed under: Design,Design Patterns,Modeling — Patrick Durusau @ 12:47 pm

A recent comment on topic map design patterns reads in part:

The second problem, and the one I’m working through now, is that information modeling with topic maps is a new paradigm for me (and most people I’m sure) and the information on topic map models is widely dispersed. Techquila had some design patterns that were very useful and later those were put put in a paper by A. Kal but, in general, it is a lot more difficult to figure out the information model with topic maps than it is with SQL or NoSQL or RDF because those other technologies have a lot more open discussions of designs to cover specific use cases. If those discussions existed for topic maps, it would make it easier for non-experts like me to connect the high-level this-is-how-topic-maps-work type information (that is plentiful) with the this-is-the-problem-and-this-is-the-model-that-solves-it type information (that is hard to find for topic maps).

Specifically, the problem I’m trying to solve and many other real world problems need a semi-structured information model, not just an amorphous blob of topics and associations. There are multiple dimensions of hierarchies and sequences that need to be modeled so that the end user can query the system with OLAP type queries where they drill up and down or pan forward and back through the information until they find what they need.

Do you know of any books of Topic Maps use cases and/or design patterns?

Unfortunately I had to say that I knew of no “Topic Maps use cases and/or design patterns” books.

There is XML topic maps : creating and using topic maps for the Web by Sam Hunting and Jack Park, but it isn’t what I would call a design pattern book.

While searching for the Hunting/Park book I did find: Topic Maps: Semantische Suche im Internet (Xpert.press) (German Edition) [Paperback] by Richard Widhalm (Author), Thomas Mück, with a 2012 publication date. Don’t be deceived. This is a reprint of the 2002 edition.

Any books that I have missed on topic maps modeling in particular?

The comment identifies a serious lack of resources on use cases and design patterns for topic maps.

My suggestion is that we all refresh our memories of Kal’s work on topic map design patterns (which I will cover in a separate post) and start to correct this deficiency.

What say you all?

March 29, 2013

Learning Grounded Models of Meaning

Filed under: Linguistics,Meaning,Modeling,Semantics — Patrick Durusau @ 2:16 pm

Learning Grounded Models of Meaning

Schedule and readings for seminar by Katrin Erk and Jason Baldridge:

Natural language processing applications typically need large amounts of information at the lexical level: words that are similar in meaning, idioms and collocations, typical relations between entities,lexical patterns that can be used to draw inferences, and so on. Today such information is mostly collected automatically from large amounts of data, making use of regularities in the co-occurrence of words. But documents often contain more than just co-occurring words, for example illustrations, geographic tags, or a link to a date. Just like co-occurrences between words, these co-occurrences of words and extra-linguistic data can be used to automatically collect information about meaning. The resulting grounded models of meaning link words to visual, geographic, or temporal information. Such models can be used in many ways: to associate documents with geographic locations or points in time, or to automatically find an appropriate image for a given document, or to generate text to accompany a given image.

In this seminar, we discuss different types of extra-linguistic data, and their use for the induction of grounded models of meaning.

Very interesting reading that should keep you busy for a while! 😉

March 16, 2013

MetaNetX.org…

Filed under: Bioinformatics,Biomedical,Genomics,Modeling,Semantic Diversity — Patrick Durusau @ 1:42 pm

MetaNetX.org: a website and repository for accessing, analysing and manipulating metabolic networks by Mathias Ganter, Thomas Bernard, Sébastien Moretti, Joerg Stelling and Marco Pagni. (Bioinformatics (2013) 29 (6): 815-816. doi: 10.1093/bioinformatics/btt036)

Abstract:

MetaNetX.org is a website for accessing, analysing and manipulating genome-scale metabolic networks (GSMs) as well as biochemical pathways. It consistently integrates data from various public resources and makes the data accessible in a standardized format using a common namespace. Currently, it provides access to hundreds of GSMs and pathways that can be interactively compared (two or more), analysed (e.g. detection of dead-end metabolites and reactions, flux balance analysis or simulation of reaction and gene knockouts), manipulated and exported. Users can also upload their own metabolic models, choose to automatically map them into the common namespace and subsequently make use of the website’s functionality.

http://metanetx.org.

The authors are addressing a familiar problem:

Genome-scale metabolic networks (GSMs) consist of compartmentalized reactions that consistently combine biochemical, genetic and genomic information. When also considering a biomass reaction and both uptake and secretion reactions, GSMs are often used to study genotype–phenotype relationships, to direct new discoveries and to identify targets in metabolic engineering (Karr et al., 2012). However, a major difficulty in GSM comparisons and reconstructions is to integrate data from different resources with different nomenclatures and conventions for both metabolites and reactions. Hence, GSM consolidation and comparison may be impossible without detailed biological knowledge and programming skills. (emphasis added)

For which they propose an uncommon solution:

MetaNetX.org is implemented as a user-friendly and self-explanatory website that handles all user requests dynamically (Fig. 1a). It allows a user to access a collection of hundreds of published models, browse and select subsets for comparison and analysis, upload or modify new models and export models in conjunction with their results. Its functionality is based on a common namespace defined by MNXref (Bernard et al., 2012). In particular, all repository or user uploaded models are automatically translated with or without compartments into the common namespace; small deviations from the original model are possible due to the automatic reconciliation steps implemented by Bernard et al. (2012). However, a user can choose not to translate his model but still make use of the website’s functionalities. Furthermore, it is possible to augment the given reaction set by user-defined reactions, for example, for model augmentation.

The bioinformatics community recognizes the intellectual poverty of lock step models.

Wonder when the intelligence community is going to have that “a ha” moment?

March 8, 2013

Model Matters: Graphs, Neo4j and the Future

Filed under: Graphs,Modeling,Neo4j — Patrick Durusau @ 2:58 pm

Model Matters: Graphs, Neo4j and the Future by Tareq Abedrabbo.

From the post:

As part of our work, we often help our customers choose the right datastore for a project. There are usually a number of considerations involved in that process, such as performance, scalability, the expected size of the data set, and the suitability of the data model to the problem at hand.

This blog post is about my experience with graph database technologies, specifically Neo4j. I would like to share some thoughts on when Neo4j is a good fit but also what challenges Neo4j faces now and in the near future.

I would like to focus on the data model in this blog post, which for me is the crux of the matter. Why? Simply because if you don’t choose the appropriate data model, there are things you won’t be able to do efficiently and other things you won’t be able to do at all. Ultimately, all the considerations I mentioned earlier influence each other and it boils down to finding the most acceptable trade-off rather than picking a database technology for one specific feature one might fancy.

So when is a graph model suitable? In a nutshell when the domain consists of semi-structured, highly connected data. That being said, it is important to understand that semi-structured doesn’t imply an absence of structure; there needs to be some order in your data to make any domain model purposeful. What it actually means is that the database doesn’t enforce a schema explicitly at any given point in time. This makes it possible for entities of different types to cohabit – usually in different dimensions – in the same graph without the need to make them all fit into a single rigid structure. It also means that the domain can evolve and be enriched over time when new requirements are discovered, mostly with no fear of breaking the existing structure.

Effectively, you can start taking a more fluid view of your domain as a number of superimposed layers or dimensions, each one representing a slice of the domain, and each layer can potentially be connected to nodes in other layers.

More importantly, the graph becomes the single place where the full domain representation can be consolidated in a meaningful and coherent way. This is something I have experienced on several projects, because modeling for the graph gives developers the opportunity to think about the domain in a natural and holistic way. The alternative is often a data-centric approach, that usually results from integrating different data flows together into a rigidly structured form which is convenient for databases but not for the domain itself.

Interesting review of the current and some projected capabilities of Neo4j.

I am particularly sympathetic to starting with the data users have as opposed to starting with a model written in software and shoe horning the user’s data to fit the model.

Can be done, has been done (for decades), and works quite well in some cases.

But not all cases.

February 15, 2013

Using molecular networks to assess molecular similarity

Systems chemistry: Using molecular networks to assess molecular similarity by Bailey Fallon.

From the post:

In new research published in Journal of Systems Chemistry, Sijbren Otto and colleagues have provided the first experimental approach towards molecular networks that can predict bioactivity based on an assessment of molecular similarity.

Molecular similarity is an important concept in drug discovery. Molecules that share certain features such as shape, structure or hydrogen bond donor/acceptor groups may have similar properties that make them common to a particular target. Assessment of molecular similarity has so far relied almost exclusively on computational approaches, but Dr Otto reasoned that a measure of similarity might be obtained by interrogating the molecules in solution experimentally.

Important work for drug discovery but there are semantic lessons here as well:

Tests for similarity/sameness are domain specific.

Which means there are no universal tests for similarity/sameness.

Lacking universal tests for similarity/sameness, we should focus on developing documented and domain specific tests for similarity/sameness.

Domain specific tests provide quicker ROI than less useful and doomed universal solutions.

Documented domain specific tests may, no guarantees, enable us to find commonalities between domain measures of similarity/sameness.

But our conclusions will be based on domain experience and not projection from our domain onto others, less well known domains.

February 6, 2013

The Evolution of Regression Modeling… [Webinar]

Filed under: Mathematics,Modeling,Regression — Patrick Durusau @ 12:46 pm

The Evolution of Regression Modeling: From Classical Linear Regression to Modern Ensembles by Mikhail Golovnya and Illia Polosukhin.

Dates/Times:

Part 1: Fri March 1, 10 am, PST

Part 2: Friday, March 15, 10 am, PST

Part 3: Friday, March 29, 10 am, PST

Part 4: Friday, April 12, 10 am, PST

From the webpage:

Class Description: Regression is one of the most popular modeling methods, but the classical approach has significant problems. This webinar series address these problems. Are you are working with larger datasets? Is your data challenging? Does your data include missing values, nonlinear relationships, local patterns and interactions? This webinar series is for you! We will cover improvements to conventional and logistic regression, and will include a discussion of classical, regularized, and nonlinear regression, as well as modern ensemble and data mining approaches. This series will be of value to any classically trained statistician or modeler.

Details:

Part 1: March 1 – Regression methods discussed

  •     Classical Regression
  •     Logistic Regression
  •     Regularized Regression: GPS Generalized Path Seeker
  •     Nonlinear Regression: MARS Regression Splines

Part 2: March 15 – Hands-on demonstration of concepts discussed in Part 1

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Part 3: March 29 – Regression methods discussed
*Part 1 is a recommended pre-requisite

  •     Nonlinear Ensemble Approaches: TreeNet Gradient Boosting; Random Forests; Gradient Boosting incorporating RF
  •     Ensemble Post-Processing: ISLE; RuleLearner

Part 4: April 12 – Hands-on demonstration of concepts discussed in part 3

  •     Step-by-step demonstration
  •     Datasets and software available for download
  •     Instructions for reproducing demo at your leisure
  •     For the dedicated student: apply these methods to your own data (optional)

Salford Systems offers other introductory videos, webinars and tutorial and case studies.

Regression modeling is a tool you will encounter in data analysis and is likely to be an important part of your exploration toolkit.

I first saw this at KDNuggets.

January 27, 2013

…[D]emocratization of modeling, simulations, and predictions

Filed under: Modeling,Prediction,Simulations — Patrick Durusau @ 5:43 pm

Technical engine for democratization of modeling, simulations, and predictions by Justyna Zander and Pieter J. Mosterman. (Justyna Zander and Pieter J. Mosterman. 2012. Technical engine for democratization of modeling, simulations, and predictions. In Proceedings of the Winter Simulation Conference (WSC ’12). Winter Simulation Conference , Article 228 , 14 pages.)

Abstract:

Computational science and engineering play a critical role in advancing both research and daily-life challenges across almost every discipline. As a society, we apply search engines, social media, and selected aspects of engineering to improve personal and professional growth. Recently, leveraging such aspects as behavioral model analysis, simulation, big data extraction, and human computation is gaining momentum. The nexus of the above facilitates mass-scale users in receiving awareness about the surrounding and themselves. In this paper, an online platform for modeling and simulation (M&S) on demand is proposed. It allows an average technologist to capitalize on any acquired information and its analysis based on scientifically-founded predictions and extrapolations. The overall objective is achieved by leveraging open innovation in the form of crowd-sourcing along with clearly defined technical methodologies and social-network-based processes. The platform aims at connecting users, developers, researchers, passionate citizens, and scientists in a professional network and opens the door to collaborative and multidisciplinary innovations. An example of a domain-specific model of a pick and place machine illustrates how to employ the platform for technical innovation and collaboration.

It is an interesting paper but when speaking of integration of models the authors say:

The integration is performed in multiple manners. Multi-domain tools that become accessible from one common environment using the cloud-computing paradigm serve as a starting point. The next step of integration happens when various M&S execution semantics (and models of computation (cf., Lee and Sangiovanni-Vincentelli 1998; Lee 2010) are merged and model transformations are performed.

That went by too quickly for me. You?

The question of effective semantic integration is an important one.

The U.S. federal government publishes enough data to map where some of the dark data is waiting to be found.

The good, bad or irrelevant data churned out every week, makes the amount of effort required an ever increasing barrier to its use by the public.

Perhaps that is by design?

What do you think?

November 10, 2012

The Music Encoding Conference 2013

Filed under: Modeling,Music,Text Encoding Initiative (TEI) — Patrick Durusau @ 12:29 pm

The Music Encoding Conference 2013

22-24 May, 2013
Mainz Academy for Literature and Sciences, Mainz, Germany

Important dates:
31 December 2012: Deadline for abstract submissions
31 January 2013: Notification of acceptance/rejection of submissions
21-24 May 2013: Conference
31 July 2013: Deadline for submission of full papers for conference proceedings
December 2013: Publication of conference proceedings

From the email announcement of the conference:

You are cordially invited to participate in the Music Encoding Conference 2013 – Concepts, Methods, Editions, to be held 22-24 May, 2013, at the Mainz Academy for Literature and Sciences in Mainz, Germany.

Music encoding is now a prominent feature of various areas in musicology and music librarianship. The encoding of symbolic music data provides a foundation for a wide range of scholarship, and over the last several years, has garnered a great deal of attention in the digital humanities. This conference intends to provide an overview of the current state of data modeling, generation, and use, and aims to introduce new perspectives on topics in the fields of traditional and computational musicology, music librarianship, and scholarly editing, as well as in the broader area of digital humanities.

As the conference has a dual focus on music encoding and scholarly editing in the context of the digital humanities, the Program Committee is also happy to announce keynote lectures by Frans Wiering (Universiteit Utrecht) and Daniel Pitti (University of Virginia), both distinguished scholars in their respective fields of musicology and markup technologies in the digital humanities.

Proposals for papers, posters, panel discussions, and pre-conference workshops are encouraged. Prospective topics for submissions include:

  • theoretical and practical aspects of music, music notation models, and scholarly editing
  • rendering of symbolic music data in audio and graphical forms
  • relationships between symbolic music data, encoded text, and facsimile images
  • capture, interchange, and re-purposing of music data and metadata
  • ontologies, authority files, and linked data in music encoding
  • additional topics relevant to music encoding and music editing

I know Daniel Pitti from the TEI (Text Encoding Initiative). His presence assures me this will be a great conference for markup, modeling and music enthusiasts.

I can recognize music because it comes in those little plastic boxes. 😉 If you want to talk about the markup/encoding/mapping side, ping me.

November 7, 2012

Data modeling … with graphs

Filed under: Data Models,Graphs,Modeling,Normalization — Patrick Durusau @ 1:30 pm

Data modeling … with graphs by Peter Bell.

Nothing surprising for topic map users but a nice presentation on modeling for graphs.

For Neo4j, unlike topic maps, you have to normalize your data before entering it into the graph.

That is if you want one node per subject.

Depends on your circumstances if that is worthwhile.

Amazing things have been done with normalized data in relational databases.

Assuming you want to pay the cost of normalization, which can include a lack of interoperability with others, errors in conversion, brittleness in the face of changing models, etc.

October 31, 2012

Make your own buckyball

Filed under: Geometry,Graphs,Mathematics,Modeling,Visualization — Patrick Durusau @ 1:05 pm

Make your own buckyball by John D. Cook.

From the post:

This weekend a couple of my daughters and I put together a buckyball from a Zometool kit. The shape is named for Buckminster Fuller of geodesic dome fame. Two years after Fuller’s death, scientists discovered that the shape appears naturally in the form of a C60 molecule, named Buckminsterfullerene in his honor. In geometric lingo, the shape is a truncated icosahedron. It’s also the shape of many soccer balls.

Don’t be embarrassed to use these at the office.

According to the PR, Roger Penrose does.

October 13, 2012

Modeling Question: What Happens When Dots Don’t Connect?

Filed under: Associations,Modeling — Patrick Durusau @ 6:35 pm

Working with a data set and have run across a different question than vagueness/possibility of relationships. (see Topic Map Modeling of Sequestration Data (Help Pls!) if you want to help with that one.)

What if when analyzing the data I determine there is no association between two subjects?

I am assuming that if there is no association, there are no roles at play.

How do I record the absence of the association?

I don’t want to trust the next user will “notice” the absence of the association.

A couple of use cases come to mind:

I suspect there is an association but have no proof. The cheating husband/wife scenario. (I suppose there I would know the “roles.”)

What about corporations or large organizations? Allegations are made but no connection to identifiable actors.

Corporations act only through agents. A charge that names the responsible agents is different from a general allegation.

How do I distinguish those? Or make it clear no agent has been named?

Wouldn’t that be interesting?

We read now: XYZ corporation plead guilty to government contract fraud.

We could read: A, B, and C, XYZ corporation and L, M, N, government contract officers managed the XYZ government contract. XYZ plead guilty to contract fraud and was fined $.

Could keep better score on private and public employees that keep turning up in contract fraud cases.

One test for transparency is accountability.

No accountability, no transparency.

October 4, 2012

PostgreSQL Database Modeler

Filed under: Database,Modeling,PostgreSQL — Patrick Durusau @ 2:22 pm

PostgreSQL Database Modeler

From the readme file at github:

PostgreSQL Database Modeler, or simply, pgModeler is an open source tool for modeling databases that merges the classical concepts of entity-relationship diagrams with specific features that only PostgreSQL implements. The pgModeler translates the models created by the user to SQL code and apply them onto database clusters from version 8.0 to 9.1.

Other modeling tools you have or are likely to encounter writing topic maps?

When the output of diverse modeling tools or diverse output from the same modeling tool needs semantic reconciliation, I would turn to topic maps.

I first saw this at DZone.

September 29, 2012

Topic Map Modeling of Sequestration Data (Help Pls!)

Filed under: Data Models,Modeling,Topic Maps — Patrick Durusau @ 10:40 am

With the political noise in the United States over presidential and other elections, it is easy to lose sight of a looming “sequestration” that on January 2, 2013 will result in:

10.0% reduction non-exempt defense mandatory funding
9.4% reduction non-exempt defense discretionary funding
8.2% reduction non-exempt nondefense discretionary funding
7.6% reduction non-exempt nondefense mandatory funding
2.0% reduction Medicare

The report is not a model of clarity/transparency. See: U.S. Sequestration Report – Out of the Shadows/Into the Light?.

Report caveats make it clear cited amounts are fanciful estimates that can change radically as more information becomes available.

Be that as it may, a topic map based on the reported accounts as topics can capture the present day conjectures. To say nothing of capturing future revelations of exact details.

Whether from sequestration or from efforts to avoid sequestration.

Tracking/transparency has to start somewhere and it may as well be here.

In evaluating the data for creation of a topic map, I have encountered an entry with a topic map modeling issue.

I could really use your help.

Here is the entry in question:

Department of Health and Human Services, Health Resources and Services Administration, 009-15-0350, Health Resources and Services, Nondefense Function, Mandatory (page 80 of Appendix A, page 92 of the pdf of the report):

BA Type BA Amount Sequester Percentage Sequester Amount
Sequestrable BA 514 7.6 39
Sequestrable BA
– special rule
1352 2.0 27
Exempt BA 10
Total Gross BA 1876
Offsets -16
Net BA 1860

If it read as follows, no problem.

Example: Not Accurate

BA Type BA Amount Sequester Percentage Sequester Amount
Sequestrable BA 514 7.6 39
Sequestrable BA
– special rule
1352 2.0 27
Total Gross BA 1876

Because there is no relationship between “Exempt BA” and “Offsets” to either “Sequestrable BA” or “Sequestrable BA – special rule.” I just report both of them with the percentages and total amounts to be withheld.

True, the percentages don’t change, nor does the amount to be withheld change, because of the “Exempt BA” or the “Offsets.” (Trusting soul that I am, I did verify the calculations. 😉 )

Problem: How do I represent the relationship between the “Exempt BA” and “Offsets” to either/or/both “Sequestrable BA,” “Sequestrable BA – special rule?”

Of the 1318 entries in Appendix A of this report, including this one, it is the only entry with this issue. (A number of accounts are split into discretionary/mandatory parts. I am counting each part as a separate “entry.”)

If I ignore “Exempt BA” and “Offsets” in this case, my topic map is an incomplete representation of Appendix A.

It is also the case that I want to represent the information “as written.” There may be some external explanation that clarifies this entry, but that would be an “addition” to the original topic map.

Suggestions?

September 8, 2012

“how hard can this be?” (Data and Reality)

Filed under: Design,Modeling,Subject Identity — Patrick Durusau @ 2:07 pm

Books that Influenced my Thinking: Kent’s Data and Reality by Thomas Redman.

From the post:

It was the rumor that Steve Hoberman (Technics Publications) planned to reissue Data and Reality by William Kent that led me to use this space to review books that had influenced my thinking about data and data quality. My plan had been to do the review of Data and Reality as soon as it came out. I completely missed the boat – it has been out for some six months.

I first read Data and Reality as we struggled at Bell Labs to develop a definition of data that would prove useful for data quality. While I knew philosophers had debated the merits of various approaches for thousands of years, I still thought “how hard can this be?” About twenty minutes with Kent’s book convinced me. This is really tough.
….

Amazon reports Data and Reality (3rd edition) as 200 pages long.

Looking at a hard copy I see:

  • Prefaces 17-34
  • Chapter 1 Entities 35-54
  • Chapter 2 The Nature of an Information System 55-67
  • Chapter 3 Naming 69-86
  • Chapter 4 Relationships 87-98
  • Chapter 5 Attributes 99-107
  • Chapter 6 Types and Categories and Sets 109-117
  • Chapter 7 Models 119-123
  • Chapter 8 The Record Model 125-137
  • Chapter 9 Philosophy 139-150
  • Bibliography 151-159
  • Index 161-162

Way less than the 200 pages promised by Amazon.

To ask a slightly different question:

“How hard can it be” to teach building data models?

A hard problem with no fixed solution?

Suggestions?

August 2, 2012

Does category theory make you a better programmer?

Filed under: Category Theory,Modeling — Patrick Durusau @ 11:00 am

Does category theory make you a better programmer? by Debasish Ghosh.

From the post:

How much of category theory knowledge should a working programmer have ? I guess this depends on what kind of language the programmer uses in his daily life. Given the proliferation of functional languages today, specifically typed functional languages (Haskell, Scala etc.) that embeds the typed lambda calculus in some form or the other, the question looks relevant to me. And apparently to a few others as well. In one of his courses on Category Theory, Graham Hutton mentioned the following points when talking about the usefulness of the theory :

  • Building bridges—exploring relationships between various mathematical objects, e.g., Products and Function
  • Unifying ideas – abstracting from unnecessary details to give general definitions and results, e.g., Functors
  • High level language – focusing on how things behave rather than what their implementation details are e.g. specification vs implementation
  • Type safety – using types to ensure that things are combined only in sensible ways e.g. (f: A -> B g: B -> C) => (g o f: A -> C)
  • Equational proofs—performing proofs in a purely equational style of reasoning

Many of the above points can be related to the experience that we encounter while programming in a functional language today. We use Product and Sum types, we use Functors to abstract our computation, we marry types together to encode domain logic within the structures that we build and many of us use equational reasoning to optimize algorithms and data structures.

But how much do we need to care about how category theory models these structures and how that model maps to the ones that we use in our programming model ?

Read the post for Debasish’s answer for programmers.

For topic map authors, remember category theory began as an effort to find commonalities between abstract mathematical structures.

Commonalities? That sounds a lot like subject sameness doesn’t it?

With category theory you can describe, model, uncover commonalities in mathematical structures and commonalities in other areas as well.

A two for one as it were. Sounds worthwhile to me.

I first saw this at DZone.

July 31, 2012

Ignorance by Stuart Firestein; It’s Not Rocket Science by Ben Miller – review

Filed under: Knowledge,Modeling — Patrick Durusau @ 7:13 am

Ignorance by Stuart Firestein; It’s Not Rocket Science by Ben Miller – review by Adam Rutherford

From the review, speaking of “Ignorance” by Stuart Firestein, Adam writes:

Stuart Firestein, a teacher and neuroscientist, has written a splendid and admirably short book about the pleasure of finding things out using the scientific method. He smartly outlines how science works in reality rather than in stereotype. His MacGuffin – the plot device to explore what science is – is ignorance, on which he runs a course at Columbia University in New York. Although the word “science” is derived from the Latin scire (to know), this misrepresents why it is the foundation and deliverer of civilisation. Science is to not know but have a method to find out. It is a way of knowing.

Firestein is also quick to dispel the popular notion of the scientific method, more often than not portrayed as a singular thing enshrined in stone. The scientific method is more of a utility belt for ignorance. Certainly, falsification and inductive reasoning are cornerstones of converting unknowns to knowns. But much published research is not hypothesis-driven, or even experimental, and yet can generate robust knowledge. We also invent, build, take apart, think and simply observe. It is, Firestein says, akin to looking for a black cat in a darkened room, with no guarantee the moggy is even present. But the structure of ignorance is crucial, and not merely blind feline fumbling.

The size of your questions is important, and will be determined by how much you know. Therein lies a conundrum of teaching science. Questions based on pure ignorance can be answered with knowledge. Scientific research has to be born of informed ignorance, otherwise you are not finding new stuff out. Packed with real examples and deep practical knowledge, Ignorance is a thoughtful introduction to the nature of knowing, and the joy of curiosity.

Not to slight “It’s Not Rocket Science,” but I am much more sympathetic to discussions of the “…structure of ignorance…” and how we model those structures.

If you are interested in such arguments, consider the Oxford Handbook of Skepticism. I don’t have a copy (you can fix that if you like) but it is reported to have good coverage of the subject of ignorance.

July 24, 2012

Cambridge Advanced Modeller (CAM)

Filed under: Cambridge Advanced Modeler (CAM),Modeling — Patrick Durusau @ 6:46 pm

Cambridge Advanced Modeller (CAM)

From the webpage:

Cambridge Advanced Modeller is a software tool for modelling and analysing the dependencies and flows in complex systems – such as products, processes and organisations. It provides a diagrammer, a simulation tool, and a DSM tool.

CAM is free for research, teaching and evaluation. We only require that you cite our work if you use CAM in support of published work. Commercial evaluation is allowed. Commercial use is subject to non-onerous conditions.

Toolboxes provide several modelling notations and analysis methods. CAM can be configured to develop new modelling notations by specifying the types of element and connection allowed. A modular architecture allows new functionality, such as simulation codes, to be added.

One of the research tool boxes is topic maps! Cool!

Have you used CAM?

July 23, 2012

Wrinkling Time

Filed under: Modeling,Time,Timelines,Topic Maps — Patrick Durusau @ 6:25 pm

The post by Dan Brickley that I mentioned earlier today, Dilbert schematics, made me start thinking about more complex time scenarios than serial assignment of cubicles.

Like Hermione Granger and Harry Potter’s adventure in the Prisoner of Azkaban.

For those of you who are vague on the story, Hermione uses a “Time-Turner” to go back in time several hours. As a result, she and Harry must avoid being seen by themselves (and others). Works quite well in the story but what if I wanted to model that narrative in a topic map?

Some issues/questions that occurred to me:

Harry and Hermione are the same subjects they were during the prior time interval. Or are they?

Does a linear notion of time mean they are different subjects?

How would I model their interactions with others? Such as Buckbeak? Who interacted with both versions (for lack of a better term) of Harry?

Is there a time line running parallel to the “original” time line?

Just curious, what happens if the Time-Turner fails and Harry and Hermoine don’t return to the present, ever? That is their “current” present is forever 3 hours behind their “real” present.

What other time issues, either in literature or elsewhere seem difficult to model to you?

June 27, 2012

neo4j: Handling optional relationships

Filed under: Modeling,Neo4j — Patrick Durusau @ 12:58 pm

neo4j: Handling optional relationships by Mark Needham.

From the post:

On my ThoughtWorks neo4j there are now two different types of relationships between people nodes – they can either be colleagues or one can be the sponsor of the other.

Getting the information/relationships “in” wasn’t a problem. Getting the required information back out, that was a different story.

A useful illustration of how establishing the desired result (output in this case) can clarify what needs to be asked.

Don’t jump to the solution. Read the post and write down how you would get the desired results.

I first saw this at DZone’s Neo4j page.

June 6, 2012

How Do You Define Failure?

Filed under: Modeling,Requirements — Patrick Durusau @ 7:48 pm

… business intelligence implementations are often called failures when they fail to meet the required objectives, lack user acceptance or are only implemented after numerous long delays.

Called failures? Sounds like failures to me. You?

News: The cause of such failures has been discovered:

…an improperly modeled repository not adhering to basic dimensional modeling principles

Really?

I would have said that not having a shared semantic, one shared by all the shareholders in the project, would be the root cause for most project failures.

I’m not particular about how you achieve that shared semantic. You could use white boards, sticky notes or have people physically act out the system. The important thing being to avoid the assumption that other stakeholders “know what I mean by….” They probably don’t. And several months into building of data structures, interfaces, etc., is a bad time to find out you assumed incorrectly.

The lack of a shared semantic can result in an “…improperly modeled repository…” but that is much later in the process.

Quotes from: Oracle Expert Shares Implementation Key

May 25, 2012

Role Modeling

Filed under: Modeling,Roles — Patrick Durusau @ 4:12 am

Role Modeling

From the webpage:

Roles are about objects and how they interact to achieve some purpose. For thirty years I have tried to get them into the into the main stream, but haven’t succeeded. I believe the reason is that our programming languages are class oriented rather than object oriented. So why model in terms of objects when you cannot program them?

Almost all my documents are about role modeling in one form or another. There are two very useful abstractions on objects. One abstraction classifies objects according to their properties. The other studies how objects work together to achieve one or more of the users’ goals. I have for the past 30 years tried to make our profession aware of this important dichotomy, but have met with very little success. The Object Management Group (OMG) has standardized the Unified Modeling Language, UML. We were members of the core team defining this language and our role modeling became part of the language under the name of Collaborations. Initially, very few people seemed to appreciate the importance of the notion of Collaborations. I thought that this would change when Ivar Jacobson came out with his Use Cases because a role model shows how a system of interacting objects realizes a use case, but it is still heavy going. There are encouaging signs in the concept of Components in the emerging UML version 2.0. Even more encouaging is the ongoing work with Web Services where people and components are in the center of interest while classes are left to the specialists. My current project, BabyUML, binds it all together: algorithms coded as classes + declaration of semantic model + coding of object interaction as collaborations/role models.

The best reference is my book Working With Objects. Out of print, but is still available from some bookshops including Amazon as of January 2010.

You can download the pdf of Working with Objects (version before publication). A substantial savings over the Amazon “new” price of $100+ US.

This webpage has links to a number resources from Trygve M. H. Reenskaug on role modeling.

I saw this reference in a tweet by Inge Henriksen.

March 31, 2012

Automated science, deep data and the paradox of information – Data As Story

Filed under: BigData,Epistemology,Information Theory,Modeling,Statistics — Patrick Durusau @ 4:09 pm

Automated science, deep data and the paradox of information…

Bradley Voytek writes:

A lot of great pieces have been written about the relatively recent surge in interest in big data and data science, but in this piece I want to address the importance of deep data analysis: what we can learn from the statistical outliers by drilling down and asking, “What’s different here? What’s special about these outliers and what do they tell us about our models and assumptions?”

The reason that big data proponents are so excited about the burgeoning data revolution isn’t just because of the math. Don’t get me wrong, the math is fun, but we’re excited because we can begin to distill patterns that were previously invisible to us due to a lack of information.

That’s big data.

Of course, data are just a collection of facts; bits of information that are only given context — assigned meaning and importance — by human minds. It’s not until we do something with the data that any of it matters. You can have the best machine learning algorithms, the tightest statistics, and the smartest people working on them, but none of that means anything until someone makes a story out of the results.

And therein lies the rub.

Do all these data tell us a story about ourselves and the universe in which we live, or are we simply hallucinating patterns that we want to see?

I reformulate Bradley’s question into:

We use data to tell stories about ourselves and the universe in which we live.

Which means that his rules of statistical methods:

  1. The more advanced the statistical methods used, the fewer critics are available to be properly skeptical.
  2. The more advanced the statistical methods used, the more likely the data analyst will be to use math as a shield.
  3. Any sufficiently advanced statistics can trick people into believing the results reflect truth.

are sources of other stories “about ourselves and the universe in which we live.”

If you prefer Bradley’s original question:

Do all these data tell us a story about ourselves and the universe in which we live, or are we simply hallucinating patterns that we want to see?

I would answer: And the difference would be?

March 11, 2012

“All Models are Right, Most are Useless”

Filed under: Modeling,Regression,Statistics — Patrick Durusau @ 8:09 pm

“All Models are Right, Most are Useless”

A counter to George Box saying: “all models are wrong, some are useful.” by Thad Tarpey. Pointer to slides for the presentation.

Covers the fallacy of “reification” (in the modeling sense) among other amusements.

Useful to remember that maps are approximations as well.

January 29, 2012

Munging, Modeling and Visualizing Data with R

Filed under: Data,Modeling,R,Visualization — Patrick Durusau @ 9:17 pm

Munging, Modeling and Visualizing Data with R by Xavier Léauté.

With a title like that, how could I resist?

From the post:

Yesterday evening Romy Misra from visual.ly invited us to teach an introductory workshop to R for the San Francisco Data Mining meetup. Todd Holloway was kind enough to host the event at Trulia headquarters.

R can be a little daunting for beginners, so I wanted to give everyone a quick overview of its capabilities and enough material to get people started. Most importantly, the objective of this interactive session was to give everyone some time to try out some simple examples that would be useful in the future.

I hope everyone enjoyed learning some fun and easy ways to slice, model and visualize data, and that I piqued their interest enough to start exploring datasets on their own.

Slides and sample scripts follow.

First seen at Christophe Lalanne’s Bag of Tweets for January 2012.

December 14, 2011

A Task-based Model of Search

Filed under: Modeling,Search Behavior,Search Interface,Searching — Patrick Durusau @ 7:46 pm

A Task-based Model of Search by Tony Russell-Rose.

From the post:

A little while ago I posted an article called Findability is just So Last Year, in which I argued that the current focus (dare I say fixation) of the search community on findability was somewhat limiting, and that in my experience (of enterprise search, at least), there are a great many other types of information-seeking behaviour that aren’t adequately accommodated by the ‘search as findability’ model. I’m talking here about things like analysis, sensemaking, and other problem-solving oriented behaviours.

Now, I’m not the first person to have made this observation (and I doubt I’ll be the last), but it occurs to me that one of the reasons the debate exists in the first place is that the community lacks a shared vocabulary for defining these concepts, and when we each talk about “search tasks” we may actually be referring to quite different things. So to clarify how I see the landscape, I’ve put together the short piece below. More importantly, I’ve tried to connect the conceptual (aka academic) material to current design practice, so that we can see what difference it might make if we had a shared perspective on these things. As always, comments & feedback welcome.

High marks for a start on what complex and intertwined issues.

Not so much that we will reach a common vocabulary but so we can be clearer about where we get confused when moving from one paradigm to another.

November 30, 2011

Model Thinking

Filed under: CS Lectures,Modeling — Patrick Durusau @ 8:35 pm

Model Thinking by Scott E. Page.

Marijane sent this link in a comment to my post on Stanford classes.

From the class description:

We live in a complex world with diverse people, firms, and governments whose behaviors aggregate to produce novel, unexpected phenomena. We see political uprisings, market crashes, and a never ending array of social trends. How do we make sense of it?

Models. Evidence shows that people who think with models consistently outperform those who don’t. And, moreover people who think with lots of models outperform people who use only one.

Why do models make us better thinkers?

Models help us to better organize information – to make sense of that fire hose or hairball of data (choose your metaphor) available on the Internet. Models improve our abilities to make accurate forecasts. They help us make better decisions and adopt more effective strategies. They even can improve our ability to design institutions and procedures.

In this class, I present a starter kit of models: I start with models of tipping points. I move on to cover models explain the wisdom of crowds, models that show why some countries are rich and some are poor, and models that help unpack the strategic decisions of firm and politicians.

The models cover in this class provide a foundation for future social science classes, whether they be in economics, political science, business, or sociology. Mastering this material will give you a huge leg up in advanced courses. They also help you in life.

Here’s how the course will work.

For each model, I present a short, easily digestible overview lecture. Then, I’ll dig deeper. I’ll go into the technical details of the model. Those technical lectures won’t require calculus but be prepared for some algebra. For all the lectures, I’ll offer some questions and we’ll have quizzes and even a final exam. If you decide to do the deep dive, and take all the quizzes and the exam, you’ll receive a certificate of completion. If you just decide to follow along for the introductory lectures to gain some exposure that’s fine too. It’s all free. And it’s all here to help make you a better thinker!

Hope you can join the course this January.

As Marijane says, “…awfully relevant to Topic Maps!”

November 22, 2011

Modelling with Graphs

Filed under: Data Models,Graphs,Modeling — Patrick Durusau @ 6:59 pm

Modelling with Graphs by Alistair Jones at NoSQL Br 2011

From the description:

Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modelling data as graphs is quite different from modelling data under with relational databases. In this talk we’ll cover modelling business domains using graphs and show how they can be persisted and queried in the popular open source graph database Neo4j. We’ll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance. We’ll also discuss strategies for deciding how to proceed when a graph allows multiple ways to represent the same concept, and explain the trade-offs involved. As a side-effect, we’ll examine some of the new tools for how to query graph data in Neo4j, and discuss architectures for using Neo4j in enterprise applications.

Alistair is a Software Engineer with Neo Technology, the company behind the popular open source graph database Neo4j.

Alistair has extensive experience as a developer, technical lead and architect for teams building enterprise software across a range of industries. He has a particular focus Domain Driven Design, and is an expert on Agile methodologies. Alistair often writes and presents on applying Agile principles to the discipline of performance testing.

Excellent presentation!

Anyone care to suggest a book on modeling or modeling with graphs?

« Newer Posts

Powered by WordPress