Archive for the ‘MarkLogic’ Category

MarkLogic and Intel – “government-grade security” – Err, thanks but no thanks.

Wednesday, September 27th, 2017

Big Data Solutions for Government Agencies—MarkLogic and Intel

I thought you might appreciate the hyperbole in this marketing fluff from Intel:

This paper summarizes the issues government agencies face today with relational database management system (RDBMS) + storage area network (SAN) data environments and why the combination of MarkLogic, Apache Hadoop*, and Intel provides a government-grade solution for big data. Running on Intel® technology and the enhancements Intel has brought to Apache Hadoop, this integration gives public agencies a true enterprise-class big data solution with government-grade security for storage, real-time queries, and analysis of all their data. (emphasis added)

Really? “…government-grade security….”

Do they mean like the CIA (Aldrich Ames), NSA (Snowden), Office of Personnel Management (OPM), that sort of “…government-grade security….?”

You could have quantum level encryption and equally secure software, but when you add users:

You new state of cybersecurity.

Discussion of security absent your users isn’t meaningful. Don’t lose money on consultants and hackers as well. The meaningful question is how secure is system X with my users? Ask that and judge vendors by their answers.

An XQuery Module For Simplifying Semantic Namespaces

Wednesday, December 23rd, 2015

An XQuery Module For Simplifying Semantic Namespaces by Kurt Cagle.

From the post:

While I enjoy working with the MarkLogic 8 server, there are a number of features about the semantics library there that I still find a bit problematic. Declaring namespaces for semantics in particular is a pain—I normally have trouble remembering the namespaces for RDF or RDFS or OWL, even after working with them for several years, and once you start talking about namespaces that are specific to your own application domain, managing this list can get onerous pretty quickly.

I should point out however, that namespaces within semantics can be very useful in helping to organize and design an ontology, even a non-semantic ontology, and as such, my applications tend to be namespace rich. However, when working with Turtle, Sparql, RDFa, and other formats of namespaces, the need to incorporate these namespaces can be a real showstopper for any developer. Thus, like any good developer, I decided to automate my pain points and create a library that would allow me to simplify this process.

The code given here is in turtle and xquery, but I hope to build out similar libraries for use in JavaScript shortly. When I do, I’ll update this article to reflect those changes.

If you are forced to use a MarkLogic 8 server, great post on managing semantic namespaces.

If you have a choice of tools, something to consider before you willingly choose to use a MarkLogic 8 server.

I first saw this in a tweet by XQuery.

A Virtual Database between MongoDB, ElasticSearch, and MarkLogic

Monday, May 18th, 2015

A Virtual Database between MongoDB, ElasticSearch, and MarkLogic by William Candillon.

From the post:

Virtual Databases enable developers to write applications regardless of the underlying database technologies. We recently updated a database infrastructure from MongoDB and ElasticSearch to MarkLogic without touching the codebase.

We just flipped a switch. We updated the database infrastructure of an application (20k LOC) from MongoDB and Elasticsearch to MarkLogic without changing a single line of code.

Earlier this year, we published a tutorial that shows how the 28msec query technology can enable developers to write applications regardless of the underlying database technology. Recently, we had the opportunity to put it to the test on both a real world use case and a substantial codebase.

At 28msec, we have designed1 and implemented2 an open source modern data warehouse called CellStore. Whereas traditional data warehousing solutions can only support hundreds of fixed dimensions and thus need to ETL the data to analyze, cell stores support an unbounded number of dimensions. Our implementation of the cell store paradigm is around 20k lines of JSONiq queries. Originally the implementation was running on top of MongoDB and Elasticsearch.


Impressive work and it merits a separate post on the underlying technology, CellStore.

MarkLogic® 8…

Tuesday, November 18th, 2014

MarkLogic® 8 Evolves Database Technology to Solve Heterogeneous Data Integration Problems with the Power of Search, Semantics and Bitemporal Features All in One System

From the post:

MarkLogic Corporation, the leading Enterprise NoSQL database platform provider, today announced the availability of MarkLogic® Version 8 Early Access Edition. MarkLogic 8 brings together advanced search, semantics, bitemporal and native JavaScript support into one powerful, agile and trusted database platform. Companies can now:

  • Get better answers faster through integrated search and query of all of their data, metadata, and relationships, regardless of the data type or source;
  • Lower costs and increase agility by easily integrating heterogeneous data, including relational, unstructured, and richly structured data, across silos and at massive scale;
  • Rapidly build production-ready applications in weeks versus months or years to address the needs of the business or organization.

For enterprise customers who value agility but can’t compromise on resiliency, MarkLogic software is the only database platform that integrates Google-like search with rich query and semantics into an intelligent and extensible data layer that works equally well in a data center or in the cloud. Unlike other NoSQL solutions, MarkLogic provides ACID transactions, HA, DR, and other hardened features that enterprises require, along with the scalability and agility they need to accelerate their business.

“As more complex data, much of it semi-structured, becomes increasingly important to businesses’ daily operations, enterprises are realizing that they must look beyond relational databases to help them understand, integrate, and manage all of their data, deriving maximum value in a simple, yet sophisticated manner,” said Carl Olofson, research vice president at IDC. “MarkLogic has a history of bringing advanced data management technology to market and many of their customers and partners are accustomed to managing complex data in an agile manner. As a result, they have a more mature and creative view of how to manage and use data than do mainstream database users. MarkLogic 8 offers some very advanced tools and capabilities, which could expand the market’s definition of enterprise database technology.”

I’m not in the early release program but if you are, heads up!

By “semantics,” MarkLogic means RDF triples and the ability to query those triples with text, values, etc.

Since we can all see triples, text and values with different semantics, your semantic mileage with MarkLogic may vary greatly.

Free MarkLogic Classes

Saturday, May 24th, 2014

Free MarkLogic Classes

From the webpage:

MarkLogic University offers FREE publicly scheduled instructor led courses! Here’s how it works:

  • Sign up for any public class listed below by paying the Booking Fee
  • Once you have completed the course the Booking Fee will be fully refunded with 7 business days
  • If you register for the class but do not attend you will forfeit your Booking Fee
  • If you have any questions please contact

Vendor specific I know but you can’t argue with the pricing scheme. If anything, it should help encourage you to attend and complete the classes.

If you take one (or more) of these courses, please comment or send me a private message. Thanks!

Apache MarkMail

Friday, March 14th, 2014

Apache MarkMail

Just in case you don’t have your own index of the 10+ million messages in Apache mailing list archives, this is the site for you.


I ran across it today while debugging an error in a Solr config file.

If I could add one thing to MarkMail it would be software release date facets. Posts are not limited by release dates but I suspect a majority of posts between release dates are about the current release. Enough so that I would find it a useful facet.


Free MarkLogic Courses?

Monday, February 10th, 2014

MarkLogic Announces Free NoSQL Database Training Courses

From the post:

MarkLogic Corporation, the leading Enterprise NoSQL database platform company, today announced the schedule for its MarkLogic University public courses with hands-on instruction to attending users and developers free of charge. The courses are led by an instructor in various live, online, and classroom locations, and provide MarkLogic customers and developers with the training to optimize their NoSQL database deployments and the education to develop applications on the MarkLogic database.

Since 2001, MarkLogic has focused on providing a powerful and trusted Enterprise NoSQL database platform that empowers organizations to turn all data into valuable and actionable information. The MarkLogic University program was created to give customers the access to best practices for managing vast amounts of diverse data. Now project managers, architects, developers, testers, and administrators can improve their MarkLogic skills with no cost training.

“The demand for MarkLogic development and administration skills is increasing in the market and with a sharp focus on customer success, we are dedicated to providing easy access to information and education that will assist developers and IT professionals to better manage and do more with their data,” said Jon Bakke, senior vice president, global technical services, MarkLogic. “By making MarkLogic training resources widely available, we are helping to build up much-needed technical skills that enterprises need to derive value from the vast amounts of enterprise data that is being created and stored today.”

But, when I visit:

I see refundable booking fees. (As of 10 February 2013 at 15:00 EST.)

Nor could I find a statement by MarkLogic on its blog or pressroom confirming free classes.

I have seen this at several sources and suggest further inquiry before anyone gets too excited.

Generating an XML test corpus

Sunday, February 9th, 2014

Generating an XML test corpus by Anthony Coates.

From the post:

My current role requires me to work with the MarkLogic NoSQL database. I’ve had some experience with it in the past, if not as much as I would have liked to have had.

Compared to relational databases, “document databases” like MarkLogic have the advantage that content is stored in a denormalised “document” format. If you have your data denormalised appropriately into documents, such that each query requires only a single document, then the database gives its optimum performance. With relational databases, there’s generally no way to avoid having some joins in queries, even if some of the data is denormalised into tables.

Anthony is an old hand with XML and has started a new blog.

I am particularly interested in Anthony’s questions about linking documents, denormalizing data, to say nothing of generating the test corpus.

I signed up for the RSS feed but don’t depend on me to mention every post. 😉

MarkLogic Rolls Out the Red Carpet for…

Thursday, October 10th, 2013

MarkLogic Rolls Out the Red Carpet for Semantic Triples by Alex Woodie.

From the post:

You write a query with great care, and excitedly hit the “enter” button, only to see a bunch of gobbledygook spit out on the screen. MarkLogic says the chances of this happening will decrease thanks to the new RDF Triple Store feature that it formally introduced today with the launch of version 7 of its eponymous NoSQL database.

The capability to store and search semantic triples in MarkLogic 7 is one of the most compelling new features of the new NoSQL database. The concept of semantic triples is central to the Resource Description Framework (RDF) way of storing and searching for information. Instead of relating information in a database using an “entity-relationship” or “class diagram” model, the RDF framework enables links between pieces of data to be searched using the “subject-predicate-object” concept, which more closely corresponds to the way humans think and communicate.

The real power of this approach becomes evident when one considers the hugely disparate nature of information on the Internet. An RDF powered application can build links between different pieces of data, and effectively “learn” from the connections created by the semantic triples. This is the big (and as yet unrealized) pipe dream of the semantic Web.

RDF has been around for a while, and while you probably wouldn’t call it mainstream, there are a handful of applications using this approach. What makes MarkLogic’s approach unique is that it’s storing the semantic triples–the linked data–right inside the main NoSQL database, where it can make use of all the rich data and metadata stored in documents and other semi-structured files that NoSQL databases like MarkLogic are so good at storing.

This approach puts semantic triples right where it can do the most good. “Until now there has been a disconnect between the incredible potential of semantics and the value organizations have been able to realize,” states MarkLogic’s senior vice president of product strategy, Joe Pasqua.

“Managing triples in dedicated triple stores allowed people to see connections, but the original source of that data was disconnected, ironically losing context,” he continues. “By combining triples with a document store that also has built-in querying and APIs for delivery, organizations gain the insights of triples while connecting the data to end users who can search documents with the context of all the facts at their fingertips.”

A couple of things caught my eye in this post.

First, the comment that:

RDF has been around for a while, and while you probably wouldn’t call it mainstream, there are a handful of applications using this approach.

I can’t disagree so why would MarkLogic make RDF support a major feature of this release?

Second, the next sentence reads:

What makes MarkLogic’s approach unique is that it’s storing the semantic triples–the linked data–right inside the main NoSQL database, where it can make use of all the rich data and metadata stored in documents and other semi-structured files that NoSQL databases like MarkLogic are so good at storing.

I am reading that to mean that if you store all the documents in which triples appear, along with the triples, you have more context. Yes?

Trivially true but I not sure how significant an advantage that would be. Shouldn’t all that “contextual” metadata be included with the triples?

But I haven’t gotten a copy of version 7 so that’s all speculation on my part.

If you have a copy of MarkLogic 7, care to comment?


Semantic Queries. Who Knew?

Wednesday, June 26th, 2013

The New Generation of Database Technology Includes Semantics and Search David Gorbet, VP of Engineering for MarkLogic, chatted with Bloor Group Principal Robin Bloor in a recent Briefing Room.

From near the end of the interview:

There’s still a lot of opportunity to light up new scenarios for our customers. That’s why we’re excited about our semantics capabilities in MarkLogic 7. We believe that semantics technology is the next generation of search and discovery, allowing queries based on the concepts you’re looking for and not just the words and phrases. MarkLogic 7 will be the only database to allow semantics queries combined with document search and element/value queries all in one place. Our customers are excited about this.

Need to watch the marketing literature from MarkLogic for riffs and themes to repeat for topic map-based solutions.

Not to mention that topic maps can point into add semantics to existing data stores and their contents.

Re-using current data stores sounds more attractive than ripping out all your data to migrate to another platform.


Beyond Enterprise Search…

Tuesday, May 21st, 2013

Beyond Enterprise Search… by adamfowleruk.

From the post:

Searching through all your content is fine – until you get a mountain of it with similar content, differentiated only by context. Then you’ll need to understand the meaning within the content. In this post I discuss how to do this using semantic techniques…

Organisations today have realised that for certain applications it is useful to have a consolidated search approach over several catalogues. This is most often the case when customers can interact with several parts of the company – sales, billing, service, delivery, fraud checks.

This approach is commonly called Enterprise Search, or Search and Discovery, which is where your content across several repositories is indexed in a separate search engine. Typically this indexing occurs some time after the content is added. In addition, it is not possible for a search engine to understand the fully capabilities of every content system. This means complex mappings are needed between content, meta data and security. In some cases, this may be retrofitted with custom code as the systems do not support a common vocabulary around these aspects of information management.

Content Search

We are all used to content search, so much so that for today’s teenagers a search bar with a common (‘Google like’) grammar is expected. This simple yet powerful interface allows us to search for content (typically web pages and documents) that contain all the words or phrases that we need. Often this is broadened by the use of a thesaurus and word stemming (plays and played stems to the verb play), and combined with some form of weighting based on relative frequency within each unit of content.

Other techniques are also applied. Metadata is extracted or implied – author, date created, modified, security classification, Dublin Core descriptive data. Classification tools can be used (either at the content store or search indexing stages) to perform entity extraction (Cheese is a food stuff) and enrichment (Sheffield is a place with these geospatial co-ordinates). This provides a greater level of description of the term being searched for over and above simple word terms.

Using these techniques, additional search functionality can be provided. Search for all shops visible on a map using a bounding box, radius or polygon geospatial search. Return only documents where these words are within 6 words of each other. Perhaps weight some terms as more important than others, or optional.

These techniques are provided by many of the Enterprise class search engines out there today. Even Open Source tools like Lucene and Solr are catching up with this. They have provided access to information where before we had to rely on Information and Library Services staff to correctly classify incoming documents manually, as they did back in the paper bound days of yore.

Content search only gets you so far though.

I was amening with the best of them until Adam reached the part about MarkLogic 7 going to add Semantic Web capabilities. 😉

I didn’t see any mention of linked data replicating the semantic diversity that currently exists in data stores.

Making data more accessible isn’t going to make it less diverse.

Although making data more accessible may drive the development of ways to manage semantic diversity.

So perhaps there is a useful side to linked data after all.

USPTO – New Big Data App [Value-Add Opportunity]

Monday, April 1st, 2013

U.S. Patent and Trademark Office Launches New Big Data Application on MarkLogic®

From the post:

Real-Time, Granular, Online Access to Complex Manuals Improves Efficiency and Transparency While Reducing Costs

MarkLogic Corporation, the provider of the MarkLogic® Enterprise NoSQL database, today announced that the U.S. Patent and Trademark Office (USPTO) has launched the Reference Document Management Service (RDMS), which uses MarkLogic for real-time searching of detailed, specific, up-to-date content within patent and trademark manuals. RDMS enables real-time search of the Manual of Patent Examining Procedure (MPEP) and the Trademark Manual of Examination Procedures (TMEP). These manuals provide a vital window into the complexities of U.S. patent and trademark laws for inventors, examiners, businesses, and patent and government attorneys.

The thousands of examiners working for USPTO need to be able to quickly locate relevant instructions and procedures to assist in their examinations. The RDMS is enabling faster, easier searches for these internal users.

Having the most current materials online also means that the government can reduce reliance on printed manuals that quickly go out of date. USPTO can also now create and publish revisions to its manuals more quickly, allowing them to be far more responsive to changes in legislation.

Additionally, for the first time ever, the tool has also been made available to the public increasing the MPEP and TMEP accessibility globally, furthering the federal government’s efforts to promote transparency and accountability to U.S. citizens. Patent creators and their trusted advisors can now search and reference the same content as the USPTO examiners, in real time — instead of having to thumb through a printed reference guide.

The date on this report was March 26, 2013.

I don’t know if the USPTO is just playing games but searching their site for “Reference Document Management Service” produces zero “hits.”

Searching for “RDMS” produces four (4) “hits,” none of which were pointers to an interface.

Maybe it was too transparent?

The value-add proposition I was going to suggest was mapping the results of searching into some coherent presentation, like TaxMap.

And/or linking the results of searches into current literature in rapidly developing fields of technology.

Guess both of those opportunities will have to wait for basic searching to be available.

If you have a status update on this announced but missing project please ping me.

MarkLogic Announces Free Developer License for Enterprise [With Odd Condition]

Wednesday, February 13th, 2013

MarkLogic Announces Free Developer License for Enterprise

From the post:

MarkLogic Corporation today announced the availability of a free Developer License for MarkLogic Enterprise Edition.

The Developer License provides access to the features available in MarkLogic Enterprise Edition, including integrated search, government-grade security, clustering, replication, failover, alerting, geospatial indexing, conversion, and a suite of application development tools. MarkLogic also announced the Mongo2MarkLogic converter, a Java-based tool for importing data from MongoDB into MarkLogic providing developers immediate access to features needed to build out enterprise-ready big data solutions.

“By providing a free Developer License we enable developers to quickly deliver reliable, scalable and secure information and analytic applications that are production-ready,” said Gary Bloom, CEO and President of MarkLogic. “Many of our customers first experimented with other free NoSQL products, but turned to MarkLogic when they recognized the need for search, security, support for ACID transactions and other features necessary for enterprise environments. Our goal is to eliminate the cost barrier for developers and give them access to the best enterprise NoSQL platform from the start.”

The Developer License for MarkLogic Enterprise Edition includes tools for faster application development, business intelligence (BI) tool integration, analytic functions and visualization tools, and the ability to create user-defined functions for fast and flexible analysis of huge volumes of data.

You would think that story would merit at least one link to the free developer program.

For your convenience: Developer License for Enterprise Edition. BTW, MarkLogic homepage.

That wasn’t hard. Two links and you have direct access to the topic of the story and the company.

One odd licensing condition:

Q. Can I publish my work done with MarkLogic Server?

A. We encourage you to share your work publicly, but note that you can not disclose, without MarkLogic prior written consent, any performance or capacity statistics or the results of any benchmark test performed on MarkLogic Server.

That sounds just a tad defensive doesn’t it?

I haven’t looked at MarkLogic for a couple of iterations but earlier versions had no need to fear statistics or benchmark tests.

Results vary depending on how testing is done but anyone authorized to recommend or sign acquisition orders should know that.

If they don’t, your organization has more serious problems than needing a MarkLogic server.

MarkLogic 5 is Big Data for the Enterprise

Wednesday, November 2nd, 2011

MarkLogic 5 is Big Data for the Enterprise

From the announcement:

SAN CARLOS, Calif. — November 1, 2011 — MarkLogic® Corporation, the company empowering organizations to make high stakes decisions on Big Data in real time, today announced MarkLogic 5, the latest version of its award-winning product designed for Big Data applications across the enterprise. MarkLogic 5 defines Big Data by empowering organizations to build Big Data applications that make information actionable. With MarkLogic 5, organizations get smarter answers faster by analyzing structured, unstructured, and semi-structured data in the same application. This allows a complete view of the health of the enterprise. Key features include the MarkLogic Connector for Hadoop, which marries large-scale batch processing with the real time Big Data applications MarkLogic has been delivering for a decade. MarkLogic 5 is a visionary step forward for organizations who want to manage complex Big Data on an operational database with confidence at scale. MarkLogic 5 is available today.

“Most of the hype around Big Data has focused only on the big or on the analytics,” said Ken Bado, president and CEO, MarkLogic. “For nearly a decade, MarkLogic has been helping its customers build cost effective Big Data applications that create competitive advantage. That means going beyond big and analytics to make information actionable so organizations can create real value for their business. With MarkLogic, multi-billion dollar companies like JP Morgan Chase and LexisNexis have redefined their business models, while organizations like the U.S. Army and the FAA have the real time, mission-critical information they need to get the job done. These aren’t science projects – they’re real organizations using Big Data applications right now.”

“We believe that MarkLogic 5 is well positioned to help solve many of the Big Data challenges that are emerging in the healthcare industry today,” said Jeff Cunningham, CTO at Informatics Corporation of America. “By incorporating MarkLogic 5 into our CareAlign™ Health Information Exchange platform, we have the ability to securely aggregate, manage, share, and analyze large amounts of patient information derived from a wide variety of sources and formats. These capabilities will help doctors, hospitals, and healthcare systems across the country solve many of the care coordination and population health management challenges that exist in healthcare today.”

There is a lot of noise concerning this release and it will take some time to obtain a favorable signal/noise ratio.

You can help contribute to the signal side of that equation:

Available with MarkLogic 5, the new Express license is free for developers looking to check out MarkLogic. It is limited to use on one computer with at most 2 CPUs and can hold up to 40GB of content. It includes options that make sense on a single computer (geospatial, alerting, conversion) and does not include options intended for clusters or enterprise usage (e.g., replication).

MarkLogic: Beyond NoSQL

Wednesday, August 10th, 2011

MarkLogic: Beyond NoSQL

From the post:

Even though the term, NoSQL, has issues, it’s become important.

Recently, leaders from several NoSQL projects (Riak, HBase, CouchDB, Neo4j) came together for a session at Gluecon. And while they came from divergent perspectives, they all basically agreed that the term had been very helpful to developers and architects in identifying their systems as new database and/or database-alternative technologies.

There have been numerous NoSQL taxonomies, discussions about them, and calls to move beyond them. And while it’s clear to us, as well as our friends and customers, that MarkLogic Server sits among these technologies, we haven’t yet fully described why NoSQL folks should pay attention. To that end, this post is a first step at explaining why and how we’re more than “yet another NoSQL system”. And I’ll start with some context for NoSQL folks.

You should read the post for yourself but suffice for me to say that MarkLogic is an XML database that sports a universal index of the elements, attributes, hierarchy of documents as well as their content.

If that doesn’t sound interesting, see: MarkMail, which is powered by a MarkLogic server.

Interested now?