Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 17, 2011

Building blocks of a scalable web crawler

Filed under: Indexing,NoSQL,Search Engines,Searching,SQL — Patrick Durusau @ 7:29 pm

Building blocks of a scalable web crawler Thesis by Marc Seeger. (2010)

Abstract:

The purpose of this thesis was the investigation and implementation of a good architecture for collecting, analysing and managing website data on a scale of millions of domains. The final project is able to automatically collect data about websites and analyse the content management system they are using.

To be able to do this efficiently, different possible storage back-ends were examined and a system was implemented that is able to gather and store data at a fast pace while still keeping it searchable.

This thesis is a collection of the lessons learned while working on the project combined with the necessary knowledge that went into architectural decisions. It presents an overview of the different infrastructure possibilities and general approaches and as well as explaining the choices that have been made for the implemented system.

From the conclusion:

The implemented architecture has been recorded processing up to 100 domains per second on a single server. At the end of the project the system gathered information about approximately 100 million domains. The collected data can be searched instantly and the automated generation of statistics is visualized in the internal web interface.

Most of your clients have lesser information demands but the lessons here will stand you in good stead with their systems too.

July 9, 2011

Neo4j 1.4 M06 “Kiruna Stol”

Filed under: Graphs,Neo4j,NoSQL — Patrick Durusau @ 7:00 pm

Neo4j 1.4 M06 “Kiruna Stol”

From the Neo4j blog:

It’s been just a week since the Neo4j 1.4 M05 release, and though we’re pleased with the way the feature set has evolved, during testing we found a potential corruption bug in that specific milestone.

To address that issue, this week we’re releasing our sixth milestone towards the 1.4 GA release. This milestone is likely to be the last of the series for the 1.4 release, and if the community feedback is positive we will transition into our GA release shortly.

Sounds like Neo4j 1.4 is arriving soon!

July 7, 2011

Use Cases Solved in Redis
(TM Use Cases?)

Filed under: NoSQL,Redis — Patrick Durusau @ 4:17 pm

11 Common Web Use Cases Solved in Redis

From the webpage:

In How to take advantage of Redis just adding it to your stack Salvatore ‘antirez’ Sanfilippo shows how to solve some common problems in Redis by taking advantage of its unique data structure handling capabilities. Common Redis primitives like LPUSH, and LTRIM, and LREM are used to accomplish tasks programmers need to get done, but that can be hard or slow in more traditional stores. A very useful and practical article. How would you accomplish these tasks in your framework?

Good post about Redis and common web use cases.

Occurs to me that I don’t have a similar list for topic maps (whatever software you use) as a technology.

Sure, topic map apply when you need to have a common locus for information about a subject or need better modeling of relationships, but that’s all rather vague and hand-wavy.

Here are two examples that are more concrete:

The small office supply store on the town square (this is a true story) had its own internal inventory system with numbers, etc. The small store ordered from several larger suppliers, who all had their own names and internal numbers for the same items. A stable mapping wasn’t an option because the numbers used both by the large suppliers (as well as the descriptions) and the manufacturers were subject to change and reuse.

The small office supply store could see the value in a topic map but the cost in employee time to match up the inventory numbers was less than construction and maintenance of a topic map on top of their internal system. I would say that dynamic inventory control is a topic maps use case.

The other use case involves medical terminology. A doctor I know covers the hospital for an entire local medical practice. He isn’t a specialist in any of the fields covered by the practice so he has to look up the latest medical advances in several fields. Like all of us, he has terms that he learned in for particular conditions, which aren’t the ones in the medical databases. So he has trouble searching from time to time.

He recognized the value of a topic map being able to create a mapping between his terminology and the terminology used by the medical database. It would enable him to search more quickly and effectively. Unfortunately the problem, in these economic times, isn’t pinching enough to result in a project. Personalized search interfaces are another topic map use case.

What’s yours?

MongoSV

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:16 pm

MongoSV

From the homepage:

MongoSV was a four-track, one-day conference on December 3, 2010 at Microsoft Research Silicon Valley in Mountain View, CA. The main conference track featured 10gen founders Dwight Merriman and Eliot Horowitz, as well as Roger Bodamer, the head of 10gen’s west coast operations, and several of the key engineers developing the MongoDB project. These sessions were geared towards developers and administrators interested in learning how to use the database, with sessions on schema design, indexing, administration, deployment strategies, scaling, and other features. A second track showcased several high-profile deployments of the database at Shutterfly, Craigslist, IGN, Intuit, Wordnik, and more. For more experienced users of the database, there were several advanced sessions, covering the storage engine, replication, sharding, and consistency models.

Excellent collection of videos and slides on MongoDB and various aspects of its use.

July 4, 2011

RavenDB

Filed under: Database,NoSQL,RavenDB — Patrick Durusau @ 6:03 pm

RavenDB

Raven is an Open Source (with a commercial option) document database for the .NET/Windows platform. Raven offers a flexible data model design to fit the needs of real world systems. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance.

  • Scalable infrastructure: Raven builds on top of existing, proven and scalable infrastructure
  • Simple Windows configuration: Raven is simple to setup and run on windows as either a service or IIS7 website
  • Transactional: Raven support System.Transaction with ACID transactions. If you put data in it, that data is going to stay there
  • Map/Reduce: Easily define map/reduce indexes with Linq queries
  • .NET Client API: Raven comes with a fully functional .NET client API which implements Unit of Work and much more
  • RESTful: Raven is built around a RESTful API

Haven’t meant to neglect the .Net world, just don’t visit there very often. 😉 Will try to do better in the future.

July 3, 2011

SwiftRiver/Ushahidi

Filed under: Filters,Linguistics,Natural Language Processing,NoSQL,Python — Patrick Durusau @ 7:34 pm

SwiftRiver

From the Get Started page:

The mission of the SwiftRiver initiative is to democratize access to the tools used to make sense of data.

To achieve this goal we’ve taken two approaches, apps and APIs. Apps are user facing and should be tools that are easy to understand, deploy and use. APIs are machine facing and extract meta-context that other machines (apps) use to convey information to the end user.

SwiftRiver is an opensource platform that aims to allow users to do three things well: 1) structure unstructured data feeds, 2) filter and prioritize information conditionally and 3) add context to content. Doing these things well allows users to pull in real-time content from Twitter, SMS, Email or the Web and to make sense of data on the fly.

The Ushahidi logo at the top will take you to a common wiki for Ushahidi and SwithRiver.

And the Ushahidi link in text takes you to: Ushahidi:

We are a non-profit tech company that develops free and open source software for information collection, visualization and interactive mapping.

Home of:

  • Ushahidi Platform: We built the Ushahidi platform as a tool to easily crowdsource information using multiple channels, including SMS, email, Twitter and the web.
  • SwiftRiver: SwiftRiver is an open source platform that aims to democratize access to tools for filtering & making sense of real-time information.
  • Crowdmap: When you need to get the Ushahidi platform up in 2 minutes to crowdsource information, Crowdmap will do it for you. It’s our hosted version of the Ushahidi platform.
  • It occurs to me that mapping email feeds would fit right into my example in Marketing What Users Want…And An Example.

    July 2, 2011

    NoSQL and the Windows Azure Platform

    Filed under: NoSQL — Patrick Durusau @ 3:18 pm

    The Windows Club reports on a new MS whitepaper: NoSQL and the Windows Azure Platform.

    To give you an idea of the “flavor” of the whitepaper, consider the following paragraph:

    But tooling has its value, and that value tends to increase over time, when the imperative of raw implementation has passed and need for smooth maintenance and troubleshooting becomes more pronounced (and economically impactful). The design, diagnostic and operational monitoring capabilities of SQL Server’s tools are significant, and have evolved over the roughly 20-year existence of the product. These tools, including SQL Server Management Studio and its execution plan window, aid greatly in preventing problems, and in solving them quickly when they do arise. NoSQL databases’ more minimalist tooling approach leads to more manual and time-consuming management and troubleshooting than is the case with SQL Azure (which is compatible with SQL Server’s tools), and may also make the process more error prone. The cost impact of this can be significant.

    MS should empower customers to choose between NoSQL and MS SQL Server solutions in using Windows Azure. The SQL Server group will continue to flog its products but it isn’t (or shouldn’t be) seen as synonymous with MS.

    Being the road is a much stronger position than being a building along side the road. Roads get repaired, repaved, widened, extended, while buildings along side the road…, well, you know that part.

    June 29, 2011

    NoSQL should be in your business…

    Filed under: MongoDB,NoSQL — Patrick Durusau @ 9:03 am

    NoSQL should be in your business, and MongoDB could lead the way by Savio Rodrigues.

    From the post:

    NoSQL is still not well understood, as a term or a database market category, by IT decision makers. However, one NoSQL vendor — 10gen, creators of the open source MongoDB — appears to be growing into enterprise accounts and distancing itself from competitors. If you’re considering, or curious about, NoSQL databases, I recommend you spend some time looking at MongoDB.

    One important fact is that the demand for Mongo and MongoDB skills is getting larger.

    If you are looking for more information on MongoDB, check out www.mongodb.org and www.10gen.com but in particular see: www.10gen.com/presentations.

    I saw the notice about the videos in an email alert so had to delete all the tracking URL crap, then delete the path to the specific video with all its tracking crap, then I was able to give you a link to the page with the videos so you could make your own choice. Less tracking, more choice. That sounds like a better plan.

    June 28, 2011

    Neo4j 1.4 M05 “Kiruna Stol”

    Filed under: Graphs,Neo4j,NoSQL — Patrick Durusau @ 9:49 am

    Neo4j 1.4 M05 “Kiruna Stol” – MidSummer Celebration

    From the post:

    Extending the festive atmosphere of Midsummer here in Sweden (though sadly not the copious amounts of beer and strawberries), we’re releasing the final milestone build of Neo4j 1.4. The celebration includes: Auto Indexing, neat new features to the REST API, even cooler Cypher query language features, and a bunch of performance improvements. We’ve also rid ourselves of the 3rd-party service wrapper code (yay!) that caused us and our fellow (mostly Mac) users in the community so much anguish!

    Hooray!

    GoldenOrb – Released

    Filed under: GoldenOrb,Graphs,NoSQL — Patrick Durusau @ 9:48 am

    GoldenOrb

    From the webpage:

    GoldenOrb is a cloud-based open source project for massive-scale graph analysis, built upon best-of-breed software from the Apache Hadoop project modeled after Google’s Pregel architecture. Our goal is to foster solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis.

    Anticipated for some time, see: Beyond MapReduce – Large Scale Graph Processing With GoldenOrb
    .

    June 21, 2011

    Brisk 1.0 Beta 2 Released

    Filed under: Brisk,NoSQL — Patrick Durusau @ 7:08 pm

    Brisk 1.0 Beta 2 Released

    New Features:

    BRISK-12

    Apache Pig Integration. See the DataStax Documentation for more information about using Pig in Brisk.

    BRISK-89

    Job Tracker Failover. See the DataStax Documentation for more information about using the new brisktool movejt command.

    BRISK-207

    New Snappy Compression Codec built on Google Snappy is now used internally for automatic CassandraFS block compression.

    BRISK-180

    Automap Cassandra Column Families to Hive Tables in the Brisk Hive Metastore.

    BRISK-152

    Add a second HDFS layer in CassandraFS for long-term data storage. This is needed because the blocks column family in CFS requires frequent compactions – Hadoop uses it during MapReduce processing to store small files and temporary data. Compaction cleans this temporary data up after it is not needed anymore. Now there is the cfs:/// and cfs-archive:/// endpoints within CFS. The blocks column family in cfs-archive:/// has compaction disabled to improve performance for static data stored in CFS.

    June 20, 2011

    Massively Parallel Database Startup to Reap Multicore Dividends

    Filed under: GPU,NoSQL — Patrick Durusau @ 3:36 pm

    Massively Parallel Database Startup to Reap Multicore Dividends

    From the post:

    The age of multicore couldn’t have come at a better time. Plagued with mounting datasets and a need for suitable architectures to contend with them, organizations are feeling the resource burden in every database corner they look.

    German startup Parstream claims its found an answer to those problems. The company has come up with a solution to harness the power of GPU computing at the dawn of the manycore plus big data day–and it just might be onto something.

    The unique element in ParStream’s offering is that it is able to exploit the coming era of multicore architectures, meaning that it will be able to deliver results faster with lower resource usage. This is in addition to its claim that it can eliminate the need for data decompression entirely, which if it is proven to be the case when their system is available later this summer, could change the way we think about system utilization when performing analytics on large data sets.

    ParStream is appearing as an exhibitor at: ISC’11 Supercomputing Conference 2011, June 20 – 22, 2011. I can’t make the conference but am interested in your reactions to the promised demos.

    Vol. 15: Understanding Dynamo — with Andy Gross

    Filed under: NoSQL,Riak — Patrick Durusau @ 3:31 pm

    Vol. 15: Understanding Dynamo — with Andy Gross

    From the webpage:

    Basho’s VP of Engineering runs us through the tenets of Dynamo systems. From Consistent Hashing to Vector Clocks, Gossip, Hinted Handoffs and Read Repairs. (Recorded on October 13, 2010 in San Francisco, CA.)

    You may want to compare the presentation of Andy Gross at Riak Core: Dynamo Building Blocks. Basically the same material but worded differently.

    June 19, 2011

    A Short Tutorial on Doctor Who (and Neo4j)

    Filed under: Neo4j,NoSQL — Patrick Durusau @ 7:31 pm

    A Short Tutorial on Doctor Who (and Neo4j)

    Where: The Skills Matter eXchange, London
    When: 29 Jun 2011 Starts at 18:30

    From the website:

    With June’s Neo4j meeting we’re moving to our new slot of last Wednesday of the month (29 June). But more importantly, we’re going to be getting our hands on the code. We’ve packed a wealth of Doctor Who knowledge into a graph, ready for you to start querying. At the end of 90 minutes and a couple of Koans, you’ll be answering questions about the Doctor Who universe like a die-hard fan. You’ll need a laptop, your Java IDE of choice, and a copy of the Koans, which you can grab from http://bit.ly/neo4j-koan

    Not a bad way to spend a late June evening in London.

    Someone needs to post this to a Dr. Who fan site. Might attract some folks to Java/Neo4j!

    June 9, 2011

    CouchDB 1.1 Feature Guide

    Filed under: CouchDB,NoSQL — Patrick Durusau @ 6:34 pm

    CouchDB 1.1 Feature Guide

    From Alex Popescu’s myNoSQL, news of a feature guide for CouchDB 1.1 and related links.

    June 8, 2011

    Microsoft Research Watch: AI, NoSQL and
    Microsoft’s Big Data Future

    Filed under: Artificial Intelligence,BigData,NoSQL — Patrick Durusau @ 10:23 am

    Microsoft Research Watch: AI, NoSQL and Microsoft’s Big Data Future

    From the post:

    Probase is a Microsoft Research project described as an “ongoing project that focuses on knowledge acquisition and knowledge serving.” Its primary goal is to “enable machines to understand human behavior and human communication.” It can be compared to Cyc, DBpedia or Freebase in that it is attempting to compile a massive collection of structured data that can be used to power artificial intelligence applications.

    It’s powered by a new graph database called Trinity, which is also a Microsoft Research project. Trinity was spotted today by MyNoSQL blogger Alex Popescu, and that led us to Probase. Neither project seems to be available to the public yet.

    These and other projects shed some light on Microsoft’s search and big data ambitions.

    Doesn’t hurt to keep track of what people with a proven track record of making money, if not always producing useful software, are up to.

    BTW, when you read the article quoting MS on Probase where it says:

    …as evidences that can add to or modify the claims and beliefs in Probase. This means Probase is able to integrate information of varied quality from heterogeneous data sources.

    Whoa! There is a long step from treating statements about a commonly identified subject, statement X is about the birth certificate of Barack Obama, to integrating information from heterogeneous data sources, which identify a subject differently, hospital record of B.H. Obama.

    Looking forward to learning more about Trinity.

    May 27, 2011

    Riak Core: Dynamo Building Blocks

    Filed under: NoSQL,Riak — Patrick Durusau @ 12:36 pm

    Riak Core: Dynamo Building Blocks

    Highly recommended!

    Summary:

    Andy Gross discusses the design philosophy behind Riak based on Amazon Dynamo – Gossip Protocol, Consistent Hashing, Vector clocks, Read Repair, etc. -, overviewing its main features and architecture.

    Amazon’s Dynamo paper:

    Dynamo: Amazon’s Highly Available Key-value Store (HTML)

    Dynamo: Amazon’s Highly Available Key-value Store (PDF)

    One of the more intriguing slide represented http/apps/dbs as a stack to show that while scaling of the http layer is well-known, scaling of apps is more difficult but still doable, the scaling of storage is the most expensive and difficult.

    I mention that because scaling of databases I suspect has a lot in common with scaling of topic maps.

    On the issue of consistency, the point was made that “expires” can be included in HTTP headers, which indicate a fact is good until some time. I wonder, could a topic have a “last merged” property? So that a user can choose the timeliness they need? So that “last merged” 7 days ago is public information, “last merged” 3 days ago is subscriber information and the most recent “last merged” is premium information.

    For example, instead of trying to regulate insider trading, the SEC could create a topic map of stocks and sell insider trading information, suitably priced to keep its “insider” character, except that for enough money, anyone could play. The SEC portion of the subscription + selling price could be used to finance other enforcement activities.

    This presentation plus the Amazon paper make nice weekend reading/viewing.

    May 25, 2011

    Near Bare Metal – Acunu

    Filed under: Acunu,Cassandra,NoSQL — Patrick Durusau @ 1:27 pm

    Acunu Storage Platform

    From the webpage:

    The Acunu Storage Platform is a powerful storage solution that brings simpler, faster and more predictable performance to NOSQL stores like Apache Cassandra.

    Our view is that the new data intensive workloads that are increasingly common are a poor match for the legacy storage systems they tend to run on. These systems are built on a set of assumptions about the capacity and performance of hardware that are simply no longer true. The Acunu Storage Platform is the result of a radical re-think of those assumptions; the result is high performance from low cost commodity hardware.

    It includes the Acunu Storage Core which runs in the Linux kernel. On top of this core, we provide a modified version of Apache Cassandra. This is essentially the same as “vanilla” Cassandra but uses the Acunu Storage Core to store data instead of the Linux file system and is therefore able to take advantage of the performance benefits of our platform. In addition to Cassandra, there is also an object store similar to Amazon’s S3; we have a number of other more experimental projects in the pipeline which we’ll talk about in future posts.

    Perhaps the start of something very interesting.

    It took NoSQL a couple of years to flower into the range of current offerings.

    I wonder if working in the kernel will have a similar path?

    Will we see a graph engine as part of the kernel?

    Hadoop Dont’s: What not to do to harvest Hadoop’s full potential

    Filed under: Hadoop,Humor,NoSQL — Patrick Durusau @ 1:26 pm

    Hadoop Dont’s: What not to do to harvest Hadoop’s full potential by Iwona Bialynicka-Birula.

    From the post:

    We’ve all heard this story. All was fine until one day your boss heard somewhere that Hadoop and No-SQL are the new black and mandated that the whole company switch over whatever it was doing to the Hadoop et al. technology stack, because that’s the only way to get your solution to scale to web proportions while maintaining reliability and efficiency.

    So you threw away your old relational database back end and maybe all or part of your middle tier code, bought a couple of books, and after a few days of swearing got your first MapReduce jobs running. But as you finished re-implementing your entire solution, you found that not only is the system way less efficient than the old one, but it’s not even scalable or reliable and your meetings are starting more and more to resemble the Hadoop Downfall parody.

    An excellent post on problems to avoid with Hadoop!

    May 17, 2011

    sones GraphDB 2.0

    Filed under: GraphDB,NoSQL — Patrick Durusau @ 2:51 pm

    sones GraphDB 2.0

    From the press release on GraphDB 2.0:

    – High-performance graph database based on a property hypergraph – Optimized for multiprocessor/multicore systems – Platform-independent (Linux, Windows, OSX) – Modular architecture – OpenSource (AGPLv3) and proprietary enterprise license – Intuitive, easy-to-learn query language: GQL (Graph Query Language) – Powerful API and traverse API – Integrated REST interfaces and administration tools – Optional persistence plug-ins – Client libraries in many popular programming languages (Java, C#, Javascript, PHP, …) – Integrated Javascript UI

    Another OpenSource high performance graph database.

    Graph databases seem to be on the rise in popularity.

    Can model the relational model and more with graph databases.

    But is that like markup trees being subsets of graphs?

    That we find it easier to use subsets of the capabilities of graphs?

    May 12, 2011

    Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

    Filed under: Cassandra,CouchDB,HBase,MongoDB,NoSQL,Redis,Riak — Patrick Durusau @ 7:56 am

    Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

    Good thumb-nail comparison of the major features of all six (6) NoSQL databases by Kristóf Kovács.

    Sorry to see that Neo4J didn’t make the comparison.

    May 10, 2011

    Hypertable 0.9.5.0 pre-release

    Filed under: Hypertable,NoSQL — Patrick Durusau @ 3:30 pm

    Stability Improvements in the Hypertable 0.9.5.0 pre-release

    From the Hypertable blog:

    We recently announced the Hypertable 0.9.5.0 pre-release. Even though we’ve labelled it as a “pre” release, it is one of the biggest and most important Hypertable releases to date. Among other things, it includes a complete re-write of the Master, to fix some known stability problems. It represents a significant amount of work as can be seen by the following code change statistics:

    • 512 files changed
    • 30,633 line insertions
    • 14,354 line deletions

    The following describes problems that existed in prior releases and how they were solved, and highlights other stability improvements included in the 0.9.5.0 pre-release.

    Details on the recent “pre-release” of Hypertable.

    May 9, 2011

    leveldb

    Filed under: leveldb,NoSQL — Patrick Durusau @ 10:33 am

    leveldb

    A NoSQL library.

    From the website:

    LevelDB is a library that implements a fast persistent key-value store.

    Features

    • Keys and values are arbitrary byte arrays.
    • Data is stored sorted by key.
    • Callers can provide a custom comparison function to override the sort order.
    • The basic operations are Put(key,value), Get(key), Delete(key).
    • Multiple changes can be made in one atomic batch.
    • Users can create a transient snapshot to get a consistent view of data.
    • Forward and backward iteration is supported over the data.
    • Data is automatically compressed using the Snappy compression library.
    • External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.
    • Detailed documentation about how to use the library is included with the source code.

    Limitations

    • This is not a SQL database. It does not have a relational data model, it does not support SQL queries, and it has no support for indexes.
    • Only a single process (possibly multi-threaded) can access a particular database at a time.
    • There is no client-server support builtin to the library. An application that needs such support will have to wrap their own server around the library.

    May 7, 2011

    Cassandra – New Beta

    Filed under: Cassandra,NoSQL — Patrick Durusau @ 5:50 pm

    Cassandra – New Beta

    Version 0.8.0 beta2 has been posted!

    Changes.

    NoSQL Databases

    Filed under: NoSQL — Patrick Durusau @ 5:49 pm

    NoSQL Databases

    I saw this on the High Scalability blog. Its a 120+ page overview of NoSQL databases by Christof Strauch, from Stuttgart Media University.

    Christof is quoted as saying the goals of the paper were:

    The paper aims at giving a systematic and thorough introduction and overview of the NoSQL field by assembling information dispersed among blogs, wikis and scientific papers. It firstly discusses reasons, rationales and motives for the development and usage of nonrelational database systems. These can be summarized by the need for high scalability, the processing of large amounts of data, the ability to distribute data among many (often commodity) servers, consequently a distribution-aware design of DBMSs.

    The paper then introduces fundamental concepts, techniques and patterns that are commonly used by NoSQL databases to address consistency, partitioning, storage layout, querying, and distributed data processing. Important concepts like eventual consistency and ACID vs. BASE transaction characteristics are discussed along with a number of notable techniques such as multi-version storage, vector clocks, state vs. operational transfer models, consistent hashing, MapReduce, and row-based vs. columnar vs. log-structured merge tree persistence.

    As a first class of NoSQL databases, key-value-stores are examined by looking at the proprietary, fully distributed, eventual consistent Amazon Dynamo store as well as popular opensource key-value-stores like Project Voldemort, Tokyo Cabinet/Tyrant and Redis.

    In the following, document stores are being observed by reviewing CouchDB and MongoDB as the two major representatives of this class of NoSQL databases. Lastly, the paper takes a look at column-stores by discussing Google’s Bigtable, Hypertable and HBase, as well as Apache Cassandra which integrates the full-distribution and eventual consistency of Amazon’s Dynamo with the data model of Google’s Bigtable.”

    May 5, 2011

    Lily 1.0: Smart Data, at Scale, made Easy

    Filed under: Lily,NoSQL — Patrick Durusau @ 1:44 pm

    Lily 1.0: Smart Data, at Scale, made Easy

    From the blog entry:

    We’re really proud to release the first official major release of Lily – our flagship repository for scalable data and content management, after 18 months of intense engineering work. Along this event, we are also launching our commercial Lily services, and announcing some early-stage customers and partners. We’re thrilled being first to launch the first open source, general-purpose, highly-scalable yet flexible data repository based on NOSQL/BigData technology: read all about it below.

    What

    Lily is Smart Data, at Scale, made Easy. Lily is a data and content repository made for the Age of Data: it allows you to store and manage vast amounts of data, and in the future will allow you to monetize user interactions by tracking and analyzing audience data.

    Lily makes Big Data easy with a high-level, developer-friendly data model with rich types, versioning and schema management. Lily offers simple Java and REST APIs for creating, reading and managing data. Its flexible indexing mechanism supports interactive and batch-oriented index maintenance.

    Lily is the foundation for any large-scale data-centric application: social media, e-commerce, large content management applications, product catalogs, archiving, media asset management: any data-centric application with an ambition to scale beyond a single-server setup. Don’t focus on scale and infrastructure: we’ll do that for you while you can focus on real differentiators.

    Lily is dead serious about Scale. The Lily repository has been tested to scale beyond any common content repository technology out there, due to its inherently distributed architecture, providing economically affordable, robust, and high-performing data management services for any kind of enterprise application.

    NoSQL databases for the .NET developer: What’s the fuss all about?

    Filed under: Marketing,NoSQL — Patrick Durusau @ 1:43 pm

    NoSQL databases for the .NET developer: What’s the fuss all about?

    Date: May 24 2011 – 2:00pm – 3:00pm EST

    http://www.regonline.com/970013

    From the post:

    NOSQL (Not Only SQL) databases are one of the hottest technology trends in the software industry. Ranging from web companies like Facebook, Foursquare, Twitter to IT power houses such as the US Federal Government, Banks or NASA; the number of companies that invest in the NOSQL paradigm as part of their infrastructure is growing exponentially. What is this NOSQL movement? What are the different types of NOSQL databases? What are the real advantages, challenges and ROIs? Can we leverage NOSQL databases from my .NET applications? This webinar will present an overview of the NOSQL movement from the perspectives of a .NET developer. We will explore the different types of NOSQL databases as well as their .NET interfaces. Finally, we will present a series of real world examples that illustrate how other companies have taken advantage of NOSQL databases as part of their infrastructure.

    Are you ready to leverage a NoSQL database from inside a topic map .Net application?

    May 2, 2011

    Putting and Getting Data from a Database

    Filed under: Graphs,Key-Value Stores,NoSQL — Patrick Durusau @ 10:33 am

    Putting and Getting Data from a Database

    Overview of database structures and and data operations on those structures by Marko A. Rodriguez.

    Marko covers primitive, key-value, and document stores, plus graph databases.

    There are a number of database structures in addition to those four, although those are certainly popular ones.

    I would not put too much stock into claims about one form of technology or another until I saw it in operation with my data.

    That software works great with someone else’s data isn’t all that interesting, either for you or your manager.

    Neo4J – @emileifrem

    Filed under: Neo4j,NoSQL — Patrick Durusau @ 10:31 am

    Need a graph database like Twitter is built on? @neo4j delivers, @emileifrem tells why

    Emil Eifrem is the CEO of Neo Technology and co-founder of the Neo4J project.

    Nothing technical but an engaging overview of why Neo4J matters.

    It was amusing when the interviewer asked about scaling.

    Emil said the then current release goes up to 12 billion nodes.

    Calls transparent (to the user) partitioning of a graph, for further scaling, a “non-trivial CS problem.”

    But also says that will be solved in Neo4J 2.0.

    May 1, 2011

    Installing and using Apache Cassandra With Java Part 1 (Installation)

    Filed under: Cassandra,NoSQL — Patrick Durusau @ 5:25 pm

    Installing and using Apache Cassandra With Java Part 1 (Installation)

    This series starts here and goes for five (5) parts for Cassandra 0.6.4.

    From the introduction:

    I’m going to write a few postings on how to use the Cassandra database with Java, although i am in no way an expert on how to use Cassandra i am very intrigued about the database because of it’s small installation, high performance and scalability. During the writing of these posts i am also learning the Cassandra database and i’m sharing my experiences with it through my posts on this blog.

    Like i said before, Cassandra is a very high performing and scalable database, it doesn’t follow the normal SQL database principles like schema’s, tables / columns, datatypes and a query language like SQL. Instead it’s a non-relational database similar to Google’s BigTable. Cassandra was initially developed by Facebook which has contributed it to the open source community. Currently it is used by websites like Facebook, Twitter, Digg, Rackspace and many others. So even though it is still only version 0.6 at the time of writing this it has already proven itself in production environments.

    It isn’t possible to say which (if any) of the NoSQL databases will prove to be the best fits for topic maps in particular or general situations.

    What is clear is that a lot of experimentation and development is underway and hopefully the results will be interesting.

    « Newer PostsOlder Posts »

    Powered by WordPress