Archive for the ‘Talend’ Category

Talend 5.5 (DYI Data Integration)

Tuesday, June 3rd, 2014

Talend Increases Big Data Integration Performance and Scalability by 45 Percent

From the post:

Only Talend 5.5 allows developers to generate high performance Hadoop code without needing to be an expert in MapReduce or Pig

(BUSINESS WIRE)–Hadoop Summit — Talend, the global big data integration software leader, today announced the availability of Talend version 5.5, the latest release of the only integration platform optimized to deliver the highest performance on all leading Hadoop distributions.

Talend 5.5 enhances Talend’s performance and scalability on Hadoop by an average of 45 percent. Adoption of Hadoop is skyrocketing and companies large and small are struggling to find enough knowledgeable Hadoop developers to meet this growing demand. Only Talend 5.5 allows any data integration developer to use a visual development environment to generate native, high performance and highly scalable Hadoop code. This unlocks a large pool of development resources that can now contribute to big data projects. In addition, Talend is staying on the cutting edge of new developments in Hadoop that allow big data analytics projects to power real-time customer interactions.


Version 5.5 of all Talend open source products is available for immediate download from Talend’s website, Experimental support for Spark code generation is also available immediately and can be downloaded from the Talend Exchange on Version 5.5 of the commercial subscription products will be available within 3 weeks and will be provided to all existing Talend customers as part of their subscription agreement. Products can be also be procured through the usual Talend representatives and partners.

To learn more about Talend 5.5 with 45 percent faster Big Data integration Performance register here for our June 10 webinar.

When you think of the centuries it took to go from a movable type press to modern word processing and near professional printing/binding capabilities, the enabling of users to perform data processing/integration, is nothing short of amazing.

Data scientists need not fear DYI data processing/integration any more than your local bar association fears “How to Avoid Probate” books on the news stand.

I don’t doubt people will be able to get some answer out of data crunching software but did they get a useful answer? Or an answer sufficient to set company policy? Or an answer that will increase their bottom line?

Encourage the use of open source software. Non-clients who use it poorly will fail. Make sure they can’t say the same about your clients.

BTW, the webinar appears to be scheduled for thirty (30) minutes. Thirty minutes on Talend 5.5? You will be better off spending that thirty minutes with Talend 5.5.

Fun with Music, Neo4j and Talend

Saturday, July 13th, 2013

Fun with Music, Neo4j and Talend by Rik Van Bruggen.

From the post:

Many of you know that I am a big fan of Belgian beers. But of course I have a number of other hobbies and passions. One of those being: Music. I have played music, created music (although that seems like a very long time ago) and still listen to new music almost every single day. So when sometime in 2006 I heard about this really cool music site called, I was one of the early adopters to try use it. So: a good 7 years later and 50k+ scrobbles later, I have quite a bit of data about my musical habits.

On top of that, I have a couple of friends that have been using as well. So this got me thinking. What if I was somehow able to get that data into neo4j, and start “walking the graph”? I am sure that must give me some interesting new musical insights… It almost feels like a “recommendation graph for music” … Let’s see where this brings us.

Usual graph story but made more interesting by the use of Talend ETL tools.

Good opportunity to become familiar with Talend if you don’t know the tools already.

Trying to get the coding Pig,

Thursday, June 27th, 2013

Trying to get the coding Pig, er – monkey off your back?

From the webpage:

Are you struggling with the basic ‘WordCount’ demo, or which Mahout algorithm you should be using? Forget hand-coding and see what you can do with Talend Studio.

In this on-demand webinar we demonstrate how you could become MUCH more productive with Hadoop and NoSQL. Talend Big Data allows you to develop in Eclipse and run your data jobs 100% natively on Hadoop… and become a big data guru over night. Rémy Dubois, big data specialist and Talend Lead developer, shows you in real-time:

  • How to visually create the ‘WordCount’ example in under 5 minutes
  • How to graphically build a big data job to perform sentiment analysis
  • How to archive NoSQL and optimize data warehouse usage

A content filled webinar! Who knew?

Be forewarned that the demos presume familiarity with the Talend interface and the demo presenter is difficult to understand.

From what I got out of the earlier parts of the webinar, very much a step in the right direction to empower users with big data.

Think of the distance between stacks of punch cards (Hadoop/MapReduce a few years ago) and the personal computer (Talend and others).

That was a big shift. This one is likely to be as well.

Looks like I need to spend some serious time with the latest Talend release!

Talend Improves Usability of Big Data…

Wednesday, May 8th, 2013

Talend Improves Usability of Big Data with Release of Integration Platform

Talend today announced the availability of version 5.3 of its next-generation integration platform, a unified environment that scales the integration of data, application and business processes. With version 5.3, Talend allows any integration developer to develop on big data platforms without requiring specific expertise in these areas.

“Hadoop and NoSQL are changing the way people manage and analyze data, but up until now, it has been difficult to work with these technologies. The general lack of skillsets required to manage these new technologies continues to be a significant barrier to mainstream adoption,” said Fabrice Bonan, co-founder and chief technical officer, Talend. “Talend v5.3 delivers on our vision of providing innovative tools that hide the underlying complexity of big data, turning anyone with integration skills into expert big data developers.”

User-Friendly Tools for 100 Percent MapReduce Code

Talend v5.3 generates native Hadoop code and runs data transformations directly inside Hadoop for scalability. By leveraging MapReduce’s architecture for highly distributed data processing, data integration developers can build their jobs on Hadoop without the need for specialist programming skills.

Graphical Mapper for Complex Processes

The new graphical mapping functionality targeting big data, and especially the Pig language, allows developers to graphically build data flows to take source data and transform it using a visual mapper. For Hadoop developers familiar with Pig Latin, this mapper enables them to develop, test and preview their data jobs within a GUI environment.

Additional NoSQL Support

Talend 5.3 adds support for NoSQL databases in its integration solutions, Talend Platform for Big Data and Talend Open Studio for Big Data, with a new set of connectors for Couchbase, CouchDB and Neo4j. Built on Talend’s open source integration technology, Talend Open Studio for Big Data is a powerful and versatile open source solution for big data integration that natively supports Apache Hadoop, including connectors for Hadoop Distributed File System (HDFS), HCatalog, Hive, Oozie, Pig, Sqoop, Cassandra, Hbase and MongoDB – in addition to the more than 450 connectors included natively in the product. The integration of these platforms into Talend’s big data solution enables customers to use these new connectors to migrate and synchronize data between NoSQL databases and all other data stores and systems.

Of particular interest is their data integration package, which reportedly sports 450+ connectors to various data sources.

Unless you are interested in coding all new connectors for the same 450+ data sources.

[P]urchase open source software products…based on its true technical merits [Alert: New Government Concept]

Monday, December 10th, 2012

Talend, an open source based company, took the lead in obtained a favorable ruling on software conformance with the Trade Agreement Act (TAA).

Trade Agreement Act: Quick summary – Goods manufactured in non-designated countries, cannot be purchased by federal agencies. Open source software can have significant contact with non-designated countries. Non-conformance with the TAA, means open source software loses an important market.

Talend obtained a very favorable ruling for open source software. The impact of that ruling:

The Talend Ruling is significant because government users now have useful guidance specifically addressing open source software that is developed and substantially transformed in a designated country, but also includes, or is based upon, source code from a non-designated country,” said Fern Lavallee, DLA Piper LLP (US), counsel to Talend. “Federal agencies can now purchase open source software products like Talend software based on its true technical merits, including ease of use, flexibility, robust documentation and data components and its substantial life-cycle cost advantages, while also having complete confidence in the product’s full compliance with threshold requirements like the TAA. The timing of this Ruling is right given the Department of Defense’s well publicized attention and commitment to Better Buying Power and DoD’s recent Open Systems Architecture initiative. (Quote from Government Agency Gives Talend Green Light on Open Source)

An important ruling for all open source software projects, including topic maps.

I started to post about it when it first appeared but reports of rulings aren’t the same as the rulings themselves.

Talend graciously forwarded a copy of the ruling and gave permission for it to be posted for your review. Talend-Inc-US-Customs-and-Border-Protection-Response-Letter.pdf

Looking forward to news of your efforts to make it possible for governments to buy open source software “…based on its true technical merits.”

Data Integration Services & Hortonworks Data Platform

Thursday, June 28th, 2012

Data Integration Services & Hortonworks Data Platform by Jim Walker

From the post:

What’s possible with all this data?

Data Integration is a key component of the Hadoop solution architecture. It is the first obstacle encountered once your cluster is up and running. Ok, I have a cluster… now what? Do I write a script to move the data? What is the language? Isn’t this just ETL with HDFS as another target?Well, yes…

Sure you can write custom scripts to perform a load, but that is hardly repeatable and not viable in the long term. You could also use Apache Sqoop (available in HDP today), which is a tool to push bulk data from relational stores into HDFS. While effective and great for basic loads, there is work to be done on the connections and transforms necessary in these types of flows. While custom scripts and Sqoop are both viable alternatives, they won’t cover everything and you still need to be a bit technical to be successful.

For wide scale adoption of Apache Hadoop, tools that abstract integration complexity are necessary for the rest of us. Enter Talend Open Studio for Big Data. We have worked with Talend in order to deeply integrate their graphical data integration tools with HDP as well as extend their offering beyond HDFS, Hive, Pig and HBase into HCatalog (metadata service) and Oozie (workflow and job scheduler).

Jim covers four advantages of using Talend:

  • Bridge the skills gap
  • HCatalog Integration
  • Connect to the entire enterprise
  • Graphic Pig Script Creation

Definitely something to keep in mind.

Talend Updates

Sunday, May 20th, 2012

Talend updates data tools to 5.1.0

From the post:

Talend has updated all the applications that run on its Open Studio unified platform to version 5.1.0. Talend’s Open Studio is an Eclipse-based environment that hosts the company’s Data Integration, Big Data, Data Quality, MDM (Master Data Management) and ESB (Enterprise Service Bus) products. The system allows a user to, using the Data Integration as an example, use a GUI to define processes that can extract data from the web, databases, files or other resources, process that data, and feed it on to other systems. The resulting definition can then be compiled into a production application.

In the 5.10 update, OpenStudio for Data Integration has, according to the release notes, been given enhanced XML mapping and support for XML documents in its SOAP, JMS, File and Mom components. A new component has also been added to help manage Kerberos security. Open Studio for Data Quality has been enhanced with new ways to apply an analysis on multiple files, and the ability to drill down through business rules to see the invalid, as well as valid, records selected by the rules.

Upgrading following a motherboard failure so I will be throwing the latest version of software on the new box.

Comments or suggestions on the Talend updates?

7 top tools for taming big data

Thursday, April 19th, 2012

7 top tools for taming big data by Peter Wayner.

Peter covers:

  • Jaspersoft BI Suite
  • Pentaho Business Analytics
  • Karmasphere Studio and Analyst
  • Talend Open Studio
  • Skytree Server
  • Tableau Desktop and Server
  • Splunk

Not as close to the metal as Lucene/Solr, Hadoop, HBase, Neo4j, and many other packages but not bad starting places.

Do be mindful of Peter’s closing paragraph:

At a recent O’Reilly Strata conference on big data, one of the best panels debated whether it was better to hire an expert on the subject being measured or an expert on using algorithms to find outliers. I’m not sure I can choose, but I think it’s important to hire a person with a mandate to think deeply about the data. It’s not enough to just buy some software and push a button.

Talend Open Studio for Big Data w/ Hadoop

Sunday, March 11th, 2012

Talend Empowers Apache Hadoop Community with Talend Open Studio for Big Data

From the post:

Talend, a global leader in open source integration software, today announced the availability of Talend Open Studio for Big Data, to be released under the Apache Software License. Talend Open Studio for Big Data is based on the world’s most popular open source integration product, Talend Open Studio, augmented with native support for Apache Hadoop. In addition, Talend Open Studio for Big Data will be bundled in Hortonworks’ leading Apache Hadoop distribution, Hortonworks Data Platform, constituting a key integration component of Hortonworks Data Platform, a massively scalable, 100 percent open source platform for storing, processing and analyzing large volumes of data.

Talend Open Studio for Big Data is a powerful and versatile open source solution for data integration that dramatically improves the efficiency of integration job design through an easy-to-use graphical development environment. Talend Open Studio for Big Data provides native support for Hadoop Distributed File System (HDFS), Pig, HBase, Sqoop and Hive. By leveraging Hadoop’s MapReduce architecture for highly-distributed data processing, Talend generates native Hadoop code and runs data transformations directly inside Hadoop for maximum scalability. This feature enables organizations to easily combine Hadoop-based processing, with traditional data integration processes, either ETL or ELT-based, for superior overall performance.

“By making Talend Open Studio for Big Data a key integration component of the Hortonworks Data Platform, we are providing Hadoop users with the ability to move data in and out of Hadoop without having to write complex code,” said Eric Baldeschwieler, CTO & co-founder of Hortonworks. “Talend provides the most powerful open source integration solution for enterprise data, and we are thrilled to be working with Talend to provide to the Apache Hadoop community such advanced integration capabilities.”


Talend Open Studio for Big Data will be available in May 2012. A preview version of the product is available immediately at

Good news but we also know that the Hadoop paradigm is evolving: Tuple MapReduce: beyond the classic MapReduce.

Will early adopters of Hadoop be just as willing to migrate as the paradigm develops?

First Look — Talend

Saturday, January 7th, 2012

First Look — Talend

From the post:

Talend has been around for about 6 years and the original focus was on “democratizing” data integration – making it cheaper, easier, quicker and less maintenance-heavy. They originally wanted to build an open source alternative for data integration. In particular they wanted to make sure that there was a product that worked for smaller companies and smaller projects, not just for large data warehouse efforts.

Talend has 400 employees in 8 countries and 2,500 paying customers for their Enterprise product. Talend uses an “open core” philosophy where the core product is open source and the enterprise version wraps around this as a paid product. They have expanded from pure data integration into a broader platform with data quality and MDM and a year ago they acquired an open source ESB vendor and earlier this year released a Talend branded version of this ESB.

I have the Talend software but need to spend some time working through the tutorials, etc.

A review from a perspective of subject identity and re-use of subject identification.

It may help me to simply start posting as I work through the software rather than waiting to create an edited review of the whole. Which I could always fashion from the pieces if it looked useful.

Watch for the start of my review of Talend this next week.

Hadoop User Group UK: Data Integration

Sunday, October 16th, 2011

Hadoop User Group UK: Data Integration

Three presentations captured as podcasts from the Hadoop User Group UK:




Fresh as of 13 October 2011.

Thanks to Skills Matter for making the podcasts available!