Archive for the ‘Finance Services’ Category

If You Don’t Think “Working For The Man” Is All That Weird

Saturday, June 17th, 2017

J.P.Morgan’s massive guide to machine learning and big data jobs in finance by Sara Butcher.

From the post:

Financial services jobs go in and out of fashion. In 2001 equity research for internet companies was all the rage. In 2006, structuring collateralised debt obligations (CDOs) was the thing. In 2010, credit traders were popular. In 2014, compliance professionals were it. In 2017, it’s all about machine learning and big data. If you can get in here, your future in finance will be assured.

J.P. Morgan’s quantitative investing and derivatives strategy team, led Marko Kolanovic and Rajesh T. Krishnamachari, has just issued the most comprehensive report ever on big data and machine learning in financial services.

Titled, ‘Big Data and AI Strategies’ and subheaded, ‘Machine Learning and Alternative Data Approach to Investing’, the report says that machine learning will become crucial to the future functioning of markets. Analysts, portfolio managers, traders and chief investment officers all need to become familiar with machine learning techniques. If they don’t they’ll be left behind: traditional data sources like quarterly earnings and GDP figures will become increasingly irrelevant as managers using newer datasets and methods will be able to predict them in advance and to trade ahead of their release.

At 280 pages, the report is too long to cover in detail, but we’ve pulled out the most salient points for you below.

How important is Sarah’s post and the report by J.P. Morgan?

Let put it this way: Sarah’s post is the first business type post I have saved as a complete webpage so I can clean it up and print without all the clutter. This year. Perhaps last year as well. It’s that important.

Sarah’s post is a quick guide to the languages, talents and tools you will need to start “working for the man.”

It that catches your interest, then Sarah’s post is pure gold.


PS: I’m still working on a link for the full 280 page report. The switchboard is down for the weekend so I will be following up with J.P. Morgan on Monday next.

What Counts: Harnessing Data for America’s Communities

Friday, January 16th, 2015

What Counts: Harnessing Data for America’s Communities Senior Editors: Naomi Cytron, Kathryn L.S. Pettit, & G. Thomas Kingsley. (new book, free pdf)

From: A Roadmap: How To Use This Book

This book is a response to the explosive interest in and availability of data, especially for improving America’s communities. It is designed to be useful to practitioners, policymakers, funders, and the data intermediaries and other technical experts who help transform all types of data into useful information. Some of the essays—which draw on experts from community development, population health, education, finance, law, and information systems—address high-level systems-change work. Others are immensely practical, and come close to explaining “how to.” All discuss the incredibly exciting opportunities and challenges that our ever-increasing ability to access and analyze data provide.

As the book’s editors, we of course believe everyone interested in improving outcomes for low-income communities would benefit from reading every essay. But we’re also realists, and know the demands of the day-to-day work of advancing opportunity and promoting well-being for disadvantaged populations. With that in mind, we are providing this roadmap to enable readers with different needs to start with the essays most likely to be of interest to them.

For everyone, but especially those who are relatively new to understanding the promise of today’s data for communities, the opening essay is a useful summary and primer. Similarly, the final essay provides both a synthesis of the book’s primary themes and a focus on the systems challenges ahead.

Section 2, Transforming Data into Policy-Relevant Information (Data for Policy), offers a glimpse into the array of data tools and approaches that advocates, planners, investors, developers and others are currently using to inform and shape local and regional processes.

Section 3, Enhancing Data Access and Transparency (Access and Transparency), should catch the eye of those whose interests are in expanding the range of data that is commonly within reach and finding ways to link data across multiple policy and program domains, all while ensuring that privacy and security are respected.

Section 4, Strengthening the Validity and Use of Data (Strengthening Validity), will be particularly provocative for those concerned about building the capacity of practitioners and policymakers to employ appropriate data for understanding and shaping community change.

The essays in section 5, Adopting More Strategic Practices (Strategic Practices), examine the roles that practitioners, funders, and policymakers all have in improving the ways we capture the multi-faceted nature of community change, communicate about the outcomes and value of our work, and influence policy at the national level.

There are of course interconnections among the essays in each section. We hope that wherever you start reading, you’ll be inspired to dig deeper into the book’s enormous richness, and will join us in an ongoing conversation about how to employ the ideas in this volume to advance policy and practice.

Thirty-one (31) essays by dozens of authors on data and its role in public policy making.

From the acknowledgements:

This book is a joint project of the Federal Reserve Bank of San Francisco and the Urban Institute. The Robert Wood Johnson Foundation provided the Urban Institute with a grant to cover the costs of staff and research that were essential to this project. We also benefited from the field-building work on data from Robert Wood Johnson grantees, many of whom are authors in this volume.

If you are pitching data and/or data projects where the Federal Reserve Bank of San Francisco/Urban Institute set the tone of policy making conversations, a must read. It is likely to have an impact on other policy discussions, but adjusted for local concerns and conventions. You could also use it to shape your local policy discussions.

I first saw this in There is no seamless link between data and transparency by Jennifer Tankard.

While We Were Distracted….

Tuesday, January 13th, 2015

I have long suspected that mainstream news, with its terrorist attacks, high profile political disputes, etc., is a dangerous distraction. Here is one more brick to shore up that opinion.

Congress attempts giant leap backward on data transparency by Pam Baker.

From the post:

The new Republican Congress was incredibly busy on its first full day at work. 241 bills were introduced on that day and more than a few were highly controversial. While polarizing bills on abortion, Obamacare and immigration got all the media headlines, one very important Congressional action dipped beneath the radar: an attempt to eliminate data transparency in financial reporting.

The provision to the “Promoting Job Creation and Reducing Small Business Burdens Act” would exempt nearly 60 percent of public companies from filing data-based reports with the Securities and Exchange Commission (SEC), according to the Data Transparency Coalition.

“This action will set the U.S. on a path backwards and put our financial regulators, public companies and investors at a significant disadvantage to global competitors. It is tremendously disappointing to see that one of the first actions of the new Congress is to put forward legislation that would harm American competitiveness and deal a major setback to data transparency in financial regulation,” said Hudson Hollister, the executive director of the Data Transparency Coalition, a trade association pursuing the publication of government information as standardized, machine-readable data.

See Pam’s post for some positive steps you can take with regard to this bill and how to remain informed about similar attempts in the future.

To be honest apparently the SEC is having all sorts of data management difficulties but given the success rate of government data projects, that’s not all that hard to believe. But the solution to such a problem isn’t to simply stop collecting information.

No doubt the SEC is locked into various custom/proprietary systems, but what if they opened up all the information about those systems for an open source project, say under the Apache Foundation, to integrate some specified data set into their systems?

It surely could not fare any worse than projects for which the government hires contractors.

Tracking Government/Terrorist Financing

Wednesday, December 17th, 2014

Deep Learning Intelligence Platform – Addressing the KYC AML Terrorism Financing Challenge Dr. Jerry A. Smith.

From the post:

Terrorism impacts our lives each and every day; whether directly through acts of violence by terrorists, reduced liberties from new anti-terrorism laws, or increased taxes to support counter terrorism activities. A vital component of terrorism is the means through which these activities are financed, through legal and illicit financial activities. Recognizing the necessity to limit these financial activities in order to reduce terrorism, many nation states have agreed to a framework of global regulations, some of which have been realized through regulatory programs such as the Bank Secrecy Act (BSA).

As part of the BSA (an other similar regulations), governed financial services institutions are required to determine if the financial transactions of a person or entity is related to financing terrorism. This is a specific report requirement found in Response 30, of Section 2, in the FinCEN Suspicious Activity Report (SAR). For every financial transaction moving through a given banking system, the institution need to determine if it is suspicious and, if so, is it part of a larger terrorist activity. In the event that it is, the financial services institution is required to immediately file a SAR and call FinCEN.

The process of determining if a financial transaction is terrorism related is not merely a compliance issue, but a national security imperative. No solution exist today that adequately addresses this requirement. As such, I was asked to speak on the issue as a data scientist practicing in the private intelligence community. These are some of the relevant points from that discussion.

Jerry has a great outline of the capabilities you will need for tracking government/terrorist financing. Depending upon your client’s interest, you may be required to monitor data flows in order to trigger the filing of a SAR and calling FinCEN or to avoid triggering the filing of a SAR and calling FinCEN. For either goal the tools and techniques are largely the same.

Or for monitoring government funding for torture or groups to carry out atrocities on its behalf. Same data mining techniques apply.

Have you ever noticed that government data leaks rarely involve financial records? Thinking of the consequences of the accounts payable ledger that listed all the organizations and people paid by the Bush administration, sans all the SS and retirement recipients.

That would be near the top of my most wanted data leaks list.



Monday, March 17th, 2014

ACTUS (Algorithmic Contract Types Unified Standards)

From the webpage:

The Alfred P. Sloan Foundation awarded Stevens Institute of Technology a grant to work on the proposal entitled “Creating a standard language for financial contracts and a contract-centric analytical framework”. The standard follows the theoretical groundwork laid down in the book “Unified Financial Analysis” (1) – UFA.The goal of this project is to build a financial instrument reference database that represents virtually all financial contracts as algorithms that link changes in risk factors (market risk, credit risk, and behavior, etc.) to cash flow obligations of financial contracts. This reference database will be the technological core of a future open source community that will maintain and evolve standardized financial contract representations for the use of regulators, risk managers, and researchers.

The objective of the project is to develop a set of about 30 unique contract types (CT’s) that represent virtually all existing financial contracts and which generate state contingent cash flows at a high level of precision. The term of art that describes the impact of changes in the risk factors on the cash flow obligations of a financial contract is called “state contingent cash flows,” which are the key input to virtually all financial analysis including models that assess financial risk.

1- Willi Brammertz, Ioannis Akkizidis, Wolfgang Breymann, Rami Entin, Marco Rustmann; Unified Financial Analysis – The Missing Links of Finance, Wiley 2009.

This will help with people who are not cheating in the financial markets.

After the revelations of the past couple of years, any guesses on the statistics of non-cheating members of the financial community?


Even if these are used by non-cheaters, we know that the semantics are going to vary from user to user.

The real questions are: 1) How will we detect semantic divergence? and 2) How much semantic divergence can be tolerated?

I first saw this in a tweet by Stefano Bertolo.

Islamic Finance: A Quest for Publically Available Bank-level Data

Wednesday, February 12th, 2014

Islamic Finance: A Quest for Publically Available Bank-level Data by Amin Mohseni-Cheraghlou.

From the post:

Attend a seminar or read a report on Islamic finance and chances are you will come across a figure between $1 trillion and $1.6 trillion, referring to the estimated size of the global Islamic assets. While these aggregate global figures are frequently mentioned, publically available bank-level data have been much harder to come by.

Considering the rapid growth of Islamic finance, its growing popularity in both Muslim and non-Muslim countries, and its emerging role in global financial industry, especially after the recent global financial crisis, it is imperative to have up-to-date and reliable bank-level data on Islamic financial institutions from around the globe.

To date, there is a surprising lack of publically available, consistent and up-to-date data on the size of Islamic assets on a bank-by-bank basis. In fairness, some subscription-based datasets, such Bureau Van Dijk’s Bankscope, do include annual financial data on some of the world’s leading Islamic financial institutions. Bank-level data are also compiled by The Banker’s Top Islamic Financial Institutions Report and Ernst & Young’s World Islamic Banking Competitiveness Report, but these are not publically available and require subscription premiums, making it difficult for many researchers and experts to access. As a result, data on Islamic financial institutions are associated with some level of opaqueness, creating obstacles and challenges for empirical research on Islamic finance.

The recent opening of the Global Center for Islamic Finance by World Bank Group President Jim Young Kim may lead to exciting venues and opportunities for standardization, data collection, and empirical research on Islamic finance. In the meantime, the Global Financial Development Report (GFDR) team at the World Bank has also started to take some initial steps towards this end.

I can think of two immediate benefits from publicly available data on Islamic financial institutions:

First, hopefully it will increase demands for meaningful transparency in Western financial institutions.

Second, it will blunt government hand waving and propaganda about the purposes of Islamic financial institutions. Which on a par with financial institutions everywhere want to remain solvent, serve the needs of their customers and play active roles in their communities. Nothing more sinister than that.

Perhaps the best way to vanquish suspicion is with transparency. Except for the fringe cases who treat lack of evidence as proof of secret evil doing.

Transparency and Bank Failures

Sunday, January 12th, 2014

The Relation Between Bank Resolutions and Information Environment: Evidence from the Auctions for Failed Banks by João Granja.


This study examines the impact of disclosure requirements on the resolution costs of failed banks. Consistent with the hypothesis that disclosure requirements mitigate information asymmetries in the auctions for failed banks, I find that, when failed banks are subject to more comprehensive disclosure requirements, regulators incur lower costs of closing a bank and retain a lower portion of the failed bank’s assets, while bidders that are geographically more distant are more likely to participate in the bidding for the failed bank. The paper provides new insights into the relation between disclosure and the reorganization of a banking system when the regulators’ preferred plan of action is to promote the acquisition of undercapitalized banks by healthy ones. The results suggest that disclosure regulation policy influences the cost of resolution of a bank and, as a result, could be an important factor in the definition of the optimal resolution strategy during a banking crisis event.

A reminder that transparency needs to be broader than open data in science and government.

In the case of bank failures, transparency lowers the cost of such failures for the public.

Some interests profit from less transparency in bank failures and other interests (like the public) profit from greater transparency.

If bank failure doesn’t sound like a current problem, consider Map of Banks Failed since 2008. (Select from Failed Banks Map (under Quick Links) to display the maps.) U.S. only. Do you know of a similar map for other countries?

Speaking of transparency, it would be interesting to track the formal, financial and social relationships of those acquiring failed bank assets.

You know, the ones that are selling for less than fair market value due to a lack of transparency.


Tuesday, January 7th, 2014

Unaccountable: The high cost of the Pentagon’s bad bookkeeping.

Part 1: Number Crunch by by Scot J. Paltrow and Kelly Carr (July 2, 2013)

Part 2: Faking It. by Scot J. Paltrow (November 18, 2013)

Part 3: Broken Fixes by Scot J. Paltrow (December 23, 2013)

If you imagine NSA fraud as being out of control, you haven’t seen anything yet.

Stated bluntly, bad bookkeeping by the Pentagon has a negative impact on its troops, its ability to carry out its primary missions and is a sinkhole for taxpayer dollars.

If you make it to the end of Part 3, you will find:

  • The Pentagon was required to be auditable by 1996 (with all other federal agencies). The current, largely fictional deadline is 2017.
  • Since 1996, the Pentagon has spent an unaudited $8.5 trillion.
  • The Pentagon may have as many as 5,000 separate accounting systems.
  • Attempts to replace Pentagon accounting systems have been canceled after expenditures of $1 billion on more than one, as failures.
  • There are no legal consequences for the Pentagon, the military services, their members or civilian contractors if the Pentagon fails to meet audit deadlines.

If external forces were degrading the effectiveness of the U.S. military to this degree, Congress would be hot to fix the problem.

Topic maps aren’t a complete answer to this problem but they could help with the lack of accountability for the problem. Every order originates with someone approving it. Topic maps could bind that order to a specific individual and track its course through whatever systems exist today.

A running total of unaudited funds would be kept for every individual who approved an order. If those funds cannot be audited within say 90 days of the end of the fiscal year, that a lien is placed against any and all benefits they have accrued to that point. And everyone higher than themselves in the chain of command. To give commanders “skin in the game.”

Tracking of responsibility and not the funds, with automatic consequences for failure, would provide incentives for the Pentagon to improve the morale of its troops, to improve its combat readiness and to be credible when asking the Congress and American pubic for additional funds for specific purposes.

Do you have similar problems at your enterprise?

Financial Data Accessible from R – Part IV

Saturday, December 14th, 2013

The R Trader blog is collecting sources of financial data accessible from R.

Financial Data Accessible from R IV

From the post:

DataMarket is the latest data source of financial data accessible from R I came across. A good tutorial can be found here. I updated the table and the descriptions below.

R Trader is a fairly new blog but I like the emphasis on data sources.

Not the largest list of data sources for financial markets I have ever seen but then it isn’t the quantity of data that makes a difference. (Ask the NSA about 9/11.)

What makes a difference is your skill at collecting the right data and at analyzing it.

What’s Not There: The Odd-Lot Bias in TAQ Data

Tuesday, December 10th, 2013

What’s Not There: The Odd-Lot Bias in TAQ Data by Maureen O’Hara, Chen Yao, and, Mao Ye.


We investigate the systematic bias that arises from the exclusion of trades for less than 100 shares from TAQ data. In our sample, we find that the median number of missing trades per stock is 19%, but for some stocks missing trades are as high as 66% of total transactions. Missing trades are more pervasive for stocks with higher prices, lower liquidity, higher levels of information asymmetry and when volatility is low. We show that odd lot trades contribute 30 % of price discovery and trades of 100 shares contribute another 50%, consistent with informed traders splitting orders into odd-lots and smaller trade sizes. The truncation of odd-lot trades leads to a significant bias for empirical measures such as order imbalance, challenges the literature using trade size to proxy individual trades, and biases measures of individual sentiment. Because odd-lot trades are more likely to arise from high frequency traders, we argue their exclusion from TAQ and the consolidated tape raises important regulatory issues.

TAQ = Trade and Quote Detail.

Amazing what you can find if you go looking for it. O’Hara and friends find that missing trades can be as much as 66% of the total transactions for some stocks.

The really big news is that from this academic paper, US regulators required disclosure of this hidden data starting on December 9, 2013

For access, see the Daily TAQ, where you will find the raw data for $1,500 per year for one user.

Despite its importance to the public, I don’t know of any time-delayed public archive of trade data.

Format specifications and sample data are available for:

  • Daily Trades File: Every trade reported to the consolidated tape, from all CTA participants. Each trade identifies the time, exchange, security, volume, price, sale condition, and more.
  • Daily TAQ Master File (Beta): (specification only)
  • Daily TAQ Master File: All master securities information in NYSE-listed and non-listed stocks, including Primary Exchange Indicator
  • Daily Quote and Trade Admin Message File: All Limit-up/Limit-down Price Band messages published on the CTA and UTP trade and quote feeds. The LULD trial is scheduled to go live with phase 1 on April 8, 2013.
  • Daily NBBO File: An addendum to the Daily Quotes file, containing continuous National Best Bid and Offer updates and consolidated trades and quotes for all listed and non-listed issues.
  • Daily Quotes File: Every quote reported to the consolidated tape, from all CTA participants. Each quote identifies the time, exchange, security, bid/ask volumes, bid/ask prices, NBBO indicator, and more.

Merging financial data with other data, property transactions/ownership, marriage/divorce, and other activities are a topic map activity.

WBG Topical Taxonomy

Tuesday, November 26th, 2013

WBG Topical Taxonomy

From the description:

The WBG Taxonomy is a classification schema which represents the concepts used to describe the World Bank Group’s topical knowledge domains and areas of expertise – expertise – the ‘what we do’ and ‘what we know’ aspect of the Bank’s work. The WBG Taxonomy provides an enterprise-wide, application-independent framework for describing all of the Bank’s areas of expertise and knowledge domains, current as well as historical, representing the language used by domain experts and domain novices, and Bank staff and Bank clients.

Available in TriG, N-Triples, RDF/XML, Turtle.

A total of 1560 concepts.

You did hear about the JP Morgan Twitter debacle, JPMorgan humiliates itself in front of all of Twitter?

My favorite tweet (from memory) was: “Does the sleeze come off with a regular shower or does it require something special, like babys’ tears?”

In light of JP Morgan’s experience, why not ask citizens of countries with World Bank debt:

What needs to be added to the “World Bank Global Topical Taxonomy?

For example:

Budget Transparency – No content other than broader concepts.

Two others at random:

ICT and Social Accountability – No content other than broader concepts. (ICT = Information and Communication Technologies)

Rural Poverty and Livelihoods – No content other than one broader concept.

Do you think vague categories result in avoidance of accountability and corporate responsibility?

So do I.

I first saw this in a tweet by Pool Party.

Getting $erious about $emantics

Thursday, June 27th, 2013

State Street’s Chief Scientist on How to Tame Big Data Using Semantics by Bryan Yurcan.

From the post in Bank Systems & Technology:

Financial institutions are accumulating data at a rapid pace. Between massive amounts of internal information and an ever-growing pool of unstructured data to deal with, banks’ data management and storage capabilities are being stretched thin. But relief may come in the form of semantic databases, which could be the next evolution in how banks manage big data, says David Saul, Chief Scientist for Boston-based State Street Corp.

The semantic data model associates a meaning to each piece of data to allow for better evaluation and analysis, Saul notes, adding that given their ability to analyze relationships, semantic databases are particularly well-suited for the financial services industry.

“Our most important asset is the data we own and the data we act as a custodian for,” he says. “A lot of what we do for our customers, and what they do with the information we deliver to them, is aggregate data from different sources and correlate it to make better business decisions.”

Semantic technology, notes Saul, is based on the same technology “that all of us use on the World Wide Web, and that’s the concept of being able to hyperlink from one location to another location. Semantic technology does the same thing for linking data.”

Using a semantic database, each piece of data has a meaning associated with it, says Saul. For example, a typical data field might be a customer name. Semantic technology knows where that piece of information is in both the database and ununstructured data, he says. Semantic data would then allow for a financial institutions to create a report or dashboard that shows all of their interactions with that customer.

“The way it’s done now, you write data extract programs and create a repository,” he says. “There’s a lot of translation that’s required.”

Semantic data can also be greatly beneficial for banks in conducting risk calculations for regulatory requirements, Saul adds.

“That is something regulators are constantly looking for us to do, they want to know what our total exposure is to a particular customer or geographic area,” he says. “That requires quite a bit of development effort, which equals time and money. With semantic technology, once you describe the data sources, you can do that very, very quickly. You don’t have to write new extract programs.”


When banks and their technology people start talking about semantics, you know serious opportunities abound.

A growing awareness of the value of the semantics of data and data structures can’t help but create market opportunities for topic maps.

Big data needs big semantics!

Intrade Archive: Data for Posterity

Wednesday, April 3rd, 2013

Intrade Archive: Data for Posterity by Panos Ipeirotis.

From the post:

A few years back, I have done some work on prediction markets. For this line of research, we have been collecting data from Intrade, to perform our experimental analysis. Some of the data is available through the Intrade Archive, a web app that I wrote in order to familiarize myself with the Google App Engine.

In the last few weeks, through, after the effective shutdown of Intrade, I started receiving requests on getting access to the data stored in the Intrade Archive. So, after popular demand, I gathered all the data from the Intrade Archive, and also all the past data that I had about all the Intrade contracts going back to 2003, and I put them all on GitHub for everyone to access and download.

If you don’t know about Intrade, see: How Intrade Works.

Not sure why you would need the data but it is unusual enough to merit notice.

Principles for effective risk data aggregation and risk reporting

Sunday, January 13th, 2013

Basel Committee issues “Principles for effective risk data aggregation and risk reporting – final document”

Not a very inviting title is it? 😉

Still, the report is important for banks, enterprises in general (if you take out the “r” word) and illustrates the need for topic maps.

From the post:

The Basel Committee on Banking Supervision today issued Principles for effective risk data aggregation and risk reporting.

The financial crisis that began in 2007 revealed that many banks, including global systemically important banks (G-SIBs), were unable to aggregate risk exposures and identify concentrations fully, quickly and accurately. This meant that banks’ ability to take risk decisions in a timely fashion was seriously impaired with wide-ranging consequences for the banks themselves and for the stability of the financial system as a whole.

The report goes into detail but the crux of the problem is contained in: “…were unable to aggregate risk exposures and identify concentrations fully, quickly and accurately.”

Easy said than fixed but the critical failure was the inability to reliable aggregate data. (Where have you heard that before?)

Principles for effective risk data aggregation and risk reporting (full text) is only twenty-eight (28) pages and worth reading in full.

Of the fourteen (14) principles, seven (7) of them could be directly advanced by the use of topic maps:

Principle 2 Data architecture and IT infrastructure – A bank should design, build and maintain data architecture and IT infrastructure which fully supports its risk data aggregation capabilities and risk reporting practices not only in normal times but also during times of stress or crisis, while still meeting the other Principles….

33. A bank should establish integrated 16 data taxonomies and architecture across the banking group, which includes information on the characteristics of the data (metadata), as well as use of single identifiers and/or unified naming conventions for data including legal entities, counterparties, customers and accounts.

16 Banks do not necessarily need to have one data model; rather, there should be robust automated reconciliation procedures where multiple models are in use.

Principle 3 Accuracy and Integrity – A bank should be able to generate accurate and reliable risk data to meet normal and stress/crisis reporting accuracy requirements. Data should be aggregated on a largely automated basis so as to minimise the probability of errors….

As a precondition, a bank should have a “dictionary” of the concepts used, such that data is defined consistently across an organisation. [What about across banks/sources?]

Principle 4 Completeness – A bank should be able to capture and aggregate all material risk data across the banking group. Data should be available by business line, legal entity, asset type, industry, region and other groupings, as relevant for the risk in question, that permit identifying and reporting risk exposures, concentrations and emerging risks….

A banking organisation is not required to express all forms of risk in a common metric or basis, but risk data aggregation capabilities should be the same regardless of the choice of risk aggregation systems implemented. However, each system should make clear the specific approach used to aggregate exposures for any given risk measure, in order to allow the board and senior management to assess the results properly.

Principle 5 Timeliness – A bank should be able to generate aggregate and up-to-date risk data in a timely manner while also meeting the principles relating to accuracy and integrity, completeness and adaptability. The precise timing will depend upon the nature and potential volatility of the risk being measured as well as its criticality to the overall risk profile of the bank. The precise timing will also depend on the bank-specific frequency requirements for risk management reporting, under both normal and stress/crisis situations, set based on the characteristics and overall risk profile of the bank….

The Basel Committee acknowledges that different types of data will be required at different speeds, depending on the type of risk, and that certain risk data may be needed faster in a stress/crisis situation. Banks need to build their risk systems to be capable of producing aggregated risk data rapidly during times of stress/crisis for all critical risks.

Principle 6 Adaptability – A bank should be able to generate aggregate risk data to meet a broad range of on-demand, ad hoc risk management reporting requests, including requests during stress/crisis situations, requests due to changing internal needs and requests to meet supervisory queries….

(a) Data aggregation processes that are flexible and enable risk data to be aggregated for assessment and quick decision-making;

(b) Capabilities for data customisation to users’ needs (eg dashboards, key takeaways, anomalies), to drill down as needed, and to produce quick summary reports;

[Flexible merging and tracking sources through merging.]

Principle 7 Accuracy – Risk management reports should accurately and precisely convey aggregated risk data and reflect risk in an exact manner. Reports should be reconciled and validated….

(b) Automated and manual edit and reasonableness checks, including an inventory of the validation rules that are applied to quantitative information. The inventory should include explanations of the conventions used to describe any mathematical or logical relationships that should be verified through these validations or checks; and

(c) Integrated procedures for identifying, reporting and explaining data errors or weaknesses in data integrity via exceptions reports.

Principle 8 Comprehensiveness – Risk management reports should cover all material risk areas within the organisation. The depth and scope of these reports should be consistent with the size and complexity of the bank’s operations and risk profile, as well as the requirements of the recipients….

Risk management reports should include exposure and position information for all significant risk areas (eg credit risk, market risk, liquidity risk, operational risk) and all significant components of those risk areas (eg single name, country and industry sector for
credit risk). Risk management reports should also cover risk-related measures (eg regulatory and economic capital).

You have heard Willie Sutton’s answer to: “Why do you rob banks, Mr. Sutton?”, Answer: “Because that’s where the money is.”

Same answer for: “Why write topic maps for banks?”

I first saw this at Basel Committee issues “Principles for effective risk data aggregation and risk reporting – final document” by Ken O’Connor.

OpenGamma updates its open source financial analytics platform [TM Opportunity in 2013]

Sunday, December 23rd, 2012

OpenGamma updates its open source financial analytics platform

From the post:

OpenGamma has released version 1.2 of its open source financial analytic and risk management platform. Released as Apache 2.0 licensed open source in April, the Java-based platform offers an architecture for delivering real-time available trading and risk analytics for front-office-traders, quants, and risk managers.
Version 1.2 includes a newly rebuilt beta of a new web GUI offering multi-pane analytics views with drag and drop panels, independent pop-out panels, multi-curve and surface viewers, and intelligent tab handling. Copy and paste is now more extensive and is capable of handing complex structures.
Underneath, the Analytics Library has been expanded to include support for Credit Default Swaps, Extended Futures, Commodity Futures and Options databases, and equity volatility surfaces. Data Management has improved robustness with schema checking on production systems and an auto-upgrade tool being added to handle restructuring of the futures/forwards database. The market and reference data’s live system now uses OpenGamma’s own component system. The Excel Integration module has also been enhanced and thanks to a backport now works with Excel 2003. A video shows the Excel module in action:

Integration with OpenGamma billed by OpenGamma as:

While true green-field development does exist in financial services, it’s exceptionally rare. Firms already have a variety of trade processing, analytics, and risk systems in place. They may not support your current requirements, or may be lacking in capabilities/flexibility; but no firm can or should simply throw them all away and start from scratch.

We think risk technology architecture should be designed to use and complement systems already supporting traders and risk managers. Whether proprietary or vendor solutions, considerable investments have been made in terms of time and money. Discarding them and starting from scratch risks losing valuable data and insight, and adds to the cost of rebuilding.

That being said, a primary goal of any project rethinking analytics or risk computation needs to be the elimination of all the problems siloed, legacy systems have: duplication of technology, lack of transparency, reconciliation difficulties, inefficient IT resourcing, etc.

The OpenGamma Platform was built from scratch specifically to integrate with any legacy data source, analytics library, trading system, or market data feed. Once that integration is done against our rich set of APIs and network endpoints, you can make use of it across any project based on the OpenGamma Platform.

A very valuable approach to integration, being able to access legacy or even current data sources.

But that leaves the undocumented semantics of data from those feeds on the cutting room floor.

The unspoken semantics of data from integrated feeds is like dry rot just waiting to make its presence known.

Suddenly and at the worst possible moment.

Compare that to documented data identity and semantics, which enables reliable re-use/merging of data from multiple sources.

So we are clear, I am not suggesting a topic maps platform with financial analytics capabilities.

I am suggesting incorporation of topic map capabilities into existing applications, such as OpenGamma.

That would take data integration to a whole new level.

Computational Finance with Map-Reduce in Scala [Since Quants Have Funding]

Wednesday, November 28th, 2012

Computational Finance with Map-Reduce in Scala by Ron Coleman, Udaya Ghattamaneni, Mark Logan, and Alan Labouseur. (PDF)

Assuming the computations performed by quants are semantically homogeneous (a big assumption), the sources of their data and application of the outcomes, are not.

The clients of quants aren’t interested in you humming “…its a big world after all…,” etc. They are interested in furtherance of their financial operations.

Using topic maps to make an already effective tool more effective, is the most likely way to capture their interest. (Short of taking hostages.)

I first saw this in a tweet by Data Science London.

Rethinking the Basics of Financial Reporting

Sunday, October 28th, 2012

Rethinking the Basics of Financial Reporting by Timo Elliott.

From the post:

The chart of accounts is one of the fundamental building blocks of finance – but it’s time to rethink it from scratch.

To organize corporate finances and track financial health, traditional financial systems typically use complex, rigid general ledger structures. The result is painful, unwieldy systems that are not agile enough to support the requirements of modern finance.

In the financial engines of the future, rigid “code block” architectures are eliminated, replaced by flexible in-memory structures. The result is a dramatic increase in the flexibility and speed general ledger entries can be stored and retrieved. Organizations can vastly simplify their chart of accounts and minimize or eliminate time-consuming and complex reconciliation, while retaining virtually unlimited flexibility to report on any business dimension they choose.

Tim’s point is quite sound.

Except that we all face the same “…complex, rigid general ledger structures.” None of us has an advantage over another in that regard.

Once we have the information in hand, we can and do create more useful representations of the same data, but current practice gets everyone off to an even start.

Or evenly disadvantaged if you prefer.

As regulators start to demand reporting that takes advantage of modern information techniques, how will “equality” of access be defined?

The Cost of Strict Global Consistency [Or Rules for Eventual Consistency]

Sunday, September 23rd, 2012

What if all transactions required strict global consistency? by Matthew Aslett.

Matthew quotes Basho CTO Justin Sheehy on eventual consistency and traditional accounting:

“Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.”

And Matthew comments:

The suggestion that bank transactions are not immediately consistent appears counter-intuitive. Comparing what happens in a transaction with a jointly atomic change in value, like buying a house, with what happens in normal transactions, like buying your groceries, we can see that for normal transactions this statement is true.

We don’t need to wait for the funds to be transferred from our accounts to a retailer before we can walk out the store. If we did we’d all waste a lot of time waiting around.

This highlights a couple of things that are true for both database transactions and financial transactions:

  • that eventual consistency doesn’t mean a lack of consistency
  • that different transactions have different consistency requirements
  • that if all transactions required strict global consistency we’d spend a lot of time waiting for those transactions to complete.

All of which is very true but misses an important point about financial transctions.

Financial transactions (involving banks, etc.) are eventually consistent according to the same rules.

That’s no accident. It didn’t just happen that banks adopted ad hoc rules that resulted in a uniform eventual consistency.

It didn’t happen over night but the current set of rules for “uniform eventual consistency” of banking transactions are spelled out by the Uniform Commercial Code. (And other laws, regulations but that is a major part of it.)

Dare we say a uniform semantic for financial transactions was hammered out without the use of formal ontologies or web addresses? And that it supports billions of transactions on a daily basis? To become eventually consistent?

Think about the transparency (to you) of your next credit card transaction. Standards and eventual consistency make that possible.

R for Quants

Monday, February 13th, 2012

R for Quants, Part I.A by Brian Lee Yung Rowe.

From the post:

I’m teaching an R workshop for the Baruch MFE program. This is the first installment of the workshop and focuses on some basics, although we assume you already know how to program.

A good way to pick up R or if you already know R, some insight into the use of R in finance/financial settings.

(MFE = Master of Financial Engineering)

Visualization of’s Loan Data Part I of II….

Sunday, December 11th, 2011

Visualization of’s Loan Data Part I of II – Compare and Contrast with Lending Club

From the post:

Due to the positive feedback received on this post I thought I would re-create the analysis on another peer-to-peer lending dataset, courtesy of You can access the Prosper Marketplace data via an API or by simply downloading XML files that are updated nightly

Interesting work both for data analysis as well as visualization.

Finance data and financial markets are all the rage these days, mostly because the rationally self-interested managed to trash them so badly. I thought this might be a good starting point for any topic mapping activities in the area.

Financial Data Analysis and Modeling with R (AMATH 542)

Tuesday, November 29th, 2011

Financial Data Analysis and Modeling with R (AMATH 542)

From the webpage:

This course is an in-depth hands-on introduction to the R statistical programming language ( for computational finance. The course will focus on R code and code writing, R packages, and R software development for statistical analysis of financial data including topics on factor models, time series analysis, and portfolio analytics.

Topics include:

  • The R Language. Syntax, data types, resources, packages and history
  • Graphics in R. Plotting and visualization
  • Statistical analysis of returns. Fat-tailed skewed distributions, outliers, serial correlation
  • Financial time series modeling. Covariance matrices, AR, VecAR
  • Factor models. Linear regression, LS and robust fits, test statistics, model selection
  • Multidimensional models. Principal components, clustering, classification
  • Optimization methods. QP, LP, general nonlinear
  • Portfolio optimization. Mean-variance optimization, out-of-sample back testing
  • Bootstrap methods. Non-parametric, parametric, confidence intervals, tests
  • Portfolio analytics. Performance and risk measures, style analysis

A quick summary:

Status: Open
Start Date: 1/4/2012
End Date: 3/19/2012
Credits: 4 Credits
Learning Format: Online
Location: Web
Cost: $3,300

Particularly if your employer is paying for it, this might be a good way to pick up some R skills for financial data work. And R will be useful if you want to mine financial data for topic map purposes. Although, transparency and finance aren’t two concepts that occur together very often. In my experience, setting disclosure requirements means people can walk as close to the disclosure line as they dare.

In other words, disclosure requirements function as disclosure limits, with the really interesting stuff just on the other side of the line.

Lab 49 Blog

Tuesday, November 1st, 2011

Lab 49 Blog

From the main site:

Lab49 is a technology consulting firm that builds advanced solutions for the financial services industry. Our clients include many of the world’s largest investment banks, hedge funds and exchanges. Lab49 designs and delivers some of the most sophisticated and forward-thinking financial applications in the industry today, and has an impeccable delivery record on mission critical systems.

Lab49 helps clients effect positive change in their markets through technological innovation and a rich fabric of industry best practices and first-hand experience. From next-generation trading platforms to innovative risk aggregation and reporting systems to entirely new investment ventures, we enable our clients to realize new business opportunities and gain competitive advantage.

Lab49 cultivates a collaborative culture that is both innovative and delivery-focused. We value intelligent, experienced, and personable engineering professionals that work with clients as partners. With a proven ability to attract and retain industry-leading engineering talent and to forge and leverage valued partnerships, Lab49 continues to innovate at the vanguard of software and technology.

A very interesting blog sponsored by what appears to be a very interesting company, Lab 49.