Informatics Musings: September 2011

Prediction is very difficult, especially about the future.

~ Neils Bohr (1885-1962), Danish physicist

The purpose of any business intelligence tool or data analysis exercise is to adequately understand the past in order to give some level of understanding of the future. The success rate of predictions based on such data can typically be pretty low, because as the mutual fund industry likes to say, "Past performance is no guarantee of future results." Circumstances change, and the past performance was dependent on the circumstances in effect at that time.

However, if your analysis is on very recent data, it may actually start showing how circumstances are beginning to change and you may have time to react before they change again. Timeliness of data and of analysis becomes key.

The internet has created the possibility of much faster delivery of data on consumer behaviour than traditional company data warehouses permit. Google makes its search engine statistics available for free under the Google Insights For Search banner. A recent study by the Bank of England looks at using Google search volumes to monitor unemployment rates and housing prices. The authors found that both could be inferred quite well by monitoring certain search phrases, providing a much more timely measurement of the economic situation than current indicators do. The difficult part is discovering which phrases to monitor. For instance, searches on the word "unemployed" or on the UK unemployment benefits program ("JSA") tracked the official unemployment rate remarkably well, but searches on "jobs" alone did not correlate. The only way to discover those search terms that are predictive is by trial and error. Further, once you find a phrase that works, there is no way to know how long it will remain predictive, as new terms could emerge in the future that supersede it. While this method of near-real-time economic tracking looks promising, it is not without its challenges.

Another fascinating approach to using the internet to predict the future in near-real-time is www.recordedfuture.com. Like Biff in Back To The Future II, this company seems to admit the true motivation for predicting the future: getting rich! In the movie, Biff finds a record of horse race results from the future and cashes in by betting big. Recorded Future monitors news feeds, databases, publications, and websites and summarizes events on the topic you want (such as a particular company). Based on historic behaviour, its analytic engine tries to predict where the trends are going. The main customers seem to be stock investors who want to know the direction a company's stock will be heading in the coming days and weeks.

For instance, if the analytic engine sees an increase in discussions about a possible dividend increase, it will interpret it as positive sentiment and predict an upward trend. If it sees more talk about missing revenue targets, it will interpret it as negative sentiment and trend things downward. Sometimes lack of activity is also predictive. If two companies in the same industry stop issuing press releases for a couple weeks, could it mean they are in merger talks? It's a fascinating use of the internet and analytics.

The whole thing leads me to ask a couple questions:

1) If Recorded Future's analytic engine is so good, why didn't they keep it to themselves and get rich, like Biff did? If they felt they could make more money selling the IDEA of predicting the future, it suggests the engine is not as precise as one might hope.

2) If many people have access to this predictive power, doesn't it defeat the purpose? Biff got rich because he was the only person with the future horse race results. If everyone had those future results, no one would get rich because they'd all bet on the same horse and reduce the payout odds to nothing. If everyone in the stock market knows at the same time that a stock is going up, it's too late to make money on it. The ability to get rich depends on knowing a stock will go up when everyone else in the market thinks it's going down.

Despite all our progress in the internet age, I still think Neils Bohr's quote at the beginning of this article remains true: Predicting the future is still very difficult.

And if I invent an amazing algorithm to predict the future, you can be sure I'm not telling anybody about it! And if I stop posting on this blog, you can assume I've succeeded with my invention and I'm busy getting rich!

You can just call me Biff.

Last week John Zachman released v3.0 of his Zachman Framework (ZF). More on that shortly, but it certainly closes out an eventful year for John.

Last summer, John Zachman created some waves by suing his long-time Canadian colleague Stan Locke for $10 million. Stan initially said he would close up shop and retire, not wanting to challenge the lawsuit. At the time, there was some musings in the industry about the future of the framework, and especially the value of the certifications Zachman had been issuing.

Now that a year has passed, I can’t find any statements from either party as to what has transpired. From my reading of the California court records, it appears Stan did in fact fight the lawsuit and had it dismissed on jurisdictional grounds. John and Stan’s names appear side by side again on the Zachman International website, and over the past few weeks John and Stan have been teaching ZF courses again in Toronto, although they appear to be teaching separately. It would seem that they kissed and made up to some extent, and life goes on in the ZF world like nothing ever happened. And if that’s the case, good for them.

Almost a decade ago, I took a course on the Zachman Framework taught by both of these gentlemen. The ZF has developed a bit of a cult following in the enterprise architecture world, but in my travels I have yet to work in an organization that is deriving any real benefit from it. That doesn’t mean such organizations don’t exist – but it probably means there can’t be too many of them!

What Is The Zachman Framework?

The Zachman Framework is a grid that organizes different types of modeling artefacts, and is helpful in terms of clarifying how those different models relate together within an organization. John Zachman created the grid while working for IBM in the 1980’s. Across the top, he put the 5 W’s (and 1 H): What, How, Where, Who, When, and Why. Down the left side, he placed five levels or perspectives within the organization: Planner, Owner, Designer, Builder, and Sub-contractor. The premise is that a sub-contractor working on the minutiae of a system will require different models than a high-level planner would. For instance, a low-level detailed model of “who” might be a user list for a particular computer system, while a high-level view of “who” might be a management organization chart or a list of competitors. Both are valid models of “who” but they fit into different slots on the grid.

There have been a few versions of the framework, including last week’s brand new version 3.0. All of the original cells of the framework are still there, but additional descriptions have been added around the edges and some names have been revised. For instance, the What column originally contained Data artefacts, which limited the perspective to what data architects typically worked with. Now it contains Inventory artefacts, which is a different perspective on an organization’s answer to the What question, but it’s equally limited. The original Builder row is now called the Engineer Perspective (which I like), but the models the Engineer makes are now called Technology Physics. I have no idea what a Technology Physics model is! It sounds like what my university physics professors built in the machine shop when they wanted to measure cosmic rays or microgravity.

Given the structure of the model can never really change (unless you create a new question word to make it 6 W’s), all that John can do is describe more clearly what’s already there. I’d say version 3.0 only brings partial progress towards that goal. There is still a need for a version 4.0 someday, and Technology Physics should definitely not be in it!

Strengths and Weaknesses

So what is the Zachman Framework good for in an organization? I’ve found it helpful to see how the various types of models in an organization fit together, and which models haven’t been built yet. It is a simple structure so it’s easy to remember, unlike Scott Ambler’s modified version of ZF which is far more thorough but also far more difficult to understand.

After taking Zachman’s course, I immediately realized it wasn’t going to help my organization with the challenges we were facing. Developing more models, or slotting the models we already had into the grid would not change how our organization functioned. While it gives the data architects and modellers a false sense of progress in filling up the pigeon holes with models, it doesn’t affect the troops on the front lines.

Ultimately, the Zachman Framework is just another Dewey Decimal System. As Melvil Dewey helped organize information in libraries, Zachman helped organize models in an organization. Dewey organized all the knowledge contained in books, but it did not specify any methods for transmitting that knowledge to people. That required education methodologies, and Dewey’s system wasn’t a methodology. Neither is Zachman’s. As he now puts at the top of his v3.0 diagram, it’s “The Enterprise Ontology™“. An ontology is a structure for organizing concepts, not a methodology for improving an organization.

In order for models to change an organization, they need to be operationalized. The easiest way to do that (but not the only way) is to embed the model in a computerized process. For instance, if you have a policy to control spending by getting everyone’s immediate superior to approve each office equipment purchase, you can implement an organization chart (a “Who” model in ZF) into a purchasing system so that whenever an employee enters a request for new furniture, it looks up their name in the org chart and automatically sends the request to their immediate boss for approval. An implementation of such a model might eliminate some unnecessary spending, but having the organization chart alone will not save the company anything.

Zachman often repeats this platitude in his courses and articles:

“Someday, you are going to wish you had all those models made explicit, Enterprise-wide, horizontally and vertically integrated at excruciating level of detail.”1

The only problem with that assertion is that the “someday” never comes. I’ve been in departments and organizations that have gone through some major crises, and no one has ever said, “If only we had all those models!” A lack of explicit models can sometimes cause challenges, but the cost of building and maintaining all these models “at excruciating level of detail” across the organization is simply not justified.

The Zachman Framework is a helpful structure for organizing models, but not for transforming organizations.

1 John A. Zachman, “Enterprise Architecture Artifacts Vs Application Development Artifacts”, © 2000 Zachman International, downloaded from http://www.hl7.org/documentcenter/public/wg/secure/Zachman%20Enterprise%20Artifacts.rtf

Informatics Musings

Saturday, 24 September 2011

Predicting the Future Using Analytics

Saturday, 3 September 2011

Zachman Framework, New and Improved