Messaging, Middleware, and EDI

By Matthew Rapaport

Nov. 2002

 

What is EDI?

 

Electronic Data Interchange (EDI) means different things to different people. Like Chinese boxes, EDI’s meanings nest one within the other. At its most basic, EDI is any exchange of data, electronically, between two companies or large divisions of a single company. We don’t include interactive applications that take data directly from a user as EDI. The exchange has to be automated, happen behind the scenes without human intervention, and typically occurs according to some schedule, whether by clock, or as the result of some other event having taken place. At this level, it can be as plain as a file transfer, for example a daily list of purchases and totals transferred to a bank.

 

Next, the data has to “mean something” to applications at both ends of the exchange. Transferring a file isn’t enough. Processing must be automatic, and do something: fill a database, trigger another program. Some application must do something with the contents of that file, and again “in the background” without human intervention at least through a part of the processing phase. The same applies to the side that produces the data being exchanged. It’s an application that produces data, which is then automatically transferred to some receiving application. For this to happen, both applications must understand the data. This means either the originating application produces output in a form the receiving application can understand, or somewhere in the exchange, the format of the source is transformed into the format of the receiver. Data transformation is one of the core requirements of any serious EDI system.

 

The last level has to do with the formats through which the transformation takes place. So far it only matters that two companies agree on a specific data format for any given exchange between them.  This proved to be a cause of much wasted effort. “Proprietary formats” multiplied endlessly, as did the need for programs to do the transforming. The solution was to standardize the transfer format for any given document. International standards bodies did just that producing a large body of documents specified in a standard and easily machine parse-able way. EDIFACT internationally, a UN standard, and X12 in the U.S., were the dominant document exchange formats for many years. Documents as diverse as railroad waybills and residential loan applications received a consistent standard representation. Individual requirements for flexibility are built into the format so trading partners can add data not directly accounted for in the document specification. Today, XML is replacing X12 and EDIFACT as a data formatting standard. XML has the advantage of having infinite flexibility built into it because changes to the format can be specified in a DTD or schema with the data in exchange for some computational complexity.

 

There is still more to EDI than this, because added in to all of what occurs above is the need to audit and account for each transaction. Driving this are not merely reporting issues, but demands for authentication and non-repudiation. These, in turn, usually require the ability to process intercompany acknowledgments also specified in the X12/EDIFACT standards. This last is something that EDI interested bodies must add to specific XML transactions. Present-day EDI software like Sterling’s Gentran or St. Paul Software’s SPEDI  are complex packages, and much of their complexity is the result of these needs in large-scale EDI applications.

 

What is Middleware?

 

Remember from above what is important about EDI is that the whole exchange has meaning to applications at either end of it. What lies between those applications is “in the middle”, the domain of middleware!

 

There are a few different ingredients in middleware, and their functionality is not limited to supporting EDI. Many intracorporate business processes can benefit from one or more middleware components. EDI, however, makes use of all the various pieces associated with modern middleware.

 

Intracorporate, inter-server communications

 

The modern IT enterprise consists of multiple servers. Databases over here on Windows, Communications over there on Unix, web services in yet another place on Linux. Data going to a trading partner might come from the database or the web (having its own supporting database). Data from a partner appears on a communications server somewhere on your network for import to a database, or display on the web. Whether the data is coming from or going to an external company, or moving between applications inside a company, there has to be a way to send data extracted on one server, to any other.

 

Companies employ various techniques to bridge servers at this “data exchange” level. Sockets, ftp, networked file systems, have been used (sometimes all at the same time) to connect servers by “virtual wires” strung permanently from each machine to every other. Subject based messaging is the big advance contributed by modern middleware in this area. Think of subject based messaging as a data-exchange network layered on top of your existing physical network. On each machine, there is a small program that broadcasts messages (a message can consist of almost any data). Each message contains a subject, and travels throughout the network. Also on each machine, there is a program that listens to these broadcasts but acts only on messages having subjects of interest to them. A data producing application need not be aware of what machines on the network are home to applications having some interest in that data. It only has to know what subject to place in the messages subject header field (analogous to the subject of an email), and the receiver on the right machine will automatically pick up and act on the message!


Workflow management

 

The messaging layer insures there is a simple and reliable way to move data to proper handlers no matter where they may be on the network. But which handlers will get any given application’s data, and having gotten it, what will they do next? This is the province of workflow management. EDI workflows often have many steps and parallel branches in their workflow (for example archiving data, or sending acknowledgments). Scripts can often handle workflow chores satisfactorily. Script languages were created for this purpose. Modern middleware provides GUI-based applications that allow corporations to specify data connections and processing by dragging and dropping connections between processing steps (other applications). These applications then produce code (usually Java) that carries out the workflow rules. By isolating workflow, data producing applications need not know even the subject needed by the correct message receiver. The workflow program takes care of adding the right subject to the data.

 

Data conversion

 

This is one of the original and most basic EDI roles. Whether a standard or proprietary data exchange format is used, some data transformation must usually take place. Traditional EDI packages are mainly data conversion engines coupled with workflow support (because handling data before and after the change-over may require many steps to satisfy audit, and other needs). Modern middleware packages have their own conversion applications that may be satisfactory for any given exchange. Middleware application suites are written in such a way as to allow substitution of other applications where necessary features are not supported. Compared with existing EDI applications, this is the weakest arena for modern middleware.

 

Data communication

 

Converted data must at some point be sent to a trading partner. Some program must watch for data from the other side of a trading partner arrangement. This is the domain of communications software of one kind or another whether proprietary sockets over some external IP network, ftp, https, email, or some other communications software running on one or both sides of the connection. Modern middleware products contain their own communication software modules. Most understand the most common exchange protocols, and can interoperate with trading partners who are not running identical software at their end. As with data conversion, a corporation may want to employ its own data communications applications, but middleware competes a little better here because data communications protocols are more standardized than are data exchange formats.

 

Fig 1. Illustrates middleware components as they might operate from different servers over a single network-wide message bus.

 

Messaging and EDI

 

In the mid 1990’s I ran an EDI department for a financial services company. There were a dozen different kinds of electronic documents we exchanged with trading partners, but one in particular was unusual. This document, together with its response documents, accounted for three quarters of the 40,000 documents exchanged every week at this particular company.  Yet while the smaller quarter of all these exchanges were traditional EDI that did not need near real-time transmission and response, the special document did indeed demand near real-time (typically 30 seconds) response, and in parallel across the entire company. If users requested 100 of these documents in the same minute, the application specification required that 100 printed documents, and 100 sets of database inserts/updates (each involving dozens of fields in multiple tables) would all take place within 30 seconds.

 

 Data communications used proprietary sockets over IP networks. They were the only mechanism fast enough to handle the volume. We wrote a parallel workflow in which multiple socket connections to the same trading partner could be established simultaneously.  Modern communications packages, either stand alone or those included in middleware collections today cannot do this.

 

The same proved to be the case with data conversion. Middleware data converters were on the drawing boards, but neither they nor the dedicated large-scale EDI applications could transform the same document, bound for (or received from) the same trading partner, in parallel. None of them would meet the performance demands of that application. This also remains true today!

 

GUI based workflow generators are problematic in middleware. They didn’t exist in the mid 1990’s of course, so we wrote our workflow the traditional way with scripts; complete with auditing, error handling, and real-time notice to users or IT staff in the event of a problem. Today, workflow generators are expensive for many applications, and their product (usually Java programs) is more difficult to maintain and extend than shell or perl scripts! Other than rapid development of relatively simple workflows, there is not much value added by these modules.

 

That leaves messaging…

 

What about the internal need for moving data from one server to another? We used internal sockets identical with those used for the communications hop between ourselves and our trading partners. This was cumbersome to support given that we needed a separate socket connection between every machine, and every other machine where we might use some service. A new connection meant a small project creating and testing the connection (functioning the socket ends) across a complex network. Add the new socket to the pool of jobs to auto start (on two servers), add it to the monitor/keep running list, notify someone if a problem, etc.  

 

If I had the whole project to do again, the only new technology I would want to add to my previous mix would be a subject messaging layer. With a universal messaging layer in place across all servers in the enterprise, connecting applications and workflow scripts across the network becomes easy. A new application connection needs nothing more than a new subject, and in the right listener, a pointer to an application (or workflow script) to run on receiving a message with that subject. Instead of having to build a connection across different machine environments, one counts on the connection already being present, and merely adding a subject to a list of subjects to invoke the new response.

 

The messaging layer is the most important innovation of modern middleware. It performs the most basic work of middleware, moving data easily from one server to any other, and is its fundamental building block. A wide variety of applications could benefit from a subject based messaging layer. GUI based workflow is an innovation, but results are counterproductive because the software is expensive and either limits you to what the GUI can do, or forces you to do Java enhancements compared with much simpler shell or perl scripts. Perl can interact with the message layer. This lets simple workflow scripts to control large inter-server data flows using messages. All the other middleware components are weak sisters of their more powerful dedicated brethren. If they do the job fine, but they are often expensive solutions for the simple jobs they can do.

 

There is potential for a messaging layer in any environment that has many servers running different operating systems that must use one another’s services. A messaging layer supports simplification of even proprietary workflow components because it supports passing both data and commands. With messaging software, it is not necessary to have one inter-server pathway for data and another (like Java’s RMI) for commands. Messaging helps achieve the objective of looser component coupling while providing a consistent connection for transaction acknowledgments between applications at the same time.  It lends itself to universal, centralized auditing of all data exchange transactions, a difficult or impossible achievement otherwise. Since the messaging layer is a unified application running across the entire network, it also oversees and reports on itself, on the health of its individual parts running on separate servers.

 

Messaging makes so many interapplication data exchanges so much more convenient, there is no excuse for not deploying it in complex environments unless there is a performance issue. Messaging does, after all, add its share of overhead, but this should not be a problem in modern hundred megabit or gigabit networks.