|
Redundancy Detection in Service-Oriented Systems |
|
|
A paper by myself and my colleague, Marlon Dumas at University of Tartu, got recently accepted a paper to WWW 2010 - the highly competitive top conferences on Web technologies. The paper addresses the problem of identifying redundant data in large-scale service-oriented information systems. More specifically, the paper puts forward an automated method to pinpoint potentially redundant data attributes from a given collection of semantically annotated Web service interfaces. The key idea is to construct a service net- work to represent all input and output dependencies between data attributes and operations captured in the service interfaces, and to apply centrality measures from network theory in order to quantify the degree to which an attribute belongs in a given subsystem. The proposed method was tested on Estonian federated governmental information system X-Road consisting of 58 independently-maintained information systems providing altogether about 1000 service operations described in WSDL. The accuracy of the method has been evaluated in terms of precision and recall.

The preceding figure summarizes our experimental results which topped with F-score 0.89. From the figure we can observe the trade-off that occurs between precision and recall when varying with fine-tuning parameters. Essentially, the recall of the classifier can reach around 99%, if certain condition hold. From practical point of view, if an attribute could reasonably qualify as redundant, the classifier will find it. At around 80%, the precision is not optimal, but arguably still acceptable. One could argue that higher precision (at close to 100% recall) would be difficult to attain, given the subjectivity underpinning the notion of redundancy.
If you are interested in learning more about the approach, either wait for WWW 2010 or contact me personally:) Anyway, see you at WWW 2010;)
 |