The ERP and the CRM directories
contain information about the two case studies, respectively.
In each directory, the following files are present:
- RawWithoutCRUD.txt. This contains "raw" similarities between all
pairs of files and services.
- RawWithCRUD.txt. This also contains "raw" similarities, but computed
while treating certain groups of semantically equivalent words (which are
specified manually) as though they are syntactically identical. This
specification is done using domain knowledge, and improves the raw
similarity scores. The tool results were obtained using these raw
similarities, and not using the raw similarities in
RawWithoutCRUD.txt.
The groups of tokens, for ERP, are {create, insert},
{read, load}, {update, change}, and {delete, terminate}. For CRM, they
are {new, add, save}, {read, find, view}, {edit, change}, and {delete,
cancel}.
- The file FinalSimilaritiesCombined.txt contains the similarity
scores computed by our tool, when both the hierarchical and the workflow
relationships are used.
- ManualMatching.txt contains the manually created gold-standard
matching. The format of these files are as
follows. Each file is divided into clusters, separated by lines containing
only "#" characters. Within each cluster all the services mentioned
(beginning with "^" symbols) match with all the source file names that
follow. (The files that appear in clusters without any services are
unmatched files.)
- The file *DomainModel.txt
contains the SAP domain model, converted by us
into a textual form. The file shows the groupings, collections, and
services in the domain model hierarchically.