Companies now have process databases, which can easily include several hundred to thousands or even tens of thousands of models. The handling of this data volume is increasingly challenging for the companies, for example in areas of model search, model update and model analysis. In the sense of a consistent data retention and modelling, the models that are similar or the same and which have a significant proportion of corresponding nodes and edges are particularly interesting. The RefMod-Miner has a variety of techniques for process matching and thus allows the (partially) automated identification of these correspondences.
How the RefMod-Miner works:
Within the RefMod-Miner, an analysis of the different node labels is carried out during the process matching, these are examined for linguistic peculiarities such as, for example, synonyms, homonyms, antonyms, etc. To identify correspondences, techniques to analyse the process flow are used as well. The spectrum of approaches used, ranges from simple linguistic and machine-sensory approaches, to the use of lexical databases and cluster procedures. On the basis of these analysis results, the RefMod-Miner colours similar nodes of two or more models, leaving nodes with no similarity uncoloured (see figure). The results of the process matching are a basis for further applications within the RefMod-Miner – for example for the inductive generation of reference models or the Clone Detection.
Problems and general conditions:
• A manual process matching is always possible. However, this will be very time-consuming, since, strictly speaking, all accounts must be compared. Example: 500 models with an average of 40 nodes are matched. This results in 20,000 nodes, which means: 20,000 * 20,000 / 2 – 20,000 (since symmetrical and reflexive) = 199,980,000. Thus some 200 million accounts have to be compared.
• The meaning of the nodes usually only reveals itself through the context in which they were embedded; But if they are isolated, they are difficult to understand and / or can be interpreted differently. Thus, the node context is not visible in an isolated comparison since the graph structures are not clear.
• A formal theory for designations of models  does not yet exist, even if there are already approaches that can be used. This fact is particularly difficult when company-specific terminology is used.
• Automated procedures are basically possible. If the terminology (homonyms, synonyms, language blurring) and the type of identifiers are not coordinated, automated procedures can quickly reach their limits. , 
 cf. Fettke, Peter: Analyse eines objektorientierten verteilten Basissystems zur Realisierung eines dezentralen PPS-Systems aus Anwendungs- und Implementierungssicht. PhD-Thesis, Westfälische Wilhelms-Universität Münster, 1996.
 cf. Thaler, Tom; Hake, Philip; Fettke, Peter; Loos, Peter: Evaluating the Evaluation of Process Matching Techniques, In: Leena Suhl; Dennis Kundisch (Hrsg.). Tagungsband der Multikonferenz Wirtschaftsinformatik. Multikonferenz Wirtschaftsinformatik (MKWI-14), February 26-28, Paderborn, Germany, Universität Paderborn, 2/2014.
 cf. Hake, Philip; Fettke, Peter; Loos, Peter: Experimentelle Evaluation automatisierter Verfahren zur Bestimmung der Ähnlichkeit von Knoten in Geschäftsprozessmodellen, In: Leena Suhl; Dennis Kundisch (Hrsg.). Tagungsband der Multikonferenz Wirtschaftsinformatik. Multikonferenz Wirtschaftsinformatik (MKWI-14), February 26-28, Paderborn, Germany, Universität Paderborn, 2/2014.
See also our student project: