Abstract:
Legacy systems are important in business but difficult to maintain. One of the causes of the difficulties is a large number of code clones in the systems; Those clones implement similar functionalities using common loop idioms in a company. Since the loop idioms have been developed to implement popular functionalities, most of them are likely to be translated into simple SQL statements in a new, modernized version of a system. To investigate the feasibility of the approach, we propose a method to automatically extract cloned loop idioms embedded in COBOL program files. We manually classified the extracted idioms and labeled them according to their functionalities. We evaluated the accuracy of our classification result with three experts.