Dissertation Defence Béla Gipp
On 02.09.2013, Béla Gipp has successfully defended his dissertation with the topic "Citation-based Plagiarism Detection - Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications". The thesis was supervised in cooperation with UC Berkeley, Prof. James Pitman and Professor Erik Wilde, and was graded with "Summa Cum Laude". The dissertation committee was as follows: chairman Prof. Dr. Bernhard Preim (FIN-ISG), the reviewers Prof. Dr. Andreas Nürnberger (FIN-ITI), Prof. Dr. Debora Weber-Wulff (HTW Berlin), and Prof. Dr. Birger Larsen (Royal School of Library and Information Science, Kopenhagen, Dänemark) as well as Prof. Dr. Klaus Turowski (FIN-ITI) as further member. We congratulate you sincerly!
Abstract
This doctoral thesis addresses a problem in information retrieval, which has recently captured the attention of media - the software-based detection of disguised plagiarism forms. State-of-the-art plagiarism detection methods are capable of identifying copy & paste, and to some extent, lightly disguised plagiarism. However, even today’s best performing systems cannot reliably identify more heavily disguised forms of plagiarism, including paraphrases, translated plagiarism, or idea plagiarism. This weakness of current systems results in a large percentage of disguised scientific plagiarism going undetected. While the easily recognizable copy & paste-type plagiarism typically occurs among students and has no serious consequences for society, disguised plagiarism in the sciences, such as plagiarized medical studies in which results are copied without the corresponding experiments having been performed, can jeopardize patient safety.
To address the weakness of current plagiarism detection systems, this thesis introduces Citation-based Plagiarism Detection (CbPD). Unlike existing character-based approaches, which perform text comparisons, CbPD does not consider text similarity alone, but uses citation patterns within documents as a unique, language-independent "semantic fingerprint" to identify potentially suspicious similarity among texts. The idea for CbPD originated from the observation that plagiarists commonly disguise academic misconduct by paraphrasing copied text, but typically do not substitute or significantly rearrange the citations. Motivated by these findings, the author developed various CbPD algorithms tailored to the different forms of plagiarism, and implemented them in the first citation-based plagiarism detection prototype capable of detecting heavily disguised plagiarism.
The advantages of the CbPD approach were demonstrated in evaluations using three document collections. CbPD’s applicability for detecting strongly disguised plagiarism was first demonstrated using the plagiarized thesis of former German Minister of Defense, K.-T. zu Guttenberg. While conventional approaches failed to detect a single instance of translated plagiarism in this thesis, CbPD identified 13 of the 16 translations. The effectiveness of the approach was further demonstrated when applied to other authors and plagiarism forms in the VroniPlag Wiki.
The practicality of the CbPD approach was demonstrated by the successful identification of several plagiarism cases in the biomedical publication collection PubMed Central Open Access Subset (PMC OAS). As a result of a user study utilizing the CbPD prototype, six plagiarism investigations have thus far been initiated and an additional medical study has since been retracted. The evaluation also showed CbPD’s visualization of citation pattern similarities to facilitate the verification of plagiarism. Additionally, it could be shown that CbPD has a superior computational efficiency compared to existing methods, and produced significantly fewer false positives. CbPD is not a substitute, but rather a complement, to existing methods. A combination of CbPD with existing methods into a hybrid system promises to ensure optimal detection of both short literal plagiarism, as well as heavily disguised or translated plagiarism.
More information can be found here: www.citeplag.org