Dissertationsverteidigung von Béla Gipp

02.09.2013 -  

Am 02.09.2013 hat Béla Gipp erfolgreich seine Dissertation zum Thema "Citation-based Plagiarism Detection - Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications" verteidigt. Die Arbeit wurde in Kooperation mit der UC Berkeley, Prof. Dr. James Pitman und Prof. Dr. Erik Wilde, betreut und mit "Summa Cum Laude" bewertet. Die Promotionskommission bestand aus dem Vorsitzenden Prof. Dr. Bernhard Preim (FIN-ISG), den Gutachtern Prof. Dr. Andreas Nürnberger (FIN-ITI), Prof. Dr. Debora Weber-Wulff (HTW Berlin) und Prof. Dr. Birger Larsen (Royal School of Library and Information Science, Kopenhagen, Dänemark) sowie dem Kommissionsmitglied Prof. Dr. Klaus Turowski (FIN-ITI). Wir gratulieren dir ganz herzlich!

 

Abstract

This doctoral thesis addresses a problem in information retrieval, which has recently captured the attention of media - the software-based detection of disguised plagiarism forms. State-of-the-art plagiarism detection methods are capable of identifying copy & paste, and to some extent, lightly disguised plagiarism. However, even today’s best performing systems cannot reliably identify more heavily disguised forms of plagiarism, including paraphrases, translated plagiarism, or idea plagiarism. This weakness of current systems results in a large percentage of disguised scientific plagiarism going undetected. While the easily recognizable copy & paste-type plagiarism typically occurs among students and has no serious consequences for society, disguised plagiarism in the sciences, such as plagiarized medical studies in which results are copied without the corresponding experiments having been performed, can jeopardize patient safety.

To address the weakness of current plagiarism detection systems, this thesis introduces Citation-based Plagiarism Detection (CbPD). Unlike existing character-based approaches, which perform text comparisons, CbPD does not consider text similarity alone, but uses citation patterns within documents as a unique, language-independent "semantic fingerprint" to identify potentially suspicious similarity among texts. The idea for CbPD originated from the observation that plagiarists commonly disguise academic misconduct by paraphrasing copied text, but typically do not substitute or significantly rearrange the citations. Motivated by these findings, the author developed various CbPD algorithms tailored to the different forms of plagiarism, and implemented them in the first citation-based plagiarism detection prototype capable of detecting heavily disguised plagiarism.

The advantages of the CbPD approach were demonstrated in evaluations using three document collections. CbPD’s applicability for detecting strongly disguised plagiarism was first demonstrated using the plagiarized thesis of former German Minister of Defense, K.-T. zu Guttenberg. While conventional approaches failed to detect a single instance of translated plagiarism in this thesis, CbPD identified 13 of the 16 translations. The effectiveness of the approach was further demonstrated when applied to other authors and plagiarism forms in the VroniPlag Wiki.

The practicality of the CbPD approach was demonstrated by the successful identification of several plagiarism cases in the biomedical publication collection PubMed Central Open Access Subset (PMC OAS). As a result of a user study utilizing the CbPD prototype, six plagiarism investigations have thus far been initiated and an additional medical study has since been retracted. The evaluation also showed CbPD’s visualization of citation pattern similarities to facilitate the verification of plagiarism. Additionally, it could be shown that CbPD has a superior computational efficiency compared to existing methods, and produced significantly fewer false positives. CbPD is not a substitute, but rather a complement, to existing methods. A combination of CbPD with existing methods into a hybrid system promises to ensure optimal detection of both short literal plagiarism, as well as heavily disguised or translated plagiarism.

Weitere Informationen finden Sie auf: www.citeplag.org

Letzte Änderung: 10.09.2013 - Ansprechpartner: