This corpus consists of 145,820 Amharic-English parallel sentences (segments) from various sources. This corpus is larger in size than previously compiled corpora. It is released for research purposes and can be used to train or support Amharic-English machine translation systems.


All the documents in the corpus are documents which have been made publicly available in the Web. The corpus has been obtained by crawling the Web. In this distribution, for copyright reasons, the sentences are randomized. By downloading this corpus you agree that the corpus should only be used for research purposes.


When using this data, please cite the original publication:

Andargachew Mekonnen Gezmu, Andreas Nürnberger, and Tesfaye Bayu Bati. 2022. Extended Parallel Corpus for Amharic-English Machine Translation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6644–6653, Marseille, France. European Language Resources Association.

You may use the following BibTex:

    title = "Extended Parallel Corpus for {A}mharic-{E}nglish Machine Translation",
    author = {Gezmu, Andargachew Mekonnen  and
      N{\"u}rnberger, Andreas  and
      Bati, Tesfaye Bayu},
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "",
    pages = "6644--6653",


For more details about the corpus, refer to the original publication.

