Machine-translating legal language: error analysis on an Italian-German corpus of decrees
Keywords:
machine translation evaluation, legal language, legal terminology, gender biasAbstract
The paper analyzes the most frequent error categories in a bidirectional corpus of machine-translated decrees in the language combination Italian-South Tyrolean German. The aim is to assess translation issues when using a fine-tuned machine translation (MT) system to produce legal texts in an Italian province where German is an officially recognized minority language, and the local legal language differs from that used within other German-speaking legal systems. Our fine-tuned MT system struggles with features that are typical for the legal language, e.g., legal phraseology, legal terminology (especially the specific local South Tyrolean terminology), and gender-sensitive language. The latter is a requirement for local legislation. The errors identified shed light on the need to feed MT systems with terminological information, especially for low-resource language varieties such as South Tyrolean German. We consider our results key information for the training of post-editors, professional translators, and non-professional translators working in multilingual public administrations.
References
Ait ElFqih, K., & Monti, J. (2023). On the Evaluation of Terminology Translation Errors in NMT and PB-SMT in the Legal Domain: A Study on the Translation of Arabic Legal Documents into English and French. Proceedings of the First ConTenNTS Workshop and the 16th BUCC Workshop, 26–35. https://aclanthology.org/2023.contents-1.4.pdf
Ammon, U., Bickel, H., & Lenz, A. N. (Eds.). (2016). Variantenwörterbuch des Deutschen. Die Standardsprache in Österreich, der Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien, Namibia und Mennonitensiedlungen (2nd ed.). de Gruyter.
Bane, F., Zaretskaya, A., Blanch Miró, T., Soler Uguet, C., & Torres, J. (2023). Coming to Terms with Glossary Enforcement: A Study of Three Approaches to Enforcing Terminology in NMT. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 345–353. https://aclanthology.org/2023.eamt-1.34.pdf
Cabrera, L., & Niehues, J. (2023). Gender Lost in Translation: How Bridging the Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation. In E. Vanmassenhove, B. Savoldi, L. Bentivogli, J. Daems, & J. Hackenbuchner (Eds.), Proceedings of the 1st Workshop on Gender-Inclusive Translation Technologes (pp. 25–35). Open Press Tilburg University. https://aclanthology.org/2023.gitt-1.3.pdf
Cao, D. (2007). Translating Law. Multilingual Matters.
Castilho, S., & Knowles, R. (2024). A survey of context in neural machine translation and its evaluation. Natural Language Processing, 1–31. https://doi.org/doi:10.1017/nlp.2024.7
Chiocchetti, E. (2021). Effects of social evolution on terminology policy in South Tyrol. Terminology, 27(1), Article 1. https://doi.org/10.1075/term.00060.chi
Chromá, M. (2008). Translating Terminology in Arbitration Discourse. In V. K. Bhatia, C. N. Candlin, J. Engberg, & J. Lung (Eds.), Legal Discourse across Cultures and Systems (pp. 309–328). Hong Kong University Press. https://www.jstor.org/stable/j.ctt1xwdnt.19
Contarino, A. (2021). Neural machine translation adaptation and automatic terminology evaluation: A case study on Italian and South Tyrolean German legal texts [Doctoral dissertation, University of Bologna]. https://amslaurea.unibo.it/24989/
Contarino, A., & De Camillis, F. (2023). Domain-adapting and evaluating machine translation for institutional German in South Tyrol. In M. Izquierdo & Z. Sanz-Villar (Eds.), Corpus Use in Cross-linguistic Research. Paving the way for teaching, translation and professional communication (pp. 179–194). John Benjamins. https://doi.org/10.1075/scl.113.10con
Costa-jussà, M. R. (2019). An analysis of gender bias studies in natural language processing. Nature Machine Intelligence, 1, 495–496. https://doi.org/10.1038/s42256-019-0105-5
De Camillis, F. (2021). La traduzione non professionale nelle istituzioni pubbliche dei territori di lingua minoritaria: Il caso di studio dell’amministrazione della Provincia autonoma di Bolzano [Doctoral dissertation, University of Bologna]. http://amsdottorato.unibo.it/9695/
De Camillis, F., Stemle, E., Chiocchetti, E., & Fernicola, F. (2023). The MT@BZ corpus: Machine translation & legal language. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 171–180. https://aclanthology.org/2023.eamt-1.17.pdf
de Groot, G.-R. (1999). Das Übersetzen juristischer Terminologie. In G.-R. de Groot & R. Schulze (Eds.), Recht und Übersetzen (pp. 11–46). Nomos.
Edman, L., Toral, A., & van Noord, G. (2020). Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 81–90. https://aclanthology.org/2020.eamt-1.10/
Fadaee, M., Bisazza, A., & Monz, C. (2017). Data Augmentation for Low-Resource Neural Machine Translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), 567–573. https://aclanthology.org/P17-2090.pdf
Farzindar, A., & Lapalme, G. (2009). Machine Translation of Legal Information and Its Evaluation. In Y. Gao & N. Japkowicz (Eds.), Lecture Notes in Artificial Intelligence (pp. 64–73). Springer. https://link.springer.com/chapter/10.1007/978-3-642-01818-3_9
Foti, M. (2022). eTranslation. Le système de traduction automatique de la Commission européenne en appui d’une Europe numérique. Traduire, 246. https://doi.org/10.4000/traduire.2793
Freixa, J. (2022). Causes of terminological variation. In P. Faber & M.-C. L’Homme (Eds.), Theoretical Perspectives on Terminology. Explaining terms, concepts and specialized knowledge (pp. 399–420). John Benjamins. https://doi.org/10.1075/tlrp.23.18fre
Giampieri, P. (2023). Legal Machine Translation Explained: MT in Legal Contexts. Cambridge Scholars.
Goyle, V., Krishnaswamy, P., Ravikumar, K. G., Chattopadhyay, U., & Goyle, K. (2023). Neural machine Translation for low resource languages. https://aclanthology.org/2023.eamt-1.17.pdf
Haddow, B., Bawden, R., Barone, A. V. M., Helcl, J., & Birch, A. (2022). Survey of Low-Resource Machine Translation. Computational Linguistics, 48(3), 673–732. https://doi.org/10.1162/coli_a_00446
Haque, R., Hasanuzzaman, M., & Way, A. (2019). Terminology Translation in Low-Resource Scenarios. Information, 10(9), 273, 2–28. https://doi.org/10.3390/info10090273
Haque, R., Hasanuzzaman, M., & Way, A. (2020). Analysing terminology translation errors in statistical and neural machine translation. Machine Translation, 34, 149–195. https://doi.org/10.1007/s10590-020-09251-z
Heiss, C., & Soffritti, M. (2018). DeepL Traduttore e didattica della traduzione dall’italiano in tedesco. inTRAlinea, 20(1). http://www.intralinea.org/archive/article/2294
Hovy, D., Bianchi, F., & Fornaciari, T. (2020). “You Sound Just Like Your Father”. Commercial Machine Translation Systems Include Stylistic Biases. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1686–1690. https://aclanthology.org/2020.acl-main.154.pdf
Ive, J., Specia, L., Szoc, S., Vanallemeersch, T., Van den Bogaert, J., Farah, E., Maroti, C., Ventura, A., & Khalilov, M. (2020). A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality? In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, I. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (pp. 3692–3697). ELRA. https://aclanthology.org/2020.lrec-1.455.pdf
Kenny, D. (2022). Human and machine translation. In D. Kenny (Ed.), Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 23–50). Language Science Press. https://zenodo.org/record/6653406
Killman, J. (2014). Vocabulary Accuracy of Statistical Machine Translation in the Legal Context. In S. O’Brien, M. Simard, & L. Specia (Eds.), Proceedings of the 11th Conference of the Association for Machine Translation in the Americas (pp. 85–98). Association for Machine Translation in the Americas. https://aclanthology.org/2014.amta-wptp.7/
Killman, J. (2023). Machine translation and legal terminology. Data-driven approaches to contextual accuracy. In Ł. Biel & H. J. Kockaert (Eds.), Handbook of Terminology. Legal Terminology (Vol. 3, pp. 485–510). Benjamins. https://benjamins.com/online/hot/articles/mac2
Kit, C., & Wong, T. M. (2008). Comparative Evaluation of Online Machine Translation Systems with Legal Texts. Law Library Journal, 2(100), 299–321.
Knowles, R., Larkin, S., Tessier, M., & Simard, M. (2023). Terminology in neural machine translation: A case study of the Canadian Hansard. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 481–488. https://nrc-publications.canada.ca/fra/voir/auteur/version/?id=808208ca-bd58-408b-b0d5-6b02f385979e
Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Revista Tradumàtica: Tecnologies de La Traducció, 12, 455–463.
Martínez Domínguez, R., Rikters, M., Vasilevskis, A., Pinnis, M., & Reichenberg, P. (2020). Customized Neural Machine Translation Systems for the Swiss Legal Domain. In J. Campbell, D. Genzel, B. Huyck, & P. O’Neill-Brown (Eds.), Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Vol. 2, pp. 217–223). Association for Machine Translation in the Americas. https://aclanthology.org/2020.amta-user.11.pdf
Marzi, E. (2021). La traduction automatique neuronale et les biais de genre: Le cas des noms de métiers entre l’italien et le français. Synergies Italie, 17, 19–36.
Mattila, H. E. S. (2018). Legal Language. In J. Humbley, G. Budin, & C. Laurén (Eds.), Languages for Special Purposes: An International Handbook (pp. 113–150). De Gruyter Mouton.
Monti, J. (2020). Gender issues in machine translation. An unsolved problem? In L. von Flotow & H. Kamal (Eds.), The Routledge handbook of translation, feminism and gender (pp. 457–468). Routledge.
Moslem, Y., Haque, R., Kelleher, J. D., & Way, A. (2023). Adaptive Machine Translation with Large Language Models. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 227–237. https://aclanthology.org/2023.eamt-1.22/
Moslem, Y., Romani, G., Molaei, M., Haque, R., Kelleher, J. D., & Way, A. (2023). Domain Terminology Integration into Machine Translation: Leveraging Large Language Models. Proceedings of the Eighth Conference on Machine Translation (WMT), 902–911. https://aclanthology.org/2023.wmt-1.82.pdf
Mulé, M., & Johnson, C. (2010). How Effective is Machine Translation of Legal Information? Clearinghouse Review, 44(1), 32–36.
Oliver, A., Alvarez, S., Stemle, E. W., & Chiocchetti, E. (2024). Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian). Proceedings of the 25th Annual Conference of the European Association for Machine Translation, 1, 573–579. https://eamt2024.github.io/proceedings/vol1.pdf
Pontrandolfo, G., & Quinci, C. (2023). Testing neural machine translation against different levels of specialisation. An exploratory investigation across legal genres and languages. Trans-Kom, 16(1), 174–209.
Prieto Ramos, F., & Cerutti Benitez, G. (2021). Terminology as a source of difficulty in translating international legal discourses: An empirical cross-genre study. International Journal of Legal Discourse, 6(2), 155–179. https://doi.org/10.1515/ijld-2021-2052
Provincial Law No. 5/2010: Legge della Provincia autonoma di Bolzano sulla parificazione e sulla promozione delle donne e modifiche a disposizioni vigenti: http://lexbrowser.provincia.bz.it/doc/it/lp-2010-5/legge_provinciale_8_marzo_2010_n_ 5.aspx
Ranathunga, S., Annie Lee, E.-S., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural Machine Translation for Low-Resource Languages: A Survey. https://arxiv.org/abs/2106.15115
Rehm, G., & Way, A. (2023). European Language Equality. Strategic Agenda for Digital Language Equality. Springer.
Sánchez-Gijón, P., & Kenny, D. (2022). Selecting and preparing texts for machine translation: Pre-editing and writing for a global audience. In D. Kenny (Ed.), Machine translation for everyone. Empowering users in the age of artificial intelligence (pp. 81–103). Language Science Press.
Šarčević, S. (1997). New Approach to Legal Translation. Kluwer Law International.
Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., & Turchi, M. (2021). Gender Bias in Machine Translation. In B. Roark & A. Nenkova (Eds.), Transactions of the Association for Computational Linguistics (Vol. 9, pp. 845–874). Association for Computational Linguistics. https://doi.org/10.1162/tacl_a_00401
Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1679–1684. https://aclanthology.org/P19-1164.pdf
Tezcan, A., Hoste, V., & Macken, L. (2017). SCATE Taxonomy and Corpus of Machine Translation Errors. In G. Corpas Pastor & I. Durán Muñoz (Eds.), Trends in e-tools and resources for translators and interpreters (pp. 219–248). Brill/Rodopi. https://core.ac.uk/download/pdf/147051928.pdf
Triboulet, B., & Bouillon, P. (2023). Evaluating the Impact of Stereotypes and Language Combinations on Gender Bias Occurrence in NMT Generic Systems. In B. R. Chakravarthi, J. Griffith, K. Bali, & P. Buitelaar (Eds.), Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 62–70). ACL. https://aclanthology.org/2023.ltedi-1.9/
Wang, R., Tan, X., Luo, R., Qin, T., & Liu, T.-Y. (2021). A Survey on Low-Resource Neural Machine Translation. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. https://www.ijcai.org/proceedings/2021/0629.pdf
Wiesmann, E. (2019). Machine translation in the field of law: A study of the translation of Italian legal texts into German. Comparative Legilinguistics. International Journal for Legal Communication, 37, 117–153. https://doi.org/10.14746/cl.2019.37.4
Yates, S. (2006). Scaling the Tower of Babel Fish: An Analysis of the Machine Translation of Legal Information. Law Library Journal, 98(3), 481–502.
Yvon, F., & Rauf, S. A. (2020). Utilisation de ressources lexicales et terminologiques en traduction neuronale. IMSI-CNRS. https://hal.science/hal-02895535v2
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Elena Chiocchetti, Flavia De Camillis

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
TSR is committed to Creative Commons Attribution Non-Commercial Licence (CC BY-NC). This licence permits users to use, reproduce, disseminate or display the article provided that the author is attributed as the original creator and that the reuse is restricted to non-commercial purposes i.e. research or educational use. The author has the copyright but TSR has the right of first publication. When the article is used e.g. for educational or other non-commercial purposes, the user is expected to mention the author(s), the title of the publication, the name and the number of the publication series and URL-address. The license of the published metadata is Creative Commons CC0 1.0 Universal (CC0 1.0).