Machine-translating legal language: error analysis on an Italian-German corpus of decrees

Authors

Keywords:

machine translation evaluation, legal language, legal terminology, gender bias

Abstract

The paper analyzes the most frequent error categories in a bidirectional corpus of machine-translated decrees in the language combination Italian-South Tyrolean German. The aim is to assess translation issues when using a fine-tuned machine translation (MT) system to produce legal texts in an Italian province where German is an officially recognized minority language, and the local legal language differs from that used within other German-speaking legal systems. Our fine-tuned MT system struggles with features that are typical for the legal language, e.g., legal phraseology, legal terminology (especially the specific local South Tyrolean terminology), and gender-sensitive language. The latter is a requirement for local legislation. The errors identified shed light on the need to feed MT systems with terminological information, especially for low-resource language varieties such as South Tyrolean German. We consider our results key information for the training of post-editors, professional translators, and non-professional translators working in multilingual public administrations. 

References

Ait ElFqih, K., & Monti, J. (2023). On the Evaluation of Terminology Translation Errors in NMT and PB-SMT in the Legal Domain: A Study on the Translation of Arabic Legal Documents into English and French. Proceedings of the First ConTenNTS Workshop and the 16th BUCC Workshop, 26–35. https://aclanthology.org/2023.contents-1.4.pdf

Ammon, U., Bickel, H., & Lenz, A. N. (Eds.). (2016). Variantenwörterbuch des Deutschen. Die Standardsprache in Österreich, der Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien, Namibia und Mennonitensiedlungen (2nd ed.). de Gruyter.

Bane, F., Zaretskaya, A., Blanch Miró, T., Soler Uguet, C., & Torres, J. (2023). Coming to Terms with Glossary Enforcement: A Study of Three Approaches to Enforcing Terminology in NMT. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 345–353. https://aclanthology.org/2023.eamt-1.34.pdf

Cabrera, L., & Niehues, J. (2023). Gender Lost in Translation: How Bridging the Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation. In E. Vanmassenhove, B. Savoldi, L. Bentivogli, J. Daems, & J. Hackenbuchner (Eds.), Proceedings of the 1st Workshop on Gender-Inclusive Translation Technologes (pp. 25–35). Open Press Tilburg University. https://aclanthology.org/2023.gitt-1.3.pdf

Cao, D. (2007). Translating Law. Multilingual Matters.

Castilho, S., & Knowles, R. (2024). A survey of context in neural machine translation and its evaluation. Natural Language Processing, 1–31. https://doi.org/doi:10.1017/nlp.2024.7

Chiocchetti, E. (2021). Effects of social evolution on terminology policy in South Tyrol. Terminology, 27(1), Article 1. https://doi.org/10.1075/term.00060.chi

Chromá, M. (2008). Translating Terminology in Arbitration Discourse. In V. K. Bhatia, C. N. Candlin, J. Engberg, & J. Lung (Eds.), Legal Discourse across Cultures and Systems (pp. 309–328). Hong Kong University Press. https://www.jstor.org/stable/j.ctt1xwdnt.19

Contarino, A. (2021). Neural machine translation adaptation and automatic terminology evaluation: A case study on Italian and South Tyrolean German legal texts [Doctoral dissertation, University of Bologna]. https://amslaurea.unibo.it/24989/

Contarino, A., & De Camillis, F. (2023). Domain-adapting and evaluating machine translation for institutional German in South Tyrol. In M. Izquierdo & Z. Sanz-Villar (Eds.), Corpus Use in Cross-linguistic Research. Paving the way for teaching, translation and professional communication (pp. 179–194). John Benjamins. https://doi.org/10.1075/scl.113.10con

Costa-jussà, M. R. (2019). An analysis of gender bias studies in natural language processing. Nature Machine Intelligence, 1, 495–496. https://doi.org/10.1038/s42256-019-0105-5

De Camillis, F. (2021). La traduzione non professionale nelle istituzioni pubbliche dei territori di lingua minoritaria: Il caso di studio dell’amministrazione della Provincia autonoma di Bolzano [Doctoral dissertation, University of Bologna]. http://amsdottorato.unibo.it/9695/

De Camillis, F., Stemle, E., Chiocchetti, E., & Fernicola, F. (2023). The MT@BZ corpus: Machine translation & legal language. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 171–180. https://aclanthology.org/2023.eamt-1.17.pdf

de Groot, G.-R. (1999). Das Übersetzen juristischer Terminologie. In G.-R. de Groot & R. Schulze (Eds.), Recht und Übersetzen (pp. 11–46). Nomos.

Edman, L., Toral, A., & van Noord, G. (2020). Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 81–90. https://aclanthology.org/2020.eamt-1.10/

Fadaee, M., Bisazza, A., & Monz, C. (2017). Data Augmentation for Low-Resource Neural Machine Translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Short Papers), 567–573. https://aclanthology.org/P17-2090.pdf

Farzindar, A., & Lapalme, G. (2009). Machine Translation of Legal Information and Its Evaluation. In Y. Gao & N. Japkowicz (Eds.), Lecture Notes in Artificial Intelligence (pp. 64–73). Springer. https://link.springer.com/chapter/10.1007/978-3-642-01818-3_9

Foti, M. (2022). eTranslation. Le système de traduction automatique de la Commission européenne en appui d’une Europe numérique. Traduire, 246. https://doi.org/10.4000/traduire.2793

Freixa, J. (2022). Causes of terminological variation. In P. Faber & M.-C. L’Homme (Eds.), Theoretical Perspectives on Terminology. Explaining terms, concepts and specialized knowledge (pp. 399–420). John Benjamins. https://doi.org/10.1075/tlrp.23.18fre

Giampieri, P. (2023). Legal Machine Translation Explained: MT in Legal Contexts. Cambridge Scholars.

Goyle, V., Krishnaswamy, P., Ravikumar, K. G., Chattopadhyay, U., & Goyle, K. (2023). Neural machine Translation for low resource languages. https://aclanthology.org/2023.eamt-1.17.pdf

Haddow, B., Bawden, R., Barone, A. V. M., Helcl, J., & Birch, A. (2022). Survey of Low-Resource Machine Translation. Computational Linguistics, 48(3), 673–732. https://doi.org/10.1162/coli_a_00446

Haque, R., Hasanuzzaman, M., & Way, A. (2019). Terminology Translation in Low-Resource Scenarios. Information, 10(9), 273, 2–28. https://doi.org/10.3390/info10090273

Haque, R., Hasanuzzaman, M., & Way, A. (2020). Analysing terminology translation errors in statistical and neural machine translation. Machine Translation, 34, 149–195. https://doi.org/10.1007/s10590-020-09251-z

Heiss, C., & Soffritti, M. (2018). DeepL Traduttore e didattica della traduzione dall’italiano in tedesco. inTRAlinea, 20(1). http://www.intralinea.org/archive/article/2294

Hovy, D., Bianchi, F., & Fornaciari, T. (2020). “You Sound Just Like Your Father”. Commercial Machine Translation Systems Include Stylistic Biases. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1686–1690. https://aclanthology.org/2020.acl-main.154.pdf

Ive, J., Specia, L., Szoc, S., Vanallemeersch, T., Van den Bogaert, J., Farah, E., Maroti, C., Ventura, A., & Khalilov, M. (2020). A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality? In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, I. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 12th Conference on Language Resources and Evaluation (pp. 3692–3697). ELRA. https://aclanthology.org/2020.lrec-1.455.pdf

Kenny, D. (2022). Human and machine translation. In D. Kenny (Ed.), Machine translation for everyone: Empowering users in the age of artificial intelligence (pp. 23–50). Language Science Press. https://zenodo.org/record/6653406

Killman, J. (2014). Vocabulary Accuracy of Statistical Machine Translation in the Legal Context. In S. O’Brien, M. Simard, & L. Specia (Eds.), Proceedings of the 11th Conference of the Association for Machine Translation in the Americas (pp. 85–98). Association for Machine Translation in the Americas. https://aclanthology.org/2014.amta-wptp.7/

Killman, J. (2023). Machine translation and legal terminology. Data-driven approaches to contextual accuracy. In Ł. Biel & H. J. Kockaert (Eds.), Handbook of Terminology. Legal Terminology (Vol. 3, pp. 485–510). Benjamins. https://benjamins.com/online/hot/articles/mac2

Kit, C., & Wong, T. M. (2008). Comparative Evaluation of Online Machine Translation Systems with Legal Texts. Law Library Journal, 2(100), 299–321.

Knowles, R., Larkin, S., Tessier, M., & Simard, M. (2023). Terminology in neural machine translation: A case study of the Canadian Hansard. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 481–488. https://nrc-publications.canada.ca/fra/voir/auteur/version/?id=808208ca-bd58-408b-b0d5-6b02f385979e

Lommel, A., Uszkoreit, H., & Burchardt, A. (2014). Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics. Revista Tradumàtica: Tecnologies de La Traducció, 12, 455–463.

Martínez Domínguez, R., Rikters, M., Vasilevskis, A., Pinnis, M., & Reichenberg, P. (2020). Customized Neural Machine Translation Systems for the Swiss Legal Domain. In J. Campbell, D. Genzel, B. Huyck, & P. O’Neill-Brown (Eds.), Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Vol. 2, pp. 217–223). Association for Machine Translation in the Americas. https://aclanthology.org/2020.amta-user.11.pdf

Marzi, E. (2021). La traduction automatique neuronale et les biais de genre: Le cas des noms de métiers entre l’italien et le français. Synergies Italie, 17, 19–36.

Mattila, H. E. S. (2018). Legal Language. In J. Humbley, G. Budin, & C. Laurén (Eds.), Languages for Special Purposes: An International Handbook (pp. 113–150). De Gruyter Mouton.

Monti, J. (2020). Gender issues in machine translation. An unsolved problem? In L. von Flotow & H. Kamal (Eds.), The Routledge handbook of translation, feminism and gender (pp. 457–468). Routledge.

Moslem, Y., Haque, R., Kelleher, J. D., & Way, A. (2023). Adaptive Machine Translation with Large Language Models. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 227–237. https://aclanthology.org/2023.eamt-1.22/

Moslem, Y., Romani, G., Molaei, M., Haque, R., Kelleher, J. D., & Way, A. (2023). Domain Terminology Integration into Machine Translation: Leveraging Large Language Models. Proceedings of the Eighth Conference on Machine Translation (WMT), 902–911. https://aclanthology.org/2023.wmt-1.82.pdf

Mulé, M., & Johnson, C. (2010). How Effective is Machine Translation of Legal Information? Clearinghouse Review, 44(1), 32–36.

Oliver, A., Alvarez, S., Stemle, E. W., & Chiocchetti, E. (2024). Training an NMT system for legal texts of a low-resource language variety (South Tyrolean German – Italian). Proceedings of the 25th Annual Conference of the European Association for Machine Translation, 1, 573–579. https://eamt2024.github.io/proceedings/vol1.pdf

Pontrandolfo, G., & Quinci, C. (2023). Testing neural machine translation against different levels of specialisation. An exploratory investigation across legal genres and languages. Trans-Kom, 16(1), 174–209.

Prieto Ramos, F., & Cerutti Benitez, G. (2021). Terminology as a source of difficulty in translating international legal discourses: An empirical cross-genre study. International Journal of Legal Discourse, 6(2), 155–179. https://doi.org/10.1515/ijld-2021-2052

Provincial Law No. 5/2010: Legge della Provincia autonoma di Bolzano sulla parificazione e sulla promozione delle donne e modifiche a disposizioni vigenti: http://lexbrowser.provincia.bz.it/doc/it/lp-2010-5/legge_provinciale_8_marzo_2010_n_ 5.aspx

Ranathunga, S., Annie Lee, E.-S., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural Machine Translation for Low-Resource Languages: A Survey. https://arxiv.org/abs/2106.15115

Rehm, G., & Way, A. (2023). European Language Equality. Strategic Agenda for Digital Language Equality. Springer.

Sánchez-Gijón, P., & Kenny, D. (2022). Selecting and preparing texts for machine translation: Pre-editing and writing for a global audience. In D. Kenny (Ed.), Machine translation for everyone. Empowering users in the age of artificial intelligence (pp. 81–103). Language Science Press.

Šarčević, S. (1997). New Approach to Legal Translation. Kluwer Law International.

Savoldi, B., Gaido, M., Bentivogli, L., Negri, M., & Turchi, M. (2021). Gender Bias in Machine Translation. In B. Roark & A. Nenkova (Eds.), Transactions of the Association for Computational Linguistics (Vol. 9, pp. 845–874). Association for Computational Linguistics. https://doi.org/10.1162/tacl_a_00401

Stanovsky, G., Smith, N. A., & Zettlemoyer, L. (2019). Evaluating Gender Bias in Machine Translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1679–1684. https://aclanthology.org/P19-1164.pdf

Tezcan, A., Hoste, V., & Macken, L. (2017). SCATE Taxonomy and Corpus of Machine Translation Errors. In G. Corpas Pastor & I. Durán Muñoz (Eds.), Trends in e-tools and resources for translators and interpreters (pp. 219–248). Brill/Rodopi. https://core.ac.uk/download/pdf/147051928.pdf

Triboulet, B., & Bouillon, P. (2023). Evaluating the Impact of Stereotypes and Language Combinations on Gender Bias Occurrence in NMT Generic Systems. In B. R. Chakravarthi, J. Griffith, K. Bali, & P. Buitelaar (Eds.), Proceedings of the Third Workshop on Language Technology for Equality, Diversity and Inclusion (pp. 62–70). ACL. https://aclanthology.org/2023.ltedi-1.9/

Wang, R., Tan, X., Luo, R., Qin, T., & Liu, T.-Y. (2021). A Survey on Low-Resource Neural Machine Translation. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. https://www.ijcai.org/proceedings/2021/0629.pdf

Wiesmann, E. (2019). Machine translation in the field of law: A study of the translation of Italian legal texts into German. Comparative Legilinguistics. International Journal for Legal Communication, 37, 117–153. https://doi.org/10.14746/cl.2019.37.4

Yates, S. (2006). Scaling the Tower of Babel Fish: An Analysis of the Machine Translation of Legal Information. Law Library Journal, 98(3), 481–502.

Yvon, F., & Rauf, S. A. (2020). Utilisation de ressources lexicales et terminologiques en traduction neuronale. IMSI-CNRS. https://hal.science/hal-02895535v2

Downloads

Published

2024-12-11

How to Cite

De Camillis, F., & Chiocchetti, E. (2024). Machine-translating legal language: error analysis on an Italian-German corpus of decrees. Terminology Science & Research Terminologie : Science Et Recherche, (27), 1–27. Retrieved from https://journal-eaft-aet.net/index.php/tsr/article/view/8304