16.12.2023: After the publication of the official proceedings, we found several inconsistencies in authors’ submissions and our metrics’ performance. For the transparency and replicability of the research outcomes, we publish the updated and double-checked results with all necessary comments:
We apologize for finding the inconsistencies late, and we hope that the published updates will help.
16.02.2024: We also publish all submitted systems’ outputs:
The compiled dataset is also available on Huggingface and is the intended way to use the data.
Read the Findings of the WMT 2023 Shared Task on Machine Translation with Terminologies. Cite the work as:
@inproceedings{semenov-etal-2023-findings,
title = "Findings of the WMT 2023 Shared Task on Machine Translation with Terminologies",
author = "Kirill Semenov and
Vilém Zouhar and
Tom Kocmi and
Dongdong Zhang and
Wangchunshu Zhou and
Yuchen Eleanor Jiang"
booktitle = "Proceedings of the Eight Conference on Machine Translation (WMT)",
month = dec,
year = "2023",
publisher = "Association for Computational Linguistics",
}
Consider the following English sentence and the hypothesis which is being updated based on incremental terminology information, which ultimately leads to the translation that is the closest to the reference.
Source | The report is in accordance with ROA. |
Hypothesis 1 | Der Bericht steht im Einklang mit ROA. |
Help 1 | “ROA” → “FOG” |
Hypothesis 2 | Der Bericht steht im Einklang mit FOG. |
Help 2 | “is in accordance” → “entspricht” |
Hypothesis 3 (best) | Der Bericht entspricht ROA. |
Date | |
---|---|
Training data and test data ready to download | 16th March, 2023 |
Release of the blind test | 11th July, 2023 |
Submission deadline for the terminology task | 24th July, 2023 |
Paper submission deadline to WMT | 5th September, 2023 |
WMT Notification of acceptance | 6th October, 2023 |
WMT Camera-ready deadline | 18th October, 2023 |
Conference | 6-7 December, 2023 |
Make sure to submit translation results on both the dev and the test sets. All deadlines are in AoE (Anywhere on Earth). Dates are specified with respect to EMNLP 2023.
The primary goal of the WMT 2023 Terminology Shared Task is to evaluate the ability of machine translation systems to accurately translate technical terms and specialized vocabulary. The task aims to assess the extent to which machine translation models can utilize additional information regarding the translation of terminologies.
Another important goal of the shared task is to encourage the development of machine translation systems that are better equipped to handle the complex vocabulary of technical and specialized domains. By providing participants with these resources and evaluating their performance on both general translation quality and the effectiveness of terminology translation, the task hopes to encourage the development of systems that can accurately and consistently translate these types of texts.
Overall, the WMT 2023 Terminology Shared Task seeks to improve the state-of-the-art in machine translation by challenging participants to develop systems that can accurately and effectively translate technical terms and specialized vocabulary, with the ultimate goal of improving communication and understanding in specialized and emerging domains.
We focus on the following language pairs (one direction for each):
zh-en
: Chinese → Englishcs-en
: English → Czechde-en
: German → EnglishThe submissions were evaluated based on:
The final ranking of the submissions will be ascertained through a weighted average of the scores obtained from both the human evaluation and terminology translation assessments.
You are invited to submit a short paper (4 to 6 pages) to WMT describing your system. Shared task submission description papers are non-archival, and you are not required to submit a paper if you do not want to. If you don’t, we ask that you give an appropriate reference describing your metric that we can cite in the overview paper.
All data, including references, have now been published: wmt-terminology-task/data-2023. The use of any additional terminology-specific data beyond that provided in these resources is prohibited.
Considering the example in motivation, the following source could appear in test.en-de.en
:
The report is in accordance with ROA.
Then, the corresponding line in test.en-de.dict
can be empty or contain a JSON array, such as:
[{"en": "ROA", "de": "FOG"}, {"en": "is in accordance", "de": "entspricht"}]
It is then up to the model to utilize any part of this additional information.
Note that the spans in the dictionary do not have to appear as consecutive words in the text (e.g. He turned around
→ Er drehte sich um.
and turned
→ drehte um
).
Furthermore, it is possible that the alignment is not perfect (semi-automatic procedure), however it is guaranteed that the individual words appear in the text.
Before you submit, please make sure your submission follows the submission format (example here). You can run your translation files through a validation script, which is now available here.
Your translations should be submitted through this form.
No, but we highly encourage it as it makes comparisons fairer. If you already have a system for a specific language pair, replicating it (without optimizing hyperparameters) for another language pair should be very easy and yield a good demonstration for your method.
The rules from the constrained track of general MT apply. Some positive exceptions may be granted if they do not violate the spirit of this task.
Yes, though ideally it would all be one model which possilby takes in also the translation dictionary.
There are essentially three modes to this task:
They’re interleaved in the data so that everything can be processed easily by a single model. So even though there are the same source sentences, there are different additional “terminology” information, which creates the three modes.
In no particular order: