Termbases and Corpus Collections
In translation practice, we often encounter many words and expressions that cannot be found in dictionaries. At this time, we can use term bases and corpora to solve the problem.
Chinese keywords:
http://www.china.org.cn/chinese/china_key_words/
Standardized term base for foreign translation of Chinese characteristic discourse:
http://210.72.20.108/index/index.jsp
Chinese core vocabulary:
https://www.cnkeywords.net/index
Chinese ideological and cultural terms:
https://www.chinesethought.cn/TermBase.aspx
UN termbase: https://unterm.un.org/UNTERM/pohttps://unterm.un.org/UNTERM/portal/welcome
Terminology Online:
https://www.termonline.cn/index
National Institute of Education Term Base:
https://terms.naer.edu.tw/download/
Chinese-English Dictionary of Ming Dynasty Officials:
https://escholarship.org/uc/uci_libs
Chinese normative terms:
https://shuyu.cnki.net/#/
Great Terminology Dictionary:
https://gdt.oqlf.gouv.qc.ca/
TERMIUM:
https://www.btb.termiumplus.gc.ca/tpv2alpha/alpha-eng.html?lang=eng
Yufan Glossary:
http://termbox.lingosail.com/
Microsoft Termbase:
https://www.microsoft.com/zh-cn/language
WHO terminology database:
https://www.who.int/home/cms-decommissioning
Glossary of Electronic Engineering Terms:
https://www.maximintegrated.com/cn/glossary/definitions.mvp/terms/all
FreeMdict 100GB large offline thesaurus download:
https://downloads.freemdict.com/
A dictionary (patent termbase):
http://www.onedict.com/
National Standard "Logistics Terminology": https://logistics.nankai.edu.cn/_upload/article/76/83/1c5da71e4b8e9838ae0843c8cb3d/3a1617ed-acfb-4504-9e18-c079e98e6154.pdf
Winter Olympic terminology query website:
owgt.lingosail.com/
Music term query:
http://dictionary.t-classical.com/
European Union Language and terminology:
https://eur-lex.europa.eu/summary/glossary.html?locale=en
IATE (Interactive Terminology for Europe) EU’s terminology database:
https://iate.europa.eu/home
Chinese and English terms in Hong Kong law:
https://www.elegislation.gov.hk/glossary/english
Magic Search:
https://magicsearch.org/
Languages:
https://www.linguee.com/
The Free Dictionary:
https://www.thefreedictionary.com/
Glosbe:
https://glosbe.com/
domestic
BCC Corpus:
http://bcc.blcu.edu.cn/
Corpus online:
http://www.aihanyu.org/cncorpus/index.aspx
Center for Chinese Linguistics, Peking University:
ccl.pku.edu.cn
BFSU Linguistics Corpus:
bfsu-corpus.org/
Modern Chinese Balanced Corpus:
https://www.sinica.edu.tw/SinicaCorpus/
Ancient Chinese Corpus/Modern Chinese Mark Corpus/Chinese Electronic Documents:
https://www.sinica.edu.tw/ch
Treemap database:
http://treebank.sinica.edu.tw/
Search text and interpret words:
http://words.sinica.edu.tw/
Media Language Corpus (MLC):
https://ling.cuc.edu.cn/RawPub/
Harbin Institute of Technology Information Retrieval Research Office shared corpus resources:
http://ir.hit.edu.cn/demo/ltp/Sharing_Plan.htm
Chinese Synchronous Corpus (LiVaC):
http://www.livac.org/index.php?lang=sc
Chinese Language Resource Alliance:
http://www.chineseldc.org/
Academia Sinica Modern Chinese Markup Corpus:
http://lingcorpus.iis.sinica.edu.tw/early/
"A Dream of Red Mansions" Chinese-English Parallel Corpus:
http://corpus.usx.edu.cn/hongloumeng/images/shiyongshuoming.htm
foreign
BNC - British National Corpus:
http://www.natcorp.ox.ac.uk/
BOE - Collins Corpus of English (the Bank of English):
http://www.collinslanguage.com/language-resources/dictionary-datasets/
ANC - American National Corpus:
https://www.anc.org/
Lancaster Chinese Corpus (LCMC):
http://ota.oucs.ox.ac.uk/scripts/download.php?otaid=2474
SKETCH ENGINE multilingual corpus:
https://www.sketchengine.eu/
BASE - British Academic Spoken English Corpus (British Academic Spoken English Corpus):
https://warwick.ac.uk/fac/soc/al-archive-deleted/research/base
Lextutor:
http://www.lextutor.ca/
My Memory:
https://mymemory.translated.net/
TAUS:
https://datamarketplace.taus.net/
TTMEM:
https://www.ttmem.com/terminology/download-translation-memory/
TinyTM:
http://tinytm.sourceforge.net/
DGT Translation Memory:
https://magmatranslation.com/en/free-translation-memory/
European Parliament Proceedings Parallel Corpus 1996-2011:
https://statmt.org/europarl/
University of Maryland Parallel Corpus Project: The Bible:
http://users.umiacs.umd.edu/~resnik/parallel/bible.html
Aligned Hansards of the 36th Parliament of Canada:
https://www.isi.edu/research_groups/nlg/home
EU Publication Offices:
https://op.europa.eu/en/web/general-publications/publications
Wikimedia Downloads:
https://dumps.wikimedia.org/backup-index.html
United Nations Parallel Corpus:
https://conferences.unite.un.org/UNCorpus/
European language pairs:
https://www.statmt.org/wmt13/translation-task.html#download
parallel corpus search:
http://paralela.clarin-pl.eu/
UM-Corpus: A Large English-Chinese Parallel Corpus (Natural Language Processing and Chinese-Portuguese Machine Translation Laboratory):
http://nlp2ct.cis.umac.mo/um-corpus/um-corpus-license.html
Clarin Parallel corpora:
https://www.clarin.eu/resource-families/parallel-corpora
The PKU 863 Chinese-English Parallel Corpus:
https://www.lancaster.ac.uk/fass/projects/corpus/863parallel/
BYU corpora:
https://corpus.byu.edu/
A collection of translated literature:
https://opus.nlpl.eu/Books.php
A collection of EU Translation Memories provided by the JRC:
https://opus.nlpl.eu/DGT.php
Documents from the Catalan Goverment:
https://opus.nlpl.eu/DOGC.php
European Central Bank corpus:
https://opus.nlpl.eu/ECB.php
European Medicines Agency documents:
https://opus.nlpl.eu/EMEA.php
The EU bookshop corpus:
https://opus.nlpl.eu/EUbookshop.php
The European constitution/European Parliament Proceedings:
https://opus.nlpl.eu/EUconst.php
French-English Gigal-Word Corpus:
https://opus.nlpl.eu/giga-fren.php
GNOME localization files:
https://opus.nlpl.eu/GNOME.php
News stories in various languages:
https://opus.nlpl.eu/GlobalVoices.php
English WaC corpus:
https://opus.nlpl.eu/hrenWaC.php
JRC-Acquis-legislative EU texts:
https://opus.nlpl.eu/JRC-Acquis.php
KDE4 – KDE4 localization files (v.2):
https://opus.nlpl.eu/KDE4.php
KDEdoc – the KDE manual corpus:
https://opus.nlpl.eu/KDEdoc.php
MBS – Belgian Official Gazette corpus:
https://opus.nlpl.eu/MBS.php
memat – Xhosa/English parallel data:
https://opus.nlpl.eu/memat.php
MontenegrinSubs – Montenegrin movie subtitles:
https://opus.nlpl.eu/MontenegrinSubs.php
MultiUN – Translated UN documents:
https://opus.nlpl.eu/MultiUN.php
News Commentary, v9.0, v9.1:
https://opus.nlpl.eu/News-Commentary-v11.php
OfisPublik – Breton – French parallel texts:
https://opus.nlpl.eu/OfisPublik.php
OO – the OpenOffice.org corpus:
https://opus.nlpl.eu/OpenOffice-v2.php
OpenOffice.org 3 corpus:
https://opus.nlpl.eu/OpenOffice-v3.php
OpenSubtitles – the opensubtitles.org corpus:
https://opus.nlpl.eu/OpenSubtitles-v1.php
OpenSubtitles2016 – snapshot from 2016:
https://opus.nlpl.eu/OpenSubtitles-v2016.php
OpenSubtitles2018 – new complete version:
http://opus.nlpl.eu/OpenSubtitles-v2018.php
ParaCrawl corpus:
https://opus.nlpl.eu/ParaCrawl.php
ParaCrawl corpus:
http://opus.nlpl.eu/ParCor
ParCor – A Parallel Pronoun-Coreference Corpus/PHP – the PHP manual corpus:
http://opus.nlpl.eu/ParCor
The government declaration – a tiny example corpus:
http://opus.nlpl.eu/RF.php
SETIMES – A parallel corpus of the Balkan languages:
http://opus.nlpl.eu/SETIMES.php
SPC – Stockholm Parallel Corpora:
https://opus.nlpl.eu/SPC.php
Tatoeba – A DB of translated sentences:
http://opus.nlpl.eu/Tatoeba.php
TedTalk's hr:
http://opus.nlpl.eu/TedTalks.php
TED Talks 2013:
http://opus.nlpl.eu/TED2013.php
Tanzil – A collection of Quran translations:
http://opus.nlpl.eu/Tanzil.php
TEP – The Tehran English-Persian subtitle corpus:
http://opus.nlpl.eu/TEP.php
Ubuntu – Ubuntu localization files:
http://opus.nlpl.eu/Ubuntu.php
UN – Translated UN documents:
http://opus.nlpl.eu/UN.php
Wikipedia – translated sentences from Wikipedia:
http://opus.nlpl.eu/Wikipedia.php
WikiSource – (small en-sv sample only:
http://opus.nlpl.eu/WikiSource.php
WMT News Test Sets:
http://opus.nlpl.eu/WMT-News.php
The Xhosa – English Navy corpus:
http://opus.nlpl.eu/XhosaNavy.php