Termbases and Corpus Collections

2022/11/21 20:59
Termbases and Corpus Collections

In translation practice, we often encounter many words and expressions that cannot be found in dictionaries. At this time, we can use term bases and corpora to solve the problem.


Online Glossary

Chinese keywords:

http://www.china.org.cn/chinese/china_key_words/

Standardized term base for foreign translation of Chinese characteristic discourse:

http://210.72.20.108/index/index.jsp

Chinese core vocabulary:

https://www.cnkeywords.net/index

Chinese ideological and cultural terms:

https://www.chinesethought.cn/TermBase.aspx

UN termbase: https://unterm.un.org/UNTERM/pohttps://unterm.un.org/UNTERM/portal/welcome

Terminology Online:

https://www.termonline.cn/index

National Institute of Education Term Base:

https://terms.naer.edu.tw/download/

Chinese-English Dictionary of Ming Dynasty Officials:

https://escholarship.org/uc/uci_libs

Chinese normative terms:

https://shuyu.cnki.net/#/

Great Terminology Dictionary:

https://gdt.oqlf.gouv.qc.ca/

TERMIUM:

https://www.btb.termiumplus.gc.ca/tpv2alpha/alpha-eng.html?lang=eng

Yufan Glossary:

http://termbox.lingosail.com/

Microsoft Termbase:

https://www.microsoft.com/zh-cn/language

WHO terminology database:

https://www.who.int/home/cms-decommissioning

Glossary of Electronic Engineering Terms:

https://www.maximintegrated.com/cn/glossary/definitions.mvp/terms/all

FreeMdict 100GB large offline thesaurus download:

https://downloads.freemdict.com/

A dictionary (patent termbase):

http://www.onedict.com/

National Standard "Logistics Terminology": https://logistics.nankai.edu.cn/_upload/article/76/83/1c5da71e4b8e9838ae0843c8cb3d/3a1617ed-acfb-4504-9e18-c079e98e6154.pdf

Winter Olympic terminology query website:

owgt.lingosail.com/

Music term query:

http://dictionary.t-classical.com/

European Union Language and terminology:

https://eur-lex.europa.eu/summary/glossary.html?locale=en

IATE (Interactive Terminology for Europe) EU’s terminology database:

https://iate.europa.eu/home

Chinese and English terms in Hong Kong law:

https://www.elegislation.gov.hk/glossary/english

Magic Search:

https://magicsearch.org/

Languages:

https://www.linguee.com/

The Free Dictionary:

https://www.thefreedictionary.com/

Glosbe:

https://glosbe.com/

online corpus

domestic

BCC Corpus:

http://bcc.blcu.edu.cn/

Corpus online:

http://www.aihanyu.org/cncorpus/index.aspx

Center for Chinese Linguistics, Peking University:

ccl.pku.edu.cn

BFSU Linguistics Corpus:

bfsu-corpus.org/

Modern Chinese Balanced Corpus:

https://www.sinica.edu.tw/SinicaCorpus/

Ancient Chinese Corpus/Modern Chinese Mark Corpus/Chinese Electronic Documents:

https://www.sinica.edu.tw/ch

Treemap database:

http://treebank.sinica.edu.tw/

Search text and interpret words:

http://words.sinica.edu.tw/

Media Language Corpus (MLC):

https://ling.cuc.edu.cn/RawPub/

Harbin Institute of Technology Information Retrieval Research Office shared corpus resources:

http://ir.hit.edu.cn/demo/ltp/Sharing_Plan.htm

Chinese Synchronous Corpus (LiVaC):

http://www.livac.org/index.php?lang=sc

Chinese Language Resource Alliance:

http://www.chineseldc.org/

Academia Sinica Modern Chinese Markup Corpus:

http://lingcorpus.iis.sinica.edu.tw/early/

"A Dream of Red Mansions" Chinese-English Parallel Corpus:

http://corpus.usx.edu.cn/hongloumeng/images/shiyongshuoming.htm

foreign

BNC - British National Corpus:

http://www.natcorp.ox.ac.uk/

BOE - Collins Corpus of English (the Bank of English):

http://www.collinslanguage.com/language-resources/dictionary-datasets/

ANC - American National Corpus:

https://www.anc.org/

Lancaster Chinese Corpus (LCMC):

http://ota.oucs.ox.ac.uk/scripts/download.php?otaid=2474

SKETCH ENGINE multilingual corpus:

https://www.sketchengine.eu/

BASE - British Academic Spoken English Corpus (British Academic Spoken English Corpus):

https://warwick.ac.uk/fac/soc/al-archive-deleted/research/base

Lextutor:

http://www.lextutor.ca/

My Memory:

https://mymemory.translated.net/

TAUS:

https://datamarketplace.taus.net/

TTMEM:

https://www.ttmem.com/terminology/download-translation-memory/

TinyTM:

http://tinytm.sourceforge.net/

DGT Translation Memory:

https://magmatranslation.com/en/free-translation-memory/

European Parliament Proceedings Parallel Corpus 1996-2011:

https://statmt.org/europarl/

University of Maryland Parallel Corpus Project: The Bible:

http://users.umiacs.umd.edu/~resnik/parallel/bible.html

Aligned Hansards of the 36th Parliament of Canada:

https://www.isi.edu/research_groups/nlg/home

EU Publication Offices:

https://op.europa.eu/en/web/general-publications/publications

Wikimedia Downloads:

https://dumps.wikimedia.org/backup-index.html

United Nations Parallel Corpus:

https://conferences.unite.un.org/UNCorpus/

European language pairs:

https://www.statmt.org/wmt13/translation-task.html#download

parallel corpus search:

http://paralela.clarin-pl.eu/

UM-Corpus: A Large English-Chinese Parallel Corpus (Natural Language Processing and Chinese-Portuguese Machine Translation Laboratory):

http://nlp2ct.cis.umac.mo/um-corpus/um-corpus-license.html

Clarin Parallel corpora:

https://www.clarin.eu/resource-families/parallel-corpora

The PKU 863 Chinese-English Parallel Corpus:

https://www.lancaster.ac.uk/fass/projects/corpus/863parallel/

BYU corpora: 

https://corpus.byu.edu/


Other subcorpora


A collection of translated literature:

https://opus.nlpl.eu/Books.php

A collection of EU Translation Memories provided by the JRC:

https://opus.nlpl.eu/DGT.php

Documents from the Catalan Goverment:

https://opus.nlpl.eu/DOGC.php

European Central Bank corpus:

https://opus.nlpl.eu/ECB.php

European Medicines Agency documents:

https://opus.nlpl.eu/EMEA.php

The EU bookshop corpus:

https://opus.nlpl.eu/EUbookshop.php

The European constitution/European Parliament Proceedings:

https://opus.nlpl.eu/EUconst.php

French-English Gigal-Word Corpus:

https://opus.nlpl.eu/giga-fren.php

GNOME localization files:

https://opus.nlpl.eu/GNOME.php

News stories in various languages:

https://opus.nlpl.eu/GlobalVoices.php

English WaC corpus:

https://opus.nlpl.eu/hrenWaC.php

JRC-Acquis-legislative EU texts:

https://opus.nlpl.eu/JRC-Acquis.php

KDE4 – KDE4 localization files (v.2):

https://opus.nlpl.eu/KDE4.php

KDEdoc – the KDE manual corpus:

https://opus.nlpl.eu/KDEdoc.php

MBS – Belgian Official Gazette corpus:

https://opus.nlpl.eu/MBS.php

memat – Xhosa/English parallel data:

https://opus.nlpl.eu/memat.php

MontenegrinSubs – Montenegrin movie subtitles:

https://opus.nlpl.eu/MontenegrinSubs.php

MultiUN – Translated UN documents:

https://opus.nlpl.eu/MultiUN.php

News Commentary, v9.0, v9.1:

https://opus.nlpl.eu/News-Commentary-v11.php

OfisPublik – Breton – French parallel texts:

https://opus.nlpl.eu/OfisPublik.php

OO – the OpenOffice.org corpus:

https://opus.nlpl.eu/OpenOffice-v2.php

OpenOffice.org 3 corpus:

https://opus.nlpl.eu/OpenOffice-v3.php

OpenSubtitles – the opensubtitles.org corpus:

https://opus.nlpl.eu/OpenSubtitles-v1.php

OpenSubtitles2016 – snapshot from 2016:

https://opus.nlpl.eu/OpenSubtitles-v2016.php

OpenSubtitles2018 – new complete version:

http://opus.nlpl.eu/OpenSubtitles-v2018.php

ParaCrawl corpus:

https://opus.nlpl.eu/ParaCrawl.php

ParaCrawl corpus:

http://opus.nlpl.eu/ParCor

ParCor – A Parallel Pronoun-Coreference Corpus/PHP – the PHP manual corpus:

http://opus.nlpl.eu/ParCor

The government declaration – a tiny example corpus:

http://opus.nlpl.eu/RF.php

SETIMES – A parallel corpus of the Balkan languages:

http://opus.nlpl.eu/SETIMES.php

SPC – Stockholm Parallel Corpora:

https://opus.nlpl.eu/SPC.php

Tatoeba – A DB of translated sentences:

http://opus.nlpl.eu/Tatoeba.php

TedTalk's hr:

http://opus.nlpl.eu/TedTalks.php

TED Talks 2013:

http://opus.nlpl.eu/TED2013.php

Tanzil – A collection of Quran translations:

http://opus.nlpl.eu/Tanzil.php

TEP – The Tehran English-Persian subtitle corpus:

http://opus.nlpl.eu/TEP.php

Ubuntu – Ubuntu localization files:

http://opus.nlpl.eu/Ubuntu.php

UN – Translated UN documents:

http://opus.nlpl.eu/UN.php

Wikipedia – translated sentences from Wikipedia:

http://opus.nlpl.eu/Wikipedia.php

WikiSource – (small en-sv sample only:

http://opus.nlpl.eu/WikiSource.php

WMT News Test Sets:

http://opus.nlpl.eu/WMT-News.php

The Xhosa – English Navy corpus:

http://opus.nlpl.eu/XhosaNavy.php