作为翻译工具的语料库

2022-05-05 11:59:16   第一文档网     [ 字体: ] [ 阅读: ] [ 文档下载 ]
说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。下载word有问题请添加QQ:admin处理,感谢您的支持与谅解。点击这里给我发消息

#第一文档网# 导语】以下是®第一文档网的小编为您整理的《作为翻译工具的语料库》,欢迎阅读!
语料库,作为,翻译,工具

Corpus (corpora)

A collection of written texts, especially the entire works of a particular author or a body of writing on a particular subject

A collection of written or spoken material in machine-readable form, assembled for the purpose of studing linguistic structures or frequencies.

The chance to test one's own tentative translation (especially when translating into a foreign language) against the background of a large selection of original text written in the target language.

Translation tools

Parallel corpora of source texts and translations in the formulation phase of a translation In the field of translation memories

In the scholarly analysis of translation processes Oral-written

Monolingual multilingual

Regional social historical variations Annotated text and plain texts

Medium of original text representation

The individual texts: in spoken or written form

A spoken-language corpora will have to be transcribed

Speech-recognition technology might soon make it possible to analyze the spoken-word corpora without the need for transcription.

The electronic representation of texts also allows for the integration of other medium. (audio and video) such a multimedia corpus can be a valuable tool in the field of cross-cultural communication.

Monolingual corpus British National Corpus http://info.ox.ac.uk/bnc

Social and chronological focus

Many corpora represent the language use of a given social group (immigrants, children, men and women.) or the language used in a particular historical period. Chronological : diachronic and synchronic (the current state of language use) Collins Cobuild Corpus used by lexicographers for the purpose of dictionaries or translators for the validity of a given English word or phrase

How to build a corpora

1. Find the relevant websites Yahoo's Spanish site

2. Download files from these websites (download one document at a time or download all the documents of one website at a time.)

Have access to software that enables them to retrieve data efficiently- a text analysis program WordSmith

Wordlist a list of all words occurring in the texts selected for analysis Sorted either alphabetically or by frequency


Keywords a word list from a shorter article with the list from a larger text Concordance the occurrence of a given search term in textual context http://www.oup.com/elt/catalogue/Multimedia/WordSmithTools3.0. 开始窗口里的 Tools menu Settings/Choose Text Click the store button OK

Wordlist

From the file menu 里选 start symbol

Alta Vista personal 97 and Alta Vista Discovery let you use the technology in Alta Vista search engine to search documents on your local computer 3. 4.

What is a parallel corpus

A corpus is a large collection of texts, stored on a computer. A parallel corpus contains texts in two languages. We can distinguish two main types of parallel corpus:

Comparable corpus: the texts are of the same kind and cover the same content. An example would be a corpus of articles about football from English and Danish newspapers; or legal contracts in Spanish and Greek.

Translation corpus:the texts in one language (L1) are translations of texts in the other language (L2).

(Not everyone uses exactly this terminology for different types of corpus). Here I will concentrate on translation corpora. Many researchers have built translation corpora in the past decade, though unfortunately most of them are not easily available. For a useful survey of parallel corpora round the world, look at Michael Barlow's parallel corpora web page (Barlow n.d).

To use a translation corpus you need a special piece of software called a Parallel Concordancer. With this software you can ask the computer to find all the examples of a word or phrase in L1, along with all the corresponding translated sentences in L2. Two widely-used parallel concordancers are ParaConc(see Michael Barlow's ParaConc web page (Barlow n.d2) for details) and Multiconcord (information at the Multiconcord web page (Johns 1998)).


本文来源:https://www.dywdw.cn/4e916c2eed630b1c59eeb57b.html

相关推荐
推荐阅读