Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation
Închide
Conţinutul numărului revistei
Articolul precedent
Articolul urmator
949 5
Ultima descărcare din IBN:
2022-11-16 19:31
Căutarea după subiecte
similare conform CZU
004.9:81'246.3 (2)
Informatică aplicată. Tehnici bazate pe calculator cu aplicații practice (440)
Lingvistică. Limbi (5052)
SM ISO690:2012
TUFIŞ, Dan. Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation. In: Computer Science Journal of Moldova, 2012, nr. 2(59), pp. 227-245. ISSN 1561-4042.
EXPORT metadate:
Google Scholar
Crossref
CERIF

DataCite
Dublin Core
Computer Science Journal of Moldova
Numărul 2(59) / 2012 / ISSN 1561-4042 /ISSNe 2587-4330

Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation
CZU: 004.9:81'246.3

Pag. 227-245

Tufiş Dan
 
Institute for Artificial Intelligence, Romanian Academy
 
 
Disponibil în IBN: 30 noiembrie 2013


Rezumat

The cyberspace is populated with valuable information sources, expressed in about 1500 dfferent languages and dialects. Yet, for the vast majority of WEB surfers this wealth of information is practically inaccessible or meaningless. Recent advancements in cross-lingual information retrieval, multilingual summarization, cross-lingual question answering and machine translation promise to narrow the linguistic gaps and lower the communication barriers between humans and/or software agents. Most of these language technologies are based on statistical machine learning techniques which require large volumes of cross lingual data. The most adequate type of cross-lingual data is represented by parallel corpora, collection of reciprocal translations. However, it is not easy to find enough parallel data for any language pair might be of interest. When required parallel data refers to specialized (narrow) domains, the scarcity of data becomes even more acute. Intelligent information extraction techniques from comparable corpora provide one of the possible answers to this lack of translation data.

Cuvinte-cheie
alignment, document crawling, multilingual corpora, comparable corpora, machine learning