RoDia project of a regional and historical corpus for Romanian
Închide
Articolul precedent
Articolul urmator
203 0
SM ISO690:2012
MĂRĂNDUC, Cătălina; MALAHOV, Ludmila; PEREZ, Cenel-Augusto; COLESNICOV, Alexandru. RoDia project of a regional and historical corpus for Romanian. In: Conference on Mathematical Foundations of Informatics. 25-30 iulie 2016, Chișinău. Chișinău, Republica Moldova: "VALINEX" SRL, 2016, pp. 268-284. ISBN 978‐9975‐4237‐4‐8.
EXPORT metadate:
Google Scholar
Crossref
CERIF
BibTeX
DataCite
Dublin Core
Conference on Mathematical Foundations of Informatics 2016
Conferința "Conference on Mathematical Foundations of Informatics"
Chișinău, Moldova, 25-30 iulie 2016

RoDia project of a regional and historical corpus for Romanian


Pag. 268-284

Mărănduc Cătălina12, Malahov Ludmila3, Perez Cenel-Augusto1, Colesnicov Alexandru3
 
1 „Alexandru Ioan Cuza” University, Iasi,
2 Institutul de Lingvistică „Iorgu Iordan - Al. Rosetti“ al Academiei Române,
3 Institute of Mathematics and Computer Science ASM
 
Disponibil în IBN: 30 martie 2018


Rezumat

The majority of big corpora are in contemporary journalistic style. Parsers work better in the standardized style. But recently the geographic and historic variation of natural languages become in the center of the interest of linguists and computer scientists. We have experienced the variety and creativity of Romanian studying the Social Media communication. The old Romanian has a bigger variety; because it is written before the rules were established, being also non-standardized. We will construct tools for the old Romanian and its south Danube dialects processing. We made a big lexicon of Old Romanian, having about 150,000 inflected forms.

Cuvinte-cheie
linguistic variation, diachronic corpora, nonstandardized language, inflected forms, parser training,

lexicon

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='http://datacite.org/schema/kernel-3' xsi:schemaLocation='http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd'>
<creators>
<creator>
<creatorName>Mărănduc, C.</creatorName>
<affiliation>Universitatea "Alexandru Ioan Cuza", Iaşi, România</affiliation>
</creator>
<creator>
<creatorName>Malahov, L.A.</creatorName>
<affiliation>Institutul de Matematică şi Informatică al AŞM, Moldova, Republica</affiliation>
</creator>
<creator>
<creatorName>Perez, C.</creatorName>
<affiliation>Universitatea "Alexandru Ioan Cuza", Iaşi, România</affiliation>
</creator>
<creator>
<creatorName>Colesnicov, A.E.</creatorName>
<affiliation>Institutul de Matematică şi Informatică al AŞM, Moldova, Republica</affiliation>
</creator>
</creators>
<titles>
<title xml:lang='en'><p>RoDia project of a regional and historical corpus for Romanian</p></title>
</titles>
<publisher>Instrumentul Bibliometric National</publisher>
<publicationYear>2016</publicationYear>
<relatedIdentifier relatedIdentifierType='ISBN' relationType='IsPartOf'>978‐9975‐4237‐4‐8</relatedIdentifier>
<subjects>
<subject>linguistic variation</subject>
<subject>diachronic corpora</subject>
<subject>nonstandardized language</subject>
<subject>lexicon</subject>
<subject>inflected forms</subject>
<subject>parser training</subject>
</subjects>
<dates>
<date dateType='Issued'>2016</date>
</dates>
<resourceType resourceTypeGeneral='Text'>Conference Paper</resourceType>
<descriptions>
<description xml:lang='en' descriptionType='Abstract'><p>The majority of big corpora are in contemporary journalistic style. Parsers work better in the standardized style. But recently the geographic and historic variation of natural languages become in the center of the interest of linguists and computer scientists. We have experienced the variety and creativity of Romanian studying the Social Media communication. The old Romanian has a bigger variety; because it is written before the rules were established, being also non-standardized. We will construct tools for the old Romanian and its south Danube dialects processing. We made a big lexicon of Old Romanian, having about 150,000 inflected forms.</p></description>
</descriptions>
<formats>
<format>application/pdf</format>
</formats>
</resource>