Articolul precedent |
Articolul urmator |
984 5 |
Ultima descărcare din IBN: 2024-03-15 18:17 |
SM ISO690:2012 MĂRĂNDUC, Cătălina, MALAHOV, Ludmila, PEREZ, Cenel-Augusto, COLESNICOV, Alexandru. RoDia project of a regional and historical corpus for Romanian. In: Conference on Mathematical Foundations of Informatics, Ed. 2016, 25-30 iulie 2016, Chișinău. Chișinău, Republica Moldova: "VALINEX" SRL, 2016, pp. 268-284. ISBN 978‐9975‐4237‐4‐8. |
EXPORT metadate: Google Scholar Crossref CERIF DataCite Dublin Core |
Conference on Mathematical Foundations of Informatics 2016 | |
Conferința "Conference on Mathematical Foundations of Informatics" 2016, Chișinău, Moldova, 25-30 iulie 2016 | |
|
|
Pag. 268-284 | |
Descarcă PDF | |
Rezumat | |
The majority of big corpora are in contemporary journalistic style. Parsers work better in the standardized style. But recently the geographic and historic variation of natural languages become in the center of the interest of linguists and computer scientists. We have experienced the variety and creativity of Romanian studying the Social Media communication. The old Romanian has a bigger variety; because it is written before the rules were established, being also non-standardized. We will construct tools for the old Romanian and its south Danube dialects processing. We made a big lexicon of Old Romanian, having about 150,000 inflected forms. |
|
Cuvinte-cheie linguistic variation, lexicon, diachronic corpora, nonstandardized language, inflected forms, parser training |
|
|
Cerif XML Export
<?xml version='1.0' encoding='utf-8'?> <CERIF xmlns='urn:xmlns:org:eurocris:cerif-1.5-1' xsi:schemaLocation='urn:xmlns:org:eurocris:cerif-1.5-1 http://www.eurocris.org/Uploads/Web%20pages/CERIF-1.5/CERIF_1.5_1.xsd' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' release='1.5' date='2012-10-07' sourceDatabase='Output Profile'> <cfResPubl> <cfResPublId>ibn-ResPubl-60808</cfResPublId> <cfResPublDate>2016</cfResPublDate> <cfStartPage>268</cfStartPage> <cfISBN>978‐9975‐4237‐4‐8</cfISBN> <cfURI>https://ibn.idsi.md/ro/vizualizare_articol/60808</cfURI> <cfTitle cfLangCode='EN' cfTrans='o'>RoDia project of a regional and historical corpus for Romanian</cfTitle> <cfKeyw cfLangCode='EN' cfTrans='o'>linguistic variation; diachronic corpora; nonstandardized language; lexicon; inflected forms; parser training</cfKeyw> <cfAbstr cfLangCode='EN' cfTrans='o'><p>The majority of big corpora are in contemporary journalistic style. Parsers work better in the standardized style. But recently the geographic and historic variation of natural languages become in the center of the interest of linguists and computer scientists. We have experienced the variety and creativity of Romanian studying the Social Media communication. The old Romanian has a bigger variety; because it is written before the rules were established, being also non-standardized. We will construct tools for the old Romanian and its south Danube dialects processing. We made a big lexicon of Old Romanian, having about 150,000 inflected forms.</p></cfAbstr> <cfResPubl_Class> <cfClassId>eda2d9e9-34c5-11e1-b86c-0800200c9a66</cfClassId> <cfClassSchemeId>759af938-34ae-11e1-b86c-0800200c9a66</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfResPubl_Class> <cfResPubl_Class> <cfClassId>e601872f-4b7e-4d88-929f-7df027b226c9</cfClassId> <cfClassSchemeId>40e90e2f-446d-460a-98e5-5dce57550c48</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfResPubl_Class> <cfPers_ResPubl> <cfPersId>ibn-person-54031</cfPersId> <cfClassId>49815870-1cfe-11e1-8bc2-0800200c9a66</cfClassId> <cfClassSchemeId>b7135ad0-1d00-11e1-8bc2-0800200c9a66</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfPers_ResPubl> <cfPers_ResPubl> <cfPersId>ibn-person-13029</cfPersId> <cfClassId>49815870-1cfe-11e1-8bc2-0800200c9a66</cfClassId> <cfClassSchemeId>b7135ad0-1d00-11e1-8bc2-0800200c9a66</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfPers_ResPubl> <cfPers_ResPubl> <cfPersId>ibn-person-56433</cfPersId> <cfClassId>49815870-1cfe-11e1-8bc2-0800200c9a66</cfClassId> <cfClassSchemeId>b7135ad0-1d00-11e1-8bc2-0800200c9a66</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfPers_ResPubl> <cfPers_ResPubl> <cfPersId>ibn-person-10474</cfPersId> <cfClassId>49815870-1cfe-11e1-8bc2-0800200c9a66</cfClassId> <cfClassSchemeId>b7135ad0-1d00-11e1-8bc2-0800200c9a66</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> </cfPers_ResPubl> </cfResPubl> <cfPers> <cfPersId>ibn-Pers-54031</cfPersId> <cfPersName_Pers> <cfPersNameId>ibn-PersName-54031-3</cfPersNameId> <cfClassId>55f90543-d631-42eb-8d47-d8d9266cbb26</cfClassId> <cfClassSchemeId>7375609d-cfa6-45ce-a803-75de69abe21f</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> <cfFamilyNames>Mărănduc</cfFamilyNames> <cfFirstNames>Cătălina</cfFirstNames> </cfPersName_Pers> </cfPers> <cfPers> <cfPersId>ibn-Pers-13029</cfPersId> <cfPersName_Pers> <cfPersNameId>ibn-PersName-13029-3</cfPersNameId> <cfClassId>55f90543-d631-42eb-8d47-d8d9266cbb26</cfClassId> <cfClassSchemeId>7375609d-cfa6-45ce-a803-75de69abe21f</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> <cfFamilyNames>Malahov</cfFamilyNames> <cfFirstNames>Ludmila</cfFirstNames> </cfPersName_Pers> </cfPers> <cfPers> <cfPersId>ibn-Pers-56433</cfPersId> <cfPersName_Pers> <cfPersNameId>ibn-PersName-56433-3</cfPersNameId> <cfClassId>55f90543-d631-42eb-8d47-d8d9266cbb26</cfClassId> <cfClassSchemeId>7375609d-cfa6-45ce-a803-75de69abe21f</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> <cfFamilyNames>Perez</cfFamilyNames> <cfFirstNames>Cenel-Augusto</cfFirstNames> </cfPersName_Pers> </cfPers> <cfPers> <cfPersId>ibn-Pers-10474</cfPersId> <cfPersName_Pers> <cfPersNameId>ibn-PersName-10474-3</cfPersNameId> <cfClassId>55f90543-d631-42eb-8d47-d8d9266cbb26</cfClassId> <cfClassSchemeId>7375609d-cfa6-45ce-a803-75de69abe21f</cfClassSchemeId> <cfStartDate>2016T24:00:00</cfStartDate> <cfFamilyNames>Colesnicov</cfFamilyNames> <cfFirstNames>Alexandru</cfFirstNames> </cfPersName_Pers> </cfPers> </CERIF>