The impact of data pre-processing on the assessment of the similarity of trend functions

Coandă Ilie

Articolul precedent

Articolul urmator

Ultima descărcare din IBN:
2024-04-04 19:08

SM ISO690:2012

COANDĂ, Ilie. The impact of data pre-processing on the assessment of the similarity of trend functions. In: Competitivitatea şi inovarea în economia cunoaşterii: Culegere de rezumate, Ed. Ediția 27, 22-23 septembrie 2023, Chişinău. Chişinău Republica Moldova: "Print-Caro" SRL, 2023, Ediţia a 27-a, Volumul 1, p. 65. ISBN 978-9975-175-98-2.

EXPORT metadate:
Google Scholar
Crossref
CERIF

DataCite
Dublin Core

Competitivitatea şi inovarea în economia cunoaşterii
Ediţia a 27-a, Volumul 1, 2023

Conferința "Competitivitate şi inovare în economia cunoaşterii"
Ediția 27, Chişinău, Moldova, 22-23 septembrie 2023

The impact of data pre-processing on the assessment of the similarity of trend functions

JEL: C63, I21, I23, I25, I29

Pag. 65-65

Coandă Ilie

Academy of Economic Studies of Moldova

Disponibil în IBN: 15 februarie 2024

Descarcă PDF

Rezumat

An approach to the way, the technologies of cleaning, completing, smoothing of large volumes of data to be subjected to analysis is proposed. As a rule, depending on the field and the method of data collection / recording on various supports, they could be classified at least in two categories: precise data (recorded by automated techniques, without any influence of the human factor) and data, with a level of approximation (when collecting / recording, to some extent, at a certain stage of the activity, the "man" (human) participates). If, in the case of the same activity, relatively, many people participate, then, and the quality level of the records will be at a different level of precision than the records performed in an automated way. This work aims to highlight the importance / impact of the influence of the quality of the preliminary processing (smoothing, cleaning, etc.) of the primary data used in the analysis process. In case studies, the object of the research is considered to be a set of time series corresponding to data collected regarding the phenomenon of the spread of an epidemic. The data recording of such a phenomenon fits perfectly in the studied case when the data collection is carried out with the intense participation of the "human", who is characterized by frequent deviations from the regulations prescribed by the situation. Consequently, some data could be fixed with a delay or / and people affected by the disease signal the doctor in a different period of time. Such phenomena can create anomalies in the data structure. In order to highlight the impact of the application of different smoothing methods, the completion of the primary data, the approximating functions for each time series were obtained, having previously been "corrected" by: a) averaging the neighbouring data; b) "suspicious" data were excluded. As a result, two sets of approximating functions are obtained (approximating functions can be obtained by involving non-linear regressions). By applying the technologies for evaluating the similarity of the functions, the distance (similarity level) between the functions of each set of approximating functions is calculated. Next, the hierarchical clusters of the sets of approximating functions (two sets of approximating functions) can be obtained. By comparing the hierarchical clusters, the level of impact of the "correction" methodology approach a) and b) can be evaluated.

Cuvinte-cheie
Methods, cleaning, smoothing, impact, similarity, functions, regression