The impact of data pre-processing on the assessment of the similarity of trend functions
Închide
Articolul precedent
Articolul urmator
66 1
Ultima descărcare din IBN:
2024-04-04 19:08
SM ISO690:2012
COANDĂ, Ilie. The impact of data pre-processing on the assessment of the similarity of trend functions. In: Competitivitatea şi inovarea în economia cunoaşterii: Culegere de rezumate, Ed. Ediția 27, 22-23 septembrie 2023, Chişinău. Chişinău Republica Moldova: "Print-Caro" SRL, 2023, Ediţia a 27-a, Volumul 1, p. 65. ISBN 978-9975-175-98-2.
EXPORT metadate:
Google Scholar
Crossref
CERIF

DataCite
Dublin Core
Competitivitatea şi inovarea în economia cunoaşterii
Ediţia a 27-a, Volumul 1, 2023
Conferința "Competitivitate şi inovare în economia cunoaşterii"
Ediția 27, Chişinău, Moldova, 22-23 septembrie 2023

The impact of data pre-processing on the assessment of the similarity of trend functions

JEL: C63, I21, I23, I25, I29

Pag. 65-65

Coandă Ilie
 
Academy of Economic Studies of Moldova
 
 
Disponibil în IBN: 15 februarie 2024


Rezumat

An approach to the way, the technologies of cleaning, completing, smoothing of large volumes of data to be subjected to analysis is proposed. As a rule, depending on the field and the method of data collection / recording on various supports, they could be classified at least in two categories: precise data (recorded by automated techniques, without any influence of the human factor) and data, with a level of approximation (when collecting / recording, to some extent, at a certain stage of the activity, the "man" (human) participates). If, in the case of the same activity, relatively, many people participate, then, and the quality level of the records will be at a different level of precision than the records performed in an automated way. This work aims to highlight the importance / impact of the influence of the quality of the preliminary processing (smoothing, cleaning, etc.) of the primary data used in the analysis process. In case studies, the object of the research is considered to be a set of time series corresponding to data collected regarding the phenomenon of the spread of an epidemic. The data recording of such a phenomenon fits perfectly in the studied case when the data collection is carried out with the intense participation of the "human", who is characterized by frequent deviations from the regulations prescribed by the situation. Consequently, some data could be fixed with a delay or / and people affected by the disease signal the doctor in a different period of time. Such phenomena can create anomalies in the data structure. In order to highlight the impact of the application of different smoothing methods, the completion of the primary data, the approximating functions for each time series were obtained, having previously been "corrected" by: a) averaging the neighbouring data; b) "suspicious" data were excluded. As a result, two sets of approximating functions are obtained (approximating functions can be obtained by involving non-linear regressions). By applying the technologies for evaluating the similarity of the functions, the distance (similarity level) between the functions of each set of approximating functions is calculated. Next, the hierarchical clusters of the sets of approximating functions (two sets of approximating functions) can be obtained. By comparing the hierarchical clusters, the level of impact of the "correction" methodology approach a) and b) can be evaluated.

Cuvinte-cheie
Methods, cleaning, smoothing, impact, similarity, functions, regression