Generating multidimensional clusters with support lines

dc.contributor.authorFachada, Nuno
dc.contributor.authorde Andrade, Diogo
dc.contributor.institutionCOPELABS - Cognitive and People-centric Computing
dc.date.issued2023
dc.description.abstractSynthetic data is essential for assessing clustering techniques, complementing and extending real data, and allowing for more complete coverage of a given problem’s space. In turn, synthetic data generators have the potential of creating vast amounts of data – a crucial activity when real-world data is at premium – while providing a well-understood generation procedure and an interpretable instrument for methodically investigating cluster analysis algorithms. Here, we present Clugen, a modular procedure for synthetic data generation, capable of creating multidimensional clusters supported by line segments using arbitrary distributions. Clugen is open source, comprehensively unit tested and documented, and is available for the Python, R, Julia, and MATLAB/Octave ecosystems. We demonstrate that our proposal can produce rich and varied results in various dimensions, is fit for use in the assessment of clustering algorithms, and has the potential to be a widely used framework in diverse clustering-related research tasks. Keywords: Synthetic data, Clustering, Data generation, Multidimensional datapt
dc.description.sponsorshipThis work is supported by Fundação para a Ciência e a Tecnologia, Portugal under Grant UIDB/04111/2020 (COPELABS). The authors would also like to thank the anonymous referees for their valuable comments and helpful suggestions.
dc.formatapplication/pdf
dc.identifier.citationFachada, N & de Andrade, D 2023, 'Generating multidimensional clusters with support lines', Knowledge-Based Systems, vol. 277, no. 9, 110836. https://doi.org/10.1016/j.knosys.2023.110836
dc.identifier.doihttps://doi.org/10.1016/j.knosys.2023.110836
dc.identifier.issn0950-7051
dc.identifier.urlhttps://www.scopus.com/pages/publications/85169786618
dc.language.isoeng
dc.peerreviewedyes
dc.publisherElsevier B.V.
dc.relation.ispartofKnowledge-Based Systems
dc.rightsclosedAccess
dc.subjectCRIAÇÃO DE DADOS SINTÉTICOS
dc.subjectANÁLISE DE CLUSTERS
dc.subjectCRIAÇÃO DE DADOS
dc.subjectINFORMÁTICA
dc.subjectSYNTHETIC DATA GENERATION
dc.subjectCLUSTER ANALYSIS
dc.subjectDATA GENERATION
dc.subjectCOMPUTER SCIENCE
dc.titleGenerating multidimensional clusters with support linesen
dc.typearticle

Ficheiros

Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição: