Joint Generation of Mixed Data of Different Variable Types in Pharmaceutical Sciences


Creative Commons License

Demirtaş H., Çankaya Ö., Altuntaş M., Coşar K., Yılmaztekin Y., Ye C., ...More

Anatolian Journal of Pharmaceutical Sciences, vol.4, no.3, pp.175-209, 2025 (Peer-Reviewed Journal)

Abstract

This manuscript focuses on developing a unified framework for simultaneously generating datasets that encompass four major types of variables (binary, ordinal, count, and continuous) under specified marginal distributions and an appropriate dependence structure for simulation studies. Simulation-based approaches are widely employed in pharmaceutical research and practice. A key element of any simulation study is the characterization of model components and parameters that jointly describe a scientific phenomenon. When such characterization cannot be fully achieved through deterministic methods, investigators frequently turn to random number generation (RNG) to produce simulation-driven solutions that capture the inherent randomness of the process. Although numerous RNG techniques have been proposed in the literature, a significant shortcoming is that most were not designed to accommodate all the aforementioned variable types at once. Consequently, these methods often yield only partial solutions, since real-world datasets typically consist of diverse variable forms. The present work contributes a substantial enhancement to the current methodologies by providing a systematic framework and an in-depth exploration of mixed data generation. We introduce an algorithm tailored to generate data with mixed marginals, describe its operational, computational, and practical aspects, and discuss potential extensions to encompass more complex distributional scenarios involving richer marginal features and dependence structures.