Data-intensive applications – cloud storage, big data analytics, Internet of Things (IoT), and multimedia streaming – demand cryptographic systems that simultaneously provide strong security and high throughput. However, existing cryptographic solutions often face challenges in balancing scalability, performance, and secure key management in heterogeneous computing environments. Meeting this demand requires rethinking traditional cryptographic architectures. Advanced Encryption Standard (AES) is commonly employed to secure large data volumes, but it does not support secure key distribution. Elliptic Curve Cryptography (ECC) is best suited for key exchange. In contrast, the computational cost associated with ECC makes it less efficient to secure large data volumes. Furthermore, existing hybrid AES–ECC approaches typically focus on limited data sizes or specialized hardware platforms, leaving a gap in evaluating scalable solutions on commodity heterogeneous systems.
To address this challenge, we proposed a hybrid cryptosystem combining the strengths of both AES-256 and ECC, where the sequential workflow of ECC cryptography makes it ideal to be executed in CPU, whereas AES-256 is parallelized on the GPU using CUDA to perform the bulk of data encryption/decryption. The proposed workload partitioning strategy is designed to exploit the architectural strengths of both processing units while minimizing performance bottlenecks.
Our proposed system was implemented, tested, and validated using an Intel Core i7-13620H processor with an NVIDIA RTX 4050 GPU, with data ranging from 10 MiB to 1 GiB. For fair performance comparison, we also implemented the AES using the CPU as a baseline. As we designed and built the cryptosystem using a commodity heterogeneous CPU-GPU platform, the experimental results achieved a high throughput of 8.01 GiB/s, representing a 106× speedup over the CPU-only baseline, for 1 GiB using Counter mode (CTR). For further analysis, we discussed the scalability regimes and revealed that the major cryptosystem performance constraint is the Data transfer overhead via PCIe, accounting for 68.65% of the total GPU execution time. These results demonstrate the practicality and scalability of the proposed approach for large-scale data encryption in real-world computing environments.




