The issue of data over-collection in smart devices persists. Despite various attempts to address it, data over-collection remains a significant threat to users' sensitive information on smart devices, often occurring without explicit consent (Yibin et al., 2016). According to Yibin et al. (2016), data over-collection refers to the unauthorized collection and access of user data by smartphone applications beyond their intended purposes, even within permitted permissions. The widespread use of smart devices like smartphones for storing, processing, and saving important information has increased the risk of sensitive data exposure, including personal and financial information, passwords, and contacts (Kang et al., 2013). This highlights the need for effective measures to protect users' sensitive data on smart devices.
Data over-collection without the user’s knowledge has been an alarming challenge in the cyber security and protection of data (Yibin et at., 2016), smartphone users are now experiencing situation whereby applications installed on the device collect and access users' data while staying within the scope of their permissions and not for the users' intended purposes (Yibin, et. al, 2016). Since the invention of smart devices, there have been huge security threats to users’ data privacy of sensitive data. The threat, data over-collection needs to be combated by ensuring secure privacy and protection of users’ sensitive data; this necessitates the need to design a safe and secure Machine Learning Model to prevent over-collection of data by apps.
RELATED WORKS
Alexander et al. (2020) developed a model to protect user privacy and data security in smart cities. Through an in-depth analysis of prominent smart city applications, they identified potential data security risks. Their study emphasized the importance of prioritizing user data security and privacy in the emergence of future smart cities. The researchers concluded that robust measures are necessary to safeguard user data in these urban environments (Alexander et al., 2020). However, their study did not employ a specific approach to achieve this goal.
Egele et al. (2011) used PiOS for static analysis to trace the flow of sensitive data in iOS apps in order to find privacy leaks; PiOS runs three tests on an iOS application. To find the code pathways that connect sensitive sources to sinks, it first reconstructs the control flow graph of the program. In order to identify the paths that link nodes accessing sensitive data to nodes engaging with the network, it secondly does a traditional reachability analysis on the control flow graph. In order to verify that sensitive data is actually moving from the source to the sink, it lastly examines data flow along the channels.
Noerrrezam et al. (2014) developed a more secure model that depended on automated validation by designing a model that prevented mobile applications from being harmful. It is anticipated that users will delete or disable permissions of programs that gather an excessive amount of data. This technique uses more power when operating, which is particularly helpful for smartphones with constrained resources.
Xiao et al. (2012) created a user-ware privacy control mechanism to expose how programs handle sensitive data. Software developers will, however, have more opportunities to program and methods for gathering user data as technology develops. Anand and Janani (2017) developed a strategy to address the issue of data over-collection in smart devices by combining a mobile cloud framework with a key policy attributes-based encryption (KP & BE) model. Information is kept in the cloud in his model, and any data must be confirmed by the user before it can be collected. The user is aware of the information that will be gathered. Andrew (2019) developed a methodology to handle the issue of data privacy by generating questionnaires, disseminating them, and simulating the results. He concluded by examining the user experience. Unfortunately, manual delivery of the surveys is a significant disadvantage, and the results obtained were ineffective. Jena et al.'s (2021) used homomorphic encryption to determine how to increase consumers' trust in the use of privacy-preserving machine learning, which uses cryptography to conduct ML training and testing on encrypted data. They concentrated on ensuring data privacy using machine learning for responsible data science. Instead of preventing apps from obtaining unwanted access to users' data, the machine learning technology used just to protect data. The issue of trust will arise since data will pass through multiple phases for analysis by different parties.
The State of the Art for a Private Decision Tree by Chew et al. (2017) developed a privacy-prese using privacy-preserving machine learning. With the use of the decision tree method, they employed a variety of machine learning privacy protection techniques, including randomization and secure two-party computing. They came to the conclusion that integrating their new discovery with currently used tactics will significantly increase data privacy guarantees. This approach has the disadvantage that machine learning aims to build a model that accurately represents the patterns in training data and performs well when applied to fresh data.
MATERIAL AND METHOD
The System Model
The proposed model consists of two primary components: mobile cloud and smartphones, which interact through an access control submodule. This submodule contains the Data Security Level Determination (DSD) module, which assesses the security level at which an app can access user data. Based on this assessment, the data status determines the data class that the app can access. An app can only access data classes with the same status as its authorized security level, preventing unauthorized access to other data. Once the app's data access security level is verified, the cloud header request is linked to the app's security level, ensuring secure data access. The access control module transmits data to the cloud, which is then processed by the app request router module. This module interprets the data, forwards it to the data-level security module, and updates the process status list. The data security and status list modules collaborate to decode the application's access level and authorized data class. The cryptography module then receives this information, facilitating the encryption and decryption of data on the cloud. Data is encrypted before transmission to the storage service and decrypted upon receipt. The storage service manages data storage, access, and distribution on the cloud. The data router is responsible for routing user
Abimbola Mujidat Oketayo*
10.5281/zenodo.15051151