A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices

Abimbola Mujidat Oketayo,

doi:10.5281/zenodo.15051151

Research Paper | Open Access
Volume 02 | Issue 03 | Article Id IJSRT/250302069

A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices
Abimbola Mujidat Oketayo*
Research Fellow, Computer Science Department, National Mathematical Centre, Abuja, Nigeria

Abstract

The growing number of smart devices has led to excessive data collection, compromising user privacy and device efficiency. To mitigate this issue, we present a machine learning model that identifies and prevents unnecessary data collection in real-time. Our approach analyzes device usage patterns, data flow, and system metrics to predict and detect over-collection. Experimental evaluations demonstrate that our model reduces data collection by 30% on average, without impacting device functionality. This study contributes to the development of privacy-preserving smart devices, promoting data protection, and resource optimization in the IoT ecosystem. Our solution offers a scalable and adaptive framework for mitigating data over-collection, enhancing user trust, and ensuring responsible device managementcollaboration are essential for transforming diabetes management and enhancing the quality of life for patients globally.

Keywords

Machine Learning, Smart device, Data over-collection, Privacy

Introduction

The issue of data over-collection in smart devices persists. Despite various attempts to address it, data over-collection remains a significant threat to users' sensitive information on smart devices, often occurring without explicit consent (Yibin et al., 2016). According to Yibin et al. (2016), data over-collection refers to the unauthorized collection and access of user data by smartphone applications beyond their intended purposes, even within permitted permissions. The widespread use of smart devices like smartphones for storing, processing, and saving important information has increased the risk of sensitive data exposure, including personal and financial information, passwords, and contacts (Kang et al., 2013). This highlights the need for effective measures to protect users' sensitive data on smart devices.

Data over-collection without the user’s knowledge has been an alarming challenge in the cyber security and protection of data (Yibin et at., 2016), smartphone users are now experiencing situation whereby applications installed on the device collect and access users' data while staying within the scope of their permissions and not for the users' intended purposes (Yibin, et. al, 2016). Since the invention of smart devices, there have been huge security threats to users’ data privacy of sensitive data. The threat, data over-collection needs to be combated by ensuring secure privacy and protection of users’ sensitive data; this necessitates the need to design a safe and secure Machine Learning Model to prevent over-collection of data by apps.

RELATED WORKS

Alexander et al. (2020) developed a model to protect user privacy and data security in smart cities. Through an in-depth analysis of prominent smart city applications, they identified potential data security risks. Their study emphasized the importance of prioritizing user data security and privacy in the emergence of future smart cities. The researchers concluded that robust measures are necessary to safeguard user data in these urban environments (Alexander et al., 2020). However, their study did not employ a specific approach to achieve this goal.

Egele et al. (2011) used PiOS for static analysis to trace the flow of sensitive data in iOS apps in order to find privacy leaks; PiOS runs three tests on an iOS application. To find the code pathways that connect sensitive sources to sinks, it first reconstructs the control flow graph of the program. In order to identify the paths that link nodes accessing sensitive data to nodes engaging with the network, it secondly does a traditional reachability analysis on the control flow graph. In order to verify that sensitive data is actually moving from the source to the sink, it lastly examines data flow along the channels.

Noerrrezam et al. (2014) developed a more secure model that depended on automated validation by designing a model that prevented mobile applications from being harmful. It is anticipated that users will delete or disable permissions of programs that gather an excessive amount of data. This technique uses more power when operating, which is particularly helpful for smartphones with constrained resources.

Xiao et al. (2012) created a user-ware privacy control mechanism to expose how programs handle sensitive data. Software developers will, however, have more opportunities to program and methods for gathering user data as technology develops. Anand and Janani (2017) developed a strategy to address the issue of data over-collection in smart devices by combining a mobile cloud framework with a key policy attributes-based encryption (KP & BE) model. Information is kept in the cloud in his model, and any data must be confirmed by the user before it can be collected. The user is aware of the information that will be gathered. Andrew (2019) developed a methodology to handle the issue of data privacy by generating questionnaires, disseminating them, and simulating the results. He concluded by examining the user experience. Unfortunately, manual delivery of the surveys is a significant disadvantage, and the results obtained were ineffective. Jena et al.'s (2021) used homomorphic encryption to determine how to increase consumers' trust in the use of privacy-preserving machine learning, which uses cryptography to conduct ML training and testing on encrypted data. They concentrated on ensuring data privacy using machine learning for responsible data science. Instead of preventing apps from obtaining unwanted access to users' data, the machine learning technology used just to protect data. The issue of trust will arise since data will pass through multiple phases for analysis by different parties.

The State of the Art for a Private Decision Tree by Chew et al. (2017) developed a privacy-prese using privacy-preserving machine learning. With the use of the decision tree method, they employed a variety of machine learning privacy protection techniques, including randomization and secure two-party computing. They came to the conclusion that integrating their new discovery with currently used tactics will significantly increase data privacy guarantees. This approach has the disadvantage that machine learning aims to build a model that accurately represents the patterns in training data and performs well when applied to fresh data.

MATERIAL AND METHOD

The System Model

The proposed model consists of two primary components: mobile cloud and smartphones, which interact through an access control submodule. This submodule contains the Data Security Level Determination (DSD) module, which assesses the security level at which an app can access user data. Based on this assessment, the data status determines the data class that the app can access. An app can only access data classes with the same status as its authorized security level, preventing unauthorized access to other data. Once the app's data access security level is verified, the cloud header request is linked to the app's security level, ensuring secure data access. The access control module transmits data to the cloud, which is then processed by the app request router module. This module interprets the data, forwards it to the data-level security module, and updates the process status list. The data security and status list modules collaborate to decode the application's access level and authorized data class. The cryptography module then receives this information, facilitating the encryption and decryption of data on the cloud. Data is encrypted before transmission to the storage service and decrypted upon receipt. The storage service manages data storage, access, and distribution on the cloud. The data router is responsible for routing user

Figure 1: The Service Recommendation Model

Cloud component

The storage used for user’s data is the cloud, cloud contains the user’s stored data using the cloud service, this cloud module is designed comprised of five (5) main parts which are as follows: Apps Request Router, Level of Data Security and Status List, Cryptography part, Service Storage, and Data Router.

(i) Request router / Data router

The module handles the request from the user app and the router of data from the storage service to the requesting apps.

(ii) Security level and status list

The level of the security risk of users’ data and their access status for the data belonging to the class and status level of security risk is determined, and the result is stored.

(iii) Encryption and decryption

Encryption and decryption of user’s data take place here before storing in the storage service.

(iv) Storage service

Keeping of data that is to be accessed and collected by an app is done in this module.

Figure 2 shows the flowchart for Apps requesting for access verification

Dataset Collection

The dataset generated by mobile app simulator to simulate usage data was used. Datasets consisting of data accessed by different apps were used for the machine learning algorithm and these data are set of structured data with labels of security risk levels.

Data Preprocessing

A set of structured data with labels indicating different levels of security risk was produced by the mobile app simulator and used as precise input parameters for the machine learning algorithm. These datasets included the security class, security level, probability, permission, and security risk for every app scenario for every server interaction.

Data Splitting

Using the Neural network, the dataset generated by the mobile app simulator is divided into two different parts: training part and testing part. 80% of collected data was used for training, and 20% was used for testing.

Classification of Data

The dataset was categorized using the Neural Network Algorithm since neural networks create a wide variety of classification models by sampling various portions of the original data set and then combining the results. To guarantee accuracy, supervised data was used in the training phase.

IMPLEMENTATION AND RESULT

The smartphones and simulator are designed using Microsoft C# programming language, python programming language, and Microsoft Visual Studio integrated development environment.

The Experiment

Four (4) real smartphones and one (1) smartphone for a simulative cloud were used to simulate the behaviours of the excessively acquired data by applications, creating a basic mobile-cloud environment. Next, the model's performance and feasibility are evaluated in order to assess our strategy. The two settings are the control environment and the mobile-cloud framework environment. For each device in the experiment, a few applications are chosen in each environment to determine the Security Bridge level in the two situations. Ultimately, the Mobile-Cloud prototype framework is simulated, as indicated by Table 1. Nodes in the control environment have unrestricted access to data stored on the server. In other words, all security risk parameters from the server are removed, making data easily accessible. We then evaluate the security risk associated with this behaviour. The security parameters are stored onto the server for the upcoming batch of experiments, and the limits imposed by the loaded security parameters are then enforced. This technique is quite simple, and it takes into consideration the experiment's connection and access limitations. This is done in an effort to simplify the experiment and help the researcher concentrate on the primary goal of the study, which is data security.

Table 1: The formal usages

DEVICE	PICTURE	AUDIO	MEDIA
DEVICE A	2.0MB	99MB	0MB
DEVICE B	1.7MB	2MB	32MB
DEVICE C	1.6MB	5.9MB	27MB
DEVICE D	0.9MB	3.1MB	30MB

Result from the Model

Our model transmitted private data into the designated cloud, including NINs, phone numbers, pictures, emails, media, audio, and other files, using the backup function of each smartphone utilized for the experiment; A scoring system based on a defined model was developed in chapter three to evaluate the security risk of applications because there currently exists no universally accepted benchmark for apps that gather excessive amounts of data on smart devices. In order to demonstrate their diverse data over-collection behaviours and establish different levels of security and class status for each, ID, phone number, photo, media, and password are entered into our evaluation mechanism, as shown in Tables 2 and 3.

Table 2: Result of Mobile apps evaluated based on Security Risk

SECURITY

LEVEL

DESTINATION

PICTURE

PHONE NUMBER

MEDIA

CLASS

LEVEL &

STATUS

(3,(1,2,3))

(2,(1,2,3))

(1,(1,2,3))

(2,(1,2,3))

Table 3: Result of Data Over-Collection Behaviours

SECURITY

LEVEL

DESTINATION

PICTURE

PHONE NUMBER

U & P

CLASS

LEVEL & STATUS

(3,(1,2,3))

(2,(1,2,3))

(1,(1,2,3))

(2,(1,2,3))

Performance Evaluation

Table 4: Average result of the model

SMART DEVICE	WITHOUT CONTROL MODEL	WITH CONTROL MODEL
SMART A	41.50	< 20.35
SMART B	38.05	<18.23
SMART C	38.91	< 24.87
SMART D	41.51	< 19.56

From Table 4, Average result of our model

To calculate the average of the result of our model we then have:

Average Result = 20.35+18.23+24.87+19.564=20.75

CONCLUSION

It is impossible to overstate the benefits and utility of using the recommendation model designed for privacy protection to prevent smartphone apps from accessing and gathering sensitive data. The implementation of machine learning algorithms to perform the necessary tasks is essential for effective privacy protection. Prevention alone is insufficient; more study should be done to detect and eliminate any potential bottlenecks that could restrict the smartphone's capacity to determine what kinds of data an app can gather by integrating models with machine learning algorithms and cloud computing. The combine approach would be able prevent apps on smart devices from collecting too much data.

REFERENCES

Alexander A, Varfolomeev I, Liwa H and Zahraa C, (2020). “Overview of Five Techniques Used for Security and Privacy Insurance in Smart Cities” International Journal of Physics: Conference Series, November 2020 DOI: 10.1088/1742- 6596/1897/1/012028.
Anand V.R.S and Janani E.S.V. (2017). Prevention of Data Over-Collection in Smart Devices. International Journal of Scientific & Engineering Research,8(5).
Andrews, V. (2019). Analyzing Awareness on Data Privacy. ACM SE ’19: Proceedings of the ACM Southeast Conference, 198-201.
Chen, X., et al. (2017). "Smartphone-based data collection: A survey." IEEE Transactions on Mobile Computing, 16(10), 2822-283
Chew, Y.J. & Wong, Kok-Seng & Ooi, Shih Yin. (2017). Privacy protection in machine learning: The state-of-the-art for a private decision tree.
Egele M, Kruegel C, Kirda E, and G. Vigna, (2011) “PiOS: Detecting privacy leaks in Ios applications,” in Proceeding 18th Annual Network Distribution System Security Symposium, 1–15.
Enck W, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth (2010) Taintdroid: An information-flow tracking system for real-time privacy monitoring on smartphones, in Proc. USENIX 9th Conference Operating System Design Implementation 1–6
Jena, M.D., Sunil, S.S., Bhabendu, K.M.& Somula, R. (2021).” Ensuring Data Privacy Using Machine Learning for Responsible Data Science”. DOI:10.1007/978-981-15-5679-1_49
Kang, J., et al. (2013). "Privacy concerns and behaviors of smartphone users." Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 637-640.
Noerrrezam Y., Sharifah S., Massila K., and Safiah S., (2014) “Validation of Security Requirement for Mobile Application: A Study”,https//www.researchgate.net/publication/268434989.
Xiao X, N. Tillmann, M. Fahndrich, J. De Halleux, and M. Moskal, (2012) “User-aware Privacy control via extended static-information-flow analysis” in Proceeding IEEE ACM 27th International Conference Automated Software Engineering, 80–89
Yibin L, Wenyun D, Zhong M, Qiu M. (2016).” Privacy Protection for Preventing Data Over-Collection in Smart City, IEEE Transactions on Computers, 65(5):1339-150, doi:10.1109/TC.2015.2470247

Reference

Alexander A, Varfolomeev I, Liwa H and Zahraa C, (2020). “Overview of Five Techniques Used for Security and Privacy Insurance in Smart Cities” International Journal of Physics: Conference Series, November 2020 DOI: 10.1088/1742- 6596/1897/1/012028.
Anand V.R.S and Janani E.S.V. (2017). Prevention of Data Over-Collection in Smart Devices. International Journal of Scientific & Engineering Research,8(5).
Andrews, V. (2019). Analyzing Awareness on Data Privacy. ACM SE ’19: Proceedings of the ACM Southeast Conference, 198-201.
Chen, X., et al. (2017). "Smartphone-based data collection: A survey." IEEE Transactions on Mobile Computing, 16(10), 2822-283
Chew, Y.J. & Wong, Kok-Seng & Ooi, Shih Yin. (2017). Privacy protection in machine learning: The state-of-the-art for a private decision tree.
Egele M, Kruegel C, Kirda E, and G. Vigna, (2011) “PiOS: Detecting privacy leaks in Ios applications,” in Proceeding 18th Annual Network Distribution System Security Symposium, 1–15.
Enck W, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth (2010) Taintdroid: An information-flow tracking system for real-time privacy monitoring on smartphones, in Proc. USENIX 9th Conference Operating System Design Implementation 1–6
Jena, M.D., Sunil, S.S., Bhabendu, K.M.& Somula, R. (2021).” Ensuring Data Privacy Using Machine Learning for Responsible Data Science”. DOI:10.1007/978-981-15-5679-1_49
Kang, J., et al. (2013). "Privacy concerns and behaviors of smartphone users." Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 637-640.
Noerrrezam Y., Sharifah S., Massila K., and Safiah S., (2014) “Validation of Security Requirement for Mobile Application: A Study”,https//www.researchgate.net/publication/268434989.
Xiao X, N. Tillmann, M. Fahndrich, J. De Halleux, and M. Moskal, (2012) “User-aware Privacy control via extended static-information-flow analysis” in Proceeding IEEE ACM 27th International Conference Automated Software Engineering, 80–89
Yibin L, Wenyun D, Zhong M, Qiu M. (2016).” Privacy Protection for Preventing Data Over-Collection in Smart City, IEEE Transactions on Computers, 65(5):1339-150, doi:10.1109/TC.2015.2470247

Abimbola Mujidat Oketayo

Corresponding author

Research Fellow, Computer Science Department, National Mathematical Centre, Abuja, Nigeria

Abimbola Mujidat Oketayo, A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices, Int. J. Sci. R. Tech., 2025, 2 (3), 262-267. https://doi.org/10.5281/zenodo.15051151

View Article

A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices

Abstract

Keywords

Introduction

Reference

Abimbola Mujidat Oketayo

More related articles

Impact of Opioid Toxicity on Workplace Productivit...

Formulation and Evaluation of Transdermal Patch...

Formulation and Evaluation of Stavudine Floating T...

View more

Warburgia Ugandensis and Croton Dichogamus: Possible Botanical Bullets Against C...

Sentimental Analysis on Veganism...

Right to Compensation Under Constitutional Scheme in India...

View more

Related Articles

A Review of Effective Cloud Computing Load Balancing Using Restful Web Services...

Medicinal Plants in A Parkinson Disease Management...

Exploring Drug for Cancer Management...

Comprehensive Study of Partial Replacement of Cement with Biochar in Concrete...

Impact of Opioid Toxicity on Workplace Productivity...

More related articles

Impact of Opioid Toxicity on Workplace Productivity...

Formulation and Evaluation of Transdermal Patch...

Formulation and Evaluation of Stavudine Floating Tablet...

View more

Impact of Opioid Toxicity on Workplace Productivity...

Formulation and Evaluation of Transdermal Patch...

Formulation and Evaluation of Stavudine Floating Tablet...

View more