View Article

  • A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices

  • Research Fellow, Computer Science Department, National Mathematical Centre, Abuja, Nigeria

Abstract

The growing number of smart devices has led to excessive data collection, compromising user privacy and device efficiency. To mitigate this issue, we present a machine learning model that identifies and prevents unnecessary data collection in real-time. Our approach analyzes device usage patterns, data flow, and system metrics to predict and detect over-collection. Experimental evaluations demonstrate that our model reduces data collection by 30% on average, without impacting device functionality. This study contributes to the development of privacy-preserving smart devices, promoting data protection, and resource optimization in the IoT ecosystem. Our solution offers a scalable and adaptive framework for mitigating data over-collection, enhancing user trust, and ensuring responsible device managementcollaboration are essential for transforming diabetes management and enhancing the quality of life for patients globally.

Keywords

Machine Learning, Smart device, Data over-collection, Privacy

Introduction

The issue of data over-collection in smart devices persists. Despite various attempts to address it, data over-collection remains a significant threat to users' sensitive information on smart devices, often occurring without explicit consent (Yibin et al., 2016). According to Yibin et al. (2016), data over-collection refers to the unauthorized collection and access of user data by smartphone applications beyond their intended purposes, even within permitted permissions. The widespread use of smart devices like smartphones for storing, processing, and saving important information has increased the risk of sensitive data exposure, including personal and financial information, passwords, and contacts (Kang et al., 2013). This highlights the need for effective measures to protect users' sensitive data on smart devices.

Data over-collection without the user’s knowledge has been an alarming challenge in the cyber security and protection of data (Yibin et at., 2016), smartphone users are now experiencing situation whereby applications installed on the device collect and access users' data while staying within the scope of their permissions and not for the users' intended purposes (Yibin, et. al, 2016). Since the invention of smart devices, there have been huge security threats to users’ data privacy of sensitive data. The threat, data over-collection needs to be combated by ensuring secure privacy and protection of users’ sensitive data; this necessitates the need to design a safe and secure Machine Learning Model to prevent over-collection of data by apps.

RELATED WORKS

Alexander et al. (2020) developed a model to protect user privacy and data security in smart cities. Through an in-depth analysis of prominent smart city applications, they identified potential data security risks. Their study emphasized the importance of prioritizing user data security and privacy in the emergence of future smart cities. The researchers concluded that robust measures are necessary to safeguard user data in these urban environments (Alexander et al., 2020). However, their study did not employ a specific approach to achieve this goal.

Egele et al. (2011) used PiOS for static analysis to trace the flow of sensitive data in iOS apps in order to find privacy leaks; PiOS runs three tests on an iOS application. To find the code pathways that connect sensitive sources to sinks, it first reconstructs the control flow graph of the program. In order to identify the paths that link nodes accessing sensitive data to nodes engaging with the network, it secondly does a traditional reachability analysis on the control flow graph. In order to verify that sensitive data is actually moving from the source to the sink, it lastly examines data flow along the channels.

Noerrrezam et al. (2014) developed a more secure model that depended on automated validation by designing a model that prevented mobile applications from being harmful. It is anticipated that users will delete or disable permissions of programs that gather an excessive amount of data. This technique uses more power when operating, which is particularly helpful for smartphones with constrained resources.

Xiao et al. (2012) created a user-ware privacy control mechanism to expose how programs handle sensitive data. Software developers will, however, have more opportunities to program and methods for gathering user data as technology develops. Anand and Janani (2017) developed a strategy to address the issue of data over-collection in smart devices by combining a mobile cloud framework with a key policy attributes-based encryption (KP & BE) model. Information is kept in the cloud in his model, and any data must be confirmed by the user before it can be collected. The user is aware of the information that will be gathered. Andrew (2019) developed a methodology to handle the issue of data privacy by generating questionnaires, disseminating them, and simulating the results. He concluded by examining the user experience. Unfortunately, manual delivery of the surveys is a significant disadvantage, and the results obtained were ineffective. Jena et al.'s (2021) used homomorphic encryption to determine how to increase consumers' trust in the use of privacy-preserving machine learning, which uses cryptography to conduct ML training and testing on encrypted data. They concentrated on ensuring data privacy using machine learning for responsible data science. Instead of preventing apps from obtaining unwanted access to users' data, the machine learning technology used just to protect data. The issue of trust will arise since data will pass through multiple phases for analysis by different parties.

The State of the Art for a Private Decision Tree by Chew et al. (2017) developed a privacy-prese using privacy-preserving machine learning. With the use of the decision tree method, they employed a variety of machine learning privacy protection techniques, including randomization and secure two-party computing. They came to the conclusion that integrating their new discovery with currently used tactics will significantly increase data privacy guarantees. This approach has the disadvantage that machine learning aims to build a model that accurately represents the patterns in training data and performs well when applied to fresh data.

MATERIAL AND METHOD

The System Model

The proposed model consists of two primary components: mobile cloud and smartphones, which interact through an access control submodule. This submodule contains the Data Security Level Determination (DSD) module, which assesses the security level at which an app can access user data. Based on this assessment, the data status determines the data class that the app can access. An app can only access data classes with the same status as its authorized security level, preventing unauthorized access to other data. Once the app's data access security level is verified, the cloud header request is linked to the app's security level, ensuring secure data access. The access control module transmits data to the cloud, which is then processed by the app request router module. This module interprets the data, forwards it to the data-level security module, and updates the process status list. The data security and status list modules collaborate to decode the application's access level and authorized data class. The cryptography module then receives this information, facilitating the encryption and decryption of data on the cloud. Data is encrypted before transmission to the storage service and decrypted upon receipt. The storage service manages data storage, access, and distribution on the cloud. The data router is responsible for routing user

Figure 1: The Service Recommendation Model

Cloud component

The storage used for user’s data is the cloud, cloud contains the user’s stored data using the cloud service, this cloud module is designed comprised of five (5) main parts which are as follows: Apps Request Router, Level of Data Security and Status List, Cryptography part, Service Storage, and Data Router.

(i)         Request router / Data router

The module handles the request from the user app and the router of data from the storage service to the requesting apps.

(ii)        Security level and status list

The level of the security risk of users’ data and their access status for the data belonging to the class and status level of security risk is determined, and the result is stored.

(iii)       Encryption and decryption

Encryption and decryption of user’s data take place here before storing in the storage service. 

(iv)       Storage service 

Keeping of data that is to be accessed and collected by an app is done in this module.

Figure 2 shows the flowchart for Apps requesting for access verification

Dataset Collection 

The dataset generated by mobile app simulator to simulate usage data was used. Datasets consisting of data accessed by different apps were used for the machine learning algorithm and these data are set of structured data with labels of security risk levels.

Data Preprocessing

 A set of structured data with labels indicating different levels of security risk was produced by the mobile app simulator and used as precise input parameters for the machine learning algorithm. These datasets included the security class, security level, probability, permission, and security risk for every app scenario for every server interaction.

Data Splitting

Using the Neural network, the dataset generated by the mobile app simulator is divided into two different parts: training part and testing part. 80% of collected data was used for training, and 20% was used for testing.

Classification of Data

The dataset was categorized using the Neural Network Algorithm since neural networks create a wide variety of classification models by sampling various portions of the original data set and then combining the results. To guarantee accuracy, supervised data was used in the training phase.

IMPLEMENTATION AND RESULT

The smartphones and simulator are designed using Microsoft C# programming language, python programming language, and Microsoft Visual Studio integrated development environment.

The Experiment 

Four (4) real smartphones and one (1) smartphone for a simulative cloud were used to simulate the behaviours of the excessively acquired data by applications, creating a basic mobile-cloud environment. Next, the model's performance and feasibility are evaluated in order to assess our strategy.  The two settings are the control environment and the mobile-cloud framework environment. For each device in the experiment, a few applications are chosen in each environment to determine the Security Bridge level in the two situations. Ultimately, the Mobile-Cloud prototype framework is simulated, as indicated by Table 1.  Nodes in the control environment have unrestricted access to data stored on the server. In other words, all security risk parameters from the server are removed, making data easily accessible. We then evaluate the security risk associated with this behaviour. The security parameters are stored onto the server for the upcoming batch of experiments, and the limits imposed by the loaded security parameters are then enforced. This technique is quite simple, and it takes into consideration the experiment's connection and access limitations. This is done in an effort to simplify the experiment and help the researcher concentrate on the primary goal of the study, which is data security.

Table 1: The formal usages

DEVICE

PICTURE

AUDIO

MEDIA

DEVICE A

2.0MB

99MB

0MB

DEVICE B

1.7MB

2MB

32MB

DEVICE C

1.6MB

5.9MB

27MB

DEVICE D

0.9MB

3.1MB

30MB

 Result from the Model

Our model transmitted private data into the designated cloud, including NINs, phone numbers, pictures, emails, media, audio, and other files, using the backup function of each smartphone utilized for the experiment; A scoring system based on a defined model was developed in chapter three to evaluate the security risk of applications because there currently exists no universally accepted benchmark for apps that gather excessive amounts of data on smart devices.  In order to demonstrate their diverse data over-collection behaviours and establish different levels of security and class status for each, ID, phone number, photo, media, and password are entered into our evaluation mechanism, as shown in Tables 2 and 3.

Table 2: Result of Mobile apps evaluated based on Security Risk

SECURITY

LEVEL

ID

DESTINATION

PICTURE

PHONE NUMBER

MEDIA

CLASS

LEVEL &

STATUS

(3,(1,2,3))

(3,(1,2,3))

(2,(1,2,3))

(1,(1,2,3))

(2,(1,2,3))

Table 3: Result of Data Over-Collection Behaviours

SECURITY

LEVEL

ID

DESTINATION

PICTURE

PHONE NUMBER

U & P

CLASS

LEVEL & STATUS

(3,(1,2,3))

(3,(1,2,3))

(2,(1,2,3))

(1,(1,2,3))

(2,(1,2,3))

Performance Evaluation

Table 4: Average result of the model

SMART DEVICE

WITHOUT CONTROL

MODEL

WITH CONTROL

MODEL

SMART A

41.50

< 20.35

SMART B

38.05

<18.23

SMART C

38.91

< 24.87

SMART D

41.51

< 19.56

 

From Table 4, Average result of our model

To calculate the average of the result of our model we then have:

 Average Result = 20.35+18.23+24.87+19.564=20.75

CONCLUSION

It is impossible to overstate the benefits and utility of using the recommendation model designed for privacy protection to prevent smartphone apps from accessing and gathering sensitive data. The implementation of machine learning algorithms to perform the necessary tasks is essential for effective privacy protection. Prevention alone is insufficient; more study should be done to detect and eliminate any potential bottlenecks that could restrict the smartphone's capacity to determine what kinds of data an app can gather by integrating models with machine learning algorithms and cloud computing. The combine approach would be able prevent apps on smart devices from collecting too much data.

REFERENCES

  1. Alexander A, Varfolomeev I, Liwa H and Zahraa C, (2020). “Overview of Five Techniques Used for Security and Privacy Insurance in Smart Cities” International Journal of Physics: Conference Series, November 2020 DOI: 10.1088/1742- 6596/1897/1/012028.
  2. Anand V.R.S and Janani E.S.V. (2017). Prevention of Data Over-Collection in Smart Devices. International Journal of Scientific & Engineering Research,8(5).
  3. Andrews, V. (2019). Analyzing Awareness on Data Privacy. ACM SE ’19: Proceedings of the ACM Southeast Conference, 198-201.
  4. Chen, X., et al. (2017). "Smartphone-based data collection: A survey." IEEE Transactions on Mobile Computing, 16(10), 2822-283
  5. Chew, Y.J. & Wong, Kok-Seng & Ooi, Shih Yin. (2017). Privacy protection in machine learning: The state-of-the-art for a private decision tree.
  6. Egele M, Kruegel C, Kirda E, and G. Vigna, (2011) “PiOS: Detecting privacy leaks in Ios applications,” in Proceeding 18th Annual Network Distribution System Security Symposium, 1–15.
  7. Enck W, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth (2010) Taintdroid: An information-flow tracking system for real-time privacy monitoring on smartphones, in Proc. USENIX  9th Conference Operating System Design Implementation 1–6
  8. Jena, M.D., Sunil, S.S., Bhabendu, K.M.& Somula, R. (2021).” Ensuring Data Privacy Using Machine Learning for Responsible Data Science”.  DOI:10.1007/978-981-15-5679-1_49
  9. Kang, J., et al. (2013). "Privacy concerns and behaviors of smartphone users." Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 637-640.
  10. Noerrrezam Y., Sharifah S., Massila K., and Safiah S., (2014) “Validation of Security  Requirement for Mobile Application: A Study”,https//www.researchgate.net/publication/268434989.
  11. Xiao X, N. Tillmann, M. Fahndrich, J. De Halleux, and M. Moskal, (2012) “User-aware Privacy control via extended static-information-flow analysis” in Proceeding IEEE ACM 27th International Conference Automated Software Engineering, 80–89
  12. Yibin L, Wenyun D, Zhong M, Qiu M. (2016).” Privacy Protection for Preventing Data Over-Collection in Smart City, IEEE Transactions on Computers, 65(5):1339-150,      doi:10.1109/TC.2015.2470247

Reference

  1. Alexander A, Varfolomeev I, Liwa H and Zahraa C, (2020). “Overview of Five Techniques Used for Security and Privacy Insurance in Smart Cities” International Journal of Physics: Conference Series, November 2020 DOI: 10.1088/1742- 6596/1897/1/012028.
  2. Anand V.R.S and Janani E.S.V. (2017). Prevention of Data Over-Collection in Smart Devices. International Journal of Scientific & Engineering Research,8(5).
  3. Andrews, V. (2019). Analyzing Awareness on Data Privacy. ACM SE ’19: Proceedings of the ACM Southeast Conference, 198-201.
  4. Chen, X., et al. (2017). "Smartphone-based data collection: A survey." IEEE Transactions on Mobile Computing, 16(10), 2822-283
  5. Chew, Y.J. & Wong, Kok-Seng & Ooi, Shih Yin. (2017). Privacy protection in machine learning: The state-of-the-art for a private decision tree.
  6. Egele M, Kruegel C, Kirda E, and G. Vigna, (2011) “PiOS: Detecting privacy leaks in Ios applications,” in Proceeding 18th Annual Network Distribution System Security Symposium, 1–15.
  7. Enck W, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth (2010) Taintdroid: An information-flow tracking system for real-time privacy monitoring on smartphones, in Proc. USENIX  9th Conference Operating System Design Implementation 1–6
  8. Jena, M.D., Sunil, S.S., Bhabendu, K.M.& Somula, R. (2021).” Ensuring Data Privacy Using Machine Learning for Responsible Data Science”.  DOI:10.1007/978-981-15-5679-1_49
  9. Kang, J., et al. (2013). "Privacy concerns and behaviors of smartphone users." Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 637-640.
  10. Noerrrezam Y., Sharifah S., Massila K., and Safiah S., (2014) “Validation of Security  Requirement for Mobile Application: A Study”,https//www.researchgate.net/publication/268434989.
  11. Xiao X, N. Tillmann, M. Fahndrich, J. De Halleux, and M. Moskal, (2012) “User-aware Privacy control via extended static-information-flow analysis” in Proceeding IEEE ACM 27th International Conference Automated Software Engineering, 80–89
  12. Yibin L, Wenyun D, Zhong M, Qiu M. (2016).” Privacy Protection for Preventing Data Over-Collection in Smart City, IEEE Transactions on Computers, 65(5):1339-150,      doi:10.1109/TC.2015.2470247

Photo
Abimbola Mujidat Oketayo
Corresponding author

Research Fellow, Computer Science Department, National Mathematical Centre, Abuja, Nigeria

Abimbola Mujidat Oketayo, A Machine Learning Approach to Prevention of Data Over-Collection in Smart Devices, Int. J. Sci. R. Tech., 2025, 2 (3), 262-267. https://doi.org/10.5281/zenodo.15051151

More related articles
Impact of Opioid Toxicity on Workplace Productivit...
Praveen Kumar Mishra, Swati Khare, Dr. Jitendra Banweer, ...
Formulation and Evaluation of Transdermal Patch...
Ashwini Karnakoti, Dr. Amol Borade, Prajwal Birajdar, Vishal Bodk...
Formulation and Evaluation of Stavudine Floating T...
Wakde Ashwini, Shubham Wakchaure, Dr. Megha Salve, ...
Warburgia Ugandensis and Croton Dichogamus: Possible Botanical Bullets Against C...
Athanas Alexander Katoo, Mathew Ngugi, Stephen Gitahi, ...
Sentimental Analysis on Veganism...
Sunali Bhattacherji , Omkar Singh, ...
Related Articles
A Review of Effective Cloud Computing Load Balancing Using Restful Web Services...
Devendra Namdeo, Dr. Jyotibala Gupta, Dr. Praveen Kumar Shrivastava, ...
Medicinal Plants in A Parkinson Disease Management...
Kunal Gaikwad, Amol Darwade, Sunny Gaikwad, Sayali Dighe, Prasad Solse, ...
Exploring Drug for Cancer Management...
Supriya Kore, Srushti Yadav, Sarang Nirmale, Ashish Bhole, Shraddha Konda, Sanjana Kamble, Aniket K...
Comprehensive Study of Partial Replacement of Cement with Biochar in Concrete...
Dr. Pranab Jyoti Barman, Manash Pratim Deka, Ankita Gogoi, Gyandeep Das, Ritushna Sarmah, Manjit Pat...
Impact of Opioid Toxicity on Workplace Productivity...
Praveen Kumar Mishra, Swati Khare, Dr. Jitendra Banweer, ...
More related articles
Impact of Opioid Toxicity on Workplace Productivity...
Praveen Kumar Mishra, Swati Khare, Dr. Jitendra Banweer, ...
Formulation and Evaluation of Transdermal Patch...
Ashwini Karnakoti, Dr. Amol Borade, Prajwal Birajdar, Vishal Bodke, Mangesh Dagale, Ruchita Badekar,...
Formulation and Evaluation of Stavudine Floating Tablet...
Wakde Ashwini, Shubham Wakchaure, Dr. Megha Salve, ...
Impact of Opioid Toxicity on Workplace Productivity...
Praveen Kumar Mishra, Swati Khare, Dr. Jitendra Banweer, ...
Formulation and Evaluation of Transdermal Patch...
Ashwini Karnakoti, Dr. Amol Borade, Prajwal Birajdar, Vishal Bodke, Mangesh Dagale, Ruchita Badekar,...
Formulation and Evaluation of Stavudine Floating Tablet...
Wakde Ashwini, Shubham Wakchaure, Dr. Megha Salve, ...