“Between the dawn of civilization and 2003, we only created five exabytes (1 exabyte=1,000,000,000 gigabytes) of data; now we’re creating that amount every two days. By 2020, that figure is predicted to sit at 53,000 exabytes” [13]. This statement by Eric Schmidt, Chief Executive Officer at Google, proves how ubiquitous large amounts of data have become in today’s technology-driven world. With the generation of such massive quantities of data, corporations have developed new ways to buy, sell, and use it to their advantage. However, along with these developments come public concerns about potential violations of consumer privacy. Although it is expected for a user to be concerned about their privacy online, big data mining has provided many more benefits than detriments. In general, consumers can greatly benefit from the effects of big data mining, whether it’s being used to provide users a seamless experience on their phone or safety in the event of an attack. Ultimately, the public should focus less on the privacy issues of data mining.
The term “data mining” first evolved in the early 1990s in the database community when retail and financial companies started using the technique to improve their businesses. It can be defined as “[a]n information extraction activity whose goal is to discover hidden facts contained in databases” [1]. Originally, corporations used data mining to predict stock prices and customer demand. Although it is still used for those purposes today, data mining now has exponentially more applications for a variety of industries. As society becomes more reliant on smartphone applications, corporations can easily collect data about what a consumer does online. This has led to a major increase in the amount of data available to use and analyze [3]. This data collection may seem like a threat, but it has given private corporations and the government the ability to get valuable insights that benefit people’s daily lives in ways that they might not even realize.
In countless cases, big data mining has provided consumers with valuable experiences. The data collected, and consequently analyzed, allows companies to get a more detailed insight into each specific consumer and their interests. This not only allows businesses to provide a more tailored experience for the user, but also gives them the ability to communicate with the customer more easily. A common scenario where companies use data mining to improve the customer experience is in home media entertainment. Sites like Netflix use big data mining to provide suggestions based on content a specific user has watched in the past. With this technique, content is hand-picked for each individual. As Netflix constantly adds and deletes content, users have an increasingly difficult time navigating it. Julie Evers, Director of Communications at Netflix, says, “There are over 33 million versions of Netflix,” all tailored specifically to different types of users [14]. This is where data mining can help release decision anxiety and provide user-specific suggestions, improving user experience whether users realize it or not.
This application of data mining works not only in entertainment, but also in career-focused sites like LinkedIn. At LinkedIn, analysts use data mining techniques to analyze user profiles and match them to jobs that they would be a good fit for. As a student who has been searching for job and internship opportunities since my sophomore year, this feature of LinkedIn has been a resource that I have constantly relied on throughout my college career.
Not only does data mining enhance customer experience, it also helps companies market and sell their products better. Data mining allows consumers to connect with new and exciting products through personalized media that they may not have discovered otherwise. In a process called customer segmentation, industries use customer data to split their consumer base into several different segments [12]. This allows better communication between the company and customer because they can target specific services or products to the relevant subsections of their consumer base. The most common example of market segmentation in action is in the way corporations sell grooming and personal hygiene products. When you walk into any drugstore, there are clear sections for women’s products and men’s products. Companies clearly use differences in packaging to signify what is meant for women and what is meant for men. For example, women’s toiletries often have feminine, flowery scents and girly, pink packaging while men’s products are grey and masculine. This tradition is based on research that certain scents and packaging will appeal to men over women. These differences may seem surface-level, but they prove to be effective, and display the simplest form of market segmentation. In a subconscious way, consumers feel like they are being understood when corporations take the effort to market their products in a way that appeals to them [2, 4]. This same concept can be applied to more specific segments of consumers and allows companies to specifically market products even better. By individualizing marketing, the results not only provide up-sell and cross-sell opportunities to companies, but also help customers through a custom and guided buying process.
In addition to the benefits data mining provides customers in terms of marketing and experience, big data can keep users safe. Although data mining is not typically known for this purpose, it can be used to identify suspicious individuals and groups and protect computer systems through malicious code detection. In malicious code detection, researchers mine data within large pieces of code to see if there is anything to suggest that there have been security intrusions. Similarly, anomaly detection is used to detect unusual patterns and behaviors within large datasets. This helps to establish a more secure network by identifying users with potentially harmful tendencies [11]. Perhaps the most beneficial application of data mining to public safety is the use of predictive analytics. Predictive analytics has a wide variety of potential applications. For instance, it can be used by public safety organizations to mine data from a large number of citizens, and, from a number of different factors, attempt to predict future violent crimes [8]. A predictive analysis approach can also be applied to natural disasters. Specific to California, data analysts have been mining historical data to find indicators of when an earthquake might hit. This allows citizens to prepare for a disaster rather than to simply deal with the ramifications once it has already occurred [8].
Another industry where the benefits of data mining may not be as obvious is healthcare, where professionals have been using data to streamline some processes and make suggestions based on patient data. As expected, there is a large amount of patient data available, as most people have been a patient at least once in their lifetime. Not only does analyzing this data help the healthcare companies, it also helps the user. Some benefits of using data mining in the context of healthcare include helping physicians identify more effective and personalized treatments, and helping patients find more affordable healthcare services [9]. For example, one healthcare provider used data mining to help identify characteristics that would make an individual at risk for diabetes. To do so, the researchers analyzed a database consisting of seven variables of particular interest including body mass index and exercise frequency. By classifying these people into different groups and conducting the relevant analysis, the researchers were able to effectively identify which individuals were more likely to contract diabetes. Doctors could then recommend patient-specific healthcare paths that could potentially prevent the disease [10].
Although there are flaws in the current data mining process in terms of consumer privacy, the industry is constantly making strides to improve. In fact, many corporations are establishing communication channels to inform the customer about their data mining practices and get their consent. For example, the startup Digi.me provides a method for users to locally host their data and share it with companies on their own terms. This service provides a best-of-both-worlds approach where corporations can receive the benefits of the data, and the user does not have to worry about its misuse [6]. Considering the polarization that results from the debate over the ethics of data mining, many companies already understand that having a transparent attitude about consumer data is the best way to earn customer trust and retain their business. Nans Sivaram, a client partner at IT consultancy and outsourcer Infosys, states that “consumers are willing to part with their personal information, provided there’s a good reason to.” As long as companies remain open and honest about the ways they are using consumer data and consumers can see the value in their personalized services, data mining will benefit both parties [5]. Similar to Digi.me, the government has developed their own methods to allow users to take control of their own data. The “Blue Button” initiative set in place by the Obama administration allows users to securely access their healthcare information in order to better their health care and finances. It also allows people to download and maintain their health records. There are many similar services — for energy information, federal loan information, and IRS tax services, for example. In addition to these services, the government is developing legislation that protects consumers’ First Amendment rights in the face of data mining. This way, consumers can reap the benefits of the practice without worry [7], and an agreement can be reached from which users, corporations, and the government can all benefit.
Recent breaches of online data privacy have sparked panic in consumers that their data could be at risk of falling into the wrong hands. As a result, many companies are making moves to re-identify PII (personally identifiable information) from simple PIs (personal information). This means encoding personal data so that even when information leaks, it remains anonymous. This is just one of the many methods in development to retain customer privacy [11]. With the understanding that many corporations and the government are pushing to protect user privacy, customers can reliably use personalized services without worrying about privacy. Additionally, users can also take preventative steps to make sure that they are practicing proper information security protocols. Data breaches might be inevitable, but the results of how much a data breach can affect us is within our own hands. In the recent Equifax data breach, hackers were able to get a hold of more than 450 million people’s financial data and publicize it. In this situation, although the company should have had more stringent security, the public also holds responsibility for the consequences. Those who took immediate action to freeze their credit and protect their identities ultimately ended up facing little to no consequence [15].
The public should let go of the minutiae of privacy concerns and accept that data mining is extremely beneficial. Most importantly, data mining techniques can keep us safe and healthy with applications in the security and healthcare industries. As our world gets increasingly connected via technology, it is important for consumers to know how corporations use their data and to not remain ignorant or scared of the concept. If citizens remain informed about recent developments in technology, they will be able to let go of their fears of the unknown and reap the benefits of data mining. Similarly, if they are uncomfortable with particular information being shared, they can take it into their own hands with applications like Digi.me. Data mining may not be perfect, and neither are the corporations that partake in it, but as engineers we must do whatever is in our power to help consumers understand the process, knowing that it provides them with numerous benefits.
By Chandini Ramesh, Viterbi School of Engineering, University of Southern California
Works Cited
[1] M. Berry and G. Linoff, Data mining techniques. Hoboken, N.J.: Wiley, 2011.
[2] “Using big data to improve customer experience and business performance – Nokia Bell Labs Journals & Magazine”, Ieeexplore.ieee.org, 2017. [Online]. Available: http://ieeexplore.ieee.org/document/6770344/.
[3] R. Li, R. Li, D. Tan and S. Joshi, “History of data mining | Hacker Bits”, Hackerbits.com, 2017. [Online]. Available: https://hackerbits.com/data/history-of-data-mining/.
[4] “Using Data Mining to Improve the Customer Experience in Your Call Center” 2016. [Online]. Available: http://blog.playvox.com/using-data-mining-to-improve-the-customer-experience-in-your-call-center.
[5] D. Kadlec and D. Kadlec, “Data Mining Helps Individuals Make Smarter Money Decisions.”, TIME.com, 2017. [Online]. Available: http://business.time.com/2012/03/06/privacy-heres-how-data-mining-might-actually-help-consumers/.
[6] “This Data Mining Startup Gives Consumers the Tools to Own their Digital Footprints”, Forbes.com, 2017. [Online]. Available: https://www.forbes.com/sites/julianmitchell/2017/01/25/this-data-mining-startup-gives-consumers-the-tools-to-own-their-digital-footprint/#6f1220f18db1.
[7] “Giving Americans Easier Access to Their Own Data”, whitehouse.gov, 2017. [Online]. Available: https://obamawhitehouse.archives.gov/blog/2014/11/12/giving-americans-easier-access-their-own-data.
[8] B. Thuraisingham. Data Mining for Counter-Terrorism. 2017. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.9521&rep=rep1&type=pdf.
[9] H. Koh and G. Tan. “Data Mining Applications in Healthcare” 2017. [Online]. Available: https://pdfs.semanticscholar.org/433a/57b382c528c78395e317d9fee008fb8ed9de.pdf.
[10]”How Data Mining Is Helping Healthcare – Data Mining, Analytics and Predictive Modeling: Training & Consulting”, Data Mining, Analytics and Predictive Modeling: Training & Consulting, 2017. [Online]. Available: https://the-modeling-agency.com/how-data-mining-is-helping-healthcare/.
[11] S. Insights, “How Big Data can change the world of Public Safety”, Sas.com, 2017. [Online]. Available: https://www.sas.com/en_ca/insights/articles/big-data/local/how-big-data-changes-public-safety.html.
[12] Achim Machauer, Sebastian Morgner. “Segmentation of bank customers by expected benefits and attitudes”, International Journal of Bank Marketing, Vol. 19Issue: 1, pp.6-18, https://doi.org/10.1108/02652320110366472.
[13] B. Carlson, “Quote of the Day: Google CEO Compares Data Across Millennia”, The Atlantic, 2017. [Online]. Available: https://www.theatlantic.com/technology/archive/2010/07/quote-of-the-day-google-ceo-compares-data-across-millennia/344989/.
[14]”How Netflix Uses Analytics To Select Movies, Create Content, & Make Multimillion Dollar Decisions”, 2017. [Online]. Available: https://blog.kissmetrics.com/how-netflix-uses-analytics/. [Accessed: 10- Oct- 2017].
[15] L. Cole, “The Equifax breach may have exposed 143 million people’s Social Security numbers — but here’s why you shouldn’t freak out”, Business Insider, 2017. [Online]. Available: http://www.businessinsider.com/equifax-hack-dont-freak-out-2017-9.