Privacy and Paternalism: The Ethics of Student Data Collection
In February 2021, MIT’s Schwarzman College of Computing launched a specially commissioned series that aims to address the opportunities and challenges of the computing age. The MIT Case Studies in Social and Ethical Responsibilities of Computing (SERC) series features peer-reviewed, freely available cases by topic-area experts that introduce readers to a range of questions about computing, data, and society. The following article is excerpted from Kathleen Creel and Tara Dixit case study, featured in the Summer 2022 issue of SERC.
—The Editors
For many high school students in the United States, classwork and homework begin the same way: by opening a laptop and logging into a learning platform. Students spend hours of their days composing essays, practicing math problems, emailing teachers, and taking tests online. At schools with policies governing in-class phone use, students message friends on school-provided devices that come to feel like their own. But many do not realize that schools and third-party companies can collect their personal communications as data and, in some cases, as evidence.
Schools want to protect students from cyberbullying peers and from harming themselves or others. To address these issues, administrators install monitoring software. A single word in an email, instant message, or search bar indicating bullying (“gay”) or self-harm (“suicide”) can trigger an alert to school administrators and law enforcement. In September 2021, monitoring software in a Minneapolis school district sent over 1,300 alerts to school administrators stating that students were viewing “questionable content.” Each alert extracted flagged text from students’ messages, emails, or school documents and sent it to school administrators for review. In other districts, monitoring software escalated students’ mentions of suicide to police.
A national survey of parents of school-aged children found that a majority of parents did not want their child’s educational data to be shared with law enforcement. Despite widespread, and increasing, parental concern over how and with whom student data is shared, with near ubiquitous use of learning platforms in the classroom, students and parents are left without practical ways to opt out of engagement with companies whose data collection policies and practices put students’ privacy at risk.
In addition to protecting students, education researchers and makers of education software also want to use the rich trove of data students generate to improve students’ education. Artificial intelligence (AI)-based educational technology aims to “meet students where they are” by using the student’s data to track progress and create personalized learning plans. Makers of educational software hope to understand how students are interacting with software in order to improve future offerings, while education researchers hope to use student data to understand learning and educational disparities.
However, data collection comes with costs. In 2016, the Electronic Frontier Foundation studied 118 education technology software services and apps commonly used by schools. Seventy-eight retained data after students had graduated; only 48 encrypted personal information about students. Some school districts require educational software providers to disclose their data collection practices to parents and allow them to opt their child out, yet only 55 percent of parents surveyed had received a disclosure and 32 percent said that they were unable to opt out. This lack of data encryption, transparency, and choice is concerning. Despite policies that aim to provide both transparency and access, most students and parents are unaware of what data is being stored and who has access to it.
Adults are routinely subjected to similar invasions. Yet the special moral status of children makes the tension between their protection and education and the protection of their privacy especially fraught. On one model of childhood privacy, the paternalism view, students are children to be protected and educated by responsible adults who make decisions on their behalf. Parents and guardians may succeed in maintaining their child’s privacy from others by using laws like the Family Educational Privacy Rights Act (FERPA) to shield their child’s student records. However, the child has no privacy from her guardians and educators: Parents and school administrators may read her social-emotional–learning diaries and text messages at any time. The child is a “moral patient”: a being deserving of protection, but lacking the agency to make their own choices without oversight. Protecting her from a chance of harm from herself or others is worthwhile, no matter how small the chance of harm, because there is no countervailing duty to protect her privacy.
On another view, the essential role of the child is not to be protected but to “become herself.” Through exploration and play, the child develops her own agency, autonomy, and ability to freely choose. To the extent that a child is aware of being monitored in her experimentation and aware that her decisions and explorations may be interpreted out of context and punished accordingly, she cannot explore freely. The child also needs privacy in order to develop herself in relation to others. Respecting the privacy of communications between children allows them to develop genuine friendships, as friendship is a relationship between two people that cannot survive constant surveillance by a third party.
Respecting digital privacy also builds relationships of trust between children and adults, as philosopher Kay Mathiesen has argued. Trusting a child gives her an “incentive to do the very thing” she is trusted to do, building her capacity to behave in a trustworthy manner. By contrast, the act of surveillance can decrease both the child’s trust in the adult and the adult’s trust in the child, even if no malfeasance is discovered. Researchers have expressed concern that digital surveillance may also decrease a student’s civic participation and their willingness to seek help with mental health issues. As such, on this view, parents and school officials are obliged to respect the privacy rights of children unless there is a significant reason to violate them; perpetual background surveillance without cause is inappropriate.
On some issues, both perspectives align. Neither students nor their parents would choose to trigger false alarms that send law enforcement knocking or knowingly allow learning platforms to resell identifiable student data to commercial vendors. But the broader dilemma remains. Knowing when to treat children “as children” and when to treat them as responsible agents is, as philosopher Tamar Schapiro has argued, the essential predicament of childhood. We cannot wait for adult data privacy to be settled before tackling it.
Privacy and Contextual Integrity
Before the use of computers in education, students communicated primarily in person, which prevented most third parties from knowing the contents of their conversations, and did their schoolwork on paper or the ephemeral chalkboard. What data they did produce was confined to paper records locked in filing cabinets. Such data was difficult to aggregate: If shared at all, it was mimeographed and mailed or read aloud over the telephone. A third party aspiring to collect data on students would be stymied by the friction of acquiring the data and the expense of storing it.
Today, students generate electronic data every time they send email or messages, take quizzes and tests on learning platforms, and record themselves for class projects. The volume, velocity, and variety of the data they generate has shifted dramatically. As cloud computing prices fall, analytics companies become incentivized to collect and store as much data as possible for their future use. Educational software and websites are no exception. Companies analyze and mine data about students’ day-to-day activities to find useful patterns for their own purposes, such as product improvement or targeted advertising.
Without additional legal protection, the trend toward increasing student data collection, storage, and reuse is likely to continue. Given the contested moral status of children, how should their privacy and persons be protected? What policies or laws should we adopt? In order to choose one policy over another, we need first a reason for the choice — a justification of why a certain norm of privacy is correct based on a broader justificatory framework. We will seek it in Helen Nissenbaum’s classic analysis of privacy as “contextual integrity.”
Contextual integrity suggests that every social context, from the realm of politics to the dentist’s office, is governed by “norms of information flow.” Each social context is governed by different expectations regarding what people in different roles should do or say. For example, it would be appropriate for a dentist to ask a patient his age but unusual for a patient to reverse the question. In addition to norms of appropriateness, each social context has norms governing the proper flow or distribution of information. Thus, privacy is defined as the “appropriate flow of information” in a context, not as secrecy or lack of information flow.
According to Nissenbaum, there are five parameters that define the privacy norms in a context: Subject, Sender, Recipient, Information Type, and Transmission Principle. Any digression from norms typical for a context constitutes a violation of contextual integrity. However, the principle is not fully conservative: New practices can be evaluated in terms of their effects on “justice, fairness, equality, social hierarchy, democracy,” and autonomy as well as their contribution to achieving the goals relevant for the context. On these grounds, the new privacy norm may be chosen above the old.
The contextual integrity model can be used to evaluate reliance on educational software in schools. Any learning platform can be analyzed in terms of the appropriateness of its privacy policy to the privacy norms of the classroom. Consider Gaggle’s privacy policy. Gaggle, an online platform designed for use in the classroom that seeks to replace communication tools such as blogging software and email clients with similar software equipped with content filters, states that, “Gaggle will not distribute to third parties any staff data or student data without the consent of either a parent/guardian or a qualified educational institution except in cases of Possible Student Situations (PSS), which may be reported to law enforcement.”
Imagine that a student sent a message to another student at 8 p.m. on a Saturday and Gaggle flagged it as a potential indicator that the student is depressed. Analyzing this scenario according to the contextual integrity norm, the five parameters would be:
- Subject: Student 1 (the author of the message) and Student 1’s mental health concerns
- Sender: Student 1 (the author of the message)
- Recipient: Student 2 and Gaggle. If Gaggle alerts Student 1’s parents, school administration, or law enforcement of student activity, they also become recipients.
- Information Type: Student data (in this case, a message and its associated metadata, such as sender, recipient, timestamp, and location)
- Transmission Principle: The recipient will not share the student data with third parties without the consent of the parent/guardian or educational institution, except in cases of Possible Student Situations (PSS).
The desire to protect students and intervene to help them when they struggle with depression or anxiety is laudable and may initially appear to justify Gaggle’s new norms for classroom communication. However, the context of childhood friendship before the introduction of digital messaging was one in which it was possible for a student to discuss their feelings of sadness with another student on a weekend, outside of the classroom, without being overheard and without school intervention. Whether in person or on the telephone, the context of the interaction between Sender and Recipient was one of friendship mediated by the transmission principle of the telephone, which permitted information flow without disclosure to a third party. Pre-Gaggle messaging also presumes a disclosure-free channel for communication between friends.
Given these changes, the introduction of Gaggle meaningfully alters both the transmission principle and the set of recipients, thereby violating the preexisting privacy norms. In the contextual integrity framework, in order to argue that the new privacy norm is beneficial, proponents could appeal to its positive contribution to “justice, fairness, equality, social hierarchy, democracy,” or autonomy. If the new privacy norms do not contribute to these goods, they must contribute instead to the goals of the context. In order to evaluate whether the privacy norms reshaped by Gaggle are beneficial, we must determine what goals, and whose goals, should be considered relevant to the context.
Student Privacy Laws
The foregoing analysis assumes that the relevant context of communications between children who are friends are those of friendship: namely, that the norms of privacy that apply to adult friendships and adult message communications should also apply to childhood friendships and childhood message communications. However, if the first, guardian-centered viewpoint presented above is correct, it may be that the relevant context of analysis is primarily that the sender and recipient are both children, not that they are friends. Legally, that is the case: In the United States, a child has no right to privacy from their parents. Parents may monitor the online communications of children as they please.
Privacy from school officials or other adults is dependent on school policy and on the wishes of the parents or guardian until the child reaches the age of 18. There are three primary federal laws in the United States that aim to protect student privacy and security: the Family Educational Rights and Privacy Act (FERPA), the Children’s Online Privacy Protection Act (COPPA), and the Protection of Pupil Rights Amendment (PPRA). Although each attempted to meet the information security needs of its day, the three collectively fall short in protecting student privacy from contemporary data collection.
FERPA provides the strongest federal privacy protection for student data, but has not been updated in the past decade. FERPA gives parents three rights: the right to access their child’s education records, the right to a hearing in which they may challenge inaccuracies in those records, and the right to prevent personally identifiable information (PII) from their child’s record from being disclosed to third parties without their written consent.
FERPA typically allows school officials to share student data only if both direct identifiers, such as student names, and indirect identifiers, such as a student’s date or place of birth, are removed. However, school officials may access PII data provided they have a “legitimate educational interest” in doing so. This privilege is extended at the school’s discretion to educational technology providers who take over roles previously performed by school officials, are under “direct control” of the district, and agree to be bound by the same provisions against reuse and reselling as a school official would be. Since most educational technology providers fall under these provisions, they are permitted to collect and store personally identifiable information about students. Other laws similarly allow third-party software providers to collect data without requiring transparency as to the uses of that data.
The patchwork of federal student privacy laws, supplemented by state legislation such as California’s Student Online Personal Information Protection Act (SOPIPA), has changed little in the last decade. But as the social practices to which the laws are applied changes — as student data becomes more voluminous, its transmission becomes easier, and as educational software providers take on activities once performed by school officials — the informational norms that privacy laws sought to protect are violated. The contextual integrity framework helps us understand why this is. Even if the prevailing social context (a high school), the subject of the data (students), sender (school officials), many of the recipients (school officials, school districts), and the laws governing transmission remain the same, the addition of recipients such as the providers of educational software and the changes in principle of transmission (paper to email, or email to centralized learning platform) generates a violation of contextual integrity and therefore of privacy.
In order to illustrate how a change in educational technology can violate contextual integrity without violating FERPA, consider the case of InBloom. InBloom was an ambitious nonprofit initiative launched in 2011 to improve the U.S. education system by creating a centralized, standardized, and open source student data-sharing platform for learning materials. Educators and software professionals from several states started building the platform. Although the platform complied with FERPA, it meaningfully changed the transmission principle under which student data was transmitted. Before, student data had been stored only locally, at the school level, and in the fragmented databases of educational technology providers. Now it would be pooled at the state level and national levels, granting both school officials and their authorized educational software providers access to a much larger and more integrated database of student data, including personally identifiable information and school records. This is a violation of the contextual integrity of student data, and the backlash InBloom faced from parents, activists, and local school officials was on the grounds of the changes it would prompt in the transmission and storage of data. The InBloom incident highlights the need for updated student privacy legislation, and perhaps for legislation that incorporates principles of contextual integrity. While InBloom shut down in 2014, many of the parent and activist criticisms of its data pooling would apply equally to the for-profit educational technology companies that continue to collect and store data in its absence.
Privacy and Justice
Another factor relevant for the evaluation of contextual integrity is the social identities of the actors involved and how they interact with the roles they inhabit. Student privacy concerns can be compounded when combined with existing biases and discrimination, including those based in class, sexuality, race, and documentation status. According to a study by the Center for Democracy & Technology (CDT), low-income students are disproportionately subjected to digital surveillance. This is because many schools distribute laptops and other devices to low-income students, a side effect of which is increased access to student online activity. In some cases, school officials can view the applications a student opens and their browsing history in real-time. Privacy concerns have been exacerbated with virtual learning, as schools expanded laptop distribution significantly during the COVID-19 pandemic.
The privacy of LGBTQ students is also threatened by school surveillance software. In a Minneapolis school district, there were several incident reports of Gaggle flagging LGBTQ-specific words like “gay” and “lesbian” because they may signify bullying, which led to at least one LGBTQ student being unintentionally outed to their parents. Fear of outing may cause LGBTQ students to curtail their online discussions, which could be especially impactful since queer youth often seek meaningful connection to other queer youth online in order to understand their identities.
Learning platforms may exacerbate existing systemic racism and bias in school-based disciplinary measures as Black and Hispanic student suspensions, expulsions, or arrests have been greater than for White students for similar offenses. Existing teacher biases, often associated with school-based disciplinary actions, may be embedded into AI-based education software, resulting in adverse impacts on marginalized students. Researchers at the University of Michigan studied the sociotechnical consequences of using ClassDojo, a data-driven behavior management software for K-8 students. ClassDojo allows teachers to add or subtract “Dojo points” from students for behaviors such as “helping others” or “being disrespectful.” The researchers found that use of ClassDojo had the potential to reinforce teacher biases, as when teacher stereotypes about which students were likely to be more “disrespectful” or “disruptive” were seemingly substantiated by ClassDojo behavior records gathered by the teachers themselves, and also found that ClassDojo had adverse psychological effects on students.
Violations of student privacy also disproportionately impact undocumented students. FERPA prohibits school officials from sharing student information, including immigration status, directly with government agencies such as Immigrations and Customs Enforcement (ICE). However, ICE can access this data in other ways. The “third-party doctrine” holds that if individuals “voluntarily give their information to third parties” such as banks, phone companies, or software vendors, they no longer retain a “reasonable expectation of privacy.” When the makers of educational websites or purveyors of student surveys sell student data to brokers, they make it possible for ICE to buy data about undocumented students. This can have effects on the enrollment of undocumented students. As students realize that their documentation status is no longer private, they may withdraw from school out of concern that their family members may be targeted for deportation.
Addressing Privacy in Context
In addition to the planned and routine violations of contextual integrity that may occur when educational software providers resell supposedly anonymized data or school officials aggregate student data across contexts, there are the accidents. Large databases, or “data lakes” as they are evocatively called, are prone to rupture and spill. The U.S. Government Accountability Office analyzed 99 data breaches across 287 school districts, from July 2016 to May 2020. According to their report, thousands of students had their academic records and PII compromised. Bigger and more affluent school districts that used more technology and collected more student data were impacted most. The report states that compromised PII like social security numbers, names, addresses, and birth dates can be sold on the black market, causing financial harm to students who have otherwise clean credit histories. Compromised records containing special educational status or the medical records of students with disabilities who are on an Individualized Education Program (IEP) can lead to social and emotional harm.
The government’s awareness of the frequency of data leaks and their negative consequences for student privacy, financial well-being, and medical confidentiality establishes a context for legislative solutions. Many data leaks flow from third-party educational software providers who are not following information security best practices. For example, nearly 820,000 students’ personal data from the New York City public school system was compromised in early 2022. Before the leak, the school district was unaware that the educational software provider responsible had failed to take basic measures such as encrypting all student data.
In addition to encryption, other best practices include anonymizing data when possible, including using techniques like differential privacy. Although anonymizing data by removing direct identifiers of students provides some measures of privacy, Latanya Sweeney’s work has shown that when linked to other data sources, it can be possible to reverse-engineer the anonymized data, which is known as linkage attacks. Most individual Americans can be identified by their “anonymized” Facebook accounts or their health care information.
To protect against linkage attacks and further ensure privacy, differential privacy is a technique that introduces statistical “noise” in sensitive data by slightly modifying select data fields in order to ensure that individuals are not identifiable. Differentially private individual data may be inaccurate; however, the aggregate results remain fairly accurate. Even if the data set is accessed, individuals’ privacy will be less likely to be compromised.
Differential privacy is used by researchers to secure their data, within companies that hold PII, and by the U.S. Census Bureau. The statistical algorithms to implement differential privacy are most effective on large data sets as the utility of small data sets decreases due to noise. Thus, school systems with large and sensitive data sets may increase privacy with this technology. Differential privacy has not been widely adopted within educational technology. In addition to the complexity of implementing differential privacy compared to data anonymization, many educational technology systems need the ability to identify specific students and track their progress. Implementing differential privacy internally, as opposed to when releasing data to researchers, could impede these pedagogical functions.
Data stored long enough is likely to leak. Introducing requirements that stored student data be encrypted and anonymized would protect data subjects from reidentification when it does. But not all student data can be anonymized and still remain useful. For data that cannot, one proposed solution is the establishment of “information fiduciaries.” A fiduciary is a representative who is required to make decisions in the best interests of their clients. Stipulating that schools and the educational software providers with whom they contract must act as fiduciaries when they handle student data would confer additional legal obligations to act in the best interests of the students and their privacy.
The Value of Data Collection
Requiring that an information fiduciary act in the best interests of the students, however, returns us to our original questions: what are the best interests of students, how should their interests be weighted, and who should have the authority to decide? The paternalism perspective advocated for protecting students, even at the expense of their privacy, while they develop the capacity to make autonomous decisions. The second perspective suggested that prioritizing students’ privacy is in their best interests so that they may develop autonomy and maintain trusting relationships with the adults in their lives. Many adult caretakers take up a perspective between these two, changing their comparative valuation of children’s protection and autonomy as children age. However, people acting from either perspective would take themselves to be acting in the best interest of the child. Establishing an information fiduciary does not solve the substantive moral question of which of the child’s interests the fiduciary should prioritize.
One factor difficult to weigh is the educational value of big data collection for K-12 students. Since K-12 students are evaluated so frequently and thoroughly, completing daily or weekly homework assignments and in-class assessments, sophisticated analytics typically are not needed to determine when they are struggling academically or what skills they need to focus on, as they are used in the university setting. Some studies show benefit from the use of educational data to personalize learning, although comprehensive reviews show that the hype surrounding personalized education often far outstrips its current benefits. The rhetoric of personalized learning was that “adaptive tutors” created with the help of big data analytics “would be more efficient at teaching students mastery of key concepts and that this efficiency would enable teachers to carry out rich project-based instruction,” writes educational researcher Justin Reich in his book “Failure to Disrupt: Why Technology Alone Can’t Transform Education.” Instead, the primary use of adaptive online learning, he explains, is in providing students with ancillary practice in math and early reading. The primary value of educational data collection has been for researchers: both academic researchers interested in education and educational disparities and educational technology researchers interested in improving the products they sell.
Conclusion
Educational software will continue to be adopted by school systems for its perceived value in increasing access to high-quality and personalized education. As a result, student privacy issues will escalate. Current federal privacy laws such as FERPA require updating in order to meet these challenges, as they do not hold school districts or educational software providers to the government’s own standards of student privacy and information security. Yet ensuring that school officials and educational software providers respect the contextual integrity of information transmission for student data, or adopt policies that represent a student-centric rather than guardian-centric perspective on children’s rights and privacy, might require more than a simple update.
Kathleen Creel is an assistant professor of philosophy and computer science at Northeastern University. Tara Dixit is a Chantilly High School senior. A full version of this article, as well as a bibliography, can be accessed here.