7 Unit :7 :Ethics, Privacy and Consent
The application of learning analytics has huge potential to benefit both learners and institutions. Applied responsibly, targeted interventions based on observed behaviours or on predictive analytics, for example, can provide relevant and personalised prompts and improve learner outcomes. Core to the collection, analysis and use of data associated with learning analytics is a range of institutional moral and ethical duties, including learner privacy and consent issues.
In this unit, we explore some of the relevant issues, highlighting potential areas for focus and directing you toward relevant documents within your own institutions and national contexts.
After you have worked through this unit, you should be able to:
- explain the broad range of ethical issues associated with learning analytics
- critically assess the terminologies used, types of data collected, how that data might be used and who the data could be shared with, as indicated in the terms and conditions of different institutional websites
- examine legislation and/or policy frameworks to define and identify a range of elements such as “legitimate use,” “personal data,” “sensitive data,” “data sharing,” etc.
- explore the construct of the data privacy calculus — trust, control and benefits — to understand your own data-sharing practices
- draft a statement on learner consent for your local institutional or course context
Ethical implications around the collection, analysis, and uses of learner data should take account of the potentially conflicting interests of a range of stakeholders, such as learners and their institutions. Views on the benefits, risks and potential for harm from the collection, analysis and use of learner data will depend on the interests and perceptions of the particular stakeholder. In this section, we briefly introduce a number of practical considerations. First, have a look at the material in the box, which we will revisit later in the unit.
Look at the following statements and then decide which meet the following question:
i. Consent to use learner data must be obtained at the point of registration.
ii. Once an institution can predict a likely learner outcome, it must act upon that insight.
iii. Learning analytics can only be used by trained analysts or statisticians.
iv. Once a learner registers, all of their data (e.g., provided at registration or later click data) becomes the property of the institution.
Which of these do you think is true?
- All of them
- (a) and (d)
- (a), (b) and (d)
- Only (a)
- Only (d)
- None of them
There are many ways to consider and categorise ethical issues. The following headings explore some of the issues you may wish to consider. (There are others!)
What data are collected, for what purpose, and who has access to it?
This concerns categories of data and intended use. That is, data should only be collected and analysed if they are relevant and required for the proposed learning analytics activity. Speculative data collection (harvested in case it should become of interest in the future) should be avoided. Legitimate use or purpose is typically defined as being specified, explicit and sanctioned. Such uses or purposes should ideally be set out in advance in such a way that all stakeholders understand them. Data collected for legitimate purposes should only be used for those purposes and not be further processed for other purposes; in an educational context, admissible processing would include what is needed for archiving or for research. In the absence of a clear policy, it is relatively easy for an institution to overlook the need to restrict data collection, analysis and use to a contained set of purposes. One example might be a higher education institution indicating (formally or informally) that learner (demographic and behavioural) data are collected and analysed specifically for learning design purposes, but then going on to share subsets of that data with third-party marketing companies. Most commonly though, there is the issue of finding useful things in a data set that push the use into an area not previously covered or agreed.
Some national and federal/state legislation prohibits or constrains the collection of personal or sensitive data, and you should ensure that you are aware of what data types fall into this category in your own context. For example, in Canada, “sensitive data” are not specifically defined under national data protection legislation or provincial data protection statutes, although data can be classified as sensitive, depending on the context. In the European Union, however, sensitive data are defined as: data consisting of racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, data concerning health or data concerning a person’s sex life or sexual orientation.
Reflective action: become aware of relevant legislation
Depending on where your institution is based (and whether it has multiple international sites or third-party suppliers that transfer learner data between them), research the relevant national and federal/state legislation to identify sensitive and/or personal data categories that may not be collected and/or analysed. Assess whether there is any definition of legitimate use formally set out, as well as limits on how data may be shared between the institution and others. In addition, research whether there are any institutional policies in place that limit the collection and sharing of any data categories. Make a note of these.
Similarly, you should consider who has access to data and why they would need it. You might consider that “raw” data — e.g., taken directly from a learner’s record or from their online log files — should be restricted to those with a need to know. For example, support or administrative staff may need to understand a learner’s personal situation in order to provide relevant personalised advice and support. It is less likely that a course designer would need similar access to that raw data, although they might want to see processed, aggregated data for a learner cohort or class to better understand how to redesign an assessment strategy.
Reflective action: who and why
If you’ve enrolled in this course, there’s a good probability that you like data — understanding the patterns you might find, gaining insight from analyses, etc. However, although it is tempting to collect data and then see what you can extract from it, you should aim to reverse that thinking. Consider what you (or your institution) is seeking to achieve. As you did in Unit 3, record the purpose(s) of learning analytics in your context.
Note down all of the stakeholder groups that might have some involvement in learning analytics. Now think about the data categories available to you (excluding those considered off limits, as discussed above), and note down which categories of data are relevant and likely available to each stakeholder group. For example, you will almost certainly consider instructors or teachers to be a relevant group; what data do they have access to? What data might be considered out of bounds (irrelevant or sensitive)? We will revisit this activity in somewhat more detail in the next unit.
Transparency and consent
Transparency focuses on the extent to which an institution is open and clear about the purposes of learning analytics. This should include making those purposes accessible; there is little point in having a policy (see Unit 9) if learners and other stakeholders are unaware of it. Although setting out the purposes of learning analytics may feel obvious, it is fair to say that its uses are often made as much for the benefit of the institution (maximising throughput) as the individual learner (providing the “best” outcome, whether that relates to improving course scores or allowing free study choices).
Transparency also includes making clear what data are collected (and what are not), and any assumptions made about those data (where they may be incomplete or acting as a proxy for another measure, for instance). Raising awareness may itself introduce challenges, since many are often unaware of the extent to which data are now routinely collected. Of course, it is not always possible to be completely transparent — models built around regression approaches, for example, can be difficult to understand and interrogate. It is not always clear why one learner is identified as potentially more vulnerable than another. Nor is it always in the best interests of learners to communicate a predicted poor outcome (unless this is appropriately balanced with the provision of additional support or alternative study paths).
Transparency is necessary for meaningful consent. Often, consent to collect learner data for learning analytics (as opposed to collection for normal administrative uses, say) is sought at the point of registration, if at all. At this point, many learners will be largely unaware of learning analytics and how it may be used to provide ongoing insight. Consent at this point is certainly most convenient for the institution, but arguably less meaningful for the learner (unless there is an opportunity to later withdraw it). Various frameworks and policies propose different approaches here, often influenced by existing legislation in the national context. One broad approach might be to differentiate between initial broad consent for the collection of data and specific consent when data are used to intervene in the choices learners have, when adapting their learning experience, or when access to resources is suggested. Although there are practical difficulties in doing so, an expectation that users should consent to uses of personal data unknown at the point of registration is perhaps an unreasonable and unethical one.
Reflective action: review your terms and conditions
Find the terms and conditions statement for your own institution (or if you can’t easily find it, take a look at the terms of use for US MOOC provider edX, [28] and its embedded privacy policy[29]). How accessible do you find the terminology? Does it describe or define what type of data will be collected, how that data might be used and who the data could be shared with? Are there alternatives if a user objects?
[28] https://www.edx.org/edx-terms-service
[29] https://www.edx.org/edx-privacy-policy
Data storage
Data storage and security are usually determined by existing data protection policies (check yours to see whether you think it is clear and detailed enough for most learners to understand). Additional issues for consideration include data sharing between the institution and other third parties — usually those offering a service, such as marketing. Some institutions effectively outsource their learning analytics activities, either wholly or in part, by using proprietary software that may capture and store learner data. It is sensible to ensure that your institutional approach leaves learner (and other) data secure and not open to unintended consequences.
Data ownership
In a learning analytics context, the institutional presumption is often that data collected are owned by the institution. Some, though, see data as an extension of the learner’s identity; for example, “learner data is not something separate from learners’ identities, their histories, their beings [it] is therefore not something a learner owns but rather is” (Prinsloo, 2017). And although there is often clear existing legal protection relating to personal data, the lack of clarity around who owns the data muddies principles of meaningful consent. Another perspective then might be to assume that the institution has temporary stewardship of its learner data. Subject to legislation and policy, the institution may store data sets under certain conditions and for specified periods, but within a higher education context, the issue of ownership could remain open. In this situation, the institution would be able to collect and analyse learner data, and apply insights from learning analytics, but would be constrained from further exploitation of that data.
Interpretation issues
If analytics is predictive (i.e., not built only on known events, but involving statistical calculations), data sets should be complete and sufficient to ensure any calculations are robust. Further, the models used to analyse, interpret and communicate learning analytics to stakeholders (learners, teachers, support staff, advisers, etc.) should be free from algorithmic bias, transparent where possible, and clearly understood by the end users. This is not to say that everyone needs a postgraduate qualification in statistics, but we should be aware of the limitations of the outputs and understand that prediction is not equivalent to fact. Analytics outputs can be misinterpreted or misunderstood if the end user lacks the skills to extract clear meanings and causal relationships.
Some robustness may be lost when, for example, the thing that the user wants to get a feel for or measure is hard to quantify, so an available proxy is used instead. A classic example here is the measure for learner engagement — often extracted from time spent online as an achievable but arguably flawed measure.
The obligation to act
Should access to knowing, and understanding more about how our learners learn and how they are progressing, equate to a moral obligation to act? For example, having observed learners not submitting summative assignments, or having calculated the probabilities of course completion, is the institution obliged to act on what it has identified? Often, resources (usually staff time) are constrained, and it is not always easy to reach all learners identified as likely to benefit from additional support. In these cases, the institution might consider a type of “triage” policy by focusing available time and support on the group identified as, for example, most potentially vulnerable; or on learners in high-population courses; or on learners with particular characteristics (for example, those with known disabilities). Realistically, a line has to be drawn. However, the decision-making process should be transparent and align with other known strategies.
Privacy calculus theory suggests that individuals may be willing to share some or all of their data after evaluating the potential benefits and risks. Research by the Pew Research Center [30] in the US indicates a range of views around the acceptability of organisations sharing personal data — most often depending on the organisation and the perceived purpose. For example, their 2019 report states that “49% [of Americans] say it is acceptable for the government to collect data about all Americans to assess who might be a potential terrorist threat,” whereas “about four-in-ten are concerned a lot about what personal information social media sites . . . might know about them.” Research carried out in 2018 at The Open University, in the UK, looked at how learners viewed their own privacy calculus whilst also measuring what they actually did in practice. This research found that very few learners ever read the terms and conditions of websites visited, but that “93% of respondents indicated that it is ‘quite’ or ‘very important’ to be in control of who gets access to information shared online” (Slade et al., 2019). In the context of education and learning analytics, there was a higher level of trust (as might be hoped), with around three-quarters of learners stating that they trusted the university with their data. In reality, of course, user preferences can be irrelevant, since the options are most often limited to sharing data or being denied access to the service.
For more insight into the privacy calculus, watch this short video.
Watch Video: https://youtu.be/k0PW_5JlF88
Video attribution: “Unit 7: The Privacy Calculus” by Commonwealth of Learning is available under CC BY- SA 4.0
Reflective action: how do you share?
Think about the websites that you visit most often. Have you read their terms and conditions? (Be honest!) Do you know what data are collected by the site and how these might be shared? Would that matter to you? Most people have some concerns or feel some unease at the thought of their data being shared with other parties but often feel that access to the service is sufficient payment (or feel that the risk is low enough not to worry).
Reflection action: learner consent in your context
Having reached this stage in the course, you will have a better understanding of how learning analytics can be applied in educational settings, what data may be collected, and a range of ethical issues. Consider the issue of consent. If you were to take a position on consent in your own institutional (or course) context, what would you include?
This unit has given a quick overview of the complex area of ethical concerns over uses of (learner) data. None is necessarily more important than any other, and views on each will vary, depending on the institution and the stakeholder. Unit 9 considers how some of these issues can be formalised via the creation of a learning analytics policy.
Suggested readings
There is a growing body of research summarising these issues in learning analytics. The following are some useful readings.
Learning analytics: ethical issues and dilemmas: http://oro.open.ac.uk/36594/2/ECE12B6B.pdf
GDPR and Learning Analytics: frequently asked questions: https://analytics.jiscinvolve.org/wp/2018/06/01/gdpr-and-learning- analytics-frequently-asked-questions/
Ethical challenges for learning analytics: https://epress.lib.uts.edu.au/journals/index.php/JLA/article/view/65 87
Practical ethics for building learning analytics: https://bera-journals.onlinelibrary.wiley.com/doi/abs/10.1111/bjet.12868
a. Having read a summary of potential ethical concerns, let’s revisit the following statements and again make a decision:
i. Consent must be obtained at the point of registration.
ii. Once an institution can predict a likely learner outcome, it must act upon that insight.
iii. Learning analytics can only be used by trained analysts or statisticians.
iv. Once a learner registers, all of their data (provided at registration or click data) becomes the property of the institution.
Which of these is always true?
- All of them
- (a) and (d)
- (a), (b) and (d)
- Only (a)
- Only (d)
- None of them
b. Learning analytics can predict whether a learner will pass a course.
i. True
ii. False
c. What is the data privacy calculus?
i. a weighing up of the pros and cons of using learning analytics
ii. a weighing up of the pros and cons of sharing data in order to access a service
iii. a devilishly tricky math calculation
d. What is needed for meaningful consent? (choose all that apply)
i. a formal policy statement
ii. transparency about uses of learner data in learning analytics
iii. a robust approach to using algorithms
iv. an ability to change consent positions at later stages