Data Ownership
DSCI 200
Katie Burak, Gabriela V. Cohen Freue
Last modified – 17 March 2026
\[
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\DeclareMathOperator*{\minimize}{minimize}
\DeclareMathOperator*{\maximize}{maximize}
\DeclareMathOperator*{\find}{find}
\DeclareMathOperator{\st}{subject\,\,to}
\newcommand{\E}{E}
\newcommand{\Expect}[1]{\E\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[2]{\mathrm{Cov}\left[#1,\ #2\right]}
\newcommand{\given}{\ \vert\ }
\newcommand{\X}{\mathbf{X}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\P}{\mathcal{P}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\snorm}[1]{\lVert #1 \rVert}
\newcommand{\tr}[1]{\mbox{tr}(#1)}
\newcommand{\brt}{\widehat{\beta}^R_{s}}
\newcommand{\brl}{\widehat{\beta}^R_{\lambda}}
\newcommand{\bls}{\widehat{\beta}_{ols}}
\newcommand{\blt}{\widehat{\beta}^L_{s}}
\newcommand{\bll}{\widehat{\beta}^L_{\lambda}}
\newcommand{\U}{\mathbf{U}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\V}{\mathbf{V}}
\]
Attribution
This material is adapted from the following sources:
In-Class Activity: The Data Dilemma
Learning Objectives
- Understand the concept of open science and explain the benefits of open data, including transparency, collaboration, and innovation
- Analyze why some data cannot be fully open due to ethical, legal, and proprietary restrictions
- Apply FAIR principles to assess how data can be made findable, accessible, interoperable, and reusable, regardless of openness
- Evaluate data ownership and licensing frameworks to determine their impact on openness and FAIR compliance
- Compare FAIR and CARE principles to assess how Indigenous data sovereignty influences data management and sharing
Note: These slides provide an overview of some key considerations related to data ownership. For specific legal advice, it’s always best to consult a legal professional.
Open Science
- Open science is about making scientific research, data, and dissemination accessible to all.
- It promotes transparency, collaboration, and innovation in research.
- Includes open access publications, open data and open tools.
- Supported by initiatives like FOSTER (Facilitate Open Science Training for European Research).
What are Open Data?
- Open data refers to freely accessible, online data that can be used, reused, and shared with proper attribution given to the original source (FOSTER).
- Sharing and reusing open data helps make research more transparent and reproducible.
- Ethical considerations mean that not all data can (or should) be fully open (e.g., personal or sensitive data).
Why Open Data Matters
- Reproducibility: Enables verification and replication of research.
- Efficiency: Saves time and resources by reducing redundant data collection.
- Collaboration: Allows researchers to combine datasets for new insights.
- Innovation: Drives new discoveries and applications across disciplines.
Balancing Openness & Ethics
- Open science supports open data whenever ethically appropriate.
- Some data must remain restricted due to privacy, security, or legal constraints.
- Best practices help balance openness with responsibility.
- The goal: “As open as possible, as closed as necessary.”
FAIR Principles
- Accessibility alone isn’t enough— data must be usable!
- Proper formatting, clear metadata, and documentation are essential.
- The FAIR principles were established in 2016 (Wilkinson et al., 2016) to provide guidelines for the use of the vast amount of information available.
What are FAIR Data?
- FAIR is an acronym for Findable, Accessible, Interoperable, Reusable.
- FAIR data ensures that both humans and machines can easily discover, access, and reuse data.
- Principles are high-level and domain-independent, allowing flexibility across disciplines.
Findable
- Data and metadata must be easy to find.
- Machine-readable metadata supports automated discovery.
- Key principles:
- F1. Data have a globally unique and persistent identifier (PID).
- F2. Data are described with rich metadata.
- F3. Metadata include clear links to the data they describe.
- F4. Data are indexed in searchable resources.
Accessible
- Users must be able to access data when needed.
- Accessibility includes authentication and authorization when necessary.
- Key principles:
- A1. Data are retrievable via a standardized communication protocol.
- A1.1. The protocol is open, free, and universally implementable.
- A1.2. The protocol supports authentication and authorization if required.
- A2. Metadata remain accessible even if the data are no longer available.
Interoperable
- Data should integrate easily with other datasets and systems.
- Interoperability ensures that data can be used across different platforms.
- Key principles:
- I1. Data use a formal, shared, and accessible language.
- I2. Metadata follow FAIR-aligned vocabularies.
- I3. Data include qualified references to related datasets.
Reusable
- Data should be well-described and structured for future use.
- Reusability supports reproducibility and new discoveries.
- Key principles:
- R1. Data have rich, accurate metadata.
- R1.1. Data include a clear usage license (more on licenses later).
- R1.2. Provenance (origin and history) is well-documented.
- R1.3. Data align with community standards.
Why FAIR Matters
- Improves data discovery, sharing, and reuse.
- Supports collaborative and reproducible research.
- Enables automation and machine learning applications.
- Encourages responsible data management, whether open or restricted.
Fun fact: The Canadian government embraces FAIR data! 🇨🇦
FAIR Data and Openness
- FAIR does not necessarily mean open data (data can be FAIR without being openly accessible).
- Licensing determines whether FAIR data can be shared, restricted, or fully open.
- FAIR metadata ensures discoverability and reuse, even if access is controlled.
Four Levels of FAIRness
- FAIR metadata only: Data has a PID and searchable metadata, but may not be accessible.
- Rich metadata: Includes detailed provenance and user-defined metadata.
- FAIR but restricted: Data elements follow FAIR principles but have access controls.
- FAIR and open: Data is publicly available with clear licenses.
FAIRness in Industry
- While the concept of FAIR data originated in academia, FAIR data principles are very relevant in industry as corporations both largely produce AND consume data.
- FAIR data breaks down internal data silos by making it findable and accessible across departments, improving collaboration.
- FAIR data also simplifies external integration by ensuring data can easily work with public and commercial sources, boosting innovation.
Example: Corporate pharmaceutical data is often more detailed but less FAIR than broader public datasets, which can hinder drug development and discovery.
FAIRness in Industry
- FAIR data doesn’t necessarily mean public data (recall the levels of FAIRness); companies can implement FAIR internally while protecting intellectual property and competitive advantage.
- Proper governance ensures FAIR data improves efficiency without exposing trade secrets or causing financial loss.
- FAIR data can be shared securely through controlled access, APIs, or licensing.
Question 1
Which of the following statements is TRUE about FAIR data?
- FAIR data must always be publicly available.
- FAIR ensures data is usable, but it may have access restrictions.
- FAIR and Open Data mean the same thing.
- FAIR data cannot have any licensing restrictions.
Question 2
A researcher wants to combine multiple datasets from different sources.
Which FAIR principle ensures that data can be used across systems?
- Findable
- Accessible
- Interoperable
- Reusable
Data Ownership & Licensing
- Understanding data ownership is essential before sharing or licensing data.
- Ownership depends on factors like:
- Who collected or created the data.
- Institutional policies and employment contracts.
- Funding agency requirements and agreements.
- The nature of the data - personal data may be subject to privacy laws (refer back to the Amazon example).
Copyright & Data
- Copyright protects creative works, but data ownership is more complex.
- Not all data are protected by copyright.
- Copyright protects original expressions but not raw facts or measurements.
- Some types of data may be protected due to how they are organized or presented.
- Proper licensing helps clarify rights and permissions for data sharing.
Copyright & Data
| Raw data (e.g., a measurement) |
Data representations (e.g., tables, graphs) |
|
Datasets |
|
Data compilations |
|
Databases |
|
Purchased data (with conditions of use) |
|
Literary, musical, dramatic, or artistic works (e.g., photos) |
What data are protected by copyright highly depends on the situation - these are just guidelines. For example, a graph may be protected if it involves creative choices in how the data is organized, displayed, or interpreted. Simple graphs, such as bar graphs or line charts, typically aren’t copyrightable.
Ownership Considerations
| Primary Data |
Collected directly by the organization or researcher for their own use. Likely owned by the organization but depends on institutional policies. |
| Secondary Data |
Data collected by others and reused for a new purpose. Usually owned by the original data creator. |
| Tertiary Data |
Summarized or synthesized data from multiple sources (e.g., reports, meta-analyses). Likely owned by others unless explicitly released under an open license. |
Note that data ownership depends on the nature of the data; for personal data the owner is typically the individual for which the data was recorded!
Data Rights and Protections in Canada
- In Canada, businesses don’t “own” data in the traditional sense. Data is protected through various legal frameworks.
- Key Legal Protections:
- Intellectual Property: Limited protection; patents and copyrights may apply to methods or databases, not the data itself.
- Contracts: Use contracts to define how data can be used and include remedies for misuse.
- Statutory Protection: Privacy laws like PIPEDA for personal information protection help manage data.
- Common Law: Claims like breach of confidence and negligence can protect against data misuse.
Data Licensing
- A license defines how data can be used, shared, and modified.
- Even if data are publicly available, they may have restrictions on reuse.
- Open licenses allow for greater transparency and accessibility.
- Choosing the right license ensures compliance with copyright and ethical guidelines.
Types of Open Licenses
- Creative Commons (CC) Licenses: Used for general data sharing.
- Open Data Commons (ODC) Licenses: Designed specifically for databases.
- Software Licenses: Applied to data-related software or scripts.
Types of Open Licenses
| CC BY 4.0 |
✅ Yes |
✅ Yes |
✅ Yes |
| CC0 (Public Domain) |
✅ Yes |
✅ Yes |
❌ No |
| ODbL 1.0 (Open Database License) |
✅ Yes |
✅ Yes |
✅ Yes (Share-Alike) |
| ODC-BY 1.0 (Attribution License) |
✅ Yes |
✅ Yes |
✅ Yes |
| PDDL 1.0 (Public Domain Dedication License) |
✅ Yes |
✅ Yes |
❌ No |
| MIT License (for software) |
✅ Yes |
✅ Yes |
✅ Yes |
| GNU GPL v3 (for software) |
✅ Yes |
✅ Yes |
✅ Yes |
| Apache 2.0 (for software) |
✅ Yes |
✅ Yes |
✅ Yes |
Choosing the Right License
Here are some general guidelines for choosing the right license:
- Want maximum openness? Use CC0 or PDDL to waive all rights.
- Need attribution? Use CC BY or ODC-BY.
- Concerned about modifications? Consider CC BY-ND (No Derivatives) but note it limits reuse.
- Working with databases? Use ODbL to ensure share-alike conditions.
- Publishing software or code? Use MIT, GPL, or Apache.
Why Licensing Matters for Open & FAIR Data
- Licensing determines whether data can be shared, restricted, or fully open.
- Even restricted data can be FAIR (Findable, Accessible, Interoperable, Reusable).
- Proper licensing ensures compliance with ethical and legal requirements.
- Data repositories often require specifying a license before publishing datasets.
CARE Principles for Indigenous Data Governance
- Open data and open science movements often overlook Indigenous Peoples’ rights and interests.
- FAIR principles focus on data sharing but ignore power imbalances.
- Indigenous Peoples seek greater control over their data to ensure collective benefit.
- The CARE Principles complement FAIR by centering people and purpose in data governance.
What Are Indigenous Data?
- Indigenous data refers to information and knowledge about:
- Individuals, groups, and organizations.
- Ways of knowing and living, including cultural practices and traditions.
- Languages, cultures, land, and natural resources.
Traditional Knowledge
- Indigenous data exists in many formats, including traditional knowledge passed down through generations. This includes:
- Languages, stories, ceremonies, and songs.
- Arts, hunting, trapping, gathering, and food preparation.
- Spirituality, beliefs, and worldviews.
- Indigenous data is foundational to community identity, cultural continuity, and self-determination.
Indigenous Data Sovereignty
- The right of Indigenous Peoples to govern the collection, ownership, and application of their data.
- Rooted in inherent rights to self-governance over peoples, lands, and resources.
- Knowledge belongs to the collective and is fundamental to Indigenous identity.
- Positioned within a human rights framework, including treaties, court cases, and legal recognition.
CARE Principles for Indigenous Data Governance
- The CARE framework centers on Collective Benefit, Authority to Control, Responsibility, and Ethics.
- Designed to complement FAIR by ensuring Indigenous control and benefit from data.
- Encourages ethical, inclusive, and just data practices.
Collective Benefit
- Indigenous data ecosystems should enable collective benefit for Indigenous Peoples.
- Principles:
- C1. For inclusive development and innovation.
- C2. For improved governance and citizen engagement.
- C3. For equitable outcomes.
Authority to Control
- Indigenous Peoples’ rights and authority over their data must be recognized and respected.
- Principles:
- A1. Recognizing rights and interests.
- A2. Data for governance.
- A3. Governance of data.
Responsibility
- Those working with Indigenous data must ensure its use supports self-determination and collective benefit.
- Principles:
- R1. For positive relationships.
- R2. For expanding capability and capacity.
- R3. For Indigenous languages and worldviews.
Ethics
- Indigenous rights and well-being must be the primary concern throughout the data life cycle.
- Principles:
- E1. For minimizing harm and maximizing benefit.
- E2. For justice.
- E3. For future use.
Supporting Indigenous Data Sovereignty
- 1. Recognize and promote Indigenous sovereignty.
- 2. Center Indigenous values in research and data practices.
- 3. Conduct scholarship in service to Indigenous communities.
- 4. Build research and data capacity within Indigenous communities.
- 5. Follow existing Indigenous data governance protocols.
- 6. Support Indigenous scholars in both academic and community settings.
FAIR and CARE: Complementary Frameworks
- FAIR focuses on data usability, while CARE ensures ethical and people-centered approaches.
- CARE builds on FAIR by emphasizing:
- People and purpose over just data sharing.
- Correcting historical power imbalances in data governance.
- Creating value for Indigenous communities through ethical data use.
- The goal is to “Be FAIR and CARE”!
What is OCAP®?
- OCAP® stands for Ownership, Control, Access, and Possession.
- It is a set of principles developed by the First Nations Information Governance Centre (FNIGC) in Canada.
- First Nations communities in Canada have historically been excluded from decisions about the collection, storage, and sharing of data related to their peoples.
- OCAP® helps ensure that First Nations communities have sovereignty over their data and information.
OCAP® is a registered trademark of the First Nations Information Governance Centre (FNIGC)
The Four Principles of OCAP®
- Ownership: Communities or groups collectively own their own knowledge, data, and information in the same way that individuals own their own personal information.
- Control: Communities have control over all stages of research, from collection to storage and everything in between. Communities have control and decision-making power over all aspects of research and information that impacts them.
- Access: Communities should be able to access their collective information and data, no matter its location. Communities should be able to manage and make decisions regarding the access to and control of their information.
- Possession: This is like Ownership, but more concrete. It is the physical control of data, the mechanism that asserts and protects ownership of information. It may also be thought of as stewardship.
Key Takeaways
- Open data encourages collaboration but must be handled ethically and legally to respect privacy, security, and intellectual property.
- FAIR and CARE principles guide the usability and governance of data, with FAIR focusing on technical management (making data findable, accessible, interoperable, and reusable) and CARE emphasizing community control over data.
- Indigenous data sovereignty, as seen with OCAP®, ensures that Indigenous communities maintain control over their data.