Per the Foundations for Evidence-based Policymaking Act of 2018 (also referred to as the Evidence Act), and eloquently stated by the Federal CDO Council, the federal Chief Data Officer’s first responsibility is “Managing data at every stage of the data lifecycle by establishing effective procedures, standards, and controls…”
That’s a loaded sentence! All agencies have data lifecycles – some more mature than others – but integrating the data lifecycle with legacy processes is labor-intensive. When we look across agencies at their enterprise data management functions and strategies, we see similar challenges to establishing robust, effective, and yet flexible governance including:
- The number of stakeholders involved can be overwhelming.
- Legacy processes in which to insert data governance controls are voluminous and complex.
- Enterprise data sharing and analytics introduce new challenges to legacy practices.
This article shares practical ideas and clear examples of data governance practices that align well with the DMBOK 2.0 data lifecycle phases. I hope these examples clearly demonstrate how the DMBOK 2.0 data lifecycle will help federal data leaders understand how they can introduce and align new data governance with their many existing processes and organizational units.
Data Governance is a Team Sport
To succeed in their roles, CDOs need to establish relationships with practically every business stakeholder group including Investment Management, Cybersecurity, Privacy/Compliance, Records Management, Data Stewards, IT Development & Operations Teams, and Data Product Developers and Consumers. And to make it more complex, the roles, people, and collaborations with these groups will naturally change from phase to phase throughout the data lifecycle.
The Enterprise Data Lifecycle can help CDOs and their teams manage data governance across these myriad, dynamic, enterprise-wide interactions. The following illustration shows examples of how this simple framework offers opportunities to propose data governance controls from numerous perspectives.
Governance Opportunities Throughout the Data Lifecycle
Let’s look in more detail at each of these examples as simple opportunities to establish data governance along the enterprise data lifecycle.
1. Establish a seat at the investment management table
The Planning phase presents the CDO’s first opportunity to establish key data governance information and establish awareness of the pending data asset within the investment management workflow. CDOs can work with other C-Level officers to actually embed the CDO role and data perspectives from day 1 – even before the data asset exists, while it’s still being planned and budgeted. We can simply follow the lead of those “Chiefs” who have come before us and successfully integrated their respective practices at the onset of agency management processes including the CTO, CIO, and CPO.
A first suggestion here is to embed data perspectives into agency investment management processes and systems. This will help ensure the data asset is well understood at the onset and throughout its life. Specifically, consider including key information about the data asset into investment management system workflows, such as Folio. You can include the anticipated family of data that the new data asset will fit into, any obvious ethical intent for the data asset, its sensitivity level, and identifying the formal data steward who will advocate for the data asset throughout the entire data lifecycle. In addition, consider the concept of a “coming soon” notice at this point of the data lifecycle for new data assets that would be published in the enterprise data catalog to help data citizens understand helpful data that will soon become available.
2. Apply similar governance rigor for acquired data
Acquired data can easily slip through the governance “cracks”, especially publicly available data. Various teams around the business will bring in “public” data without thinking about how that data will be managed through the lifecycle. (CDOs clearly have a huge job to include this issue in data literacy training!)
I recommend that data teams include all acquired data as individual assets within the data catalog – even if their sole purpose is to supplement another formal data asset – to ensure that all data citizens can clearly understand its data provenance. This clarity will help future data governance and analytics activities to better understand data dependencies and usage rights.
3. Pair data governance disciplines to platform governance
Many factors can lead to a rush to ingest data assets into new data management platforms. Specifically, investments in new data repositories, such as shared data lakes, often proceed the data governance practices guiding their use. To limit the governance decisions that must be made at each step in the analytics process, consider a “just in time” governance approach to capture requirements.
You don’t have to elaborate ALL analytics process requirements before ingesting data into the new platform. For example, the point at which a data asset is first ingested into a data lake will require a smaller set of data access controls, because analysis and data transformations will be limited to a small group of staff (e.g., data stewards and developers) before the asset is shared as an enterprise data product. However, when the resulting data product is ready to be shared with the entire enterprise, the number of data access, data usage, and data monitoring controls expands significantly to account for widespread review, analysis, and re-use.
Federal teams may be helped by applying just enough requirements at the point of ingest to avoid waiting until the larger set of data protection requirements are understood and approved for sharing. This can be especially helpful for agencies in which data governance processes and data platforms are both new.
Finally, an important consideration for this attention to “just in time” requirements granularity is that it can provide clarity and avoid inadvertent misconfiguration of data management platform services that allow the misuse of the data asset. Inadvertent security misconfiguration due to misunderstanding and lack of clarity is included on the Open Worldwide Application Security Project (OWASP) Top 10 list of all reasons for web vulnerabilities.
4. Include data stewards and communications teams for pivotal data governance decisions
Consider adjusting the role of data stewards and communications teams (corporate comms and organizational change management comms) for Incident Management to account for the complexity of analytics models and to re-assert data governance approvals prior to re-publishing. Regarding the data steward’s role in discerning the resolution to a discovered incident, consider that even minor changes to complex data analytics can trigger unforeseen results.
For example, a subtle change to data access controls or aggregation parameters can trigger unforeseen data disclosure risks. Pay close attention to the communications required to re-publish complex analytics models to broad audiences. Consider creating a formal discernment, approval, and communication to re-publish a data asset as an important safeguard for high-profile information that can be used as evidence, interpreted, and re-used in many ways once it is made available. CDOs and data governance leaders should ensure through governance that data stewards will re-evaluate previously approved analytics, and the impact of any changes, before re-publishing data externally.
5. Be judicious about retaining analytics versions of data
Organizations tend to default to “save everything.” Even people who are not data hoarders may rightly think they might need to use some information again – so why throw it away? But retention-management policies and processes rarely address the analytics versions of data. A simple set of retention criteria for analytics versions of data, such as the following, can be an easy and valuable update to retention policies:
- Can we regenerate the analytics data asset based on well-understood PROD systems where we have a clear understanding of the data provenance? Do not retain it.
- Are the analytics data assets subject to a legal hold or a FOIA request? Retain it.
- Does a requirement exist to show how the data has been used historically? Consider only retaining the historical access logs that cannot be regenerated.
Organizing data governance along the phases of the DMBOK 2.0 Data Lifecycle can greatly help agencies find the path to federally compliant data governance. Organizing on a phase-by-phase perspective of data governance throughout the data lifecycle can make it easier to understand the specific organizational stakeholders, process owners, and activities that new data governance practices will interact with.
These specific examples of new data governance practices within each phase of the enterprise data lifecycle will hopefully demonstrate that changes can be lightweight and simple because most agencies already have governance of many forms on which to piggyback. This article includes references to investment management, data acquisition, data intake, incident resolution, and retention. The stakeholders and the processes for these disciplines are well-established at most federal agencies.
In my recent work at Citizant supporting federal agencies, we have used our Middle-Out Architecture Approach to help federal data leaders define a clear methodology and build a data governance roadmap to integrate new data governance. By being a true partner to the federal CDO, we help the agency along the data governance journey to assess, identify, develop, and iteratively deploy data governance practices to accommodate the specific priorities of any federal agency including:
- Comprehensive operational roadmaps for enterprise-level data governance that complement existing governance processes.
- Compliance with federal legislation such as the Evidence Act.
- Clarity of data assets and their usage, which yields improved business insights through analytics and trust in data assets.
- Reduced Agency, Departmental, and Legislative risk via NIST-compliant data security throughout the entire data lifecycle.