Abstract
Marketing analytics rises or falls on one quiet moment: the data definition phase, when teams decide what to measure and how. Despite its centrality, the field lacks a standardized methodology for doing this well. This study asks how practitioners actually run data definition and what distinguishes mature from ad-hoc practice across agencies, in-house teams, and freelancers. Evidence comes from 40 semi-structured interviews analyzed inductively with reliability checks. Four dimensions consistently shape outcomes: process formality, stakeholder collaboration, documentation and tooling, and recurring failure points such as definitional drift, onboarding friction, and governance gaps. Comparative patterns show agencies emphasize standardization, in-house results hinge on leadership and culture, and freelancer-driven gains fade without internal ownership. The study introduces a five-level Data Definition Maturity Model that specifies practical capabilities, aligned KPI glossaries, facilitated definition workshops, versioned metric repositories, and privacy checkpoints. Higher maturity reduces “whose numbers are right” disputes, speeds consensus, and improves analytic reliability. The contribution is a shared language and actionable roadmap for a phase too often improvised; we argue that rigorous data definition is a necessary precondition for reliable analytics and AI-ready marketing data ecosystems.
Keywords
Introduction
Marketing analytics is now a core capability for firms seeking to compete in data-rich environments. Organizations invest heavily in tools, dashboards, and advanced techniques such as machine learning and AI to optimize campaigns, personalization, and customer journeys (Fosso Wamba et al., 2017; Germann et al., 2013). Yet, as the oft-cited “garbage in, garbage out” principle reminds us, analytics quality is bounded by the quality of the underlying data and how it is conceptualized and captured in the first place.
A crucial but frequently under-specified part of this process is what we term the data definition phase: the work of articulating business objectives, translating them into metrics and KPIs, and specifying the events, entities, and rules that determine how data enters organizational systems. This phase involves agreeing on what constitutes a “lead,” a “conversion,” an “active user,” or “churn,” how these constructs are operationalized across channels, and how they are documented and governed. Decisions made here have long-lasting consequences for comparisons over time, cross-market reporting, and the perceived credibility of analytics outputs.
As organizations increasingly aspire to use predictive models and AI, the consequences of weak data definition practices become even more serious. Poorly specified metrics, inconsistent definitions across markets, and undocumented changes can propagate directly into training datasets, undermining model validity, introducing bias, and fueling costly AI failures or “pilot purgatory” (Abbasi et al., 2016; Mikalef et al., 2020). Reliable, AI-ready marketing analytics therefore presuppose consistent, well-governed data definitions.
Academic and practitioner literatures emphasize data quality, governance, and analytics capability more broadly (Abraham et al., 2019; Alhassan et al., 2016; Gupta & George, 2016; Król & Zdonek, 2020; Langer, 2025). However, most work treats the definition of metrics and events as a relatively technical detail within broader data governance or maturity frameworks, rather than as a distinct, socio-technical phase deserving focused empirical attention.
Existing analytics and maturity models typically assess infrastructure, skills, and organizational culture, with only limited treatment of how metrics and definitions are created, negotiated, and maintained over time (Grossman, 2018; Santos-Neto and Costa, 2019; Smajli et al., 2024). Data quality research identifies definitional clarity as a dimension of quality, but provides little guidance on how organizations actually develop and govern those definitions in practice.
This gap is particularly salient in marketing, where metrics are often used across multiple stakeholders (marketing, sales, finance, IT, external agencies) and where digital transformation has multiplied the number of measurable touchpoints (Varshitha et al., 2023; Vial, 2019). Stakeholder theory and socio-technical perspectives suggest that cross-functional alignment and governance are crucial (Bhattacharya & Korschun, 2008; Levina & Vaast, 2005; Reich & Benbasat, 2000), yet we know little about how such alignment is (or is not) achieved in everyday data definition work.
Against this backdrop, we ask: How do marketing analytics professionals approach the data definition phase, and how does methodological maturity differ across organization types (agencies, freelancers, in-house teams)?
To address this question, the study pursues five objectives: (1) Identify and describe the common methodologies, processes, tools, and frameworks employed during the data definition phase. (2) Explore the role of stakeholder collaboration in defining data, metrics, and KPIs, including key enablers and barriers. (3) Uncover prevalent challenges and pain points in data definition and how practitioners attempt to overcome them. (4) Compare methodological maturity in data definition across agencies, freelancers, and in-house teams. (5) Propose a conceptual framework, the Data Definition Maturity Model (DDMM), to help organizations assess and improve their data definition practices and, by extension, their AI readiness.
The study is based on 40 semi-structured interviews (Kallio et al., 2016; Tate et al., 2023) with marketing analytics professionals across agencies, in-house teams, and independent consultants. An inductive thematic analysis is used to map current practices and synthesize a maturity model.
The paper makes three main contributions. Conceptually, it foregrounds the data definition phase as a distinct and critical component of the marketing analytics lifecycle, integrating insights from data quality, governance, and stakeholder collaboration literatures (Abraham et al., 2019; Bhattacharya & Korschun, 2008; Wand & Wang, 1996). Second, it develops the Data Definition Maturity Model (DDMM), a five-level framework that captures how organizations move from ad hoc, person-dependent practices to optimized, strategically aligned and ethically governed data definition processes (Becker et al., 2009; Paulk et al., 1993; Wendler, 2012). Third, it provides a practical roadmap for governance and AI readiness, showing how improvements in data definition maturity support more reliable analytics and robust foundations for AI initiatives.
The remainder of the paper is structured as follows. We first review literature on marketing analytics, data quality, maturity models, organizational context, and stakeholder collaboration. We then outline the qualitative methodology. Next, we present the thematic findings, followed by a comparison of practices across organizational contexts. We subsequently develop the DDMM and discuss its theoretical and practical implications. Finally, we conclude and outline avenues for future research, including quantitative validation of the DDMM.
Literature Review and Theoretical Foundation
Marketing analytics encompasses the processes and technologies used to collect, analyze, and apply data to improve marketing decisions (Davenport & Harris, 2007; Wedel & Kannan, 2016). Frameworks typically describe a lifecycle from goal-setting and data collection through analysis, insight generation, and action (Germann et al., 2013; Varshitha et al., 2023). Within this lifecycle, data definition sits upstream of collection and analysis: it shapes what is captured, how it is categorized, and how it can be interpreted later.
While many lifecycle models implicitly assume that metrics and events are well-defined, practitioner accounts highlight that this is often not the case. Misalignment over metrics (e.g., different teams calculating “churn rate” or “conversion” differently) leads to disputes, rework, and loss of trust in data, a theme echoed in participant narratives in this study and in broader data quality research (Abbasi et al., 2016; Haug et al., 2011). In AI contexts, such misalignment directly contaminates training data and model evaluation, limiting the ability to move beyond dashboards to predictive or prescriptive analytics.
Data quality is multi-dimensional, encompassing accuracy, completeness, timeliness, and consistency (Haug et al., 2011; Wand & Wang, 1996). Many of these dimensions are determined “at the source”, when data is first defined and collected. Definitional choices (e.g., thresholds, attribution rules, cohort definitions) influence not only the values recorded but also which phenomena are rendered visible to decision-makers.
Data governance frameworks emphasize structures, roles, and processes to ensure that data is managed as an asset (Abraham et al., 2019; Alhassan et al., 2016; Khatri & Brown, 2010). However, governance discussions often focus on master data, security, and stewardship at an enterprise level; the concrete processes by which marketing teams agree on KPI definitions, document them, and manage changes over time are less frequently addressed. There is also growing recognition that governance must encompass ethical and privacy considerations, especially under regimes such as GDPR (Martin & Murphy, 2017). These considerations are increasingly relevant at the point of data definition, where decisions about what to measure and how can have compliance and reputational consequences.
Maturity models describe staged progressions through which capabilities become more systematic, controlled, and optimized (Becker et al., 2009; Paulk et al., 1993; Pöppelbuß & Röglinger, 2011; Wendler, 2012). In analytics, maturity models assess dimensions such as data infrastructure, analytical techniques, culture, and governance (Grossman, 2018; Król & Zdonek, 2020; Langer, 2025; Santos-Neto and Costa, 2019).
Recent reviews highlight both the proliferation of maturity models and their limitations, including overlapping constructs and limited empirical grounding (Langer, 2025; Smajli et al., 2024). Importantly, most analytics maturity frameworks treat the definition of metrics and KPIs as an implicit sub-dimension rather than a focal capability. This leaves a gap: we lack a model that explicitly conceptualizes maturity in the data definition phase, linking micro-level practices (templates, workshops, documentation) to macro-level outcomes such as data quality, governance, and AI readiness.
Organizational context shapes how analytics capabilities develop (Gupta & George, 2016; Mikalef et al., 2020; Vial, 2019). Factors such as organization size, digital orientation, sector, and leadership support influence the resources available for formalizing processes and investing in governance. For instance, large, data-driven firms may establish data councils and cross-functional committees, while small firms or start-ups may rely on a single analyst or external freelancer.
Service providers such as agencies must codify methods that work across diverse clients (Côrte-Real et al., 2017), while freelance consultants operate under tighter timeframes and limited formal authority. These contextual differences imply that one-size-fits-all prescriptions for data definition are unlikely to work. A maturity model that accounts for different organizational archetypes can help tailor expectations and pathways for improvement.
Data definition is inherently socio-technical: it involves not just specifying fields and events but aligning multiple stakeholders around shared meanings (Barki & Hartwick, 1989; Bhattacharya & Korschun, 2008). Marketing managers, analysts, IT specialists, product owners, legal/privacy experts, and external partners all have stakes in how metrics are defined and used.
Effective collaboration in this phase can lead to more relevant, trusted metrics and smoother downstream decision-making. Cross-functional alignment is a longstanding theme in IS and management research (Ancona & Caldwell, 1992; Reich & Benbasat, 2000), and boundary-spanning roles, such as business analysts or “analytics translators”, are often critical to bridge language and perspective gaps (Levina & Vaast, 2005).
However, collaboration is challenging. Power dynamics, “turf wars” over data ownership, and differences in data literacy can derail attempts at consensus. Participants in this study described both high-collaboration settings (joint KPI workshops, shared documentation) and siloed environments where misaligned definitions created persistent friction. These dynamics are central to our understanding of methodological maturity in data definition and are explicitly reflected in the DDMM.
Methodology
Given the exploratory nature of our research question, seeking to understand how and why data definition practices vary across contexts, a qualitative design was appropriate (Edmondson & McManus, 2007). We conducted 40 semi-structured interviews with marketing analytics professionals, enabling in-depth exploration of practices, challenges, and perceptions across a relatively diverse sample (Boddy, 2016; Hennink et al., 2021; Malterud et al., 2016).
Sampling Strategy and Participant Recruitment
We employed purposive sampling to capture variation across organizational contexts (agencies/consultancies, in-house teams, independent freelancers), industries, company sizes, and regions (Jansen, 2010). Participants were recruited through professional networks, industry communities, and snowball referrals, targeting individuals with significant experience in digital analytics, marketing measurement, or related roles.
Inclusion criteria required direct involvement in defining metrics or implementation requirements (e.g., specifying events, tag plans, KPI definitions) rather than purely consuming reports. Recruitment continued until the research team judged that additional interviews were yielding diminishing new insights relative to the research objectives, consistent with qualitative “information power” principles (Fusch & Ness, 2015; Malterud et al., 2016).
The final sample comprised 40 professionals spanning agencies/consultancies, in-house analytics/marketing teams, and independent freelancers/consultants, working across sectors such as ecommerce, telecom, financial services, hospitality, and B2B services. Company sizes ranged from small start-ups to large multinationals, and participants were based primarily in Europe and North America, with some representation from other regions. Experience in marketing analytics and related roles varied from mid-career practitioners to senior leaders with over a decade of experience.
Figure 1 summarizes the sample distribution by organization type, sector, company size, digital maturity, and AI adoption. Participant recruitment and sample characteristics
Data Collection Procedure and Ethical Considerations
Interviews were conducted via videoconferencing platforms, consistent with contemporary qualitative practice and guidance on online interviewing (Archibald et al., 2019; Deakin & Wakefield, 2014; Lobe et al., 2020, 2022). Sessions typically lasted 45–75 min. An interview guide with open-ended prompts probed: - How participants approach defining metrics and KPIs. - Stakeholder involvement and collaboration. - Tools, templates, and documentation used. - Challenges, trade-offs, and illustrative incidents.
The guide was iteratively refined as interviews progressed.
All interviews were audio or video recorded with permission and transcribed verbatim. The study protocol was reviewed and approved by the Ethics Committee of the University of Deusto (Faculty of Engineering, Doctoral Programme in Engineering for the Information Society and Sustainable Development), and all participants gave informed consent prior to taking part.
Transcripts were de-identified before analysis: names, specific company/product references, and other direct identifiers were removed or replaced with neutral labels, following best practices on anonymisation and identity protection (Pascale et al., 2022; Saunders et al., 2015). Data was stored on secure, access-restricted drives. Where participants shared artifacts (e.g., measurement-plan templates, KPI dictionaries), these were used as contextual material with permission (DeJonckheere & Vaughn, 2019).
Data Analysis
We conducted an inductive thematic analysis, following Braun and Clarke’s (2006) six-phase approach. After familiarization with the transcripts, the first author generated initial codes capturing recurring practices (e.g., “ad hoc definitions,” “KPI glossary,” “alignment workshop”), challenges, and contextual factors. Codes were iteratively refined into broader themes and subthemes relating to formality, stakeholder collaboration, tools and documentation, and pain points.
To support this manual, reflexive process, we used lightweight natural language processing (NLP) tools to generate descriptive corpus summaries (e.g., frequency lists, keyword-in-context displays). These tools assisted with navigation and triangulation but did not replace interpretive coding decisions, which remained researcher-led in line with qualitative best practice (Braun & Clarke, 2006; Kornbluh, 2015; Korstjens & Moser, 2018).
Coding reliability and interpretive rigour were enhanced through iterative team discussion, memoing, and comparison across cases and contexts (Edmondson & McManus, 2007; Kornbluh, 2015).
Findings: Current Practices and Challenges in Data Definition
Four interrelated themes describe how marketing analytics professionals currently approach the data definition phase: (1) Spectrum of Formality in Data Definition Approaches, (2) Stakeholder Collaboration: Dynamics, Enablers, and Barriers, (3) Tools and Templates Ecosystem, and (4) Challenges and Pain Points in the Data Definition Lifecycle. Each theme maps onto different levels of methodological maturity and indirectly signals AI readiness, as summarized at the end of each subsection.
Spectrum of Formality in Data Definition Approaches
Participants described a wide spectrum of practices, from highly improvised to well-structured processes. At the ad hoc end, metrics were often defined reactively, under time pressure, with little documentation. One in-house analyst noted: If leadership asks for a new KPI, we scramble to define it and get the data… there’s no consistent system.
In such settings, different teams or individuals might maintain their own implicit definitions. These organizations typically lacked templates or formal checkpoints; data definition was person-dependent and fragile, especially when staff turned over.
At the more formalized end of the spectrum, participants described standardized templates, checklists, and approval processes. Some organizations had formal measurement plans embedded into project kickoffs, with joint definition workshops and mandatory documentation of each KPI’s name, formula, data source, and owner. In one high-maturity in-house team, any new KPI required approval from a cross-functional “Data Council,” and definitions were stored in an enterprise-wide data dictionary. As the participant put it: Now it’s part of our DNA to nail down definitions upfront.
Between these poles lay emerging practices: rudimentary templates or glossaries existed, often introduced by a motivated individual or consultant, but usage was inconsistent, and documents quickly became outdated. Participants in this “in-between” space often said, “We know we should be doing X, we just haven’t gotten there yet.”
This spectrum forms the backbone of the DDMM. Level 1 corresponds to ad hoc, person-dependent practices; Level 2 to emerging structure; Level 3 and above to standardized, managed processes with governance. From an AI readiness perspective, organizations at Levels 1–2 lack the stable, consistent training data necessary for reliable models; any AI initiative operates on shifting definitional sand. Higher levels, by contrast, create more stable and interpretable data foundations for model development and monitoring.
Stakeholder Collaboration: Dynamics, Enablers, and Barriers
A second theme concerns how stakeholders collaborate (or fail to) during data definition. In more collaborative contexts, participants described cross-functional workshops where marketing, product, analytics, IT, and sometimes legal/privacy stakeholders jointly defined KPIs and implementation requirements. One analytics director explained: We run a KPI alignment workshop at project kickoff. Marketing, Product, Data Science all in one room. It’s not always easy… but by the end, we have a single agreed definition for every metric. It has saved us so much grief later on.
Such settings often featured a boundary-spanning role, a business analyst, analytics translator, or data steward, responsible for facilitating dialogue and documenting decisions (Barki & Hartwick, 1989; Levina & Vaast, 2005). These roles helped “get everyone speaking the same language” and were identified as key enablers of maturity.
In more siloed environments, different teams defined and used metrics independently. A marketing manager recounted a clash between web analytics and UX teams: We realized the web analytics team defined ‘bounce rate’ differently from our UX team… both were right, but talking about different things. It was chaos.
Participants linked such conflicts to a lack of structured involvement, unclear ownership, and uneven data literacy. Departmental politics also surfaced: “Every team wants metrics that make them look good,” one respondent remarked.
Stakeholder collaboration is a defining dimension of the DDMM: Levels 1–2 are characterized by limited, informal collaboration, while Level 3 and above feature structured cross-functional involvement and clear roles. From an AI readiness lens, organizations with strong collaborative practices are better placed to define training labels, evaluate model outputs, and manage model governance in a way that reflects diverse perspectives and ethical considerations.
Tools and Templates Ecosystem
The third theme concerns the tools, templates, and documentation practices that support data definition. At higher maturity levels, organizations used shared artifacts such as KPI dictionaries, standardized tracking templates, and knowledge bases (e.g., Confluence, internal wikis, shared drives) to record definitions and governance rules. A telecom respondent explained: We have a Confluence wiki page that lists all our standard metrics with definitions and data sources. It’s open to everyone.
Agencies often developed reusable “measurement plan” templates and internal repositories of KPI examples by vertical, which consultants could adapt for new clients. Some organizations employed more specialized tools, such as data catalog features within BI platforms or dedicated tracking-plan software with version control. However, many high-maturity cases relied on relatively simple tools (spreadsheets, shared documents) used consistently.
By contrast, lower maturity environments either had no shared tools (“it’s basically tribal knowledge”) or had official documentation that was outdated and rarely consulted. One participant admitted that although their company had a KPI handbook, “no one updates it, so it’s out of date and people stopped looking at it.” The mere existence of a template or wiki was insufficient; processes and ownership were needed to keep these artifacts alive.
In the DDMM, the presence, quality, and use of tools and documentation differentiate levels. Level 1 has little or no documentation; Level 2 features basic, inconsistently used templates; Levels 3–5 involve systematic documentation, maintenance, and increasingly automated metadata management. For AI readiness, well-maintained data dictionaries and lineage information are essential for understanding and governing training data, tracing changes over time, and auditing model inputs.
Challenges and Pain Points in the Data Definition Lifecycle
The fourth theme captures the pain points that arise when data definition is weak or poorly governed. Across interviews, participants emphasized that “getting the numbers right” was often less a technical challenge than a definitional one, rooted in how concepts were framed, documented, and maintained over time.
A first recurring issue was the presence of inconsistent definitions across periods, which undermined longitudinal analysis. Teams would discover that metrics such as “active customers” had been counted differently in prior years than in the present, making year-over-year comparisons unreliable and forcing analysts to spend time reconstructing historical logic rather than interpreting results.
Participants also described persistent ambiguity and disputes within meetings. Considerable time was spent debating whose figures were “correct,” only to reveal that both sets of numbers were technically accurate but based on subtly different definitions. These definitional discrepancies eroded trust in dashboards and reports, and shifted attention away from substantive discussion toward clarification of basic terms.
A further pain point concerned onboarding. New analysts or marketers often struggled to decode undocumented acronyms, legacy naming conventions, and embedded business rules. In the absence of clear, accessible documentation, they were forced to reverse-engineer logic from queries or spreadsheets, slowing their integration into the team and increasing their dependence on informal knowledge holders.
Finally, participants highlighted compliance and ethical risks, particularly when defining metrics that involve personal data under GDPR or similar regulatory frameworks, where ambiguities in definition can translate into uncertainties about legal bases, consent, and appropriate data use (Martin & Murphy, 2017).
One in-house analyst described discovering that two teams had calculated “customer churn rate” differently for six months, forcing them to restate reports, “embarrassing and avoidable.” Another participant emphasized the ethical dimension: When defining any metric that uses customer data, we now involve our privacy officer… we learned that lesson after nearly rolling out a location-tracking KPI that would have quietly violated GDPR.
In more mature organizations, pain points shifted from firefighting to incremental improvement, such as automating access to metric definitions or refining governance workflows.
These challenges function as negative indicators in the DDMM: lower levels are characterized by frequent disputes, rework, and compliance scares; higher levels by their relative absence. For AI, unresolved definitional inconsistencies and undocumented changes translate directly into data drift and model risk, making maturity in data definition a practical control for AI governance.
Comparative Analysis: Organizational Context and Maturity Differences
A core objective of the study was to examine how methodological maturity in data definition varies across organizational contexts. We compare three archetypes, agencies/consultancies, in-house teams, and independent freelancers, highlighting typical maturity patterns and AI readiness implications.
Agencies/Consultancies
Agencies that provide marketing analytics services to multiple clients tended to exhibit higher baseline maturity in their internal data definition practices. Market pressure to deliver consistent quality across engagements encouraged them to codify repeatable methodologies (Côrte-Real et al., 2017). As one analytics director put it: We can’t reinvent the wheel for every client… so we have a standardized approach and toolkit for defining metrics. It’s one of our selling points.
Agencies often used generalized measurement-plan templates, later tailored to each client, and maintained internal repositories of KPI definitions by industry. They also played an educational role, guiding clients through the process of defining and documenting KPIs and sometimes introducing practices such as KPI glossaries that clients did not previously have.
However, agencies had to adapt to clients’ legacies and politics. One consultant recalled: Client A insisted on using their existing metric definitions even though we saw issues with them. We had to compromise and work with what they had, and gradually suggest improvements.
Thus, agencies often operated internally at Level 3–4 maturity but could be constrained to lower levels by client context.
Agencies’ relatively structured approaches, internal repositories, and cross-client experience generally positioned them well to support clients’ AI ambitions, at least from a data definition standpoint. Yet when clients resisted change, agencies’ ability to create AI-ready data ecosystems was limited, underscoring that AI readiness depends not only on technical capability but also on governance and client buy-in.
In-House Corporate Teams
In-house teams exhibited the widest range of maturity. At one end, data-driven firms, particularly in tech and digital-native sectors, had sophisticated governance: cross-functional data councils, enterprise KPI dictionaries, and formal review and change processes. These cases align with high levels in analytics maturity literature (Gupta & George, 2016; Langer, 2025).
At the other end, in more traditional or resource-constrained firms, data definition was ad hoc and reactive: “If leadership asks for a new KPI, we scramble… there’s no consistent system,” reported one mid-size manufacturing company analyst.
Many in-house teams fell in the middle, with partial structures (e.g., repositories that were not maintained) and episodic collaboration (e.g., workshops for major projects but not routine work). These mid-level maturity teams were often acutely aware of their gaps: “We know we should be doing X, we just haven’t gotten there yet.” Incidents such as misdefined KPIs leading to bad decisions sometimes triggered improvement efforts.
Champions, individuals who had seen more mature practices elsewhere, played an important role in raising maturity, echoing broader evidence on champion-driven change (Bonawitz et al., 2020; Vial, 2019).
In-house teams at Level 1–2 struggled to maintain stable definitions even for reporting, making AI projects risky or premature. By contrast, high-maturity in-house teams (Levels 3–4) had governance structures (e.g., data councils, stewards) and documentation that provided firmer ground for AI initiatives, including the ability to trace and justify how training labels and evaluation metrics were defined.
Independent Freelancers/Consultants
Independent practitioners brought personal maturity in their methods but operated under constraints of time, authority, and sustainability. Many freelancers described having their own KPI templates and structured questioning approaches, but using them “stealthily” in conversations so as not to overwhelm clients: I have a template in my back pocket for defining KPIs, but I don’t present it as some formal thing to the client.
They often delivered quick wins, clarifying key metrics, creating basic measurement plans, but worried about sustainability: I usually set up a decent measurement plan for clients, but after my contract, I’m not sure they keep it up.
Clients of freelancers were frequently at Level 1 before engagement and perhaps at early Level 2 when the consultant left. Without internal ownership, documentation and practices often atrophied. Freelancers, therefore, acted as temporary boundary spanners or catalysts (Levina & Vaast, 2005), accelerating maturity but not guaranteeing its persistence.
For smaller organizations relying on freelancers, AI-related aspirations were often aspirational rather than operational. Freelancers could introduce better definitions and highlight data quality needs, but in the absence of ongoing governance, the conditions for sustained, AI-ready data ecosystems remained fragile.
Discussion and Implications
Our findings offer several insights that enrich the academic discourse on marketing analytics and capability maturity, while also providing practical implications. In this section, we discuss the results in light of existing literature and highlight their significance for theory and practice. We then outline a roadmap of actionable steps for practitioners, and finally acknowledge limitations and future research directions.
This study advances research on marketing analytics and maturity modelling by foregrounding the micro-processes of data definition. Existing analytics maturity frameworks primarily emphasize infrastructure, analytical techniques, and culture (Grossman, 2018; Król & Zdonek, 2020; Langer, 2025). Building on general maturity principles (Becker et al., 2009; Paulk et al., 1993; Pöppelbuß & Röglinger, 2011), the DDMM specifies how data definition capabilities develop from ad hoc improvisation to routinized and strategically governed practices. In doing so, it offers a processual view of how organizations progress from fragmented to integrated approaches to defining marketing data.
We conceptualize data definition as a distinct and foundational phase of the marketing data lifecycle, in which business objectives are translated into operational constructs and constraints (Abraham et al., 2019; Wand & Wang, 1996). By delineating this phase, the DDMM integrates and extends literature on data quality, data governance, and stakeholder alignment: it shows how definitional decisions condition downstream analytic and strategic outcomes, rather than treating data definitions as a taken-for-granted backdrop to analytics.
The model contributes to socio-technical perspectives on data work by specifying how methodological maturity in data definition is jointly constituted by technical artifacts and social arrangements. Higher levels of the DDMM involve not only more sophisticated artifacts (documentation, catalogs, version control) but also more structured social processes (role configurations, cross-functional workshops, and training) (Barki & Hartwick, 1989; Bhattacharya & Korschun, 2008; Levina & Vaast, 2005). This socio-technical integration responds to calls to theorize how collaboration and stakeholder involvement shape data-intensive organizing.
The DDMM also extends research on data ethics in marketing by locating privacy and fairness concerns at the point of metric specification, rather than only at later stages of data use (Martin & Murphy, 2017). By embedding ethical checks at higher maturity levels, the model broadens governance perspectives from a focus on data quality towards questions about what is considered legitimate to measure, store, and model.
Finally, our findings theorize data definition maturity as a mechanism linking current marketing analytics practice to AI readiness. Lower maturity levels are characterized by unstable, undocumented, and inconsistently applied definitions that hinder the auditing of training data and render AI initiatives fragile and high-risk. Intermediate levels exhibit emerging routines and shared documentation that support more reliable feature, label, and metric design. At higher levels, continuous improvement, versioned metadata, and embedded ethical checks enable organizations to manage data quality over time and to align AI systems with evolving business and regulatory conditions. Thus, the DDMM reframes robust data definition as a necessary precondition, rather than a secondary technical detail, for credible AI initiatives in marketing analytics.
Managerial Roadmap for Improving Data Definition Maturity
Source: Own elaboration based on qualitative findings.
For agencies and consultants, a further implication is to explicitly position data definition maturity as part of their value proposition, while being realistic about client constraints. Freelancers, in turn, may maximize impact by ensuring that clients leave their engagement with clear documentation and an internal “owner” for maintaining it.
While the DDMM is grounded in rich qualitative data, future research should quantitatively validate and refine it. One promising avenue is to develop a measurement instrument based on the model’s indicators (e.g., presence of templates, degree of stakeholder collaboration, governance mechanisms, ethical checks). This scale could be used in survey research to: - Confirm the factor structure corresponding to maturity dimensions (e.g., via exploratory and confirmatory factor analysis). - Assess relationships between data definition maturity and outcomes such as analytics performance, marketing decision quality, or AI project success (e.g., via regression or structural equation modelling). - Explore contingent effects of context (sector, size, digital orientation) on the maturity–outcome link.
Such quantitative work would complement our qualitative findings and help embed the DDMM within broader maturity modelling and analytics effectiveness research (Germann et al., 2013; Grossman, 2018; Langer, 2025).
The Data Definition Maturity Model (DDMM)
In this section, we introduce the five-level Data Definition Maturity Model synthesized from our findings, linking each level to maturity theory and providing examples. Each level is characterized by practices identified in the Findings, with anonymized illustrations from participants. The DDMM serves as a framework for organizations to assess and improve their data definition maturity.
Level 1, Ad hoc: At Level 1, processes for data definition are chaotic or non-existent. Metrics and KPIs are defined in a reactive manner, often under duress (e.g., when a report is needed immediately). There is little to no documentation of metric definitions. Practices are person-dependent, if a particular analyst or manager leaves, their understanding of the data leaves with them. In the language of capability maturity models, this corresponds to the Initial stage (Paulk et al., 1993). One participant described this stage succinctly: “It’s the wild west. Everyone has their own version of the truth.” The absence of methodology leads to highly variable outcomes that depend on individual skill and memory, echoing what Doherty and King (2001) described about lacking methodology resulting in person-dependent results. Ethical or governance considerations are largely absent at this stage, it’s purely survival mode, focusing on delivering numbers by any means. The risk of errors, miscommunication, and decision-making blunders is highest at Level 1, as evidenced by multiple incidents participants shared. Organizations at this level often do not realize the root cause of their analytics frustrations is the ad hoc nature of their data definitions. AI readiness is very low. Data is unstable, poorly documented, and unsuitable as a reliable foundation for training or monitoring AI models.
Level 2, Emerging: Level 2 represents the introduction of basic structure and templates. There may be rudimentary templates for defining metrics (e.g., a one-page KPI definition form) or an initial attempt at a metrics glossary. However, usage of these is inconsistent. Documentation might exist but be incomplete or quickly outdated. The process still relies on specific individuals to drive it, for example, one proactive analyst might be populating a spreadsheet of definitions, but it’s not institutionally mandated. Stakeholder collaboration is still limited; definitions may be improved within the analytics team, but not all relevant departments are fully involved. This level can be seen as analogous to a Repeatable but not yet fully Defined process in maturity model terms (Becker et al., 2009). An example: a mid-size company created a basic KPI catalog after a consulting engagement (emerging structured approach), but months later only half the new projects were actually registering their metrics in it. Level 2 is an important transitional stage; it is where awareness grows and quick wins (like solving a recurring misdefinition problem) demonstrate the value of a structured approach, building momentum for more formal practices. AI readiness is low. Some definitional clarity exists, enabling exploratory analytics or pilots, but inconsistent adoption and weak governance create significant risk for sustained AI deployment.
Level 3, Collaborative: The process for data definition becomes standardized and documented across the organization. There is a defined procedure (e.g., as part of marketing campaign kickoff, a metric definition workshop is required). Key practices include cross-functional collaboration in defining metrics, marketing, analytics, IT, etc., all have a voice (Barki & Hartwick, 1989; Bhattacharya & Korschun, 2008). Documentation is maintained, typically in a shared repository or wiki, and is kept reasonably up-to-date. The organization has moved beyond dependence on individuals; roles and responsibilities for data definition are assigned (for instance, “the Analytics Manager must approve all new KPI definitions”). This level reflects a mature Defined stage of process maturity, where standards are in place (Wendler, 2012). Boundary spanning roles often emerge at this stage to facilitate collaboration (Levina & Vaast, 2005). Many of the pain points of earlier levels (misunderstandings, conflicting definitions) are greatly reduced, as participants from such organizations attested. However, challenges can still occur, especially if the scope changes or new types of metrics arise that the current process hasn’t encountered. Ethically, Level 3 organizations start integrating governance checks, e.g., a legal review might be required if a new metric involves personal data, embedding privacy considerations into the definition process (Martin & Murphy, 2017). AI readiness is Moderate to high. Organizations can reliably define and track features and labels, monitor changes, and engage appropriate stakeholders in AI-related governance.
Level 4, Strategic: Represents a further refinement where data definition practices are not only standardized but also managed for continuous improvement. Organizations at this level monitor and audit their data definition process itself. They institutionalize lessons learned, if a miscommunication somehow occurs, they treat it as a process failure to be analyzed and fixed, not just a one-off error. In capability maturity parlance, this approaches the Managed stage, where processes are measured and controlled (Wendler, 2012). Our study had fewer clear examples at this level, as it is quite advanced, but one large tech firm came close: they had a governance committee that not only approved definitions (Level 3 behavior) but also regularly revisited whether those definitions were still serving the business needs, and they used feedback from data consumers to refine metric definitions or documentation for clarity. Tools might include specialized software for metadata management, with features like version control and automated lineage tracking (so if a definition changes, reports using it are flagged). Ethical considerations and compliance are fully embedded: for example, if a new metric is defined, the system might automatically prompt a privacy impact assessment if needed, ensuring no step is missed. Level 4 organizations thus experience very few data definition-related problems; they proactively manage the quality and appropriateness of their metrics. AI readiness is high. Stable, well-governed definitional infrastructure enables systematic AI development and monitoring, including impact assessments and traceability of model inputs over time.
Level 5, Optimized: Data definition maturity becomes part of the organization’s continuous improvement and strategic planning cycles. This is akin to the Optimizing stage in classic maturity models (Paulk et al., 1993). At this highest level, the organization not only has optimized internal processes but also aligns data definition practices with strategic objectives dynamically. For instance, when the business strategy shifts, there is a mechanism to revisit and realign KPIs and their definitions to the new strategy (Reich & Benbasat, 2000 would consider this high strategic alignment in information systems). Organizations at Level 5 treat data definition as a strategic capability in itself, they might benchmark their practices against industry bests, invest in training employees on data literacy and definition skills, and perhaps even influence external standards (for example, contributing to industry-wide definitions for certain digital marketing metrics). A hallmark of this level is that methodological maturity in data definition feeds a broader culture of data-driven decision-making. Marketing teams trust the data because they trust the definitions and process behind it, leading to widespread use of metrics in strategic decisions (Wedel & Kannan, 2016). Moreover, at Level 5, the organization is agile in its data definitions: it can evolve metrics responsibly as the business evolves, without losing comparability or stakeholders’ trust. In terms of ethics and governance, Level 5 organizations are likely leaders, ensuring their metrics and data usage not only comply with laws but set an example for transparency and accountability (Martin & Murphy, 2017). While none of the organizations in our study likely achieved full Level 5 (which is aspirational), elements of it were visible in the most advanced ones, such as the attitude of continuous refinement and viewing the data definition process as vital infrastructure to be maintained.
Data Definition Maturity Model (DDMM): Summary of Levels 1–5
Source: Own elaboration based on qualitative findings.
Figure 2 illustrates the Data Definition Maturity Model (DDMM), tracing the evolution of data practices through real-world scenarios. This progression ranges from a startup’s Ad Hoc use of personal formulas and a mid-sized company’s Emerging spreadsheet tracking, to the Defined governance of cross-functional wikis. It culminates in the Managed oversight of formal data councils and the Optimized, AI-driven strategies employed by top-tier tech firms. Progression of real-word practices from reactive individual efforts to proactive embed data culture
Conclusion
This paper has examined the often-overlooked data definition phase of marketing analytics. Through 40 qualitative interviews across agencies, in-house teams, and freelancers, we have shown that how organizations define, document, and govern metrics is highly variable, and consequential. Low maturity produces recurring confusion, rework, and compliance risks; higher maturity supports more reliable analytics, stronger data cultures, and AI readiness.
The proposed Data Definition Maturity Model (DDMM) offers a structured way to conceptualize and improve methodological maturity in this phase, integrating insights from maturity modelling, stakeholder theory, data governance, and ethics. Practically, it provides marketing analytics leaders and practitioners with a roadmap to diagnose their current state, prioritize next steps, and make a persuasive case for investing in foundational work that often feels unglamorous compared to dashboards or models.
Ultimately, we argue that investing in rigorous data definition is not optional but foundational. Establishing shared, explicit definitions early aligns stakeholders, reduces ambiguity, and accelerates downstream value creation (Alhassan et al., 2016). As one participant succinctly put it, “If you don’t define what success looks like at the start, no amount of data later will give you a clear answer.” For organizations aiming not just to deploy analytics and AI, but to do so responsibly and effectively, the journey toward maturity begins with that definition work.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
