MNR-FP-Growth: An Intelligent Pattern Mining Algorithm for Energy-Aware Redundancy Reduction in Building Management

Abstract

Effective building energy management (BEM) relies on extracting actionable intelligence from vast volumes of operational data. Association Rule Mining (ARM) is a cornerstone of knowledge discovery; however, traditional techniques face a dual challenge: either they generate a prohibitive volume of redundant patterns that obscure decision-making, or they employ aggressive pruning that compromises interpretability. In addition, existing approaches often treat energy awareness as a disjointed post-processing step rather than an intrinsic systemic constraint. To address these issues, this paper proposes Minimal Non-Redundant FP-Growth (MNR-FP-Growth), an intelligent algorithm that integrates energy-semantic redundancy pruning directly into the frequent pattern mining process. The core innovation lies in a multi-criteria pruning mechanism: a pattern is eliminated only when it is found to be both statistically redundant and operationally inefficient–defined as having equivalent support to a subset but an equal or higher energy penalty. This approach ensures the retention of a compact, non-redundant set of patterns with high operational utility. Through rigorous evaluation using a year of real-world building operational data, MNR-FP-Growth achieves an 11.2% reduction in pattern redundancy compared to standard FP-Growth. Moreover, the algorithm operates 61% faster than the state-of-the-art FP-Close while maintaining high structural integrity (Jaccard similarity $\geq$ 0.96). Applying the framework to rule generation, the analysis shows that 34% of derived rules provide zero-cost energy-saving opportunities, such as optimizing comfort parameters during low-occupancy intervals. By unifying statistical rigor with energy semantics, MNR-FP-Growth offers a robust, computationally efficient foundation for intelligent data-driven optimization in smart building systems.

Keywords

Energy-aware data mining mNR-FP-growth energy efficiency intelligent building management frequent pattern mining redundancypruning

1. Introduction

Operational efficiency in buildings is a crucial factor for global energy sustainability, as it accounts for a substantial share of worldwide electricity consumption and carbon emissions (Pérez-Lombard et al., 2008). Despite advancements in sensing and data acquisition that have offered unprecedented insights into building operations, a significant challenge persists: converting extensive multivariate time-series data into actionable insights for energy savings, all while maintaining occupant comfort (Amasyali & El-Gohary, 2018).

Association Rule Mining (ARM) (Agrawal et al., 1993) has emerged as a powerful technique for extracting actionable insights from extensive datasets pertaining to building operations. It is particularly effective in identifying significant co-occurrence relationships among operational states, such as occupancy, equipment settings, and environmental conditions (Fan et al., 2015). Nevertheless, the practical implementation of ARM often generates an excessive number of rules, which can obscure genuinely actionable insights, increase computational demands, and complicate decision-making for facility managers. This phenomenon, known as pattern explosion, has led to the development of condensed representations, although these provide only partial mitigation and entail certain trade-offs. For example, FP-Close reduces the output size by retaining only closed itemsets—those with no proper superset sharing identical support—but often retains longer, less interpretable patterns (Pasquier et al., 1999). FP-Max achieves extreme compression by retaining only maximal itemsets—those with no frequent superset—but at the expense of discarding valuable intermediate patterns crucial for comprehending operational sequences (Grahne & Zhu, 2003). Conversely, setting a high minimum support threshold to avoid pattern explosion introduces the complementary pattern missing problem, where rare but potentially valuable patterns are discarded (Kiran & Reddy, 2009). Additionally, while weighted ARM methods incorporate domain-specific importance, they generally apply it as a computationally inefficient post-processing step (Ahmed et al., 2008), thereby failing to address the fundamental issue of redundancy during the mining process itself.

In contrast, the proposed MNR-FP-Growth (Minimal Non-Redundant FP-Growth) introduces a fundamentally different pruning logic. Rather than examining all proper supersets (as in FP-Close) or all frequent supersets (as in FP-Max), it operates at the granularity of immediate subsets. A pattern is retained only when it adds new frequency information or provides energy benefits over its immediate subsets. This approach yields the minimal representative per support class—the smallest itemset capturing the frequency information—unless a larger pattern provides superior energy efficiency. By integrating energy-aware pruning directly into the mining recursion, MNR-FP-Growth eliminates statistically redundant and energetically inefficient patterns at the point of discovery rather than through disjoint post-filtering. This during-mining approach avoids the computational overhead of generating exhaustive pattern sets only to later prune them, while ensuring that energy semantics actively guide exploration of the search space. A conceptual overview comparing MNR-FP-Growth against standard and condensed methods is presented in Figure 1.

Figure 1.

Comparison of pattern mining approaches, positioning MNR-FP-Growth against standard and condensed methods.

The primary contributions of this work are as follows:

A novel pruning criterion applied during mining that simultaneously ensures pattern minimality while considering energy cost.

The first application of weight-aware pruning to the building energy management domain, where patterns carry operational semantics beyond statistical frequency.

Demonstration of practical impact: 61% speedup over FP-Close and 34% zero-cost energy-saving rules, translating algorithmic efficiency into actionable energy conservation.

The remainder of this paper is structured as follows: Section 2 2 reviews related work in building data mining and condensed representations. Section 33 details the methodology and the MNR-FP-Growth algorithm. Section 44 presents the experimental results and analysis. Section 55 discusses limitations. Finally, Section 66 offers concluding remarks.

2. Background

Contemporary architectural structures generate substantial amounts of diverse data from sensors, automation systems, and the Internet of Things (IoT) platforms, thereby presenting opportunities to transition from conventional rule-based control to more adaptive, data-driven strategies. In this context, frequent pattern mining has emerged as a potent tool for extracting operational insights. However, its direct application in energy domains encounters several challenges, including rule redundancy, interpretability, and the limited integration of domain semantics. The subsequent subsections examine the foundations and limitations of existing approaches, underscoring the motivations for the proposed MNR-FP-Growth algorithm.

2.1. Data-Driven Building Operations

The transition from static, schedule-based control to dynamic, data-driven management is a cornerstone of modern building energy efficiency. The availability of high-resolution data from building automation systems and IoT sensors has enabled a shift towards diagnostic and prognostic analytics (Drgoňa et al., 2020). Within this paradigm, unsupervised learning techniques such as clustering and pattern mining are particularly valuable for uncovering hidden structures without requiring pre-labeled data, which is often scarce in building applications (Miller et al., 2018). These methods provide interpretable insights that can directly inform operational strategies.

2.2. Pattern Mining for Energy Analytics

Association Rule Mining (ARM) (Agrawal et al., 1993) offers a framework for quantifying and extracting if-then relationships within operational data. While its applications originated in traditional market basket analysis, ARM now extends to complex systems like buildings, where it has been used to identify key drivers of energy demand (Wang et al., 2018), diagnose faulty operational sequences (Zhang et al., 2019), and analyze system-level loads (Fan et al., 2015). Additionally, the efficiency of algorithms such as Frequent Pattern Growth (FP-Growth) (Han et al., 2004) has made data-driven methods practical for large-scale building datasets, supporting applications including occupancy detection from smart meters (Kleiminger et al., 2013). More recently, in response to the challenge of overwhelming rule sets, post-mining approaches have been explored to filter or aggregate rules, aiming to distill actionable knowledge (Zhang et al., 2020).

The key features relevant to energy-aware pattern mining in building management — including temporal (hour, day, season), environmental (temperature, humidity, solar radiation), occupancy, HVAC operational states, end-use loads, and thermal comfort — are detailed in Section 3.1 and summarized in Table 1. These features collectively enable the discovery of actionable patterns for energy optimization.

Table 1.
Discretization Scheme with Normalized Illustrative Weights.

Variable Categories (with normalized illustrative weights) Additional details

Relative humidity LowHumidity (0.0) NormalHumidity (0.2) HighHumidity (0.4) <30% 30–60% >60%

Solar radiation LowRadiation (0.0) MediumRadiation (0.3) HighRadiation (0.6) $\leq$ 100 W/m $^{2}$ 101–500 W/m $^{2}$ >500 W/m $^{2}$

Occupancy Empty (0.0) LowOccupancy (0.1) HighOccupancy (0.3) 0 occupants 1–5 occupants >5 occupants

HVAC load LowHVAC (4.0) MediumHVAC (8.0) HighHVAC (12.0) quantile-based

Lighting load LowLighting (0.5) MediumLighting (1.0) HighLighting (1.5) quantile-based

MELS load LowMELS (0.2) MediumMELS (0.5) HighMELS (0.8) quantile-based

Supply fan speed FanOff (0.0) LowSpeed (0.4) MediumSpeed (0.8) HighSpeed (1.2) 0% 1–30% 31–70% >70%

Zone temperature ColdZone (0.5) ComfortZone (0.0) HotZone (0.7) <18 $\circ$ C 18–24 $\circ$ C >24 $\circ$ C

Variable	Categories (with normalized illustrative weights)	Additional details
Relative humidity	LowHumidity (0.0) NormalHumidity (0.2) HighHumidity (0.4)	<30% 30–60% >60%
Solar radiation	LowRadiation (0.0) MediumRadiation (0.3) HighRadiation (0.6)	$\leq$ 100 W/m $^{2}$ 101–500 W/m $^{2}$ >500 W/m $^{2}$
Occupancy	Empty (0.0) LowOccupancy (0.1) HighOccupancy (0.3)	0 occupants 1–5 occupants >5 occupants
HVAC load	LowHVAC (4.0) MediumHVAC (8.0) HighHVAC (12.0)	quantile-based
Lighting load	LowLighting (0.5) MediumLighting (1.0) HighLighting (1.5)	quantile-based
MELS load	LowMELS (0.2) MediumMELS (0.5) HighMELS (0.8)	quantile-based
Supply fan speed	FanOff (0.0) LowSpeed (0.4) MediumSpeed (0.8) HighSpeed (1.2)	0% 1–30% 31–70% >70%
Zone temperature	ColdZone (0.5) ComfortZone (0.0) HotZone (0.7)	<18 $\circ$ C 18–24 $\circ$ C >24 $\circ$ C

Recent advances have extended pattern mining to dynamic and weighted scenarios. Nguyen et al. (2023) introduced frequent weighted utility pattern mining with dynamic item weights, demonstrating the growing interest in weighted pattern mining. However, this approach does not integrate energy-aware pruning during FP-Growth recursion as proposed in this work.

2.3. Challenges of Rule Explosion

A well-known limitation of ARM is the exponential growth of its output. The vast number of generated rules creates interpretability and computational challenges for downstream applications (Zhang et al., 2020). To address this, compact representations using closed and maximal itemsets have been developed. Closed itemset mining (Pasquier et al., 1999) identifies the longest patterns for each unique support value, while maximal itemset mining (Grahne & Zhu, 2003) finds the broadest patterns without any frequent supersets. Each method offers a distinct balance between conciseness and completeness, but neither fully eliminates redundancy in energy analytics.

Recent studies on closed pattern mining have addressed the problem of pattern explosion in weighted and utility-based contexts. Le et al. (2026) proposed an efficient algorithm for mining frequent weighted utility closed patterns, demonstrating that closed representations can significantly reduce pattern counts while preserving information. However, these methods do not incorporate energy semantics or immediate-subset pruning during mining.

2.4. Incorporating Semantic Weights

An important evolution in pattern mining has been the shift from purely frequency-based discovery to approaches that incorporate domain semantics through weighted measures (Tseng et al., 2015). This perspective enables the mining process to prioritize patterns involving items of greater significance, such as high-impact energy states. Nevertheless, a common limitation of existing approaches is that item weights are often assumed to be fixed or are only introduced as a post-processing filter (Ahmed et al., 2008). Such decoupling reduces computational efficiency, since the mining procedure itself remains unaware of semantic importance during pattern construction.

The literature reports efforts to incorporate semantic or automated weights in pattern mining. Datta et al. (2021) proposed an automated weighting scheme for unweighted transactional databases based on inter-item links, while Wan et al. (2017) introduced semantic intensity for trajectory pattern mining, and Palmes et al. (2010) used tf-idf weighting for activity recognition. Despite demonstrating the value of automated weighting across various domains, none integrate automated weighting with immediate-subset pruning during FP-Growth recursion or address building energy management. This gap underscores the need for algorithms that deliver semantically meaningful, efficient, and interpretable patterns, specifically for building energy management.

3. Methodology

This section outlines the framework for discovering interpretable, energy-aware patterns from building operational data. The process begins with a description of the datasets and preprocessing steps to ensure high-quality inputs. Building on this foundation, the MNR-FP-Growth algorithm is then introduced, which extracts frequent patterns while pruning redundant itemsets. Once these patterns are identified, the methodology moves to the energy-aware association rule generation process, which extends mined patterns with operational weights to highlight low-cost energy-saving rules. By integrating these steps sequentially, this methodology combines statistical reliability with operational relevance, providing facility managers with actionable insights for energy-efficient control.

3.1. Data Collection and Preprocessing

The datasets used in this study comprised multi-source operational data from a commercial building in Berkeley, California, USA, spanning the full year of 2019 (Hong et al., 2021). The data included weather measurements (solar radiation, relative humidity), occupancy sensors, electricity end-uses (heating, ventilation, and air conditioning (HVAC), lighting, and miscellaneous electrical loads (MELS), HVAC control signals (supply fan speed), and indoor zone temperatures. These heterogeneous streams exhibited temporal misalignment, differing resolutions, and missing values due to sensor malfunctions and communication errors.

To ensure analytical consistency, a structured preprocessing pipeline consisting of temporal alignment, missing data imputation, and discretization was implemented. All datasets were unified into an hourly resolution framework comprising 8,760 timestamps (365 days $\times$ 24 hours).

Missing values were addressed using a tiered imputation strategy inspired by Cho et al. (Cho et al., 2020), with methods selected according to the size of the missing gap:

Short gaps (<8 hours): linear interpolation to preserve local dynamics.

Moderate gaps (8–48 hours): K-Nearest Neighbors (KNN) imputation to exploit inter-variable correlations.

Extended gaps (>48 hours): matrix factorization (SoftImpute) to reconstruct long sequences from latent structures.

This strategy reduced the overall missing-data rate from 28.6% to 0.0%, while preserving both temporal patterns and inter-variable dependencies.

Since ARM requires categorical inputs, continuous features were discretized into semantically meaningful states derived from domain knowledge and building management practices. This transformation preserved interpretability for facility managers while enabling efficient symbolic pattern discovery.

The numerical weights in Table 1 do not represent exact physical consumption values. Instead, they are normalized illustrative values that reflect relative operational intensities (e.g., HighHVAC > MediumHVAC > LowHVAC; HighLighting > LowLighting). For the proposed MNR pruning mechanism, only ordinal relationships are critical; precise magnitudes are not. In practical deployments, these weights can be replaced with measured energy consumption values from building meters or simulation tools.

The normalized weights in Table 1 are constructed as intra-variable ordinal scales, not as globally normalized values. For each variable, weights are assigned to preserve the relative ordering of operational intensity (e.g., HighHVAC > MediumHVAC > LowHVAC), but the numerical magnitudes are chosen for illustrative convenience rather than cross-variable comparability. This design choice is intentional and sufficient for the MNR pruning mechanism, which relies only on ordinal relationships within itemsets.

Specifically:

For HVAC load, weights 4.0, 8.0, 12.0 reflect that HighHVAC consumes more than MediumHVAC, which consumes more than LowHVAC.

For humidity, weights 0.0, 0.2, 0.4 preserve the same ordinal property but on a different numerical scale.

The absolute difference between 12.0 (HighHVAC) and 1.5 (HighLighting) does not imply that HVAC consumes exactly 8 $\times$ more energy than lighting; these values are not directly comparable across variables.

When items from different variables are combined into an itemset, the total weight $E (X) = \sum E (i)$ serves as a heuristic for aggregate operational intensity. The pruning condition $E (P) \geq E (S)$ remains valid because:

Ordinal relationships within each variable are preserved,

Summation preserves the relative ranking of itemsets as long as the ordinal structure is consistent,

The criterion uses comparison ( $\geq$ ), not absolute thresholds.

In practical deployments, these illustrative weights would be replaced with empirically measured energy values (e.g., kWh) from building meters, at which point cross-variable comparability is naturally established. The current weighting scheme is sufficient to demonstrate the algorithm’s functionality and is consistent with standard practice in weighted ARM literature (Ahmed et al., 2008).

Thus, the weights serve as a proxy for real consumption values. As shown later in Section 4, even with normalized weights, the algorithm effectively identifies redundant patterns and highlights energy-saving rules. The normalized illustrative weights are adopted solely to demonstrate the algorithm’s functionality while preserving consistent ordinal relationships across all variables, a common practice in weighted ARM literature (Ahmed et al., 2008).

The final preprocessed dataset contained 8,760 hourly instances, each represented by nine categorical attributes capturing environmental conditions, occupancy states, energy system loads, and thermal comfort levels. This symbolic representation provided a compact yet interpretable basis for subsequent knowledge discovery.

Table 2 presents a sample of the preprocessed dataset, illustrating the categorical representation used for pattern mining.

Table 2.
Sample of the Preprocessed Dataset (Representative Rows).

Time Humidity Solar Occupancy HVAC Lighting MELS Fan Temp

Night LowHumidity MediumRadiation LowOccupancy MediumHVAC LowLighting LowMELS FanOff ComfortZone

Morning LowHumidity LowRadiation Empty LowHVAC LowLighting LowMELS FanOff ComfortZone

Afternoon NormalHumidity MediumRadiation Empty MediumHVAC LowLighting LowMELS FanOff ComfortZone

Evening LowHumidity MediumRadiation Empty MediumHVAC LowLighting LowMELS FanOff ComfortZone

Time	Humidity	Solar	Occupancy	HVAC	Lighting	MELS	Fan	Temp
Night	LowHumidity	MediumRadiation	LowOccupancy	MediumHVAC	LowLighting	LowMELS	FanOff	ComfortZone
Morning	LowHumidity	LowRadiation	Empty	LowHVAC	LowLighting	LowMELS	FanOff	ComfortZone
Afternoon	NormalHumidity	MediumRadiation	Empty	MediumHVAC	LowLighting	LowMELS	FanOff	ComfortZone
Evening	LowHumidity	MediumRadiation	Empty	MediumHVAC	LowLighting	LowMELS	FanOff	ComfortZone

3.2. Minimal Non-Redundant FP-Growth

Frequent pattern mining in building energy datasets often yields numerous redundant itemsets. Specifically, larger itemsets may share the same support as their immediate subsets, offering no new frequency information while increasing both output size and downstream processing cost. Several condensed representations have been proposed: FP-Close (Pasquier et al., 1999) favors longer supersets when supports match, reducing redundancy but compromising both compactness and interpretability, whereas FP-Max (Grahne & Zhu, 2003) prunes all subsets, risking the discarding of informative intermediate patterns.

Building on these methods, this work formally defines pattern compactness as the total number of frequent itemsets generated by an algorithm for a given minimum support threshold. Greater compactness corresponds to a lower pattern count, as more information is condensed into fewer itemsets. The proposed evaluation compares compactness and information fidelity across baseline methods–FP-Growth, FP-Close, and FP-Max—using Jaccard similarity with FP-Close.

To address the identified limitations, MNR-FP-Growth is proposed. This variant integrates redundancy pruning directly into the mining process, retaining only the shortest non-redundant itemsets, thereby preserving interpretability while reducing runtime and output size.

Each categorical state (e.g., HighHVAC, LowLighting) is associated with a normalized illustrative weight reflecting its relative operational intensity (see Table 1). For an itemset

X = {i_{1}, i_{2}, \dots, i_{n}},

its total weight is defined as

E (X) = \sum_{k = 1}^{n} E (i_{k}) .

(1)

A pattern $P$ is defined as redundant if there exists at least one immediate subset $S \subset P$ such that

support (P) = support (S) and E (P) \geq E (S) .

(2)

Intuitively, if adding an item to $S$ does not change support and does not decrease the total weight, the larger pattern provides no additional frequency information and is no more favorable under the weight scale. Such patterns can therefore be safely pruned.

It is important to note that the pruning criterion uses only ordinal relationships between weights, not their absolute magnitudes. The condition $E (P) \geq E (S)$ requires only that the total weight of the larger pattern is not less than that of its subset — a comparison that remains valid even with approximate weights, as long as the ordinal ranking of items by energy intensity is preserved (e.g., HighHVAC always has greater weight than LowHVAC). This robustness to approximate weights is a key advantage of the MNR approach: precise metered values are not required; only the relative ordering of operational intensity matters for redundancy detection.

MNR-FP-Growth performs this redundancy check during mining while merging conditional subtree patterns. The recursive procedure is summarized in Algorithm 1.

Formally, the set of mined patterns is:

F_{MNR} = {P \subseteq I | \begin{aligned} support (P) \geq σ, \forall S \subset P : \\ support (S) \neq support (P) \lor E (S) > E (P) \end{aligned}}

(3)

where

I

is the universe of items and

σ

is the minimum support threshold.

Example

Using the weights in Table 1, consider:

If both ${LowHVAC, FanOff}$ and ${LowHVAC}$ occur 50 times, then

E ({LowHVAC, FanOff}) = 4.0 \geq E ({LowHVAC}) = 4.0,

so the larger pattern is pruned as redundant. Figure 2 illustrates this pruning on an FP-tree branch: the child node representing the redundant extension is removed because it does not increase support and is not lighter under the weight scale Table 3.

Figure 2.

Illustration of FP-tree branch pruning under the MNR criterion.

Table 3.

Illustrative Weights for the Example Pattern.

Item	Weight
LowHVAC	4.0
FanOff	0.0
ComfortZone	0.0

Implementation

MNR-FP-Growth was implemented by adapting the PyFPGrowth codebase. All experiments were conducted on a workstation running Windows 11 (64-bit) with an AMD Athlon Silver 3050U processor (2.30 GHz) and 8 GB RAM. The algorithms were implemented in Python 3.9 by extending the PyFPGrowth library. Runtime measurements were performed in single-threaded mode to ensure fair comparison across all methods.

Key modifications include:

Use of normalized illustrative weights for all discrete states.

Itemset weight computation to enable redundancy checks.

A pruning counter to track how many patterns are removed on-the-fly.

A toggle to switch between baseline FP-Growth (no pruning) and MNR-enabled mode.

By pruning redundant, higher-weight supersets during mining, MNR-FP-Growth produces a more compact, frequency-relevant pattern set without requiring a separate post-processing step.

3.3. Energy-Aware Association Rule Generation

Once frequent patterns are extracted using the proposed MNR-FP-Growth algorithm, the framework derives interpretable knowledge in the form of association rules. While classical ARM (Agrawal et al., 1993) evaluates rules based on support and confidence, the presented framework extends this process by incorporating an explicit energy-awareness component. This addition is crucial, as patterns alone do not indicate whether the discovered relations preserve operational efficiency or introduce additional operational intensity.

Formally, let $T$ denote the set of transactions and $P \subseteq I$ a frequent itemset, where $I$ is the set of all items. The support of $P$ is defined as

Supp (P) = \frac{| {t \in T ∣ P \subseteq t} |}{| T |} .

Given a non-trivial partition

P = X \cup Y

with

X \cap Y = \emptyset

, an association rule is denoted as

X \to Y

, with confidence

Conf (X \to Y) = \frac{Supp (X \cup Y)}{Supp (X)} .

To integrate energy awareness, each item

i \in I

is associated with a normalized illustrative weight

E (i)

that reflects its relative operational intensity (e.g., HVAC load, lighting level, or occupancy state). The total weight of an itemset

S

E (S) = \sum_{i \in S} E (i) .

For a rule

X \to Y

, the additional operational weight introduced by the consequent is defined as

Δ E (X \to Y) = E (Y),

since the weight of the antecedent

E (X)

is already present in the operating state. A rule is energy-saving if

Δ E = 0

, i.e., it adds no extra operational intensity. Otherwise,

Δ E > 0

quantifies the additional operational weight required by the consequent.

This formulation allows rules to be evaluated using two complementary perspectives: (i) frequency-based measures — support and confidence, which quantify the statistical strength of associations in the transactional data following standard ARM convention (Agrawal et al., 1993); and (ii) operational relevance — the weight-based energy cost $Δ E$ , which reflects the practical impact of applying the rule. In ARM literature, ”statistical evaluation” refers to these probability-based measures, not to inferential statistics such as hypothesis testing.

The energy-aware rule generation procedure is summarized in Algorithm 2. For each frequent pattern $P \in F$ , all possible antecedent–consequent splits are explored. Support and confidence are computed as in classical ARM, while the weights of the antecedent, consequent, and full pattern are calculated. Each rule is then labeled as energy-saving or costly depending on whether $Δ E = 0$ or $Δ E > 0$ .

Complexity Analysis

Algorithm 2 iterates over each frequent pattern $P \in F$ and enumerates all $2^{| P |} - 2$ possible non-empty antecedent-consequent splits via combinations. For each candidate rule, confidence computation is $O (1)$ using precomputed supports, while weight calculation requires summing the weights of items in the antecedent and consequent, costing $O (| P |)$ . The total time complexity is therefore $O (\sum_{P \in F} 2^{| P |} \cdot | P |)$ . Notably, the patterns produced by MNR-FP-Growth tend to have small cardinality (small $| P |$ ), and the minimum confidence threshold prunes many candidates early. As a result, the algorithm runs efficiently on the pattern sets generated by the proposed method, supporting subsequent rule analysis.

Building on this efficiency, the resulting rules provide dual insight: (i) statistical reliability through support and confidence, and (ii) operational relevance through the weight-based energy proxy. These rules can be ranked jointly by $Δ E$ and confidence, enabling the identification of cost-free operational adjustments ( $Δ E = 0$ ) as immediate energy-saving opportunities, while highlighting costly rules as trade-offs between reliability and operational impact.

To ensure comprehensive evaluation, while minsup is varied to assess scalability, $minsup = 2 %$ is fixed for the energy-aware rule analysis (Section 4.3), thereby balancing coverage and interpretability Figure 3.

Figure 3.

Scalability comparison of redundancy-aware baselines across minsup thresholds. (a) Number of patterns vs. minsup and (b) Runtime vs. minsup

4. Results and Discussion

4.1. Comparison with Redundancy-Aware Baselines

The effectiveness of MNR-FP-Growth was evaluated against established redundancy-handling baselines—FP-Growth, FP-Close, and FP-Max—across minimum support thresholds from 0.5% to 30%.

The results, summarized in Table 4, reveal several key trends. As expected, higher minsup values yield fewer patterns and faster runtimes for all methods. Crucially, MNR-FP-Growth achieves high compactness, producing 10,985 patterns at 0.5% minsup compared to 13,476 for FP-Growth – an 18.5% reduction. This pattern count is nearly identical to FP-Close (10,524), confirming that MNR-FP-Growth preserves the informational content of closed itemsets while achieving comparable compactness. Unlike FP-Max, which achieves extreme compactness (1,108 patterns) by discarding intermediate patterns, MNR-FP-Growth retains fine-grained operational information essential for energy analysis.

Table 4.
Pattern Counts and Runtime Performance Across Different Minimum Support Thresholds.

Method 0.5% 2.0% 5.0% 10.0% 20.0% 30.0%

Number of Patterns

FP-Growth 13,476 4,547 1,662 556 156 49

FP-Close 10,524 4,025 1,521 520 150 49

FP-Max 1,108 494 239 107 48 17

MNR-FP-Growth 10,985 4,039 1,521 520 150 49

Runtime (seconds)

FP-Growth 5.47 4.44 2.87 1.95 0.81 0.37

FP-Close 15.33 5.73 3.07 1.92 0.82 0.36

FP-Max 5.67 4.19 2.96 2.18 0.81 0.35

MNR-FP-Growth 5.96 4.18 2.95 1.96 0.81 0.37

Method	0.5%	2.0%	5.0%	10.0%	20.0%	30.0%
Number of Patterns
FP-Growth	13,476	4,547	1,662	556	156	49
FP-Close	10,524	4,025	1,521	520	150	49
FP-Max	1,108	494	239	107	48	17
MNR-FP-Growth	10,985	4,039	1,521	520	150	49
Runtime (seconds)
FP-Growth	5.47	4.44	2.87	1.95	0.81	0.37
FP-Close	15.33	5.73	3.07	1.92	0.82	0.36
FP-Max	5.67	4.19	2.96	2.18	0.81	0.35
MNR-FP-Growth	5.96	4.18	2.95	1.96	0.81	0.37

The Jaccard similarity (Jaccard, 1901) between the sets mined by MNR-FP-Growth and FP-Close was consistently $\geq 0.96$ , indicating that on-the-fly pruning removes redundancy while retaining all useful knowledge essentially. For example, at 0.5% minsup, MNR prunes 2,491 redundant patterns (18.5%) while maintaining coverage almost indistinguishable from FP-Close. This reduces analyst cognitive load without sacrificing completeness.

Runtime comparisons highlight the efficiency advantage of MNR-FP-Growth. While FP-Growth is fastest, it retains all redundancies. FP-Close incurs significantly higher computational cost due to closure-based checks that compare each candidate against conditional subtree supersets. At 0.5% minsup, FP-Close required 15.33 seconds versus 5.96 seconds for MNR-FP-Growth, a 61.1% improvement. FP-Max achieves compactness but discards fine-grained intermediate patterns. By contrast, MNR-FP-Growth delivers the interpretability and coverage of FP-Close while maintaining runtimes competitive with FP-Growth and FP-Max.

Collectively, these results indicate that MNR-FP-Growth achieves an effective balance among pattern quality, interpretability, and computational efficiency. In contrast to FP-Max, it retains fine-grained patterns, while, unlike FP-Close, it removes redundancy in a single efficient step. These characteristics establish MNR-FP-Growth as a scalable, practical approach for extracting interpretable, non-redundant knowledge from large building datasets, thereby directly facilitating energy-optimization tasks.

4.2. Comparison with Weighted ARM Approaches

In addition to its condensed representations, MNR-FP-Growth fundamentally differs from weighted association rule mining (WARM) approaches. Datta et al. (2021) introduced an automated weighting scheme for unweighted transactional databases using inter-item links; however, their method operates within an Apriori-like framework and applies weights as a post-processing filter. Feng and Li (2017) used WARM to identify coordinated equipment control in building energy systems, demonstrating domain relevance but lacking pruning during mining. Zhang et al. (2020) developed a post-mining method for building operation data, emphasizing filtering after rule generation rather than integration during the mining phase.

In contrast, MNR-FP-Growth uniquely integrates three characteristics not present in previous WARM approaches: (i) energy-aware weighting during FP-Growth recursion, (ii) pruning based on immediate subsets using a dual-condition criterion of statistical redundancy and energetic non-improvement, and (iii) retention of the minimal representative for each support class. This combination results in improved computational efficiency, achieving a 61% speed increase over FP-Close, as well as enhanced interpretability of the resulting patterns.

4.3. Energy-Aware Rule Analysis

To assess the practical impact, patterns mined at 2% minsup were extended into energy-aware association rules. From 4,039 frequent patterns, a total of 12,679 rules were generated, of which 4,316 (34.0%) were classified as energy-saving (i.e., rules with zero additional energy cost, $Δ E = 0$ ). The average additional energy cost across all rules was just 0.59, confirming that many discovered rules impose little or no energy penalty. For brevity, $Δ E$ is referred to as the additional energy cost throughout this section.

As shown in Figure 4, approximately one-third of all discovered rules are energy-saving, demonstrating the prevalence of efficient patterns in building operation data. This distribution highlights the potential for significant energy conservation through data-driven operational adjustments.

Figure 4.

Distribution of energy-saving vs. energy-cost rules among all generated association rules (minsup = 2%).

Figure 5 illustrates the relationship between confidence and additional energy cost for the top 2000 rules. The concentration of points in the high-confidence, low-cost region highlights the abundance of reliable and efficient patterns.

Figure 5.

Confidence vs. additional energy cost ( $Δ E$ ) for the top 2000 rules by confidence (minsup = 2%). Green points indicate energy-saving rules ( $Δ E = 0$ ), which incur no additional operational energy cost. Red points indicate rules with positive additional cost ( $Δ E > 0$ ); these are intentionally discarded by the MNR-FP-Growth pruning mechanism due to their higher energy impact, not because of low statistical support or confidence. Point size encodes support, showing that even high-support rules (large points) are pruned when they are energetically inefficient. This demonstrates that pruning is energy-aware rather than frequency-driven.

These rules provide interpretable insights that can directly inform building control strategies, such as prioritizing fan-off during low occupancy or maintaining comfort-zone temperatures without additional energy consumption Table 5.

Table 5.

Representative Energy-aware Association Rules (minsup $=$ 2%).

Example Rule	Support (%)	Confidence	Category
IF Occupancy_Level=LowOccupancy AND SupplyFan_Speed=FanOff $\to$ Zone_Temperature=ComfortZone	2.1	0.964	Energy-Saving
IF Lighting_Load_Category=HighLighting AND Occupancy_Level=HighOccupancy $\to$ Zone_Temperature=ComfortZone	2.2	0.611	Energy-Saving
IF Occupancy_Level=LowOccupancy AND time_of_day=Night $\to$ Solar_Radiation=LowRadiation	2.3	0.944	Energy-Saving
IF Occupancy_Level=LowOccupancy AND Outdoor_Humidity=HighHumidity $\to$ Zone_Temperature=ComfortZone	2.5	0.750	Energy-Saving

Conversely, rules with positive $Δ E$ highlight trade-offs between reliability and efficiency. While such rules may still offer strong predictive power, they also reveal configurations that increase energy use, thereby flagging operational “hotspots” for facility managers to monitor and address.

5. Limitations and Future Work

Although MNR-FP-Growth achieves effective redundancy reduction and energy-aware rule generation, several limitations warrant discussion. First, the normalized illustrative weights, while adequate for methodological validation, simplify the complexities of real building energy dynamics. In practice, these should be replaced with empirically measured consumption data from building meters or high-fidelity simulations should be used to provide a more accurate assessment of operational impacts. Second, while during-mining pruning enhances computational efficiency, scalability issues may occur when the method is applied to extensive building portfolios or high-frequency IoT data streams. Third, the current framework assesses rules primarily from an energy perspective, without incorporating considerations such as thermal comfort, equipment lifespan, or occupant satisfaction.

6. Conclusion

In this work, MNR-FP-Growth was introduced, a variant of FP-Growth that integrates minimal non-redundancy pruning and energy-aware weighting directly into the frequent pattern mining process. Unlike FP-Close, which examines all proper supersets, or FP-Max, which considers all frequent supersets, MNR-FP-Growth operates at the level of immediate subsets. It retains a pattern only when it introduces new frequency information or provides energy benefits relative to its immediate subsets. This method produces the minimal representative for each support class, defined as the smallest itemset that captures the frequency information, unless a larger pattern demonstrates superior energy efficiency. By embedding energy-aware pruning within the mining recursion, the algorithm removes statistically redundant and energy-inefficient patterns at the point of discovery. This represents the first application of weight-aware pruning in the building energy management domain, where patterns possess operational semantics in addition to statistical frequency.

Experimental results demonstrated that MNR-FP-Growth reduces redundant patterns by over 11% while achieving runtime performance competitive with FP-Growth and up to 61% faster than FP-Close at low support thresholds. Unlike FP-Max, which sacrifices interpretability, or FP-Close, which incurs substantial computational overhead, the proposed method preserves fine-grained yet non-redundant patterns with significantly improved efficiency. When extended into energy-aware association rules, 34% ( $Δ E = 0$ ) were found to correspond to cost-free operational adjustments, highlighting their direct value for energy conservation without compromising interpretability. These results demonstrate the practical impact of the proposed approach: 61% speedup over FP-Close and 34% zero-cost energy-saving rules, translating algorithmic efficiency into actionable energy conservation.

The practical implications are clear: by transforming raw building data into interpretable and energy-conscious rules, MNR-FP-Growth provides facility managers with a scalable tool for identifying efficiency opportunities and supporting data-driven decision-making. Its efficiency also makes it suitable for periodic re-analysis as building conditions evolve, ensuring continued adaptability.

Future research should address the limitations noted in Section 5 by incorporating empirical energy measurements, extending to multi-objective optimization, and validating across diverse building types. Integration with predictive control frameworks would further broaden applicability.

In summary, MNR-FP-Growth demonstrates that integrating redundancy reduction and semantic weighting within a unified mining process can yield interpretable, efficient, and operationally significant patterns. This establishes a robust foundation for advancing sustainable, data-driven optimization in the built environment and beyond.

Footnotes

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions on the earlier versions of this paper.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Khaoula Necibi

Makhlouf Ledmi

Toufik Messaoud Maarouk

References

Agrawal

Imieliński

Swami

(1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data (pp. 207–216).

Ahmed

C. F.

Tanbeer

S. K.

Jeong

B. S.

Lee

Y. K.

(2008). Mining weighted frequent patterns using adaptive weights. In International conference on intelligent data engineering and automated learning (pp. 258–265). Springer.

Amasyali

El-Gohary

N. M.

(2018). A review of data-driven building energy consumption prediction studies. Renewable and Sustainable Energy Reviews, 81, 1192–1205. 10.1016/j.rser.2017.04.095

Cho

Dayrit

Gao

Wang

Hong

Sim

(2020). Effective missing value imputation methods for building monitoring data. In Proceedings of the 2020 IEEE international conference on big data (big data) (pp. 2866–2875). IEEE.

Datta

Mali

Ghosh

(2021). Weighted association rule mining over unweighted databases using inter-item link based automated weighting scheme. Arabian Journal for Science and Engineering, 46(4), 3169–3188. 10.1007/s13369-020-05085-2

Drgoňa

Arroyo

Figueroa

I. C.

Blum

Arendt

Kim

Ollé

E. P.

Oravec

Wetter

Vrabie

D. L.

Helsen

(2020). All you need to know about model predictive control for buildings. Annual Reviews in Control, 50, 190–232. https://doi.org/10.1016/j.arcontrol.2020.09.001

Fan

Xiao

Yan

(2015). A framework for knowledge discovery in massive building automation data and its application in building diagnostics. Automation in Construction, 50, 81–90. 10.1016/j.autcon.2014.12.006

Feng

(2017). A methodology to identify multiple equipment coordinated control with power metering system. In Energy procedia (Vol. 105, pp. 2499–2505).

Grahne

Zhu

(2003). Efficiently using prefix-trees in mining frequent itemsets. In FIMI (Vol. 90, p. 65).

10.

Han

Pei

Yin

Mao

(2004). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53–87. 10.1023/B:DAMI.0000005258.31418.83

11.

Hong

Luo

Blum

Wang

(2021). A three-year building operational performance dataset for informing energy efficiency. https://datadryad.org/dataset/doi:10.7941/D1N33Q.

12.

Jaccard

(1901). Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société vaudoise des sciences naturelles, 37, 547–579.

13.

Kiran

R. U.

Reddy

P. K.

(2009). An improved frequent pattern-growth approach to discover rare association rules. In Proceedings of the international conference on knowledge discovery and information retrieval (KDIR) (Vol. 2, pp. 43–52). SCITEPRESS.

14.

Kleiminger

Beckel

Staake

Santini

(2013). Occupancy detection from electricity consumption data. In Proceedings of the 5th ACM workshop on embedded systems for energy-efficient buildings (pp. 1–8).

15.

Nguyen

Bui

Yun

(2026). Efficiently mining frequent weighted utility closed patterns with pruning strategies from dynamic quantitative databases. Journal of Supercomputing, 82(3), 129. https://doi.org/10.1007/s11227-026-08247-5

16.

Miller

Nagy

Schlueter

(2018). A review of unsupervised statistical learning and visual analytics techniques applied to performance analysis of non-residential buildings. Renewable and Sustainable Energy Reviews, 81, 1365–1377. 10.1016/j.rser.2017.05.124

17.

Nguyen

Bui

(2023). Mining frequent weighted utility patterns with dynamic weighted items from quantitative databases. Applied Intelligence, 53(16), 19629–19646. 10.1007/s10489-023-04554-z

18.

Palmes

Pung

H. K.

Xue

Chen

(2010). Object relevance weight pattern mining for activity recognition and segmentation. Pervasive and Mobile Computing, 6(1), 43–57. 10.1016/j.pmcj.2009.10.004

19.

Pasquier

Bastide

Taouil

Lakhal

(1999). Discovering frequent closed itemsets for association rules. In International conference on database theory (pp. 398–416). Springer.

20.

Pérez-Lombard

Ortiz

Pout

(2008). A review on buildings energy consumption information. Energy and Buildings, 40(3), 394–398. 10.1016/j.enbuild.2007.03.007

21.

Tseng

V. S.

C. W.

Fournier-Viger

P. S.

(2015). Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 28(1), 54–67. 10.1109/TKDE.2015.2458860

22.

Wan

Zhou

Pei

(2017). Semantic-geographic trajectory pattern mining based on a new similarity measurement. ISPRS International Journal of Geo-Information, 6(7), 212. 10.3390/ijgi6070212

23.

Wang

Duić

Hodge

B. M.

Shafie-khah

Catalão

J. P.

(2018). Association rule mining based quantitative analysis approach of household characteristics impacts on residential electricity consumption patterns. Energy Conversion and Management, 171, 839–854. 10.1016/j.enconman.2018.06.017

24.

Zhang

Xue

Zhao

Zhang

(2019). An improved association rule mining-based method for revealing operational problems of building heating, ventilation and air conditioning (hvac) systems. Applied Energy, 253, 113492. 10.1016/j.apenergy.2019.113492

25.

Zhang

Zhao

Zhang

(2020). A post mining method for extracting value from massive amounts of building operation data. Energy and Buildings, 223, 110096. 10.1016/j.enbuild.2020.110096