Fuzzy clustering as a new grouping technique to define the business size of SMEs through their financial information

Abstract

SMEs have a very important performance in any current economy because they contribute both to the generation of wealth and to the creation of jobs. In this article we analyze a sample of 12,658 Catalan SMEs and show the level of association that exists between some of their financial ratios to construct a synthetic measure that explains their business size. For the study of financial ratios, we used three data analysis techniques first, and to find the optimal number of clusters, we used the Two-Phase Clusters algorithm. Subsequently, and once the optimal number of groups was known, the most probable business size (cluster) for each company was calculated through a Fuzzy Clustering analysis. Finally, the optimal cluster, estimated for each company, was validated with a Probit model. The results allowed knowing that the size reported by each company is not necessarily equal to the synthetic measure proposed in this article. It is suggested to validate our results with an accounting analysis, other equally robust methodologies or algorithms, such as neural networks.

Keywords

Fuzzy clustering business size SMEs financial ratios probit models

1 Introduction

The significance and relevance of this study regarding SMEs lies in the importance they have in the economic development of a country or region [23]. Although it is true that its growth and development has been framed in a particular economic and political context, it is increasingly necessary to implement models that allow this sector to operate in a relevant way, under the conditions that global competences require. In this context, there is no doubt that SMEs generate a large number of jobs to support the national and international economy [4].

Among the advantages of SMEs, the following stand out: they are an important engine for the development of a country; they have great mobility, which allows them to expand or decrease the size of their production plant as well as change their essential technical processes; due to their dynamism they have a high possibility of growth and with the possibility of becoming a large company; they may employ a significant portion of the population. Among its disadvantages, [11] already highlighted the following; they find it difficult to reinvest their profits to improve their equipment and production techniques; they do not have specialized and trained personnel, they do not pay competitive salaries; they have deficiencies in the quality of their production because in many of them quality controls are minimal or simply do not exist; they have problems due to the lack of their organization, such as: insufficient sales, competitive weakness, poor service, poor customer service, products with high prices and of poor quality, excessive fixed assets, poor location, lack of inventory, tax problems, lack of adequate and timely financing, difficulty of access to technology, supplies, market, information, credit and support services [5 , 27]. On the other hand, much of the empirical evidence, [10] has shown that small companies located in business concentrations are able to overcome some of the obstacles mentioned above and others due to their financial information, can classified of medium-sized, if your information is analyzed with a Probit probability model [26].

1.1 Conceptual framework

1.1.1 The definition of SMEs

One of the main problems in studying the existence of differences in structure both economic and financial between different business sizes is the criterion for classifying companies as medium, small or micro [6]. The problem not only contemplates the variables that must be used, but also the limits or cut-off points for each of those variables 1 . The review of the criteria used to define the size of a company allows us to make the following observations: I) The criteria used are, in all cases, one-dimensional and the two classic approaches used are: 1) the number of employees or, 2) the figures related to total sales. However, the first classic approach has the drawback that by using only one productive factor (labor), without considering the technology of each sector, the size of those companies based on their monetary capital is undervalued and, in turn, the size of the companies whose base is, precisely, the labor force is overestimated. Furthermore, this approach does not consider temporary work or unpaid work (for example, the work of corporate partners).

The second classic approach has the dilemma that, in addition to capturing size, it includes the different degree of efficiency of companies. That is, two companies with the same number of workers and the same volume of assets may belong, as a consequence of their degree of efficiency, to different groups of companies if their sales volume were different; II) There is no general agreement on the limits that must be established for each of these criteria: in some works, SMEs are considered as those companies with less than 100 workers, while in others cases, these companies are considered as those that they have up to 500 workers and; III) The use of different criteria and cut-off points for each of them makes it difficult to compare the results obtained in the multiple works carried out to date [9]. Of course, this problem prevents the application, with any degree of rigor, of existing laws on funding criteria and payment of taxes for different companies according to their size reported by themselves [30].

This lack of agreements also exists in the different definitions of SMEs, which are established by various institutions and public administrations. Without a doubt, this lack of perfectly defined criteria limits all those companies that can and want to have access to different types of subsidies. For this reason, and in order to unify criteria, in [19] is proposed the following: When it is required to distinguish between small and medium-sized companies 2 , a small company is defined as one that; a) It has no more than 50 workers; b) It has a turnover of not more than 7 million euros or a balance sheet of not more than 5 million euros and; c) It complies with the independence criterion, that is, that 25% (or more) of its capital or of its voting rights do not belong to another company or, jointly, to several companies [1].

1.2.1 Uses and precautions of financial ratios

A financial ratio, in very general terms, defines the relationship between two numbers. In the accounting field they are a set of indices that express the result of relating, at least, two accounts in the Balance Sheet or the Statement of Profit and Loss [8]. These financial ratios provide important information that is intended to strengthen the decision-making, generally correct, of some people or analysts who are interested in a particular company [20, 24]. The relative simplicity in the calculation of financial ratios can transform its use into a simple technical application, which can lead to the indifference of what its contribution really is, that is, its interpretation or grouping [31]. A recurring mistake that is made, to facilitate the work of analysts, is to seek previously fixed rules for a faster and more comfortable interpretation, but which, in general, does not achieve the desired objective when making use of financial ratios [7]. This is due to the fact that the realities and needs of the environment of each company are different [17].

On the other hand, the ease of building or creating financial ratios has caused them to become more numerous. This dynamic generates a new problem, which is summarized in the following question: How to select the financial ratios to be used? 3 This question is not easy to answer because it is a complicated task and depends on different factors, among them: the objective of the analysis, the analyst’s perspective, the prevailing economic reality, in which the analyst’s experience will be essential to define a typology or a set of financial ratios that are important for the reality to be analyzed [28].

The objective of this article is to present a level of association between some financial ratios, previously analyzed, to construct a synthetic measure that defines the size of a company. Galindo’s [2] proposal is adopted, where it is shown that business size can be seen as a set of variables, related to each other, that would act as a substitute for this concept. The main result of this work is to present a new approach, through the financial information reported by a significant group of companies, for assigning size to a company. The algorithms used to classify the 12,658 companies analyzed allowed detecting 2 groups (24% of the companies were classified in the first group, while 76% of the companies were classified in the second group) of companies instead of 3. Of this classification, it follows, that some companies should not have the size they report, which suggests complementing the results of this work with an accounting analysis to verify if tax evasion is present in these companies or if the size reported is to obtain better subsidies. The only limitation presented in this article is that the results depend on the financial information used, so if other financial ratios are considered, then the results may not be the same. This article is organized as follows: Section 2 describes the methodology used. In Section 3 the most outstanding results are presented. In Section 4, the main conclusions of the article are presented.

2 Methodology

The database we use in this article contains 12,658 SMEs in Catalonia, Spain and 22 financial ratios reported by each of these companies (Table 1).

Table 1
Financial ratios

No. Concept Label

1 Current assets Current_A

2 Net asset Net_A

3 Profitability of sales Profi_S

4 Liquidity Liquidity

5 Operation Operation

6 Credits Credits

7 Personnel expenses per employee Person_E

8 Own funds Own_F

9 External financing of fixed assets External_F

10 Margin Margin

11 Indebtedness Idebt

12 General resources General_R

13 Economic profitability Economic_P

14 Financial profit Financial_P

15 Financial Expense Coverage Financial_E

16 Cash Coverage Cash_C

17 Rotation Rotation

18 Solvency Solvency

19 Asset Productivity Asset_P

20 Productivity of fixed assets Produc_F

21 Income Productivity Income_P

22 VAB (gross value added) per occupant VAB

No.	Concept	Label
1	Current assets	Current_A
2	Net asset	Net_A
3	Profitability of sales	Profi_S
4	Liquidity	Liquidity
5	Operation	Operation
6	Credits	Credits
7	Personnel expenses per employee	Person_E
8	Own funds	Own_F
9	External financing of fixed assets	External_F
10	Margin	Margin
11	Indebtedness	Idebt
12	General resources	General_R
13	Economic profitability	Economic_P
14	Financial profit	Financial_P
15	Financial Expense Coverage	Financial_E
16	Cash Coverage	Cash_C
17	Rotation	Rotation
18	Solvency	Solvency
19	Asset Productivity	Asset_P
20	Productivity of fixed assets	Produc_F
21	Income Productivity	Income_P
22	VAB (gross value added) per occupant	VAB

We divide the analysis of financial information into two stages: 1) We analyze the financial ratios using the technique called Conglomerates in Two Phases (CTP) to obtain the optimal number of estimated groups and, 2) Since we know the number of groups to estimate, we use the Fuzzy Clustering algorithm to integrate the final clusters. Then, we use the final clusters as the initial variables for a Probit model, which we use in order to know the maximum probability of belonging to the cluster previously calculated by the Fuzzy algorithm, and them, below we describe the algorithms we use for data analysis.

2.1 Conglomerates in Two Phases (CTP)

The CTP procedure is a Machine Learning tool that discovers the natural groupings (conglomerates) of a data set that would not otherwise be possible to detect. This technique has the great advantage of comparing the values obtained, by selecting a criterion for the model, and can automatically determine the optimal number of clusters [14, 29]. In this algorithm, the verisimilitude distance measurement assumes that the model variables are independent. Furthermore, it assumes that each independent variable has a normal distribution, but empirical tests indicate that this algorithm is quite robust in the absence of both the assumption of independence and for the distributions of the variables. However, it is important to consider to what extent these assumptions are fulfilled [3]. We use this algorithm only to calculate the optimal number of conglomerates, so we do not analyze their statistical results.

2.2 Fuzzy C Means

A cluster generally is a set of elements having a strong mathematical similarity between them and simultaneously weak similarity to other elements [21]. The clustering technique is the detection of subspaces (clusters) in a data space. In traditional clustering algorithms, each element belongs exclusively to a single cluster. The fuzzy clustering algorithm associates a cluster, to each element, using membership functions. [15]. The result for the fuzzy algorithm is a grouping, not an excluding partition. Fuzzy grouping techniques emphasize, in much of the existing literature, the minimization of the distances between the elements belonging to a data sample [13, 16]. The objective function for a family of algorithms with fuzzy grouping is as follows [16]: $F (Z, U, C) = \sum_{i = 1}^{c} {\sum_{k = 1}^{N} (μ_{ik})}^{m} {∥ z_{k} - c_{i} ∥}_{B}^{2}$ (1)

Where Z ={ z₁, z₂, … . . , z_N } are the data that must be classified and U = [μ_ik] ∈ M_fc it is a matrix with blurred partition of Z. In addition, $0 < \sum_{k = 1}^{N} μ_{ik} < N con i = 1, 2, \dots, c$ (2)

μ_ik represents the degree of membership for each element z_k to the center of the prototype C_i and where C = [c₁, c₂, … . , c_c] c_i ∈ Rⁿis the vector of centroids to be determined and: $D_{ikB}^{2} = {∥ z_{k} - c_{i} ∥}_{B}^{2}$ (3)

Equation (3) is a norm that is determined by the choice of matrix B (for example, identity makes the norm the Euclidean distance), and m ∈ (1, ∞)is an exponent which determines the “fuzziness” of the resulting classes (the higher m, the more fuzzy the identified sets are). The minimization of the objective function (1) is a nonlinear optimization problem that is solved with the Fuzzy C-Means algorithm. The stationary points of the objective function (1) are found by adding the condition that the sum of the belongings to all the groups, for an element, is equal to 1. The conditions necessary for the objective function (1) to reach its minimum are resolved by Dunn (1974): $μ_{ik} = \frac{1}{\sum_{j = 1}^{c} {(\frac{D_{ikB}}{D_{jkB}})}^{2 / (m - 1)}}$ (4) with 1 ⩽ i ⩽ c and 1 ⩽ k ⩽ N. $c_{i} = \frac{\sum_{k = 1}^{N} {(μ_{ik})}^{m} Z_{k}}{\sum_{k = 1}^{N} {(μ_{ik})}^{m}} with 1 ⩽ i ⩽ c$ (5)

Equation (5) provides a value for c_i as the weighted mean of the data that belongs to a class and where the weights are the membership functions.

2.3 Probit model

The Probit model measures the relationship between the intensity of a stimulus and the proportion of cases that present a response to this stimulus. This type of model is useful for situations in which a dichotomous response is available, which is thought to be influenced or caused by the levels of some independent variables. Furthermore, this probabilistic model is suitable for experimental data [12]. This type of data analysis allows estimating the intensity necessary for a stimulus to induce a certain proportion of responses [18].

For each value of the independent variable (or for each combination of values with multiple independent variables), the dependent variable must contain the count of the number of cases that the response of interest presents and that takes those values of the variable(s) independent(s) Probit analysis is closely related to logistic regression. In general, Probit analysis is appropriate for experimental designs, while logistic regression is more suitable for observational studies. The Probit model provides the estimates of the effective values for the different response rates, while the logistic regression provides the estimates of the odds ratios for the independent variables [12]. If we consider the probability of y = 1 be p and the probability of y = 0 be (1- p). Then the expected value of y is the probability that the event will occur: $E (y) = p \cdot 1 + (1 - p) \cdot 0 = p$ (6)

If we now consider this probability as a function of a vector of explanatory variables x and a vector of unknown parameters β, then we can write the general binary choice model as: $Prob (y = 1 | x) = F (β^{'} x)$ (7)

The estimator of β under this specification will be inconsistent if the distribution is not normal or if the estimated error is heterocedastic and where, $F (β^{'} x) = φ (β^{'} x) = \int_{- \infty}^{β^{'} x} \frac{1}{\sqrt{2 π}} e^{- \frac{u^{2}}{2}} du$ (8)

A Probit model allows us to estimate probabilities, marginal effects, and other ancillary outcomes, as long as an assumption is made in the data: This probabilistic model assumes a normal distribution of random variables (independent variables in the model). Probit analysis is an alternative to the Logit method. Between these probability models there are no significant differences in practice, unless the sample contains numerous observations with extreme values [26].

3 Results

The 22 financial ratios in Table 1 presented perfect multicollinearity. To eliminate this problem, we opted for a distance analysis with respect to the financial ratio called “size” (tamany). We use the Chebychev distance, standardized by its standard deviation (in absolute value) for each financial ratio. This analysis allowed us to better appreciate this distance and to carry out a selection of the financial ratios. In this way, we classified 19 financial ratios to be considered in this empirical study (Table 2).

Table 2
Grouping of selected financial ratios

Group No. Label

A 1 Own_F

2 Credits

B 3 External_F

4 Current_A

C 5 Operation

6 Income_P

7 Cash_C

8 Solvency

D 9 Financial_P

10 Economic_P

11 Margin

12 Rotation

13 Produc_F

14 General_R

15 Liquidity

16 Financial_E

17 Asset_P

18 VAB

19 Person_E

Group	No.	Label
A	1	Own_F
	2	Credits
B	3	External_F
	4	Current_A
C	5	Operation
	6	Income_P
	7	Cash_C
	8	Solvency
D	9	Financial_P
	10	Economic_P
	11	Margin
	12	Rotation
	13	Produc_F
	14	General_R
	15	Liquidity
	16	Financial_E
	17	Asset_P
	18	VAB
	19	Person_E

For the financial ratios in Table 2, we obtained that, through a CTP analysis (performed with the SPSS statistical package), the optimal number of resulting conglomerates was two. The distribution of the companies in the two estimated conglomerates is shown in Table 3, while the importance of the attributes is presented in Fig. 1.

Table 3

Distribution of conglomerates

Grouping		N	% of combined	% of the total
	1	7,726	61.0%	61.0%
Conglomerate	2	4,932	39.0%	39.0%
	Combined	12,658	100.0%	100.0%
Total		12,658	100.0%	100.0%

Fig. 1

Importance of attributes.

For all the financial ratios of the two estimated conglomerates we performed a robust test for equality of means. We use the Brown-Forsythe and Welch statistics because they are preferable to the F statistic if equality of variances between the financial ratios is not assumed. The results of these two statistics implied that the means differ statistically in each predicted conglomerate, since the level of significance is less than 5%. These results imply that all the financial ratios in Table 2 are important for our empirical study (Table 4).

Table 4

Robust tests for equality of means

No.	Financial ratio	Test	Statistic (a)	gl1	gl2	Sig.
1	Financial profitability	Welch	17.373	1	10,689.009	0.000
		Brown-Forsythe	17.373	1	10,689.009	0.000
2	Economic profitability	Welch	666.224	1	8,228.731	0.000
		Brown-Forsythe	666.224	1	8,228.731	0.000
3	Margin	Welch	1,406.806	1	7,237.215	0.000
		Brown-Forsythe	1,406.806	1	7,237.215	0.000
4	Rotation	Welch	604.440	1	11,594.782	0.000
		Brown-Forsythe	604.440	1	11,594.782	0.000
5	Own funds / remunerated liabilities	Welch	2,405.869	1	9,417.170	0.000
		Brown-Forsythe	2,405.869	1	9,417.170	0.000
6	Creditors c / t / Remunerated liabilities	Welch	5,374.341	1	10,575.925	0.000
		Brown-Forsythe	5,374.341	1	10,575.925	0.000
7	Net property exploited / Net assets	Welch	4,974.558	1	7,862.782	0.000
		Brown-Forsythe	4,974.558	1	7,862.782	0.000
8	Operating Current Assets / Net Assets	Welch	5,140.607	1	7,954.176	0.000
		Brown-Forsythe	5,140.607	1	7,954.176	0.000
9	Consumption exploit / Ing exploit	Welch	4,514.059	1	9,081.182	0.000
		Brown-Forsythe	4,514.059	1	9,081.182	0.000
10	GVA (Gross Value Added) / Ing exploited	Welch	4,530.352	1	9,124.908	0.000
		Brown-Forsythe	4,530.352	1	9,124.908	0.000
11	GVA (Gross Value Added) / Net property exploited	Welch	964.707	1	12,600.848	0.000
		Brown-Forsythe	964.707	1	12,600.848	0.000
12	GVA (Gross Value Added) / Net operating assets	Welch	1,621.014	1	9,499.475	0.000
		Brown-Forsythe	1,621.014	1	9,499.475	0.000
13	Staff costs / n workers	Welch	113.251	1	10,259.782	0.000
		Brown-Forsythe	113.251	1	10,259.782	0.000
14	GVA (Gross Value Added) / Number of workers	Welch	155.846	1	10,347.259	0.000
		Brown-Forsythe	155.846	1	10,347.259	0.000
15	Exploited Treasury / Income	Welch	469.300	1	8,448.452	0.000
		Brown-Forsythe	469.300	1	8,448.452	0.000
16	Resources Generated	Welch	606.713	1	8,346.271	0.000
		Brown-Forsythe	606.713	1	8,346.271	0.000
17	Cash Flow / creditors a c / t	Welch	5,081.171	1	6,049.310	0.000
		Brown-Forsythe	5,081.171	1	6,049.310	0.000
18	Rene / Ing exploited	Welch	1,386.195	1	7,263.651	0.000
		Brown-Forsythe	1,386.195	1	7,263.651	0.000
19	Rene / Financial expenses	Welch	573.608	1	7,714.477	0.000
		Brown-Forsythe	573.608	1	7,714.477	0.000

It is important to mention that if we omit the financial ratio that refers to the “number of workers” and only consider the 19 financial ratios of Table 2, then we obtain for the initial groups 1 and 2, and in most of the financial ratios, statistical differences for both variance and mean. Furthermore, for initial groups 2 and 3 we find statistical evidence for equality of variances, but slight evidence for equality of means. With these results we verify that there is statistical evidence to assume the fact that in reality there must be two large groups of companies and not three. Based on these results, we opted to consider two clusters, as calculated from the beginning by CTP analysis.

(a) Distributed in F asymptotically.

After we obtained the optimal number of conglomerates, we proceeded to calculate the two clusters using the Fuzzy C Means algorithm. To do this, we analyzed the 19 financial ratios in Table 2, without any transformation, using the Fuzme 3.5b statistical package to detect direct relationships between them and achieve a more representative grouping.

We carry out this analysis for the following fuzzy exponents: 1.1, 1.3, 1.5, 1.7 and 2. The membership level for each fuzzy clustering and for each fuzzy exponent is presented in Table 5. Also, the membership error for each Fuzzy clustering is shown in Table 6.

Table 5

Membership level with different fuzzy exponent

Cluster	Fuzzy clustering	%	Fuzzy clustering	%	Fuzzy clustering	%	Fuzzy clustering	%	Fuzzy clustering	%
	(2)		(1.7)		(1.5)		(1.3)		(1.1)
1	3,163	25.0	3,037	24.0	2,972	23.5	2,933	23.2	2,931	23.2
2	9,495	75.0	9,621	76.0	9,686	76.5	9,725	76.8	9,727	76.8
Total	12,658	100.0	12,658	100.0	12,658	100.0	12,658	100.0	12,658	100.0

Table 6

Membership error for each fuzzy exponent

Cluster	Fuzzy exponent					Average	%	Belonging (Frequencies)
	1.1	1.3	1.5	1.7	2
1	2,931	2,933	2,972	3,037	3,163	3,007	24	3,007
2	9,727	9,725	9,686	9,621	9,495	9,650	76	9,650
Total	12,658	12,658	12,658	12,658	12,658	12,658	100	12,658

Once we calculate the centroids for each fuzzy exponent, we consider, without loss of generality, the fuzzy exponent equal to 1.7. For this fuzzy exponent, we present in Table 7 their level of membership, with respect to each initial group of companies.

Table 7

Level of membership

Initial group	Fuzzy Clustering				Total
	1	%	2	%
1	747	9.8	6,895	90.2	7,642
2	2,207	45.0	2,694	55.0	4,901
3	83	72.2	32	27.8	115
Total	3,037	24	9,621	76	12,658

The fuzzy algorithm allows us to observe that a large percentage (76% - 9,621 companies) of all the companies that we consider in this analysis can be classified in cluster 2, that is, as medium-sized companies. On the other hand, we calculate that 90% (6,895 companies) of small companies can be classified as medium-sized companies.

It is important to remember that a Probit model measures the relationship between the intensity of a stimulus and the proportion of cases that present a response to this stimulus. This probabilistic model is useful for situations where a dichotomous response is available and is thought to be influenced (or caused) by levels of some, or some, independent variables and is particularly suitable for experimental data. Furthermore, a Probit analysis allows estimating the intensity necessary for a stimulus to induce a certain proportion of responses. The information for financial ratios using a Probit analysis is shown in Table 8.

Table 8

Information on the data analysed using a Probit model

Total cases		Number of cases
Valid		6,318
	Out of range (a)	0
Rejected	Lost	0
	Number of responses > Number of subjects	6,340
Control group		938
	1	3,093
Size	2	3,134
	3	91

(a) Cases rejected due to group values outside the range.

With this type of analysis, we find an optimal response on the convergence of the model from 89 iterations, which allows us to appreciate the consistency of the Probit model. As a result, and because a financial accounting analysis is not the main objective of our empirical study, the estimation of the parameters is presented in Table 9. In the results of Table 9 we can see that 12 financial ratios are Statistically significant at 95% confidence, which suggests a refinement of the model.

Table 9

Estimation of the parameters using a Probit model

Model	Parameter	Estimation	Typical error	Z	Sig.	95% confidence interval
						Lower limit	Upper limit
PROBIT (a)	Financial profitability		– 1.100	1.081	– 1.018	0.309	– 3.218	1.018
	Economic profitability		– 15.817	6.072	– 2.605	0.009	– 27.718	– 3.916
	Margin		70.910	27.598	2.569	0.010	16.818	125.002
	Rotation		0.990	0.368	2.694	0.007	0.270	1.711
	Own funds / remunerated liabilities		– 1.523	0.866	– 1.760	0.078	– 3.220	0.173
	Creditors c / t / Remunerated liabilities		2.624	0.942	2.786	0.005	0.778	4.470
	Net property exploited / Net assets		– 4.272	2.933	– 1.456	0.145	– 10.020	1.477
	Operating Current Assets / Net Assets		– 1.025	2.731	– 0.375	0.708	– 6.378	4.328
	Consumption exploit / Ing exploit		0.394	0.992	0.397	0.691	– 1.550	2.338
	GVA / Ing exploited		– 0.017	1.904	– 0.009	0.993	– 3.748	3.714
	GVA / Net property exploited		0.002	0.070	0.034	0.973	– 0.136	0.140
	GVA / Net operating assets		– 1.781	0.861	– 2.067	0.039	– 3.469	– 0.092
	Staff costs / n workers		– 0.040	0.034	– 1.196	0.232	– 0.107	0.026
	GVA / Number of workers		– 0.003	0.024	– 0.125	0.900	– 0.051	0.045
	Exploited Treasury / Income		– 2.650	1.480	– 1.790	0.073	– 5.551	0.252
	Resources Generated		– 0.226	0.043	– 5.262	0.000	– 0.310	– 0.142
	Cash Flow / creditors a c / t		0.504	1.105	0.457	0.648	– 1.661	2.670
	Rene / Ing exploited		– 57.014	26.457	– 2.155	0.031	– 108.870	– 5.159
	Rene / Financial expenses		– 0.014	0.037	– 0.376	0.707	– 0.087	0.059
	Intersection (b)	1	14.818	4.415	3.356	0.001	10.403	19.233
		2	15.208	4.495	3.383	0.001	10.713	19.703
		3	16.043	4.761	3.370	0.001	11.282	20.804

a. Model PROBIT: PROBIT (p) = Intersection + BX. b. Corresponds to the grouping variable size.

The information in Table 9 helped us to know the maximum probability of belonging to each cluster predicted using the Fuzzy C Means algorithm. The results for this section of our analysis are presented in Table 10.

Table 10

Average probability of membership

Initial group	Forecasted group	Probability
1	1	0.69
	2	0.99
2	1	0.67
	2	0.99
3	1	0.65
	2	1.00

Through the Probit model, we found that cluster 2, calculated using the Fuzzy C Means algorithm, presented the highest probability of membership for all the companies analyzed. In other words, cluster 2 seems to absorb most of the companies that showed a change of cluster. However, this cluster change should not be understood as the probability of “attraction” to the cluster, rather the Probit model provides us with the probability that the new cluster (if there was a cluster change) is correct. In Table 11 we show the result for the contrast of the correct specification of the model: we obtained, in this case, a significance level of 1. This last result confirms the validity and correct specification of the Probit model, but also suggests a refinement of the model.

Table 11

Result for the contrast of Pearson’s goodness of fit

Test	Chi squared	gl (a)	Sig.
PROBIT	Contrast of Pearson’s goodness of fit	3,744.519	6,295	1.000

(a) Statistics based on individual cases differ from statistics based on aggregate cases.

In Table 12 we present the distribution of the predicted groups with respect to the initial groups. It is clear that the Probit model confirms the result obtained with the fuzzy algorithm, that is, a large percentage of small companies have financial information to classify them as medium-sized companies. Finally, the final distribution of all the companies analyzed is shown in Fig. 2.

Table 12

Results of the classification through a Probit model

Initial Group		Forecasted Group
	1	2
1	7,642	747	6,895
2	4,901	2,207	2,694
3	115	83	32
Total	12,658	3,037	9,621

The results assume a fuzzy exponent of 1.7.

Fig. 2

Distribution of predicted groups.

4 Conclusion

The criteria most used to define the size of a company are quantitative, and specifically, the number of employees and total sales figures. In our article we do not establish quantitative limits for these two criteria, what we do do is propose a novel methodology that discriminates, through its financial information, why a company should belong to the group of small, medium or micro-companies.

In this work we calculate the optimal integration of two clusters using the CTP algorithm. We use this result to form two fuzzy clusters using the Fuzzy C Means technique, where we obtained an average attribute importance of 76.24% for fuzzy cluster 2 and an average attribute importance of 23.76% for fuzzy cluster 1. Also, we found an optimal response, based on 89 iterations, using a Probit analysis to estimate the probability of belonging to each cluster predicted using the Fuzzy C Means algorithm: the group with the highest average probability of success was fuzzy clustering 2. Therefore, we can deduce that there and is statistical evidence for a refinement of the Probit model and involve other statistical analyzes and, with this, emphasize the consistency of the results. However, the results we obtained with this methodology were quite acceptable.

For the 12,658 Catalan SMEs analyzed in this article, and based on 19 selected financial ratios, we present statistical evidence that the initial groups should not be three, but two. In other words, the size reported by each company analyzed is not necessarily equal to the size we calculated using the methodology of this article. These results are confirmed because we found that for companies with a size 3 (medium-sized company), 72% were classified with a size 1 (micro-company); 28% were classified with size 2 (small business) and none were classified with size 3 (medium business). Similarly, we found that 45% of small companies (size 2) have financial information to be classified as micro-companies (size 1). On the other hand, 90% of micro-companies (size 1) have financial information to be classified as small companies (size 2).

Although these results cannot be considered conclusive, we can take them into account as a first study that presents a new methodology to define the size of a significant sample of companies. This result suggests that we carry out an accounting analysis to verify if some companies incur tax evasion (those companies that report a small size but could be a larger size) or if some companies report a larger size, when in reality they do not have it, to opt for better financial subsidies from the government.

Footnotes

There are undoubtedly multiple factors that influence the classification of a company, including: other synthetic ratios, the territorial effect and the effect of the economic and/or political environment, to name a few.

In this article we define the following: 1 = Micro company; 2 = Small business and 3 = Medium business.

In the work of [] a series of financial ratios is proposed to be used, precisely, in a business financial analysis and which seek to answer this question.

References

Calvo-Silvosa

and Boedo

, Incidencia do tamaño sobre o comportamento financeiro da empresa. Unha análise empírica con pemes galegas,a}, Revista Galega de Econom’i 10(2) (2001), 1–23.

Galindo

, El tamaño empresarial como factor de diversidad. Universidad de Cádiz y el Grupo de Investigación SEJ-366 del Plan Andaluz de Investigación, (2005), pp. 1–198.

Rencher

A.C.

and Christensen

, Methods of the multivariate analysis, 3rd Edition. USA: Wiley series in probability and statistics, 2012.

Healy

, O’Dwyer

and Ledwith

, An exploration of product advantage and its antecedents in SMEs, Journal of Small Business and Enterprise Development 25(1) (2018), 129–146.

Camisón

and Villar-López

, Effect of SMEs’ International Experience on Foreign Intensity and Economic Performance: The Mediating Role of Internationally Exploitable Assets and Competitive Strategy, Journal of Small Business Management 48(2) (2010), 116–151.

Serrano

, Molinero

, y M.C. and Larraz

J.L.

, Country and size effects in financial ratios: A European perspective, Global Finance Journal 16 (2005), 26–47.

Trejo Pech

C.O.

, Noguera

and White

, Financial ratios used by equity analysts in Mexico and stock returns,a y Administración, Contadur’i 60 (2015), 578–592.

Etter

, Lippincott

R.B.

and Reck

, An analysis of U.S. and Latin American financial accounting ratios, Advances in International Accounting 19 (2006), 145–173.

Zevallos

, Micro, Pequeñas y Medianas empresas en América Latina, Revista de la CEPAL 79 (2003), 53–70.

10.

Schmitz

, Collective Efficiency: Growth Path for Small-scale Industry,(4), Journal of Development Studies 31 (1995), 529–566.

11.

Schmitz

, Growth Constraints on Small-scale Manufacturing in Developing Countries: A Critical Review, World Development 10 (1982), 429–450.

12.

Bierens

H.J.

, Introduction to the mathematical and statistical foundations of econometrics. USA: Cambridge University Press, (2007), pp. 1–344.

13.

Dunn

, A fuzzy relative of the ISODATA process and its use in detecting compact well separated cluster, Journal of Cybernetics 3(3) (1974), 32–57.

14.

Han

and Kamber

, Data Mining: Concepts and Techniques, Second Edition. USA: Morgan Kaufmann Publishers Inc. Elsevier, 2006.

15.

Soto

, Flores-Sintas

and Vigo

M.I.

, Marco formal para una nueva función objetivo en agrupación difusa, Revista Iberoamericana de Inteligencia Artificial 8(23) (2004), 35–41.

16.

Díez

J.L.

, Navarro

J.L.

and Sala

, Algoritmos de agrupamiento en la identificación de modelos borrosos,tica e Informática Industrial, Revista Iberoamericana de Autom’a 1(2) (2004), 32–41.

17.

Gallizo

J.L.

and Salvador

, Understanding the behavior of financial ratios: the adjustment process, Journal of Economics and Business 55 (2003), 267–283.

18.

Wooldridge

J.M.

, Introductory Econometrics, A Modern Aproach, 5th Edtion. United States, Cengage Learning, 2012.

19.

Boedo

and Calvo

A.R.

, Incidencia del tamaño sobre el comportamiento financiero de la empresa. Un análisis empírico con PYMES Gallegas,a, Revista Galega de Econom’i 10(2) (2001), 1–23.

20.

Rodrigues

and Rodrigues

, Economic-financial performance of the Brazilian sugarcane energy industry: An empirical evaluation using financial ratio, cluster and discriminant analysis, Biomass and Bioenergy 108 (2018), 289–296.

21.

Halkidi

, Batistakis

and Vazirgiannis

, On clustering validation techniques, Journal of Intelligent Information Systems 17(2-3) (2001), 107–145.

22.

Saavedra

and Hernández

, Caracterización e importancia de las MIPYMES en Latinoamérica: un estudio comparative, Actualidad Contable 17 (2008), 122–134.

23.

Azari

M.J.

, Madsen

T.K.

and Moen

, Antecedent and outcomes of innovation-based growth strategies for exporting SMEs, Journal of Small Business and Enterprise Development 24(4) (2017), 733–752.

24.

Linares-Mustarós

, Coenders

and Vives-Mestres

, Financial performance and distress profiles. From classification according to financial ratios to compositional classification, Advances in Accounting 40 (2018), 1–10.

25.

Sosa

S.M.C.

, Breve inventario de los principales ratios financieros utilizados en el análisis financiero empresarial,a Latinoamericana, Observatorio de la Econom’{i 152 (2011), 1–23.

26.

Greene

, EconometricAnalysis (8th Edition), USA: Pearson, 2018.

27.

Keogh

, Galloway

, Teaching enterprise in vocational disciplines: reflecting on positive experience, Management Decision 42 (2004), 531–541.

28.

, Xiao

, Dang

, Yang

and Yang

, Financial ratio selection for business failure prediction using soft set theory, Knowledge-Based Systems 63 (2014), 59–67.

29.

, Kumar

, Quinlan

J.R.

, Ghosh

, Yang

, Motoda

, et al., Top 10 algorithms in data mining, Knowl Inf Syst 14 (2008), 1–37.

30.

Chen

Y.J.

, Liou

W.Ch.

, Chen

Y.M.

and Wu

J.H.

, Fraud detection for financial statements of business groups, International Journal of Accounting Information Systems 32 (2019), 1–23.

31.

Yu-Jie

and Hsuan-Shih

, A clustering method to identify representative financial ratios, Information Sciences 178 (2008), 1087–1097.

Fuzzy clustering as a new grouping technique to define the business size of SMEs through their financial information

Abstract

Keywords

1 Introduction

1.1 Conceptual framework

1.1.1 The definition of SMEs

1.2.1 Uses and precautions of financial ratios

2 Methodology

2.2 Fuzzy C Means

Table 2 Grouping of selected financial ratios Group No. Label A 1 Own_F 2 Credits B 3 External_F 4 Current_A C 5 Operation 6 Income_P 7 Cash_C 8 Solvency D 9 Financial_P 10 Economic_P 11 Margin 12 Rotation 13 Produc_F 14 General_R 15 Liquidity 16 Financial_E 17 Asset_P 18 VAB 19 Person_E

Footnotes

References