A Fuzzy Cluﬆering-based Approach for Classifying COVID-19 Patients by Age and Early Symptom Indicators

: COVID-19, a devaﬆating illness aﬀecting people worldwide, poses challenges in determining the severity of the patient's condition during the early ﬆages of infection. To address this issue, we propose a fuzzy cluﬆering-based model that accurately classiﬁes COVID-19 patients based on age and the severity of early symptoms (fever, dry cough, breathing diﬃculties, headache, smell, and taﬆe diﬆurbance). This model aims to enable prompt and personalized therapy for diagnosed patients. Compared to previous hard cluﬆering tactics, our method shows promising results in reducing COVID-19-related deaths and increasing the likelihood of full recovery for aﬀected individuals.


INTRODUCTION
After a significant amount of time has gone since the discovery of coronaviruses in humans and animals, it has been shown that these viruses are highly infectious [1], [2]. Coronavirus is a causative agent of the severe acute respiratory syndrome (SARS) and Middle East respiratory disease (MERS). On January 7, coronavirus was discovered while investigating a case of viral pneumonia that occurred in Wuhan, China [3], [4]. People were found to have been infected with the COVID19 virus for the first time in Wuhan in a market that sold live animals. The zoonosis track is being employed even though this has not yet been shown to be the case in order to figure out where the virus originated from. This is due to the fact that the word 'zoonosis' refers to a virus that is transmitted from animals to humans [5]. More than 151,000 individuals have died directly as a result of illnesses caused by the coronavirus, which has been detected in over 2.2 million people all over the globe [6], [7]. Early signs and symptoms, in addition to age-based patient classification based on COVID-19, make it simpler to deliver therapy that is more individualized. Two of the most immediate benefits of these medications were a reduction in the number of hospitalizations and an improvement in patient outcomes [8], [9]. It is believed that the greatest long-term benefit could be obtained by preventing the transmission of an illness by reducing the amount of time it takes for a disease to become contagious [10]. However, when it comes to using machine learning in identifying the disease, it was found that other hard clustering approaches are not as successful as fuzzy clustering when it comes to enhancing the categories of Covid-19 patients [11], [12]. The present work offers the following key contributions: • A fuzzy clustering-based model for accurately categorizing COVID-19 patients based on age and the severity of early symptoms. Comparative evaluation of the model against other hard clustering approaches, highlighting its superiority in reducing COVID-19-related deaths and increasing the likelihood of full recovery.
These contributions address the critical need for efficient patient classification and treatment strategies during the early stages of COVID-19 infection, thereby making significant advancements in the field of COVID-19 management.

1.1.
Fuzzy Clustering Algorithm A method of clustering known as fuzzy clustering provides for the possibility of each data point belonging to many groups simultaneously. The process of assigning data points to different clusters in such a way that the items that belong to the same cluster are as similar to one another as is practically possible, while the items that belong to different clusters are as dissimilar to one another as is practically possible, is referred to as clustering. These comparisons of likeness are broken down into their constituent parts, which are distance, connection, and intensity [23]. It is possible to make use of an appropriate similarity measure in accordance with the data or the requirements of the application.

Fuzzy C-Means Clustering
In the fuzzy-based data clustering method known as fuzzy c-means (FCM), a data set is divided up into as many clusters as there are data points in the set, and each data point in the set is considered to belong to each cluster to some degree [24]. For instance, a data point with a location that is relatively close to the centre of a cluster will have a high degree of membership in that cluster, whereas a data point with a location that is relatively far away from the centre of a cluster will have a degree of membership that is significantly lower.

RESEARCH METHODOLOGY
The research methodology comprises following steps: i) firstly, Fuzzy c-mean clustering has been used on the covid-19 patient's data set; ii) then appropriate number of clusters was determined with the help of elbow function. iii) After obtaining the finest number of clusters, the algorithm then adjust the degree of fuzzy overlap in fuzzy c-means clustering by re-evaluating the impact of exponent parameter value (m). iv) In the next step the membership matrix is used to generate the threshold values. v) At the last phase the severity of illness (SOI) classes for the covid-19 Patients are created by using threshold values. The stages of the research methodology are shown in Figure 1.

Fuzzy Classification of Covid-19 Patient
As a result of the fact that each data point might belong to more than a single cluster at the same time, fuzzy clustering has been dubbed [13].Known as cluster analysis, this is the process of grouping data points into clusters that are as similar to each other as feasible, while other clusters are as different from each other as possible [14]. It was necessary to utilize the Fuzzy Clustering technique to categorize Covid-19 patients into the 1<N<D classification, where D is the number of data points and the number of clusters is denoted by N [15], [16]. A patient's participation in a certain cluster may be estimated, as well as the position of the group's centre, thanks to this method. Those patients who have membership values larger than zero are considered to be members of the respective clusters they belong to Eq. 1.
A good illustration of this is the value of membership about , which shows the degree to which an individual Covid-19 patient "i" is associated with the cluster "j" of interest. The following are some of the restrictions: If the first criteria (a) is satisfied, all membership levels must be between 0 and 1, and all membership levels must be between 0 and 1. Second, condition (b) demonstrates that a person's membership levels in all segments sum up to one when all segments are considered. Figure 2 illustrates that the membership value of covid-19 patients is shown in a different cluster from the rest of the patients.

Figure 2: Weighted Patient Belongingness
Graph as a Bipartite Graph

Fuzzy C-Means Objective Function
When employing the Fuzzy C-Means clustering technique, each data point may belongs to a number of different clusters, each with a different degree of membership in each cluster [17], [18], and [19]. It is determined on the basis of the objective function Eq. 3 shown in the following.

Where,
• Point D is the data point number. • N represents the number of clusters. • Grade of fuzzy overlap is managed the required modifying the fuzzy divider matrix component (m), with the ideal value being m > 1. The clustering index is defined as the amount of data points with a high level of membership value in more than one cluster, as determined by the clustering algorithm, and is expressed as a percentage.
The number µ ij reflects the grade of membership value of x i in the j th cluster, where x i is represented by the letter i. When there are a large number of clusters, the sum of the membership values for all of them is the same as 1.

Clustering Consists of FCM Steps i.
It was decided that the values for cluster membership in µij would be picked at random from a pool of available options. ii.
With the help of the following Eq. 4, we can figure out where the cluster's centres are: iii.
In line with eq (1), increase or reduce the value of µij Membership as appropriate. iv.
With that, Jm is now working on determining how to compute the target function. v.
It is necessary to repeat steps 2-4 until Jm indicates that development has slowed to a level that is lower than the present bare minimum

IMPLEMENTATION OF THE FUZZY C-MEANS ALGORITHM IN MATLAB
Implementation of FCM algorithm is accomplished with assistance from the following patient data set, which is used to determine the severity of illness (SOI) class of each Covid-19 patient.

Discrete_Numbers
Age of Covid-19 Patient • Column C, D, E, F and G represents the early Covid-19 symptoms in terms of the following standards. (Matrix) Data: There is abundant data to be clustered and in a multidimensional feature space and every line presented in the data explains it; ii.
(Integer) Cluster n: The clusters' entire number is quantified as an integer > 1. iii.
(Vector) Options: The clustering choices are provided as a vector, which includes the elements specified below. • Options (1): It is expressed as a scalar more than 1.0 in this instance because the exponent of fuzzy divider matrix (U) is more than 1.0. The default value (m = 2.0) is used for convenience. Options (2): It is preferable to have a higher number of iterations, which may be provided as a positive integer. As a consequence, the default setting is (100). • Options (3): Between the two following iterations, there is less progress in terms of the goal function, which may be written as a positive scalar in the mathematical notation. Default(1e-5).
• Options (4): The information that is shown decides whether or not the target function Jm value is displayed after each iteration, and it is categorized as follows according to the information that is displayed. The Default value is 1. If any of the options' factors are fixed to NaN, The option's default value is utilised instead of the value specified.
Centers (matrix): Cluster matrix's centers are made up of the center coordinates of each individual cluster, which in turn is made up of the cluster matrix's centres. ii.
(matrix) Membership U: The Memberships matrix of the clusters. iii.
(vector) ObjFunc: The values of the goal function are returned as a vector for each iteration of the algorithm.

Compute the Ideal Number of Clusters
In partitioning clustering, such as FCM clustering, which requires the user to specify the number of clusters N to be formed, determining the best number of clusters in a data set is a basic problem. The elbow method is a heuristic used in defining the number of clusters in a data-set in cluster analysis. The technique consists of plotting the described variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use. As shown in Fig. 3 the elbow is developing at n=4. So the optimal number of clusters will be 4 for performing FCM Fuzzy c-means clustering. There is only a 0.001 difference between the values of objective function = 32.291655 and objective function = 32.29766, hence by less than 0.001 percent, the objective function has improved and the optimization has reached its conclusion.

Cluster Memberships Matrix
The degree of membership matrix for the first ten patients who were allocated to distinct clusters is provided in Table 4 for the first ten patients who were assigned to unique clusters.

RE-EXAMINING THE INFLUENCE OF (M) EXPONENT PARAMETER VALUE
Each patient is allocated to a cluster based on the patient's greatest membership value. It's difficult to decide how to organize a group of patients whose greatest membership value appears in more than one number of clusters [20], [21]. As an example, a membership score of 0.5 indicates that the patient may be equally represented in more than one number of clusters. >>index1 = find(max(P)<=0.5) >>2 3 4 . . . . . . . . . Index1 depicts patients whose maximum membership value is less than 0.5, and who may be split into more than one cluster depending on their membership values, as indicated in Figure  4.Patients with small maximum−membership value high spot the large fuzzy overlap degree as shown in Figure 5.

: Fuzzy Cluster Analysis
A black "X" in Fig. 5, indicates that the patient's maximum membership value is less than or equal to 0.5. There is a greater degree of uncertainty in terms of membership in a cluster. To eliminate this ambiguity in membership in a cluster and to fine-tune the fuzzy overlap grade in fuzzy classification, in the first step we execute the fuzzy c-mean exponent partition matrices (m) values, such as 2.2, 2.1, 2.0, 1.9, and 1.8 to determine which exponent (m) value is best for clustering the Covid-19 patient data-set. In the second step to categorize each patient into the cluster, we set a threshold value by witnessing the average maximum membership value. Table 5 displays the whole result with different exponent values (m).  The above results shows that the FCM method's exponent parameter value 2.1 is ideal for clustering the Covid-19 patient dataset since it has the least execution time, total iterations, and objective function value of all the values examined.

THE AVERAGE MAXIMUM VALUE OF MEMBERSHIP MATRIX
Using the average maximum membership matrix value [22], we were able to derive a threshold value that could be used to categorize each patient into a cluster, which we then applied to the whole membership matrix. >> maxP=max(P) >> Average-Max=mean(maxP) >> Averag-Max >> 0.276 The value of average maximum membership Average-Max = 0.2762, offers a calculable explanation of the overlaps.

Threshold Values(λ)
We created dual threshold values of λ1 is equal to 0.30 and λ2 is equal to 0.25 based on the average maximum member-ship value of 0.2762. In this case, λ1 is set slightly higher, at 0.30, and λ2 is set slightly lower, at 0.25, compared to the average maximum membership value 0.2762. This decision allows for an appropriate range that covers the average maximum value and enables accurate patient clustering within the fuzzy model. Different threshold values can produce different classification results. Further, threshold value λ1 =0.3 specifies that patients who embrace the membership value equal to or more than 0.3, then the patient is recognized as a participant of this particular cluster similarly threshold value λ2 =0.25 specifies that patients who embrace the membership value equal to or more than 0.25, then the patient is measured as a member of that cluster. Table 6 shows the membership matrix value for the first ten patients with the new exponent parameter (m) 2.1 value. Table 7 shows the results of the aforementioned first 10 patients' classification with each patient cluster, When the threshold = 0.3.      Higher threshold values imply absolutely crisp clusters with no overlap, while lower threshold values suggest more cluster overlaps.

RESULTS AND DISCUSSION
The severity of illness, abbreviated as SOI, is a measure that indicates the extent to which a patient's organ systems are compromised or their physiology is changed as a direct consequence of their illness. This gives a medical classification of the disease as being light, moderate, significant, or severe, based on the severity of the symptoms. The goal of the SOI class is to provide a basis for analysing the use of hospital resources or developing standards for patient care. As a consequence, based on the findings of this study, six different SOI classes have been determined, with λ equal to 0.25. Patients were categorized in SOI Class 1 who were classified in Cluster-1 and after that, Patients were considered in SOI Class 2 who were classified in Cluster-2.
On the other hand, certain patients were simultaneously classified in more than one cluster. Patients in SOI Class 3, for example, were simultaneously classified in Cluster-2 & Cluster-3. In addition to this, patients who were part of SOI Class 4 were concurrently classified in Cluster-1, Cluster-2 & Cluster-3. Moreover, individuals who were part of SOI Class 5 were at the same time classified in Cluster-1 & Cluster-3. Lastly, patients from SOI Class 6 were only classified in cluter-4 throughout the investigation.

CONCLUSION
In this work, we evaluated an efficient way of categorizing covid-19 patients based on their early symptoms as well as their ages using a fuzzy classification model. This method is more acceptable and suitable for forecasting the patient's severity of illness (SOI) class in the early infection phase than other hard clustering techniques, which can play an important role in the timely and individualized treatment of patients. The patient's health may be improved in the early stages of the disease by lowering the length of time they are infectious and avoiding hospitalizations if a customized treatment plan is developed based on SOI. This is one of the key benefits of designing a treatment plan based on SOI.