Data Mining can be defined as techniques or process to analyzing data from different prospective in order to discover relationship among separate data items. There are different data mining tools which is used to retrieve data from Data warehousing.

**Data Mining Assignment Help (Sample)**

Data Mining is the process of extracting useful information or hidden meaningful patterns from given data sets. It also involves data analysis and helps in viewing affects of given certain factors or attribute on given outcome. It involves various data pre-processing, data cleaning, data visualization, data classification, associations and clustering algorithms.

While searching any information using query, there is always requirements of specific code, format of writing query according to nature of asked question. Whereas using SQL, different factors relation with predicted attribute also can’t be find out.

The data provides information about different environmental factors that would influence the use of autopilot or manual landing.

**Attribute Information:**

- Class: noauto, auto

— that is, advise using manual/automatic control

2. STABILITY: stab, xstab

3. ERROR: XL, LX, MM, SS

4. SIGN: pp, nn

5. WIND: head, tail

6. MAGNITUDE: Low, Medium, Strong, OutOfRange

7. VISIBILITY: yes, no

There are several approaches possible (rules, trees, …) to create a model to make a decision for future landings. (This is a few pages including output from WEKA. Don’t just spit out material, explain output)

There are 15 instances, 7 attributes. In above screen shot all attributes are showing, first attribute having data type ‘Nominal’ with label 1,2. All attributes with data types and values are given below.

Attribute |
Data-Type |
Data-Labels [count value] |

Class |
Nominal | 1-NoAuto [6] ,2-AutoPilot [9] |

Stability |
Nominal | 1-Stab[12], 2-Xstab[1] |

Error |
Nominal | 1-XL[1],2-LX[1],3-MM[7],4-SS[3] |

Sign |
Nominal | 1-pp[6], 2-nn[1] |

Wind |
Nominal | 1-head[3],2-tail[4] |

Magnitude |
Nominal | 1-Low[1],2-Med[3],3-Strong[3],4-OutOfRange[4] |

Visibility |
Nominal | 1-Yes[14], 2-No[1] |

Main problem is to decide about future landing that can be analyzed using given attributes relation with class attribute value. How all given attribute values are affecting on class attribute 'Auto-landing' or 'Manual-Landing'

I will apply 'Decision Tree' different classification algorithm in order to obtain highest accuracy of prediction of class attribute value. Thus the model that would give highest accuracy would be best to analyze given attribute values relation with class attribute value.

Secondly ‘Association rule generation’ using ‘Apriori’ can also apply in order to visualize association rules.

### a. Data Pre-processing

For this purpose at first need to apply data pre-processing techniques as given data set having missing values. Missing values attributes are as follow.

**Stability-Missing values-2**

**Error-Missing 3**

**Sign-Missing 8(53%)**

**Wind-Missing 8(53%)**

**Magnitude-Missing 5(33%)**

As without applying data pre-processing given attribute missing values would affect on results about visualizing relation with class attribute, thus here missing values replacement filter is applied for replacing missing values. Nominal attribute missing values gets replaced by mode value of that attribute.

For example, after applying this filter, now ERROR attribute missing values got replace by attribute mode ‘3’ as now count of mode ‘3’ is 10 that was previously 7.

For applying association rule for finding relation of given attributes with class attribute there is need to arrange attributes as by default this data set is having 'Class' attribute on 1^{st} index, it should be on last index that is done using following way.

### Data attributes Reordering

**Best rules found:**

- ERROR=4 3 ==> Class=2 3 conf:(1)
- MAGNITUDE=2 3 ==> Class=2 3 conf:(1)
- STABILITY=1 ERROR=4 3 ==> Class=2 3 conf:(1)
- STABILITY=1 MAGNITUDE=2 3 ==> Class=2 3 conf:(1)
- ERROR=4 SIGN=1 3 ==> Class=2 3 conf:(1)
- ERROR=4 WIND=2 3 ==> Class=2 3 conf:(1)
- ERROR=4 VISIBILITY=1 3 ==> Class=2 3 conf:(1)
- SIGN=1 MAGNITUDE=2 3 ==> Class=2 3 conf:(1)
- MAGNITUDE=2 VISIBILITY=1 3 ==> Class=2 3 conf:(1)
- STABILITY=1 ERROR=4 SIGN=1 3 ==> Class=2 3 conf:(1)

The above rules showing that how given attribute values are relating to class attribute value ‘2’-Auto Pilot landing, e.g. if Stability=1(stable) and Magnitude=2(medium) then class landing= autopilot

All rules showing that Magnitude=2(Medium), Stability=1 Stab, Error-4 SS, Sign-1 pp values occurrence leading to predicted landing value ‘Auto-Pilot’.

**Best rules found for class=1 Manual-landing**

- STABILITY=2 1 ==> Class=1 1 conf:(1)
- ERROR=1 1 ==> Class=1 1 conf:(1)
- ERROR=2 1 ==> Class=1 1 conf:(1)
- SIGN=2 1 ==> Class=1 1 conf:(1)
- MAGNITUDE=4 1 ==> Class=1 1 conf:(1)
- VISIBILITY=2 1 ==> Class=2 1 conf:(1)
- STABILITY=1 ERROR=1 1 ==> Class=1 1 conf:(1)
- STABILITY=1 ERROR=2 1 ==> Class=1 1 conf:(1)
- STABILITY=1 SIGN=2 1 ==> Class=1 1 conf:(1)
- STABILITY=1 MAGNITUDE=4 1 ==> Class=1 1 conf:(1)

All rules showing that Magnitude=4(OutOfRange), Stability=1 Stab, Error-2 LX, Sign-2 nn values occurrence leading to predicted landing value ‘Non-Auto, Manual landing’.

### Decision Tree

**J48-**

Using percentage split 66%, J48 is giving highest possible accuracy 66%, by changing training and test set size accuracy is being decreasing. Also here unpruned tree is being getting for visualizing all attribute relation with class attribute values as showing in tree-diagram below.

**J48 unpruned tree**

ERROR = 1: 1 (1.0)

ERROR = 2: 1 (1.0)

ERROR = 3

| MAGNITUDE = 1: 2 (5.0/2.0)

| MAGNITUDE = 2: 2 (2.0)

| MAGNITUDE = 3: 1 (2.0/1.0)

| MAGNITUDE = 4: 1 (1.0)

ERROR = 4: 2 (3.0)

Number of Leaves : 7, Size of the tree : 9

Here this tree clearly showing that ERROR attribute values ‘1’ LX,’2’ LX are associating with class value ‘1’ Non-Auto landing or manual landing whereas values ‘3’ MM, ‘4’ SS with Magnitude values ‘1,2’ low-medium is associating with class ‘Auto-pilot landing’

Yes It is classifying successfully as required for predicting future landing values by viewing given attribute values relation. Also, using association rule mining similar results are getting.

Here main challenge is to achieve more accurate predicted results as applied algorithm only giving 66% accuracy and by changing other data size accuracy is being decreasing not increasing. Thus, different other classifiers need to test for obtaining higher accuracy.

Distance metrics are used to find similar data objects that lead to develop robust algorithms for the data mining functionalities such as classification and clustering. Thus, distance metric plays a very important role in order to measure the similarity among different data items. Distance metric is used in 'Clustering' where clusters having more similar items in one group or cluster, the more the similarity among the data in clusters, more the chances of particular item belongs to particular cluster or group. In general, K-means is a heuristic algorithm that partitions a data set into K clusters by minimizing the sum of squared distance in each cluster. There are different distance metrices uses as 'Euclidean distance metric', 'Manhattan distance metrice', 'Chebyche distance' and Minkowski distance'. In Data mining Clustering structure uses the minimizing a certain error criterion that measures the "distance" of each instance to its representative value. The most well known method of finding minimum is the 'SSE-Sum of squared error', which measures the total squared Euclidian distance of instances to their representative values. Challenge here is to choose such clusters that have lowest SSE.