This server is intended for use for Academic Classwork related Git repositories only. Projects/repositories will generally be removed after 6 months following close of the semester. Inactive repositories from previous semester are now being archived when no activity for 365 days. They are renamed and marked as 'archived'. After 90 days in that state they will be removed from the system completely.

Main Markdown File

parent 397ec7fb
---
title: "1R Mammals Report"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
rm(list=ls())
library(tidyverse)
library(OneR)
# included for confusion matrix
library(caret)
library(e1071)
```
**Name:** Austin Sampson
**eMail:** aws9t5@mst.edu
**Course:** CS 5402
**Date:** 02-14-2020
## Concept Description:
Train a system from existing data to classify animals as either mammal or non-mammal
## Data Collection:
The data has been provided by Perry B. Koob, not professor or doctor. It is a modified version of the UCI Mushroom data set found here on canvas.
## Example Description:
**animal.name**
nominal attribute name of the animal or species
**hair**
Nominal boolean attribute that displays output as:
True
False
**feathers**
Nominal boolean attribute that displays output as:
True
False
**eggs**
Nominal boolean attribute that displays output as:
True
False
**milk**
Nominal boolean attribute that displays output as:
True
False
**airborne**
Nominal boolean attribute that displays output as:
True
False
**aquatic**
Nominal boolean attribute that displays output as:
True
False
**preditor**
Nominal boolean attribute that displays output as:
True
False
**toothed**
Nominal boolean attribute that displays output as:
True
False
**backbone**
Nominal boolean attribute that displays output as:
True
False
**breathes**
Nominal boolean attribute that displays output as:
True
False
**venomous**
Nominal boolean attribute that displays output as:
True
False
**fins**
Nominal boolean attribute that displays output as:
True
False
**legs**
Ratio Lable displaying the number of legs. null value or 0
indicates the absince of legs.
**tail**
Nominal boolean attribute that displays output as:
True
False
**domestic**
Nominal boolean attribute that displays output as:
True
False
**catsize**
Nominal boolean attribute that displays output as:
True
False
**gestation**
Interval attribute displays a measure of time it took for the gestation of a species.
There were missing values for this attribute in 2 examples those examples were removed.
**type**
Nominal, Main classificationn variable for this data set. Output displayed as:
mammal
fish
arthropod
bird
insect
amphibian
reptile
## Data Import and Wrangling:
Importing test and training data
```{r}
#import main file
train <- read.csv("animal-taxonomy-train.csv")
test <- read.csv("animal-taxonomy-test.csv")
```
Standardizing Classifcation attrbute.
We are testing if an animal is a mammal or non-mammal thefore reclassify all type not mammal as non-mammal.
remove unused levels from type
```{R}
levels(train$type) <- c(levels(train$type), 'non-mammal')
train$type[train$type != 'mammal'] = 'non-mammal'
levels(train$type) <- droplevels(train$type)
#Do thje same thing for test data
#prepare the list of classes from the test data for evaluation
levels(test$type) <- c(levels(test$type), "non-mammal")
#convert novert types to mammal and non-mammal only
test$type[test$type != "mammal"] = "non-mammal"
#drop additional levels
#levels(test$type) <- droplevels(test$type)
test$type <- factor(test$type)
```
## Mining and Analytics:
**Manual OneR classification**
generate confusion matrices for each attribute excluding animal.name
```{r}
hair <- count(train, hair, type)
feathers <- count(train, feathers, type)
eggs <- count(train, eggs, type)
milk <- count(train, milk, type)
airborne <- count(train, airborne, type)
aquatic <- count(train, aquatic, type)
predator <- count(train, predator, type)
toothed <- count(train, toothed, type)
backbone <- count(train, backbone, type)
breathes <- count(train, breathes, type)
venomous <- count(train, venomous, type)
fins <- count(train, fins, type)
legs <- count(train, legs, type)
tail <- count(train, tail, type)
domestic <- count(train, domestic, type)
catsize <- count(train, catsize, type)
gestation <- count(train, gestation, type)
```
Display fraphs and rules
(note: errors were calculated manualy)
```{r}
hair
```
Based on the frequency of mammal and non-mammal we generated the following rules:
If no hair -> non-mammal
if has hari -> mammal
```{r}
feathers
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if no feathers -> non-mammal
if feathers -> mammal
```{r}
eggs
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if No eggs -> mammal
if has eggs -> mammal
```{r}
milk
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if no milk -> non-mammal
if has milk -> mammal
```{r}
airborne
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not airborne -> non-mammal
if is airborne -> non-mammal
```{r}
aquatic
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not aquatic -> mammal
if is auqatic -> non-mammal
```{r}
predator
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not preditor -> non-mammal
if is preditor -> non-mammal
```{r}
toothed
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not toothed -> non-mammal
if is toothed -> mammal
```{r}
backbone
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if no backbone -> non-mammal
if has backbone -> non-mammal
```{r}
breathes
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if doesnt breath -> non-mammal
if does breath -> non-mammal
```{r}
venomous
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not venomous -> non-mammal
if is venomous -> non-mammal
```{r}
fins
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if no fins -> non-mammal
if has fins -> non-mammal
```{r}
legs
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if legs < 2 -> non-mamma
if legs = 4 -> mammal
if legs >= 5 non-mammal
```{r}
tail
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if no tail -> non-mammal
if has tail -> non-mammal
```{r}
domestic
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not domestic -> non-mammal
if domestic -> mammal
```{r}
catsize
```
Based on the frequency of mammal and non-mammal we generated the following rules:
if not catsize -> non-mammal
if is catsize -> mammal
```{r}
gestation
```
if gestation <= 56 -> non-mammal
if gestation > 56 -> mammal
**Errors for each ruleset**
(note: Errors were calculated manually)
Airborne Error = 0.384
Aquatic Error = 0.318
Backbone Error = 0.384
Breaths Error = 0.384
catsize Error = 0.208
Domestic Error = 0.329
Eggs Error = 0.329
Feathers Error = 0.384
Fins Error = 0.384
Hair Error = 0.054
Tail Error = 0.384
Toothed Error = 0.219
Venomous Error = 0.384
milk Error = 0.0
Preditor Error = 0.3846
legs Error = 0.1758
gestation Error= 0.234
**Getting the 1R rules from the OneR package**
```{r}
temp <- subset(train, select = -c(animal.name))
model <- OneR(temp, verbose = TRUE)
modelPredictions <- predict(model, test)
#two instances will be removed do to missing values as stated above in Example description
```
## Evaluation:
**OneR package**
```{r}
eval_model(modelPredictions, test)
```
*F1 Score*
precission = TP/(TP+FP) = 1
Recall = TP/(TP+FN) = 0.6
F1 Score = (2 * precision * Recall)/sum(precision,recall) = 0.75
**Manual 1R classifier**
```{r}
#reloading Test CSV to ensure no errors
#test <- read.csv("animal-taxonomy-test.csv")
reference <- as.data.frame(test$type)
colnames(reference) <- c("class")
reference <- as.factor(reference$class)
#levels(reference) <- c(levels(reference), "mammal","non-mammal")
#prepare the list of predictions from the test data for evaluation
testing <- as.data.frame(test$milk)
colnames(testing) <- c("milk")
testing$pred[testing$milk==FALSE] <- "non-mammal"
testing$pred[testing$milk==TRUE] <- "mammal"
testing <- as.factor(testing$pred)
confusionMatrix(testing, reference)
```
*F1 Score*
precission = TP/(TP+FP) = 1
Recall = TP/(TP+FN) = 0.6
F1 Score = (2 * precision * Recall)/sum(precision,recall) = 0.75
The accuracy of both models are 1 and both have a F1 Score of 0.75 therefore both can be determined
to be equivalently good models.
## Results
Since both the hand generated 1R classifier and the oneR package generated the same
results, a 1R classifier based on the milk attibute, and they both have a 100 % accuracy
with training and test data, we can feel confident that our model is a strong one.
Our Final model is:
if has milk = True -> mammal
if has milk = False -> non-mammal
## References:
https://ourcodingclub.github.io/2016/11/24/rmarkdown-1.html#create
https://ourcodingclub.github.io/2016/11/13/intro-to-r.html
https://data-flair.training/blogs/rstudio-tutorial/
https://www.tutorialspoint.com/r/r_data_frames.htm
https://stackoverflow.com/questions/38741997/how-to-solve-the-data-cannot-have-more-levels-than-the-reference-error-when-us
https://www.rdocumentation.org/packages/caret/versions/3.45/topics/confusionMatrix
https://stats.stackexchange.com/questions/138690/calculate-the-f1-score-of-precision-and-recall-in-r
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment