Friday, March 29, 2019
Mosquito Species Detection using Smart Phone
Mosquito Species Detection victimization Smart Ph bingleAbstract-According to WHO(World health Organization) re-ports, among from each one disease transmitting insects mosquito is the most hazardous insect. In 2015 alone, 214 jillion cases of malaria were registered human beingswide. Zika virus is another hurtful disease transmitted from mosquitoes. According to CDC report, in 2016 62,500 suspected case of Zika were reported to the Puerto Rico Department of Health (PRDH) stunned of which 29,345 cases were found positive. There be 3500 variant species of mosquitoes present in the world out of which 175 personas is found in United States. But scarcely few of them are responsible for these above mentioned fatal disease. Therefore compartmentalization between hazardous and regular mosquitoes are very important. For regular somebody with no expertise in this field would be almost unrealizable to strike the difference. Even for the mosquito-expert, identifying antithetica l species is a very tedious and duration consuming job. Hence in this paper, we confine tried to strainify 7 diametric species of dead mosquitoes with entire 60 patterns collected from Hillsborough County Mosquito and Aquatic green goddess Control Unit,Tampa Florida by capturing send off from smart call back tv cameras. With our approach we insufficiency to enable non-expert population to early identify the risk and act pro-actively. We pre-processed the image for removing noise and utilize ergodic fo comfort motley algorithmic rule to place varied species. Achieved good precision, think of,F1 measure and aggregate 833% accuracy. We are also planning to develop a smart-phone application which leave behind leverage this schooling good example and help in empowering population to identify mosquito species without some(prenominal) knowledge in this field.INTRODUCTIONOf all animals, mosquitoes are amongst the most deadly in spreading diseases. Mosquito borne disease s uniform Malaria, Dengue, West Nile febrility, and most recently Zika Fever have extracted devastatic tolls on humanity 1. Combating the spread of mosquitoes is an important health-care agenda across the globe, and several placements across the globe serve this purpose. For instance, one such organization is the American Mosquito Control Association (AMCA) is spread over 50 countries and conducts m whatever programs to educate citizens of the dangers posed by mosquitoes and how to control them. According to CDC report, there are about 3500 different species of mosquitoes in the world, out of which about 175 different species are found in the USA.Among programs designed to combat mosquitoes spread, identification of the character and number of species in any par-ticular area is very important. Across the world, numerous mosquito control organizations have dedicated personnel that lay traps to catch mosquitoes in specific areas, and dedicated personnel visually look at several ly captured type (via a mag-nifying glass) to identify the type of mosquito. It takes upto a minute to identify individually consume, and with more exemplifications, the succession taken to identify to separately one specimen can take hours, and naturally significant manual effort.Contributions of this Paper In this paper, we aim to de-sign a system that combines images from smart-phone cameras with machine learning algorithms for automatic detecting of the type of mosquito species from their images. Towards this extent, our specific contributions area). Building a selective informationbase of mosquito images We visited the Hillsborough County Mosquito and Aquatic flock Control in Tampa in Fall 2016 to collect numerous prototypes of mosquitoes that were captured in traps find up the county personnel. Subsequently, the personnel helped us visually identify the type of distributively sample. As a result, we collected 60 samples, that belonged to seven different species. hedge I presents our selective informationbase. Subsequently, each sample was imaged via a Sam-sung Galaxy S5 phone via quadruple angles (at the same indoor illume conditions) for a primitive of 200 images. This served as our database for subsequent layerification.b). Designing Pre- impact Techniques Generally, images are vulnerable to the different type of noises out-of-pocket to different environment condition and user expertise. Therefore, images necessitate to be pre-processed for any noise removal and also for smoothening. In the process of noise removal, we need to desexualise sure that edges and boundary of images are preserved otherwise images lead lose the key information. We utilise median leach as it flora very effectively when edges need to be preserved. This filter is wide used in image processing technique 2.c). Designing ergodic timbre Based Classifiers stochastic woodwind instrument is an ensembled supervised machine learning algorithm. It is a appe alingness of decision trees, where each trees has been grown victimization subset of develop dataset selected stochasticly. In most of the cases, it has shown significant improvement in accuracy as comparing to other classification algorithm. Apart from that, it plant life very well on outliers and noise. It handles larger dataset efficiently and quickly without over-fitting the archetype as provided a subset of preparedness set is selected for each split.3 4We conducted an extensive execution of instrument military rating for our proposed techniques.We judged our try on 60 image samples of seven different species. 10-fold cross establishment technique has been used and achieved 833% accuracy development RGB causes.The rest of the paper is organized as follows. In section II, associate gos are discussed. Followed by section III where experimental set up and data collection process are chance ond. Section IV contains the dilate about preprocessing of image data, extra cting and selecting feature films, building the learning stupefy using classification method and different metrics lever-aged for showing the results. We talked about experimental evaluation and validation in detail in section V. Finally, dis-cussion and conclusion sections are VI and VII respectively.RELATED WORKThere are more studies which are dedicated to leverage the use of smart phone camera for image recognition. In this section we have empha size of itd few of the related and important works done.A. Related make for on look-alike acknowledgmentIn 5 system was developed for determining the effec-tiveness of soil treatment on plant try on using smart-phone cameras.In this paper, 34 images of plant leaves are captured using smart phone in two soils that is biosolids and unamended tailings. Then each images was preprocessed using mean, me-dian filter followed by segmentation into pixels.They extracted RGB,R,G,B,HSV and YCbCr features from the segmented pixels. Random Fores t which is a supervised classification algorithm was designed to detect the stress of leaves and achieved 91.24% accuracy.A 6 survey has been done on Pixel-Based skin vividness sensing techniques. They have applied various color berths like RGB, Normalized RGB, HSV and YCrCb for recognizing skin. RGB is the most widely used color spaces for processing and storing digital images.Wen et.al 7 has proposed image-based automated insect iden-tification and classification method. In this paper eight insect species have been selected for experiment. These insects were rooted(p) to retreive a non damaging kill of the insect and then they were placed on a white balance panel low the reflectance light base of a Nikon stereoscopic zoom microscope SMZ1000 (Nikon, Tokyo) with Plan Apochromat 0.5 objective. Images of these were taken by a DS-Fi1 color digital camera which was placed on the microscope. Features which had been taken in these are color, texture, invariants, contour and geometric . In color features, HSV color space features were considered. T Many classification algorithm i.e. minimum least material linear classifier (MLSLC), normal densities based linear classifier (NDLC), K nearest dwell classifier (KNNC), nearest mean classifier (NMC), and decision tree (DT) were used for testing and learning the theoretical account. Among these NDLC classification algorithm outperforms other classifier.1) Comparing our Work w.r.t. Related Work Our work is focused on capturing mosquitoes images from smart phone camera and using the captured image for culture and testing the learning stumper. In 7, authors have place insect species but it needs lab set up with microscope and advanced resolution digital camera which is not available in mob generally. We have extracted RGB features for classification which is most widely used color spaces 6.data-based SETUP AND DATACOLLECTIONIn this section, we have discussed data collection process our experiment.A. Data Collectio nWe collected dead mosquito species samples from Hillsbor-ough County Mosquito and Aquatic wad Control Unit,TampaTable I Mosquito Species and cast of SamplesSpecie NameNumber of SamplesCx Nigrip10An Quadrim6Ma Titillans7Ps Columpi10An Crucians10Ps Ferox7Cq Perturbans10Table II camera Specification photographic camera SpecificationValueSensor Resolution16 MPFocus AdjustmentautomaticSpecial EffectHDRCamera Light SourceDaylightFlorida. We carefully identified seven species, mentioned in Table I for our study.Since, dead mosquito physical properties like color, del-icateness etc changes as time takees. So, images of dead mosquitoes were taken in a single day to make sure envi-ronmental qualified are same while taking these images. A Samsung Galaxy S5 smartphone was used for capturing images in regular day light. Each sample image was taken based on the knowledge aware federation described on the mosquito and aquatic control weed control unit of measurement web site. A total of 60 images were captured for our study, having following camera configuration, mentioned in Table II.OUR APPROACHWe have implemented two steps in our approach. First, pre processing of image has been done for noise removal and feature pickaxe using filter like median,mean. Second, building a learning model using a classification algorithm based on random woodland.Here our main aim is to build a learning model for identi-fying each mosquitoes species.The challenge here we faced is the image size. Images which were captured from smart phone is of 2988 X 5322 pixels. We slimd their size to 256 X 256 pixels to decrease its data propertyality. To extinguish the noise from each sample we applied median filter technique.This has been flesh out in the next subsection.Since, our images were already in dark color.It is mandatory to throw setting and foreground in contrast for building the model sanely well. So, we did not use any segmentation technique as it converts the background into b lack.Here,we are using Random Forest, a supervised learning algorithm and used 10-fold cross validation technique for learning and testing. The process melt down of our algorithm is described in material body 2. For proceeding further, we need label image data for training the model. All images were tagged manually under the guidance of mosquito experts.Noise RemovalGenerally, digital images are susceptible to different type of noise. It can occur by several ways like capture, transmission etc. Accuracy of the result are affected disadvantageously by the same. There are many filters used to remove and reduce noise from image.Sharpening tense It refers as a enhancing technique which toweringlights edges and line elaborate in the image. In this procedure, original image is passed through high pass filter which extracts its high frequency components and then the scaled output of high pass filter is added to original image which results in sharpened image. 8 loaded Filter This fi ltering technique refers to replacing each pixel value in an image with the mean of pixel values of its neighbors which falls in the skid window of n*n size. This technique removes noise more effectively if large window size is considered.This is also called mean(a) filter. 8Median Filter It is a nonlinear filtering technique. The approach behind this filtering technique is to replace each pixel value in the window of n * n size pixel by the median of all pixel values in that particular window.It is very used in digital image processing and it preserves edges while removing noise. We have used this filtering technique with 3*3 pixels window size for removing the noise from our digital images. The output with median filter and without this is shown in Figure 1. 2Feature SelectionFeature extraction and survival is very critical part of any supervised learning algorithm. decline is about reducing the data dimensionality as the size of data grows and its dimension increases and becom es very difficult to handle it manually . And then the need of mechanisation comes into the picture.Feature Selection is a process of selecting those features which are most pertinent for our line of work and eliminating unnecessary, irrelevant and redundant features of data that do not lend to the accuracy of learning model.In our proposed model, we are identifying different species of mosquitoes. Each species have contrastive color. As we can see in Figure 3, each mosquitoes have similar shapes but differ-ent body and wings color.So,the correct color channels or the combination of channel is important to take into retainer for the features.Few of the color channels are RGB, HSV etc. RGB has Red, Green and easy channels. In RGB, each component supports a range of intensity levels from 0 to 255 (integer valued)9 .Here, we extracted RGB feature from the mosquito image data. Then for feature survival of the fittest, we applied Information-Gain attribute survival algorithm which is a good measure for deciding the relevance of an attribute. This feature selection technique generally helps in achieving high accuracy and using this we got 1000 features which serve as an input vector x into Random Forest Classification algorithmic program for species sensing. We calculated its precision, recall and F1-measure which is mentioned in Table IIITable III Combination of color channels accuracy comparisonCombinationPrecision take outF1-measureRGB0.8450.8330.834C. Classification MethodRandom Forest Algorithm Random Forests(RF) is an ensemble supervised machine learning algorithm. It consists of a set of decision trees h(x,i) i = 1, 2,, where x is a feature vector extracted from the smartphone image data and i consists of K integers which are autonomous identically distributed random vectors. Each decision tree predicts a class independently. A voting is performed on the results from each decision tree and at long last the class which gets bulk vote will be the fina l predicted class. The same has been explained in Figure 4 . Given a dataset set that contains N feature vectors, each consisting of M features, the RF algorithm builds the trained model using following stepsN samples are selected at random with replacement from the data set, for training the model of a particular tree.K features are randomly selected from the set of available features, where K M.Among the values for each of the K features drawn, choose the best split according to the Information exculpateIG(T a) of the attribute. Information gain is measure of decrease in entropy which is caused by splitting the samples on an attribute. T denote a set of training samplefor a single tree. ((x),y) = (x1, x2,.., xk,y) where (x) consist is a single sample and y is its class label. Theinformation gain for an attribute a is as followThe information gain for an attribute a is as followsIG (T a) = H (T ) v val(a)j(x T jTa= v)jH (x T jxa = v)Xxj j(1)10Here, xa vals(a) is the value of the a th attribute of example x. The randomisation is present in two waysRandom selection of data for bootstrap samples as it is done in baggingRandom selection of input features for creating individual base decision trees.Each tree will grow to its maximum size until the stopping criterion has not been effect and there will be no tree pruning. Once the forest has been ensembled, testing data sample will be labeled mosquito species class based on a majority vote among all classes from all decision trees in the forest.Once theforest hasbeenensembled,testingdata sampleis labeled withone oftheclasses(species1 species2species7)bytakingthemajorityvote i.e., it is labeled with the class which has been selected by maximum number of trees. In the RF approach, minded(p) a feature sample x to be categorise ad, the conditional probabilities for each class are computed by taking the average of the conditional probabilities inclined by the trees constructing4Figure 1 a) Original Image b) Imageafte r applying sharpening median filterFigure 2 Process description of our experimenta). Cruciansb). Columpic). Feroxd). Nigripe). Peturbansf). Quadrimg). TitillansFigure 3 Mosquito Color Imagesthe ensemble. These conditional probabilities are computed as follows. Given a decision tree T, and an input feature sample x to be classified, let us denote by v(x) the leaf node where x falls when it is classified by T. The probability P (mjx T ) that the sample x belongs to the class m, where m 2 fspecies1 species2 species7g (for 7 species of interest to this paper), is estimated by the following equationP (mjx T ) =nm(2)nwhere nm is the number of training samples falling into v(x) after learning and n is the total number of training samples assigned to v(x) by the training procedure. Given a forest consisting of L trees and an unknown feature sample x to be classified, the probability estimate P (mjx) that x belongs to the species m is computed as follows1L(3)P (mjx) =P (mjx Ti)L=1XiP (mx T )bywhere thjiisthe conditional probabilityprovidedthe itreeandiscomputed according to Eq.(1). Asaconsequence,forthesample x to be classified,the RFalgorithm gives as output the vector= fP (species1jx) P (species2jx) P (species7jx)gThe class(species) with the highest probability in the set(4)ischosen as classified class for the ith tree. The final class of our RF algorithm is the one which gets the majority vote among all activities from all decision trees in the forest 11. The work flow of the RF algorithm with pre-processing, training and testing phase is officially shown in Algorithm 1.3 12D. MetricsThe results of Mosquito-Species detection are shown in scathe of precision, recall, F1-measure and disarray hyaloplasm. Each metric is a function of the of the true positives (T P ), false positives (F P ) and false negatives (F N). The precision is the ratio of mighty classified classes to the total number of classes predicted as positiveP recision =T P(5)T P + F PRecall i s the ratio of total number of classes predicted as positive to the total number of positive classesRecall =T P(6)T P + F N5Figure 4 Work flow of the Random Forest AlgorithmThe F1-measure is the weighted average of precision and recallP recisionRecallF1 = 2 P recision + Recall (7) The Confusion Matrix (CM) is a table that allows the visu-alization used to describe the performance of a classification model. Each column of the matrix represents the instances in a predicted class while each row represents the instance in anactual class (or vice-versa) 13.Precision indicates the number of samples classified as a particular species actually belonged to that species. Recall gives us the number of species which are correctly classified. The F1-measure denotes the classification models accuracy.It is calculated as the harmonic mean of precision and recall. Confusion matrix makes the system easy to see how much predicted model is acquire split between different species. For example if a sp ecies is predicted correctly only 80% of the time, then this matrix will show how the algorithm confused its prediction with the other (wrongly classified) species the remaining 20% of the time.RESULTSOverview of Evaluation Methods In this paper, we evaluated the performance of our system using 10-fold cross validation that are standard for our problem scope.Cross-validation is a model validation technique for assess-ing how the results of a classification model will generalize to an independent dataset10-fold cross-validation divides the dataset into 10 subsets, and evaluates them 10 times. Each time, one of the 10 subsets is used as the test set and the other 9 subsets are put together to form a training set. Then, the average error across all 10 trials is computed for final result. It limits problems like over-fitting in the classification model.Results and Interpretations We used RGB feature men-tioned earlier to train our classification model. To evaluate its accuracy we used 1 0-fold cross validation technique and calculated precision, recall and F1 measure of each species independently. The evaluation measures of RGB feature are shown in IV have also shown it graphically in Figure 5. Confusion Matrix of the same is shown in Figure 6.Algorithm 1 RF-based Algorithm for Mosquito-Species detectionTraining Image dataset = Id Testing Image dataset= ItedRGB Features extracted from Training Image dataset =F tRGBRGB Features extracted from Testing Image dataset =F teRGBClassified Species from Images= M S opportunity that feature F belongs to Species M S =P (M SjF )No. of trees in Random Forest = 121 step 1 Pre-ProcessingMedian filters are applied to remove accidental spikes from Id and Ited.Features F tRGB and F teRGB are extracted from processed data Id and Ited obtained from (1).Step 2 TrainingInput Training data set F tRGB take Random Forest model to classify different species of mosquitoes.Select a bootstrap sample of size N from the training data. twist a de cision tree T using following steps.Select K features at random from the set of M features. recognise the best feature/split-point among the K.Split the node into two daughter nodes.Grow the tree to its maximum size that is 6 and let the tree unprunedStep 3 PredictionInput Testing data set ItedOutput Final Mosquito Species prediction M Ss.Select the same attributes used for training the model from testing feature set F teRGB.Predict the species from the model using features selected in the above step.6Table IV RGB Features accuracy of each species indepen-dentlySpeciesPrecisionRecallF1-measureAn Crucians0.8890.80.842An Quadrim0.5710.6670.615Cd Peturbans0.7270.80.762Cx Nigrip0.8890.80.842Ma Titillans
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment