A Framework for Dry Waste Detection Based on a Deep Convolutional Neural Network

Due to lack of proper regulations in many areas of the world, consumers are not mandated to waste sorting at the origin of the source. Moreover, human sorting often suffers from human errors and low accuracy. In the intelligent detection system, it is attempted to break down a variety of household wastes including plastic bottles, glass, metals, paper bags, compact plastics, paper and disposable containers. In this paper, a real waste image system is investigated using the deep convolutional neural network and a remarkable accuracy of 92.76% was achieved. doi: 10.5829/ijee.2020.11.04.01


INTRODUCTION 1
Environmental experts once referred to waste as dirty gold. Dirty gold recovery can play a great role in the economic cycle of countries by generating income and creating jobs [1].
Machine vision has seen much growth in recent years thanks to advancement of both machine learning techniques as well as computing hardware. Machine vision is often linked to a special category of machine learning problems termed as classification. In each classification problem, the implementation of a proper model for accurately predicting the information depends on different components [2].
Tehrani and Karbasi [3] developed a hyper-spectral imaging technique with a neural networks-based algorithm, inorder to identify and separate different kinds of e-plastics. Due to separation of spectral data in the feature space, Artificial Neural Networks (ANN) was a suitable choice for classification. Also, quantifying the spectrum features of the object was sufficient to achieve efficiency and reliability results.
Zhihong et al. [4] presented a garbage sorting system based on machine vision. The vision-based robotic grasping system detects, recognizes and grasps objects *Corresponding Author E-mail: j.kazemitabar@nit.ac.ir (J. Kazemitabar) with different poses by employing a deep learning model. In fact, the machine vision tries to solve identification of object. Therefore, the deep learning method is used to identify the target object in complex background. In order to attain the accurate grabbing of target object, the Region Proposal Generation (RPN) and the VGG-16 model is employed.
Sakr et al. [5] introduced automated waste sortin by applying machine learning techniques. Deep convolution neural networks (CNN) and support vector machines (SVM) were used as candidate algorithms. Each algorithm creates a different classifier of waste separation. On that paper in order to measure its speed of classification, two machines learning techniques were compared and the best model was then implemented on the Raspberry pi 3. SVM obtained high classification accuracy in comparison with CNN. Also, SVM showed an exceptional adaptation to different kinds of wastes.
In order to solve the problems of solid waste, the bottle recycling machine (BRM) was implemented by Dhulekar et al. [6]. A novel machine learning algorithm was employed to gather used bottles and classify them. The separation mechanism is implemented to recycle the used bottles. The system consists of raspberry-PI connected with camera and audio-visual system. Furthermore, the algorithm was developed on Python platform. At last, the proposed BRM provides accurate identification of bottle and low-cost recycling.
Dharmana et al. [7] combined machine learning and Mel Frequency Cepstral Coefficients (MFCC) to detect plastic. Plastic can be detected employing audio signal analysis by MFCC feature extraction method and ANN (Artificial neural network) classification.
Possessing inputs with a variety of features and in large numbers is one of the important components for instructing a model that is both accurate and is able to publicize its knowledge and verification in real time. However, databases are a lot of times rare and require a lot of time and expenses, depending on the conditions and the type of data [8]; hence the need to build our deep learning dataset. The database consists of 5000 photographs, including plastic bottles, glass, metals, paper bags, condensed plastic, paper and disposable containers.
Firstly, our effort was focused on sorting waste at the destination. In other words, waste was collected from cities and moved to other locations. This was conducted by people who were working at the waste collection centers. This led to health problems for individuals and identify at very low accuracy. Throughout the years, waste sorting automation systems have been expanded which usually possess low accuracy along with high cost. With the development of robotics, governments move to the use of waste sorting robots at destination that generally have large sizes. This is currently used in some European countries. Through use of spectrometer and properties of matter such as density, one can detect and then sort waste one by one. Also, the accuracy of this method is usually high but their use is limited due to high cost. Moreover, there were problems of sorting at destination. Therefore, governments began to make infrastructure for sorting at the source. These solutions include the sorting of dry and wet waste in the first phase and the sorting of various types of waste such as paper, glass, plastics, and etc. in the second phase. Some organizations have been installing colored trash bins in order to perform the first phase (sorting of dry and wet waste). In several countries including Sweden and Finland, waste sorting tanks by using magnetic fields and infrared wavelengths to identify large pet and metal bottles and then sorting between them have been employed. These machines are known as recyclable and detect waste by using infrared spectrometer. The truth, however, is that the water inside the bottle can change the infrared wavelength, thus the use of machine vision can improve the performance [9].
Using the program introduced in this paper, the user takes photo of waste and the system determines its type and the specification takes places in two steps. In the first step, the image is taken from the waste and in the second step, its type is detected. To the best of our knowledge, using image processing for waste management is unprecedented [10].
Because the number of classes is few and it was not possible to collect a large number of samples to properly train the model, the transfer learning method has been used. Initially, this paper has focused on the construction and architecture of the VGGNet Neural Network to classify multiple photo categories and then has been discussed the results by testing the network [11].
Available techniques for manual and automatic waste sorting Different devices are currently used for sorting waste in different classifications. Separators using a mechanism containing a rotating hole plane. The waste is sorted by size. When waste passes through small-sized particles inside the hole, large particles are placed there. Oka et al. [12] designed an isolation of spark current to sort various metal materials in the waste. This technique uses the electromagnetic method to divide the waste into ferrous and nonferrous metals. Siddappaji and Radha [13] define the induction category places waste on a conveyor which had a number of sensors. This sensor helped to detect various types of metals in the waste. Volland et al. [14] used Near-Infrared Sensors (NIRs) which uses a reflective feature as its parameter for detecting various waste materials.
Examination, identification and classification of waste for efficient disposal and recycling, using MATLAB and OPENCV software and SVM, CNN and KNN networks is reported in the literature. Some sources have categorized waste in six groups with recognition rate of 44.6% [15].
In the case of sorting by size, the deformed can or bottle can easily be mis-identified. Also, the sorting using magnetic waves is only for metal waste and cannot detect aluminum cans. Moreover, in the sorting using infrared and X-rays, the covalent bonds of water can change the frequency spectrum. Therefore, any amount of water inside the bottle can be effective in measuring the spectrum. Therefore, the above methods can be applied on a small scale and given that 76 percent of the waste generated by human can be recycled and cannot meet the needs of different communities [16].

Generating database
Seven classes including plastic bottles, glass, metals and packets, hope plastic, paper and disposable containers were considered. Google photo collection (photos.google.com/albums) provides a database using defined libraries. All learning algorithms depend on the database and in the first stage, a large amount of data and information are needed to produce and store. Therefore, the basic database includes paper types, plastic bottles, compact plastics, metal bottles, glass bottles and paper bottles have been produced and after reviewing the images, seven general classes are categorized.
Transfer learning using VGG pre-trained model Waste does not follow any specific rule, shape, and packaging. This issue causes the database incomplete and the image recognition not to be accurate. However, in this paper, image detection performance was improved by using transfer learning via feature extraction. Transfer learning is the process of taking a pre-trained network on a dataset to recognize object categories. Feature extraction is the name for a technique that selects and /or mix variables into features, impressively decreasing the amount of data that must be processed, while still accurately and completely explaining the original data set [17]. In this article, a sufficient labeled dataset was gathered and a convolutional neural network is trained to recognize the dataset. Images of waste, then, were gathered and the model of the pre-labeled dataset trained again. After that, transfer learning was applied via feature extraction and the VGG16 network architecture was employed to extract features from the final POOL layer in the network. These features were fed into a logistic regression classifier to correctly predict the orientation of an image with 93% accuracy.

Waste detection using convolution network (CNN)
There are different architectures for CNN that differ in the number and structure of the intermediate layers. One of the most useful architectures in the field of processing and extracting features is VGGNet.
The VGGNet pre-tutorial network was developed by Karen Simonyan and Andrew Zisserman [11] at the University of Oxford in 2015 and possesses a high accuracy compared to the ImageNet network which is developed by Fei-Fei Li [18] at the University of Princeton. This network, due to its structure, simplicity and few numbers of layers has been widely used in the machine vision field as a popular network. Also, it is more known for its pyramidal shape, in which the layers that are closer to the image are wider and farther ones are deeper [19]. The input images were 96*96 with a depth of 3 and the convolution layer possesses filters with a 3*3 kernel. Utilization of RELU (rectified linear activation function) followed by batch normalization. Also, dropout is used in our network architecture. Dropout works by randomly disconnecting nodes from the current layer to the next layer. This process of random disconnects during training batches helps naturally introduce redundancy into the model no one single node in the layer is responsible for predicting a certain class, object, edge, or corner. Stacking multiple CONV and RELU layers together (prior to reducing the spatial dimensions of the volume) permits to learn a richer set of features. Feature extraction with vgg16 can compensates for the lake of database [11].
The minimum depth of this network has 11 layers consisting of eight convolutional layers and three fully connected layers. Also, the maximum depths have 19 layers consisting of 16 layers of convolution and three fully connected layers. Therefore, the final model is made up of VGGNet16 convolutional layers.
Parameters of convolution layer include a set of learnable filters. A filter in the convolutional networks is similar to a matrix weight multiplied in the input image to obtain a calculated output. Therefore, the segmentation of the images with the background is not an issue in the detection of the convolutional neural network. In a convolutional network, each filter is small but continues along the depth of the input mass. In simple terms, it is a three-dimensional mass. This three-dimensional mass has a length, width, depth, and the number of classes is 7. In this paper, images of 96 * 96 with a depth of 3 (simple matrix with three-dimensional array) are considered. Then, using the pooling layer network, the input image to reduce the charge load, memory, number of parameters and reducing the over fitting risk is miniaturized. Also, a layer of pooling for reducing the size of pixels of image and improving performance of train is placed between several consecutive convolutional layers in a convolutional architecture as shown in Figure 1.
By adding several consecutive convolutional layers, rich features can be extracted and dropout is randomly used to sort the nodes from the current layer to the next [20]. This random sorting process during the training epoch naturally improved the model. None of the nodes in the layer, are responsible for the prediction of a particular class, object, edge, or corner [21]. A certain percentage of the nodes, at each stage of the output training, has been set to zero. Therefore, the nodes have been disabled. The amount of this zeroing is initially 25% and finally reaches 50% for the input layer and the hidden layers. Also, this technique prevents over-fitting [11].

DISCUTION OF THE MODEL
To examine the model, as shown in Figure 2, the process of change is drawn in each epoch. Thus, the lack of overfitting is assured. Also, 50 epochs are considered. Classification accuracy in the training set is 92.76% and the classification accuracy in the test set is 90.16%.

SIMULATION RESULT
After performing the training and testing steps, comparing the actual test results of the model with practical results is depicted in Figure 3. It should be noted that these images did not exist in the training and testing dataset network.   Figure 4, the maximum accuracy of the network is in a multiple of 50. Generally, the neural network model cannot provide the correct prediction for low number of repetitions. Therefore, the neural network is not well trained. Also, for much repetition, the training time increases and it can be an over-fitting problem and as shown in Figure 4, the accuracy has been reduced.

CONCLUSIONS
Due to lack of proper regulations in many areas of the world, consumers are not mandated to waste sorting at the source. Moreover, human sorting often suffers from lack of accuracy. In the intelligent detection system, it is attempted to break down a variety of household wastes including plastic bottles, glass, metals, paper bags, compact plastics, paper and disposable containers. In this paper, a real waste image system was investigated using the deep convolutional neural network and a remarkable high accuracy of 92.76% achieved.