https://cpw.cvlcollections.org/files/original/e04514e67f1406c35b10f46c80b0c06c.pdf 9dc22690a8aa1da4b18221356e47c761 PDF Text Text The research in this publication was partially or fully funded by Colorado Parks and Wildlife. Dan Prenzlow, Director, Colorado Parks and Wildlife • Parks and Wildlife Commission: Marvin McDaniel, Chair • Carrie Besnette Hauser, Vice-Chair Marie Haskett, Secretary • Taishya Adams • Betsy Blecha • Charles Garcia • Dallas May • Duke Phillips, IV • Luke B. Schafer • James Jay Tutchton • Eden Vardy �Received: 4 June 2020 | Revised: 29 June 2020 | Accepted: 31 July 2020 DOI: 10.1002/ece3.6692 ORIGINAL RESEARCH Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2 Michael A. Tabak1,2 | Mohammad S. Norouzzadeh3 | David W. Wolfson4 | 5 6 7 7 Erica J. Newton | Raoul K. Boughton | Jacob S. Ivan | Eric A. Odell | Eric S. Newkirk7 | Reesa Y. Conrey7 | Jennifer Stenglein8 | Fabiola Iannarilli9 | John Erb10 | Ryan K. Brook11 | Amy J. Davis12 | Jesse Lewis13 | Daniel P. Walsh14 James C. Beasley15 | Kurt C. VerCauteren16 | Jeff Clune17 | Ryan S. Miller18 1 | Quantitative Science Consulting, LLC, Laramie, WY, USA Abstract 2 Motion-activated wildlife cameras (or “camera traps”) are frequently used to remotely Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA 3 Computer Science Department, University of Wyoming, Laramie, WY, USA 4 Minnesota Cooperative Fish and Wildlife Research Unit, Department of Fisheries, Wildlife and Conservation Biology, University of Minnesota, St. Paul, MN, USA 5 Wildlife Research and Monitoring Section, Ontario Ministry of Natural Resources and Forestry, Peterborough, ON, Canada 6 Range Cattle Research and Education Center, Wildlife Ecology and Conservation, University of Florida, Ona, FL, USA 7 Colorado Parks and Wildlife, Fort Collins, CO, USA 8 Wisconsin Department of Natural Resources, Madison, WI, USA 9 Conservation Sciences Graduate Program, University of Minnesota, St. Paul, MN, USA 10 Forest Wildlife Populations and Research Group, Minnesota Department of Natural Resources, Grand Rapids, MN, USA 11 Department of Animal and Poultry Science, University of Saskatchewan, Saskatoon, SK, Canada 12 and noninvasively observe animals. The vast number of images collected from camera trap projects has prompted some biologists to employ machine learning algorithms to automatically recognize species in these images, or at least filter-out images that do not contain animals. These approaches are often limited by model transferability, as a model trained to recognize species from one location might not work as well for the same species in different locations. Furthermore, these methods often require advanced computational skills, making them inaccessible to many biologists. We used 3 million camera trap images from 18 studies in 10 states across the United States of America to train two deep neural networks, one that recognizes 58 species, the “species model,” and one that determines if an image is empty or if it contains an animal, the “empty-animal model.” Our species model and empty-animal model had accuracies of 96.8% and 97.3%, respectively. Furthermore, the models performed well on some out-of-sample datasets, as the species model had 91% accuracy on species from Canada (accuracy range 36%–91% across all out-of-sample datasets) and the emptyanimal model achieved an accuracy of 91%–94% on out-of-sample datasets from different continents. Our software addresses some of the limitations of using machine learning to classify images from camera traps. By including many species from several locations, our species model is potentially applicable to many camera trap studies in National Wildlife Research Center, United Disclaimer: This manuscript was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information disclosed, or represents that its use not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States Department of Agriculture, but do represent the views of the U.S. Geological Survey. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd 10374 | � www.ecolevol.org Ecology and Evolution. 2020;10:10374–10383. �| TABAK et al. States Department of Agriculture, Fort Collins, CO, USA 13 College of Integrative Sciences and Arts, Arizona State University, Mesa, AZ, USA 14 US Geological Survey, National Wildlife Health Center, Madison, WI, USA 15 Savannah River Ecology Laboratory, Warnell School of Forestry and Natural Resources, University of Georgia, Aiken, SC, USA 16 National Wildlife Research Center, United States Department of Agriculture, Animal and Plant Health Inspection Service, Fort Collins, CO, USA 10375 North America. We also found that our empty-animal model can facilitate removal of images without animals globally. We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths. KEYWORDS computer vision, deep convolutional neural networks, image classification, machine learning, motion-activated camera, R package, remote sensing, species identification 17 OpenAI, San Francisco, CA, USA 18 Center for Epidemiology and Animal Health, United States Department of Agriculture, Fort Collins, CO, USA Correspondence Michael A. Tabak, Quantitative Science Consulting, LLC, Laramie, Wyoming. Email: tabakma@gmail.com 1 | I NTRO D U C TI O N excellent Python repositories for using computer vision to analyze camera trap images (Beery et al., 2019; Beery, Wu, Rathod, Votel, Motion-activated wildlife cameras (or “camera traps”) are frequently & Huang, 2020; Norouzzadeh et al., 2018; Schneider et al., 2020). used to remotely observe wild animals, but images from camera These software packages enable programmers to use and train mod- traps must be classified to extract their biological data (O’Connell, els to detect, classify, and evaluate the behavior of animals in camera Nichols, & Karanth, 2011). Manually classifying camera trap images trap images. However, these packages require extensive program- is an encumbrance that has prompted scientists to use machine ming experience in Python, a skill which is often lacking from wildlife learning to automatically classify images (Norouzzadeh et al., 2018; research teams. To facilitate the use of this type of model by biolo- Willi et al., 2019), but this approach has limitations. gists with minimal programming experience, Machine Learning for We address two major limitations of using machine learning to Wildlife Image Classification (MLWIC2) includes an option to train automatically classify animals in camera trap images. First, machine and use models in user-friendly Shiny Applications (Chang, Cheng, learning models trained to recognize species from one location and Alaire, Xie, & McPherson, 2019), allowing users to point-and-click in one camera trap setup might perform poorly when applied to im- instead of using a command line. This facilitates easier site-specific ages from camera traps in different conditions (i.e., these models can model training when our models do not perform to expectations. have low “out-of-sample” accuracy; Schneider, Greenberg, Taylor, & Kremer, 2020). This transferability, or generalizability, problem is thought to arise because different locations have different backgrounds (the part of the picture that is not the animal) and most models evaluate the entire image, including the background (Beery, 2 | M ATE R I A L S A N D M E TH O DS 2.1 | Camera trap images Morris, & Yang, 2019; Miao et al., 2019; Norouzzadeh et al., 2019; Terry, Roy, & August, 2020; Wei, Luo, Ran, & Li, 2020). By including Images were collected from 18 studies using camera traps in 10 images from 18 different studies in North America, our objective states in the United States of America (California, Colorado, Florida, was to train models with more variation in the backgrounds associ- Idaho, Minnesota, Montana, South Carolina, Texas, Washington, and ated with each species. Furthermore, by training an additional model Wisconsin; Appendix S1). Images were either classified by a single that distinguishes between images with and without animals, we wildlife expert or classified independently by two biologists, with provide an option that could be broadly applicable to camera trap discrepancies settled by a third. An image was classified as contain- studies worldwide. ing an animal if it contained any part of an animal. Our initial dataset Second, the use of machine learning in camera trap analy- included 6.3 million images but was unbalanced with most images sis is often limited to computer scientists, yet the need for image from a few species (e.g., 51% of all images were Bos taurus). We re- processing exceeds the availability of computer scientists in wild- balanced the number of images by species and site to ensure that no life research. For example, several researchers have provided one species or site dominated the training process. Previous work �10376 | TABAK et al. suggested that training a model with 100,000 images per species Precision = produces good performance (Tabak et al., 2019); therefore, we lim- TP . TP + FP ited the number of images for a single species from one location to As recall is the proportion of images of each species that were 100,000. When >100,000 images for a single species existed at one correctly classified, top-5 recall is the proportion of images for each location, we randomly selected 100,000 of these images to include species in which one of the model's top five guesses is the correct in the training/testing dataset. After rebalancing the data, we had a species. We also calculated confidence intervals for recall and pre- total of 2.98 million images; 90% were randomly selected for train- cision rates (Appendix S3). To evaluate transferability of the model, ing, while 10% were used for testing. Images used in this study were we conducted out-of-sample validation by applying our trained either already a part of or were added to the North American Camera models to images from locations where the model was not trained. Trap Images dataset (lila.science/datasets/nacti; Tabak et al., 2019). We evaluated the species model using four out-of-sample datasets Images from Canada were not used for training but were used to from North America: the Caltech Camera Traps dataset (Beery, Van evaluate model transferability as an out-of-sample dataset. Horn, & Perona, 2018), the ENA24-detection dataset (Yousif, Kays, & He, 2019), the Saskatchewan, Canada dataset from this study, and the Missouri Camera Traps dataset (Zhang, He, Cao, & Cao, 2016). 2.2 | Training models The empty-animal model was tested using the Wellington Camera Traps dataset from New Zealand (Anton, Hartley, Geldenhuis, & We trained deep convolutional neural networks using the ResNet-18 Wittmer, 2018), the Snapshot Serengeti dataset from Tanzania architecture (He, Zhang, Ren, & Sun, 2016) in the TensorFlow frame- (Swanson et al., 2015), and the Snapshot Karoo dataset from South work (Adabi et al., 2016) on a high-performance computing cluster, Africa (http://lila.science/datasets/snapshot-karoo). “Teton” (Advanced Research Computing Center, 2018). Models were To evaluate the effect of using multiple training datasets on trained for 55 epochs, with a ReLU activation function at every hid- model generalizability, we iteratively trained models using varying den layer and a softmax function in the output layer, mini-batch sto- numbers of datasets (i.e., 1 dataset, 3 datasets, 6 datasets, … all 18 chastic gradient descent with a momentum hyperparameter of 0.9 datasets) and tested the model on the out-of-sample datasets. (Goodfellow, Bengio, & Courville, 2016), a batch size of 256 images, and learning rates and weight decays that varied by epoch number (described in Appendix S2). We trained a species model, which con- 2.4 | R package development tained classes for 58 species or groups of species and one class for empty images (Table 1). We also trained an empty-animal model that MLWIC2 was developed using the R packages Shiny (Chang et al., contained only two classes, one for images containing an animal, and 2019) and ShinyFiles (Pedersen, Nijs, Schaffner, & Nantz, 2019) so the other for images without animals. the user can choose to either use a programming console or a graphical user interface. Users can navigate to locations on their computer using a browser window instead of specifying paths. The package 2.3 | Model validation and transferability can classify images at a rate of 2,000 images per minute on a laptop with 16 gigabytes of random-access memory and without a graphics We first evaluated our trained models by applying them to predict- processing unit. MLWIC2 will optionally write the top guess from ing species in the 10% of images that were withheld from training. each model and confidence associated with these guesses to the Models were evaluated for each species using the recall, top-5 re- metadata of the original image file. The function “write_metadata” call, and precision, which are values summarizing the number of true and the associated R Shiny Application uses Exiftool (Harvey, 2016) positives (TPs), false positives (FPs), and false negatives (FNs): to accomplish this. In addition, if scientists have labeled images, MLWIC2 has a Shiny app that allows users to train a new model to TP Recall = TP + FN recognize species using one of six different convolutional neural network architectures (AlexNet, DenseNet, GoogLeNet, NiN, ResNet, TA B L E 1 Comparison of validation accuracy (accuracy on the withheld dataset) using different architectures Architecture Validation accuracy ResNet-18 96.8 DenseNet-121 95.9 VGG-22 88.6 GoogleNet-32 88.1 AlexNet-8 85.4 NiN-16 84.3 and VGG) with different numbers of layers. We also trained models in these other architectures for comparison. Note that the time required to train a model depends on the number of images used for training and computing resources; operating MLWIC2 on a highperformance computing cluster requires programming experience. 3 | R E S U LT S We found the highest validation accuracy (within sample validation) using ResNet-18 (Table 1), for which we found an overall accuracy �| TABAK et al. 10377 of 96.8% for the species model and 97.3% for the empty-animal datasets, we found that accuracy on out-of-sample images increased model. Several species (6 of 11) had recall of >95% with fewer than with the number of datasets used to train the model (Figure 3). 2,000 images used for training (Table 2; Figure 1). A confusion matrix (Appendix S4) depicts how all images of each species were classi- 4 | D I S CU S S I O N fied by the species model. When evaluated on out-of-sample images, the species model accuracy ranged from 36.3% to 91.3% (Table 3), with top-5 accuracy ranging from 65.2% to 93.8% (Figure 2), and the In MLWIC2, we provide two trained machine learning models, one empty-animal model accuracy ranged from 90.6% to 94.1% (Table 3). classifying species and another distinguishing between images with When we iteratively trained the model on varying numbers of animals and those that are empty, with 97% accuracy, which can TA B L E 2 Mean recall and precision rates (along with 95% confidence intervals) for predicting species using the species model on the validation dataset (the 10% of images that were withheld from training) Class name (scientific name) Number of training images Recall Precision 0.94 (0.89, 0.97) Accipitridae family (Accipitridae) 1,511 0.91 (0.67, 1) American crow (Corvus brachyrhynchos) 2,522 0.67 (0.61, 0.73) 0.7 (0.64, 0.75) American marten (Martes americana) 51,081 0.96 (0.95, 0.97) 0.96 (0.94, 0.97) Anatidae family (Anatidae) 1,071 0.97 (0.92, 0.99) 0.97 (0.92, 0.99) Armadillo (Cingulata) 8,947 0.94 (0.59, 0.99) 0.95 (0.94, 0.96) Bighorn sheep (Ovis canadensis) 1,189 1 (0.97, 1) 1 (0.97, 1) Black bear (Ursus americanus) 111,426 0.97 (0.91, 0.99) 0.99 (0.91, 0.99) Black-billed magpie (Pica hudsonia) 2,770 0.98 (0.95, 0.99) 0.96 (0.91, 0.99) Black-tailed jackrabbit (Lepus californicus) 5,617 0.95 (0.93, 0.96) 0.93 (0.91, 0.95) Black-tailed prairie dog (Cynomys ludovicianus) 43,999 0.93 (0.93, 0.94) 0.95 (0.94, 0.96) Bobcat (Lynx rufus) 31,634 0.96 (0.95, 0.99) 0.97 (0.96, 0.98) California ground squirrel (Otospermophilus beecheyi) 30,301 California quail (Callipepla californica) 2,046 0.97 (0.94, 0.99) Canada lynx (Lynx canadensis) 15,119 1 (0.99, 1) Cattle (Bos taurus) 269,963 0.97 (0.93, 0.98) 0.98 (0.77, 0.99) Clark's nutcracker (Nucifraga columbiana) 2,785 0.94 (0.91, 0.96) 0.92 (0.87, 0.95) Common raven (Corvus corax) 21,134 0.99 (0.91, 0.99) 0.99 (0.98, 1) 1 (1, 1) 0.99 (0.98, 0.99) 0.99 (0.97, 1) 0.99 (0.98, 0.99) Coyote (Canis latrans) 41,512 0.96 (0.94, 0.98) 0.97 (0.96, 0.99) Cricetidae and Muridae families 1,254 0.93 (0.87, 0.96) 0.83 (0.7, 0.94) Dog (Canis familiaris) 1,136 0.82 (0.7, 0.98) 0.78 (0.6, 0.99) Domestic sheep (Ovis aries) 16,340 0.99 (0.99, 1) 0.99 (0.99, 1) Donkey (Equus asinus) 2,403 0.99 (0.97, 1) 0.94 (0.9, 0.96) Elk (Cervus canadensis) 112,389 0.97 (0.95, 0.98) 0.99 (0.86, 0.99) Empty (no animal) 907,096 0.97 (0.93, 0.98) 0.95 (0.92, 0.97) Fisher (Pekania pennanti) 7,697 0.98 (0.97, 0.99) 0.99 (0.96, 1) Golden-mantled ground squirrel (Callospermophilus lateralis) 1,587 0.89 (0.83, 0.92) 0.86 (0.81, 0.91) Grey fox (Urocyon cinereoargenteus) 16,094 0.98 (0.96, 0.99) 0.97 (0.95, 0.99) Grey jay (Perisoreus canadensis) 3,776 0.97 (0.87, 0.98) 0.94 (0.8, 0.98) (Continues) �10378 | TA B L E 2 TABAK et al. (Continued) Class name (scientific name) Number of training images Recall Precision Grey squirrel (Sciurus carolinensis) 24,677 0.98 (0.64, 0.99) 0.98 (0.64, 0.99) Grizzly bear (Ursus arctos horribilis) 843 0.99 (0.94, 1) 0.99 (0.94, 1) Gunnison's prairie dog (Cynomys gunnisoni) 17,393 0.83 (0.82, 0.85) 0.93 (0.91, 0.94) Horse (Equus ferus) 3,644 0.94 (0.53, 0.97) 0.95 (0.45, 0.98) Human (Homo sapiens) 139,983 0.98 (0.97, 0.98) 0.98 (0.97, 0.99) Marmota genus (Marmota spp.) 1,497 0.98 (0.95, 0.99) 0.95 (0.91, 0.98) Moose (Alces alces) 11,741 0.99 (0.97, 1) 0.99 (0.97, 1) Mountain lion (Puma concolor) 13,900 0.96 (0.95, 0.97) 0.97 (0.96, 0.98) Mule deer (Odocoileus hemionus) 91,068 0.98 (0.95, 0.99) 0.98 (0.93, 0.99) Opossum (Didelphimorphia) 5,782 0.94 (0.76, 0.98) 0.97 (0.87, 0.99) Other grouse (Tetraoninae) 4,237 0.97 (0.91, 0.99) 0.98 (0.96, 0.99) Other mustelids (Mustelidae) 2,467 0.89 (0.85, 0.92) 0.91 (0.85, 0.96) Other passerine birds (Passeriformes) 3,363 0.86 (0.81, 0.9) 0.88 (0.75, 0.94) Porcupine (Erethizontidae and Hystricidae) 6,608 0.97 (0.82, 0.99) 0.98 (0.96, 0.98) Prairie chicken (Tympanuchus cupido) 815 Pronghorn (Antilocapra americana) 57,953 1 (0.96, 1) 0.98 (0.93, 1) 0.98 (0.97, 0.98) 0.99 (0.98, 0.99) Raccoon (Procyon lotor) 51,439 0.9 (0.83, 0.99) 0.93 (0.91, 0.99) Red fox (Vulpes vulpes) 43,433 0.98 (0.96, 0.99) 0.98 (0.97, 0.99) Red squirrel (Tamiasciurus hudsonicus) 21,586 0.85 (0.84, 0.96) 0.86 (0.88, 0.97) River otter (Lontra canadensis) 1,821 0.96 (0.92, 0.98) 0.97 (0.93, 0.98) Snowshoe hare (Lepus americanus) 37,467 0.97 (0.94, 0.99) 0.97 (0.95, 0.98) Steller's jay (Cyanocitta stelleri) 1,844 0.91 (0.8, 0.98) 0.96 (0.87, 1) Striped skunk (Mephitis mephitis) 12,416 0.98 (0.9, 0.99) 0.97 (0.96, 0.98) Swift fox (Vulpes velox) 3,266 0.85 (0.81, 0.88) 0.95 (0.92, 0.97) Sylvilagus family 6,385 0.93 (0.82, 0.99) 0.94 (0.86, 0.97) Totals 2,682,380 0.97 0.97 Vehicle (truck, ATV, car) 32,912 0.97 (0.96, 0.98) 0.97 (0.97, 0.98) White-tailed deer (Odocoileus virginianus) 88,531 0.93 (0.83, 1) 0.97 (0.84, 0.99) Wild pig (Sus scrofa) 243,344 0.98 (0.98, 0.99) 0.99 (0.98, 1) Wild turkey (Meleagris gallopavo) 15,686 0.94 (0.88, 0.99) 0.98 (0.95, 1) Wolf (Canis lupus) 3,070 0.96 (0.88, 1) 0.95 (0.8, 1) Wolverine (Gulo gulo) 18,810 0.98 (0.96, 1) 0.98 (0.97, 0.99) potentially be used to rapidly classify camera trap images from many dataset where our model performed worst, the top-5 accuracy, the locations. While the species model performed well on out-of-sam- rate at which the true species in an image was in the model's top-5 ple images from Saskatchewan, Canada (91% overall accuracy), the guesses, was 65% (Table 3). For some applications, for example, de- model performed poorly on some out-of-sample datasets (Table 3; tection of invasive or rare species, such an out-of-sample top-5 recall Figure 2). The discrepancy in model performance on images from rate may be sufficient to address scientific questions or meet moni- different datasets indicates that transferability remains an issue and toring objectives. Additionally, our empty-animal model performed our species model will not be useful on all datasets; some users will well at distinguishing empty images from those containing animals need to train new models on images from their field sites, an op- in datasets from three different countries (91%–94% accuracy), in- tion that is available in MLWIC2. Nevertheless, even in the Missouri dicating that this model may be broadly applicable for finding empty �| TABAK et al. 10379 F I G U R E 1 Within sample validation of the species model revealed high recall and precision for most species. Median values across datasets are presented along with 95% confidence intervals. The number of datasets for each species is included in the circle next to the species name (circle sizes are proportional to the number of datasets containing each species) images in datasets globally. For many research projects, the task of Figure 1) suggests that smaller labeled image datasets can poten- simply removing empty images can save thousands of hours of labor. tially be used to train models with this software. We propose a workflow for how users can apply these models to fil- Other researchers have developed models for recognizing an- ter-out empty images and train new models as necessary (Figure 4). imals in camera traps, with some success in out-of-sample identi- By providing Shiny Applications to train models and classify images, fication. For example, Zilong software accurately removed 85% we make this technology accessible to more scientists with minimal of empty images (Wei et al., 2020), MegaDetector had a precision programming experience. Our finding that high recall (>95%) can be of 89%–99% at detecting animals (Beery et al., 2019), and MLWIC achieved with fewer than 2,000 images for some species (Table 2; achieved an accuracy of 82% at out-of-sample species classification �10380 | TABAK et al. Number of images tested Dataset Top-5 accuracya Model tested Accuracy 38,101 Empty-animal 0.906 Snapshot Serengeti (Tanzania) 104,651 Empty-animal 0.941 Wellington (New Zealand) 266,966 Empty-animal 0.939 Caltech Camera Traps (USA) 218,147 Species 0.562 0.744 ENA24-Detection (USA) 5,285 Species 0.507 0.649 Missouri Camera Traps (USA) 5,008 Species 0.363 0.652 Saskatchewan (Canada) 5,200 Species 0.913 0.938 Snapshot Karoo (South Africa) TA B L E 3 Out-of-sample validation results. All out-of-sample images are available from lila.science/datasets a Top-5 accuracy is not relevant for the empty-animal model because there are only two classes. Wild turkey 1 American crow 1 Black bear 1 Grey fox 1 Coyote 2 Bobcat 2 Mountain lion 1 Dog 2 Striped skunk 3 Raccoon 3 Elk 1 Moose 1 White−tailed deer 3 Mule deer 2 Cow (domestic) 2 Horse 1 Wild pig 3 Sylvilagus sp. 1 Opossum 3 Vehicle 2 Empty 1 Aves Carnivora (large) Carnivora (small) Ungulata Lagomorpha Didelphimorphia Human Empty 1.0 0.8 0.6 0.4 0.2 0.0 1.0 Recall 0.8 0.6 0.4 Precision 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Recall (top 5) F I G U R E 2 Species model out-of-sample validation revealed variable recall and precision rates across species. Median values across datasets are presented along with 95% confidence intervals. The number of datasets for each species is included in the circle next to the species name (Tabak et al., 2018, 2019). We hypothesize that our models per- develop a search image for each species in multiple backgrounds formed well on some out-of-sample datasets (Snapshot Serengeti, (Figure 3). Snapshot Karoo, Wellington, and Saskatchewan; Table 3) because Transferability of machine learning models remains a complica- they were trained using camera trap images from multiple locations tion for implementing these models more broadly to camera trap with different camera placement protocols, allowing the model to data and, in many cases, it is most productive for scientists to build �| TABAK et al. 10381 when more datasets are used to train the model (Figure 3) indicates 1.0 Out−of−sample accuracy that by including more diverse datasets when we train future models, we may be able to train a model that can be accurate in more 0.8 locations. 0.6 4.1 | Future directions 0.4 As this new technology becomes more widely available, ecologists will need to decide how it will be applied in ecological analyses. For 0.2 example, when using machine learning model output to design occupancy and abundance models, we can incorporate accuracy esti- 0.0 mates that were generated when conducting model testing. The error 0 3 6 9 12 15 18 Number of studies used for training F I G U R E 3 Models became more generalizable (i.e., out-ofsample accuracy increased) as the number of datasets used to train the model increased. Points represent median accuracy across outof-sample datasets and lines connect the minimum and maximum of the 95% quantiles for accuracy values across these datasets of a machine learning model in identifying species from camera traps is similar to the problem of imperfect detection of wildlife when conducting field surveys (McIntyre, Majelantle, Slip, & Harcourt, 2020). Wildlife are often not detected when they are present (false negatives) and occasionally detected when they are absent (false positives); ecologists have developed models to effectively estimate occupancy when data have these types of errors (Guillera-Arroita, Lahoz-Monfort, van Rooyen, Weeks, & Tingley, 2017; Royle & Link, 2006). We can use Bayesian occupancy and abundance models where the central tendencies of the prior distributions for the false models that are trained directly on their study sites (see Figure 4 negative and false-positive error rates are derived from validation of for more details). While such models will have less broad applica- our machine learning models. While we would expect false-positive bility (they are unlikely to be accurate globally), they can have high rates in occupancy models to resemble the false-positive error rates study-specific accuracies, thus reducing the burden of manual image for the machine learning model, false-negative error rates would be a classification. Our finding that models become more generalizable function of the both the machine learning model and the propensity F I G U R E 4 Proposed workflow for using MLWIC2 models when classifying camera trap images �10382 | for some species to avoid detection by cameras when they are present (Tobler, Zúñiga Hartley, Carrillo-Percastegui, & Powell, 2015). TABAK et al. (equal). Jesse S Lewis: Data curation (equal); Writing-review & editing (equal). Daniel Walsh: Data curation (equal); Writing-review Another area in need of consideration is how to group taxa when & editing (equal). James Beasley: Data curation (equal); Writing- few images are available for the species. We generally grouped spe- review & editing (equal). Kurt Vercauteren: Conceptualization cies when few images were available for model training using an ar- (equal); Data curation (equal); Writing-review & editing (equal). bitrary cut off of approximately 1,000 images per group (Table 2). Jeff Clune: Methodology (supporting); Software (supporting); Nevertheless, we had relatively few images of grizzly bears (Ursus Writing-review & editing (equal). Ryan S Miller: Conceptualization arctos horribilis; n = 843), but we included this species because it is (equal); Funding acquisition (lead); Project administration (equal); of conservation concern, and found high rates of recall and preci- Visualization (lead); Writing-original draft (supporting); Writing- sion (99% for each). We grouped members of Mustelidae (Mustela review & editing (equal). erminea, Mustela frenata, unknown Mustela spp., Neovison spp., and Taxidea taxus) together, and this group had relatively low recall and AU T H O R C O N T R I B U T I O N S precision (89% and 91%, respectively). When researchers develop MAT, RSM, and RKBoughton conceived of the project. DWW, RKB, new models and decide which species to include and which to group, JSI, EAO, ESN, RYC, JLS, FI, JE, RKB, AJD, JSS, DPW, JCB, and KCV they will need to consider the available data, the species or groups oversaw the data collection and labeling processes. MSN and JC pro- in their study, and the ecological question that the model will help vided insight for model training. MAT developed MLWIC2 and led address. the writing of the manuscript. DWW and EJN assisted with MLWIC2 development. All authors contributed critically to drafts and gave AC K N OW L E D G E M E N T S final approval for submission. Contributions of JCB were partially supported by the DOE under Award Number DE-EM0004391 to the University of Georgia DATA AVA I L A B I L I T Y S TAT E M E N T Research Foundation. Support for this research was provided by The trained models described in this work are available in the the USFWS Pittman-Robertson Wildlife Restoration Program and MLWIC2 Wisconsin Department of Natural Resources. For supplying cam- Images used to train models are available in the North American era trap images, we thank USDA Forest Service: Rocky Mountain Camera Trap Images dataset (lila.science/datasets/nacti). Data Research station; Montana Fish, Wildlife and Parks; Wyoming Game from validation tests are available from the dryad digital repository and Fish Department; Washington Department of Fish and Wildlife; (https://doi.org/10.5061/dryad.x95x69pfx; Tabak, 2020). package (https://github.com/mikeyEcolog y/MLWIC2). Idaho Department of Fish and Game; and Woodland Park Zoo. ORCID C O N FL I C T O F I N T E R E S T Michael A. Tabak The authors have no conflicts of interest to declare. David W. Wolfson https://orcid.org/0000-0003-1098-9206 Jennifer Stenglein https://orcid.org/0000-0003-4578-5908 AU T H O R C O N T R I B U T I O N Amy J. Davis Michael A Tabak: Conceptualization (lead); Data curation (equal); Daniel P. Walsh Formal analysis (lead); Investigation (lead); Methodology (lead); James C. Beasley Project administration (equal); Software (lead); Validation (lead); Ryan S. Miller https://orcid.org/0000-0002-2986-7885 https://orcid.org/0000-0002-4962-9753 https://orcid.org/0000-0002-7772-2445 https://orcid.org/0000-0001-9707-3713 https://orcid.org/0000-0003-3892-0251 Visualization (equal); Writing-original draft (lead); Writing-review & editing (lead). Mohammad Sadegh Norouzzadeh: Formal analy- REFERENCES sis (equal); Methodology (equal); Software (equal); Writing-review Adabi, M., Barhab, P., Chen, J., Chen, Z., Davis, A., Dean, J., … Zheng, X. (2016). TensorFlow: A system for large-scale machine learning (Vol. 16, pp. 265–283). Presented at the 12th USENIX Symposium on Operating Systems Design and Implementation, USENIX Association. Advanced Research Computing Center (2018). Teton Computing Environment, Intel x86_64 cluster. Laramie, WY: University of Wyoming. Retrieved from https://doi.org/10.15786/M2FY47 Anton, V., Hartley, S., Geldenhuis, A., & Wittmer, H. U. (2018). Monitoring the mammalian fauna of urban areas using remote cameras and citizen science. Journal of Urban Ecology, 4(1), 1–9. https://doi. org/10.1093/jue/juy002 Beery, S., Morris, D., & Yang, S. (2019). Efficient pipeline for camera trap image review. Retrieved from http://arxiv.org/abs/1907.06772 Beery, S., Van Horn, G., & Perona, P. (2018). Recognition in terra incognita (pp. 456–473). Presented at the Proceedings of the European Conference on Computer Vision (ECCV). Retrieved from http:// openaccess.thecvf.com/content_ECCV_2018/html/Beery_Recognition_in_Terra_ECCV_2018_paper.html & editing (equal). David Wolfson: Data curation (lead); Writingreview & editing (equal). Erica Newton: Software (equal); Writingreview & editing (equal). Raoul Boughton: Data curation (equal); Funding acquisition (equal); Writing-review & editing (equal). Jacob Ivan: Data curation (equal); Writing-review & editing (equal). Eric A Odell: Data curation (equal); Writing-review & editing (equal). Eric S Newkirk: Data curation (equal); Writing-review & editing (equal). Reesa Conrey: Data curation (equal); Writingreview & editing (equal). Jennifer Leigh Stenglein: Data curation (equal); Writing-review & editing (equal). Fabiola Iannarilli: Data curation (equal); Writing-review & editing (equal). John D. Erb: Data curation (equal); Writing-review & editing (equal). Ryan Kendall Brook: Data curation (equal); Writing-review & editing (equal). Amy Davis: Data curation (equal); Writing-review & editing �| TABAK et al. Beery, S., Wu, G., Rathod, V., Votel, R., & Huang, J. (2020). Context R-CNN: Long term temporal context for per-camera object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13075–13085. Chang, W., Cheng, J., Alaire, J., Xie, Y., & McPherson, J. (2019). shiny: Web application framework for R (Version 1.4.0). Retrieved from https:// CRAN.R-projec t.org/package=shiny Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (1st ed.). Cambridge, MA: MIT Press. Guillera-Arroita, G., Lahoz-Monfort, J. J., van Rooyen, A. R., Weeks, A. R., & Tingley, R. (2017). Dealing with false-positive and false-negative errors about species occurrence at multiple levels. Methods in Ecology and Evolution, 8(9), 1081–1091. https://doi. org/10.1111/2041-210X.12743 Harvey, P. (2016). ExifTool. Retrieved from https://exiftool.org/ He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). IEEE. https://doi. org/10.1109/CVPR.2016.90 McIntyre, T., Majelantle, T. L., Slip, D. J., & Harcourt, R. G. (2020). Quantifying imperfect camera-trap detection probabilities: Implications for density modelling. Wildlife Research, 47(2), 177–185. https://doi.org/10.1071/WR19040 Miao, Z., Gaynor, K. M., Wang, J., Liu, Z., Muellerklein, O., Norouzzadeh, M. S., … Getz, W. M. (2019). Insights and approaches using deep learning to classify wildlife. Scientific Reports, 9(1), 1–9. https://doi. org/10.1038/s41598-019-44565-w Norouzzadeh, M. S., Morris, D., Beery, S., Joshi, N., Jojic, N., & Clune, J. (2019). A deep active learning system for species identification and counting in camera trap images. ArXiv:1910.09716 [Cs, Eess, Stat]. Retrieved from http://arxiv.org/abs/1910.09716 Norouzzadeh, M. S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M. S., Packer, C., & Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences of the United States of America, 115(25), E5716–E5725. https://doi.org/10.1073/ pnas.1719367115 O’Connell, A. F., Nichols, J. D., & Karanth, K. U. (Eds.) (2011). Camera traps in animal ecology: Methods and analyses. Tokyo; New York: Springer. Pedersen, T. L., Nijs, V., Schaffner, T., & Nantz, E. (2019). shinyFiles: A server-side file system viewer for shiny (Version 0.7.5). Retrieved from https://CRAN.R-projec t.org/package=shinyFiles Royle, J. A., & Link, W. A. (2006). Generalized site occupancy models allowing for false positive and false negative errors. Ecology, 87(4), 835–841. https://doi.org/10.1890/0012-9658(2006)87[835:GSOMAF ]2.0.CO;2 Schneider, S., Greenberg, S., Taylor, G. W., & Kremer, S. C. (2020). Three critical factors affecting automated image species recognition performance for camera traps. Ecology and Evolution, 10(7), 3503–3517. https://doi.org/10.1002/ece3.6147 Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith, A., & Packer, C. (2015). Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Scientific Data, 2, 150026. https://doi.org/10.1038/sdata.2015.26 10383 Tabak, M. A. (2020). Data from: Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images. Dryad, https://doi.org/10.5061/dryad.x95x6 9pfx Tabak, M. A., Norouzzadeh, M. S., Wolfson, D. W., Sweeney, S. J., VerCauteren, K. C., Snow, N. P., … Miller, R. S. (2018). MLWIC: Machine learning for wildlife image classification in R. Zenodo, https:// doi.org/10.5281/zenodo.1445736 Tabak, M. A., Norouzzadeh, M. S., Wolfson, D. W., Sweeney, S. J., Vercauteren, K. C., Snow, N. P., … Miller, R. S. (2019). Machine learning to classify animal species in camera trap images: Applications in ecology. Methods in Ecology and Evolution, 10(4), 585–590. https:// doi.org/10.1111/2041-210X.13120 Terry, J. C. D., Roy, H. E., & August, T. A. (2020). Thinking like a naturalist: Enhancing computer vision of citizen science images by harnessing contextual data. Methods in Ecology and Evolution, 11(2), 303–315. https://doi.org/10.1111/2041-210X.13335 Tobler, M. W., Zúñiga Hartley, A., Carrillo-Percastegui, S. E., & Powell, G. V. N. (2015). Spatiotemporal hierarchical modelling of species richness and occupancy using camera trap data. Journal of Applied Ecology, 52(2), 413–421. https://doi.org/10.1111/1365-2664.12399 Wei, W., Luo, G., Ran, J., & Li, J. (2020). Zilong: A tool to identify empty images in camera-trap data. Ecological Informatics, 55, 101021. https://doi.org/10.1016/j.ecoinf.2019.101021 Willi, M., Pitman, R. T., Cardoso, A. W., Locke, C., Swanson, A., Boyer, A., … Fortson, L. (2019). Identifying animal species in camera trap images using deep learning and citizen science. Methods in Ecology and Evolution, 10(1), 80–91. https://doi.org/10.1111/2041-210X.13099 Yousif, H., Kays, R., & He, Z. (2019). Dynamic programming selection of object proposals for sequence-level animal species classification in the wild. New York, NY: IEEE Transactions on Circuits and Systems for Video Technology. Zhang, Z., He, Z., Cao, G., & Cao, W. (2016). Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification. IEEE Transactions on Multimedia, 18(10), 2079–2092. https://doi.org/10.1109/TMM.2016.2594138 S U P P O R T I N G I N FO R M AT I O N Additional supporting information may be found online in the Supporting Information section. How to cite this article: Tabak MA, Norouzzadeh MS, Wolfson DW, et al. Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2. Ecol Evol. 2020;10:10374– 10383. https://doi.org/10.1002/ece3.6692 � https://cpw.cvlcollections.org/files/original/d9d07298a80a557a425a7db17158dddc.zip 9ce841a4b9aebf7f0d378e4c1b1fe4a5 Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Journal Articles Description An account of the resource CPW peer-reviewed journal publications Text A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text. Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2 Description An account of the resource <span>Motion-activated wildlife cameras (or “camera traps”) are frequently used to remotely and noninvasively observe animals. The vast number of images collected from camera trap projects has prompted some biologists to employ machine learning algorithms to automatically recognize species in these images, or at least filter-out images that do not contain animals. These approaches are often limited by model transferability, as a model trained to recognize species from one location might not work as well for the same species in different locations. Furthermore, these methods often require advanced computational skills, making them inaccessible to many biologists. We used 3 million camera trap images from 18 studies in 10 states across the United States of America to train two deep neural networks, one that recognizes 58 species, the “species model,” and one that determines if an image is empty or if it contains an animal, the “empty-animal model.” Our species model and empty-animal model had accuracies of 96.8% and 97.3%, respectively. Furthermore, the models performed well on some out-of-sample datasets, as the species model had 91% accuracy on species from Canada (accuracy range 36%–91% across all out-of-sample datasets) and the empty-animal model achieved an accuracy of 91%–94% on out-of-sample datasets from different continents. Our software addresses some of the limitations of using machine learning to classify images from camera traps. By including many species from several locations, our species model is potentially applicable to many camera trap studies in North America. We also found that our empty-animal model can facilitate removal of images without animals globally. We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths.</span> Bibliographic Citation A bibliographic reference for the resource. Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible. Tabak, M. A., M. S. Norouzzadeh, D. W. Wolfson, E. J. Newton, R. K. Boughton, J. S. Ivan, E. A. Odell, E. S. Newkirk, R. Y. Conrey, J. Stenglein, F. Iannarilli, J. Erb, R. K. Brook, A. J. Davis, J. Lewis, D. P. Walsh, J. C. Beasley, K. C. VerCauteren, J. Clune, and R. S. Miller. 2020. Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2. Ecology and Evolution 10:10374-10383. <a href="https://doi.org/10.1002/ece3.6692" target="_blank" rel="noreferrer noopener">https://doi.org/10.1002/ece3.6692</a> Creator An entity primarily responsible for making the resource Tabak, Michael A. Norouzzadeh, Mohammad S. Wolfson, David W. Newton, Erica J. Boughton, Raoul K. Ivan, Jacob S. Odell, Eric A. Newkirk, Eric S. Conrey, Reesa Y. Stenglein, Jennifer Iannarilli, Fabiola Erb, John Brook, Ryan K. Davis, Amy J. Lewis, Jesse Walsh, Daniel P. Beasley, James C. VerCauteren, Kurt C. Clune, Jeff Miller, Ryan S. Subject The topic of the resource Computer vision Deep convolutional neural networks Image classification Machine learning Motion-activated camera R package Remote sensing Species identification Extent The size or duration of the resource. 10 pages Date Created Date of creation of the resource. 2020-09-16 Rights Information about rights held in and over the resource <a href="http://rightsstatements.org/vocab/InC-NC/1.0/" target="_blank" rel="noreferrer noopener">In Copyright - Non-Commercial Use Permitted</a> <a href="https://creativecommons.org/licenses/by-nc/4.0/" target="_blank" rel="noreferrer noopener">Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)</a> Format The file format, physical medium, or dimensions of the resource application/pdf Language A language of the resource English Is Part Of A related resource in which the described resource is physically or logically included. Ecology and Evolution Has Part A related resource that is included either physically or logically in the described resource. <a href="https://github.com/mikeyEcology/MLWIC2" target="_blank" rel="noreferrer noopener">https://github.com/mikeyEcology/MLWIC2</a> Type The nature or genre of the resource Article