I was recently looking at the latest issue of R Weekly when a post by Kaylin Pavlik caught my attention. In her article titled “Exploring the Relationship Between Dog Names and Breeds”, using some complex hierarchical clustering statistical tools, Kaylin plotted a dendrogram to explore which dog breeds are given similar names in New York City (NYC). When looking at her beautiful graphs, I wondered if dogs owners in NYC preferred large or small dogs. My first thought was that in a busy city like NYC, given the limited amount of living space, most dogs would be small in size. On the other hand, the Labrador Retriever is the most popular dog breed in the USA (see ranking), and so I wondered if it was also the most popular dog breed in NYC. In addition to these questions, being the happy owner of a smart, beautiful and very large (about 95 pounds) Rough Collie myself, I was curious to discover if any Rough Collies lived in NYC. To explore these questions, I used the same database as the one used by Kaylin.
The database comes from the project “Dogs of NYC” from the Department of Health and Mental Hygiene of New York City. This dataset is freely available and updated annually when dog owners register their dogs to get their license. Unfortunately, the most recent data are not yet publicly available, so I used the most recent available data, which is from January 2013. A total of 81,542 dogs are listed, and variables like gender, age, dominant colours, name, and zip code are described. To analyze the data, I used R, which is my favourite statistical tool. I downloaded the data from the website and started by examining how many different dog breeds live in NYC. I discovered that a total of 138 different breeds are listed in the database, which means that almost every breed recognized by the AKC lives in NYC!
Finding breeds’ mean weight
Using the website of the American Kennel Club’s (AKC) website, I classified the dog breeds identified in NYC into the seven different groups defined by the AKC (toy, working, herding, etc.). In addition, still using the AKC website, I recorded the minimum and maximum weight (in pounds) of males and females for each breed. However, some very popular crossbreeds (e.g. Puggle and Labradoodle) listed in the database are not purebred dogs according to the AKC classification and so no data were available for these “breeds”. In addition, although other crossbreeds clearly display the physical characteristics of one breed (e.g. Beagle crossbreed, Labrador Retriever crossbreed, etc.), I could not find any data on the weight range of these breeds.
To resolve these issues, I decided to attribute the same weight to these crossbreeds as their dominant breed. For instance, the German Shepherd weights (minimum and maximum) were attributed to the German Shepherd crossbreed. The same approach was used for all major crossbreeds identified in the database. I used this approach since many small dogs are not purebred but the majority of owners classify them according to their dominant breed. Actually, running a genome test on all the dogs in the dataset would probably reveal that most of the dogs listed as purebred are actually not purebred. Due to this I am quite comfortable with the idea of grouping crossbreeds with their dominant breed. In spite of this, however, a large category (“Mixed/Other”) was discarded from my analysis because I could not identify any particular breed associated with these dogs (a total of 23,185 dogs).
Some data manipulations
I made some transformations to the data. First, I removed one breed (Fila Brasileiro) that is unclassified by the AKC. Then, I combined some sub-breeds that are classified as the same breed by the AKC (e.g. Rough and Smooth Coat Collies are classified as a unique breed – the Collie). Next, as mentioned earlier, I combined some crossbreeds with their dominant breed. Also, since I was unable to get a minimum or maximum weight value for each breed, I replaced all non available values (NA) using the proportional difference between the minimum and maximum weights of all breeds listed in my dataset: unknown minimum weight values were estimated by multiplying the maximum weight values by 0.925 and unknown maximum weight values were estimated by multiplying the minimum weight values by 1.09. Finally, I grouped the data for each breed, summarized them, and calculated the mean body weight of each dog breed.
Exploring the relationship between number of dogs and group mean weight
First, let explore the relationship between the number of dogs in each group and the mean weight of the dogs in each group:
As you can see, this graph clearly reveals that New Yorkers prefer toy dogs (see the yellow dot in the upper left corner). This graph also suggests that the number of dogs is inversely proportional to the size of the breeds per group. The number of dogs per group decreases as a function of the mean weight of the dogs within each group. The relationship is far from being linear, but nevertheless suggests that smaller dogs are more likely to live in NYC than larger dogs. Thus, it seems that my hypothesis is partially supported by these data.
Exploring the relationship between the number of dogs for each bread as a function of the mean body weight of each bread.
Given that the data illustrated above combine the breeds for each of the seven groups defined by the AKC, it is probably more interesting to plot individual breeds as a function of their mean body weight. So here we go:
This plot is of particular interest for several reasons. It reveals that four of the five most popular dog breeds in NYC are toy dogs. This observation clearly confirms that New Yorkers prefer small dogs. This plot also shows that the Labrador Retriever, although a relatively large breed, is quite popular in NYC, with the Pit Bull and the Beagle not too far behind. Finally, and this is very surprising to me, this graph also reveals that in spite of their massive body weight and size, some very large working breeds (e.g. Mastiff, Saint Bernard and Great Dane) live in NYC (see the bottom right of the graph).
In summary, the portrait of dogs in NYC is quite different from the one across the USA. In USA, the most popular breeds are the Labrador Retriever, the German Shepherd Dog and the Golden Retriever, which are all large dog breeds. In NYC, the most popular breeds are the Yorkshire Terrier, the Shih Tzu (two small breeds) and the Labrador Retriever. Thus, it seems that living in a large city like NYC increases the probability that dogs’ lovers share their life with small dogs. This observation is possibly due to: (1) limited amount of living space, (2) pet’s weight restriction imposed by building/apartment owners, and (3) the popular belief that smaller dogs, comparatively to larger dogs, are well-adapted for living in a house with no or very limited backyard.
And the number of Rough Collies is…
And finally, to satisfy my personal curiosity, the number of Rough Collies in 2013 in New York City was… 23!