Guidelines for Referring to Populations

It is important to use care in labeling the populations when publishing or presenting the findings of studies that used the samples. This document provides guidelines on how to refer to the populations.

Rationale

The way that a population is named in studies of genetic variation, such as in the HapMap or 1000 Genomes Projects, has important ramifications scientifically, culturally, and ethically. From a scientific standpoint, precision in describing the population from which the samples were collected is an essential component of sound study design; the source of the data must be accurately described in order for the data to be interpreted correctly. From a cultural standpoint, precision in labeling reflects respect for the local norms of the communities that agreed to participate in the research, and an acknowledgement that populations in one part of the world are not all the same. From an ethical standpoint, precision is part of the obligation of researchers to participants, and helps to ensure that the research findings are neither under-generalized nor over-generalized. The use of careless or inconsistent terminology when describing the populations represents a failure in all three of these areas. The populations whose samples are included in the NHGRI Repository should not be named in such a way that they single out small, discrete communities and imply that those communities are somehow genetically unique or of special interest. Labels that are too specific could also invade the privacy interests of communities (or even of individual sample donors).

On the other hand, describing the populations in terms that are too broad could result in inappropriate over-generalization. This could erroneously lead those who interpret data from studies that use the samples to equate ancestry with race (an imprecise and socially constructed category, which has very different meanings in various parts of the world). This could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination in places where members of the named populations or of closely related communities are minorities.

Recommended Descriptors

Recommended language has been developed for naming each population whose samples are included in the NHGRI Repository. Each recommended descriptor reflects the principles discussed above, as well as input from the sample donor communities about how they wished to be described.

The complete recommended language for naming the populations whose samples are included in the NHGRI Repository reflects both the ancestral geography or ethnicity of each population and the geographic location where the samples from that population were collected. Below are the official, approved descriptors for eacsh of the populations whose samples are in the Repository. After the complete descriptor for a population has been provided, it is acceptable to use the abbreviation for that population (e.g., “YRI,” “JPT,” “CHB,” “CEU") in the remainder of the article or presentation. However, the full descriptor for each population should be provided before the abbreviations are used; this will help to avoid the risks associated with over-generalization of findings.

Population Descriptor	Abbreviation
African Americans living in St. Louis, Missouri	ASL
African Ancestry in Southwest USA	ASW
African Caribbean in Barbados	ACB
Bengali in Bangladesh	BEB
British from England and Scotland	GBR
Chinese Dai in Xishuangbanna	CDX
Chinese in Metropolitan Denver, Colorado, USA	CHD
Colombian in Medellin, Colombia	CLM
Esan in Nigeria	ESN
Finnish in Finland	FIN
Gambian in Western Division – Mandinka	GWD
Gujarati Indians in Houston, Texas, USA	GIH
Han Chinese in Beijing, China	CHB
Han Chinese South, China	CHS
Iberian Populations in Spain	IBS
Indian Telugu in the UK	ITU
Japanese in Tokyo, Japan	JPT
Kinh in Ho Chi Minh City, Vietnam	KHV
Luhya in Webuye, Kenya	LWK
Maasai in Kinyawa, Kenya	MKK
Mende in Sierra Leone	MSL
Mexican Ancestry in Los Angeles, California, USA	MXL
Peruvian in Lima, Peru	PEL
Puerto Rican in Puerto Rico	PUR
Punjabi in Lahore, Pakistan	PJL
Sri Lankan Tamil in the UK	STU
Toscani in Italia	TSI
Yoruba in Ibadan, Nigeria	YRI

The sample sets should not be described as having come from “normal controls.” No phenotypic information was collected with the samples, so we do not know what medical conditions the donors had.

In some cases, in addition to providing the complete descriptor for each population when first describing the populations, it may be appropriate to describe the criteria that were used to assign membership in each population. This information can be found in the Population Descriptions for each specific population (follow links for each specific population above).