[Facts] Re: Best source for US given name popularities? Project help needed
in reply to a message by Tempestgirl
I think you are wrong about the SSA data being based on "1% of total births". The "Background Information" page on their site now says:
All names are from Social Security card applications for births that occurred in the United States after 1879. Names are restricted to cases where the year of birth, sex, State of birth (50 States and District of Columbia) are on record, and where the given name is at least 2 characters long. Many people born before 1937 never applied for a Social Security card, so their names are not included in our data. For others who did apply, our records may not show the place of birth, and again their names are not included in our data.
All data are from a 100% sample of our records on Social Security card applications as of the end of February 2007.
All names are from Social Security card applications for births that occurred in the United States after 1879. Names are restricted to cases where the year of birth, sex, State of birth (50 States and District of Columbia) are on record, and where the given name is at least 2 characters long. Many people born before 1937 never applied for a Social Security card, so their names are not included in our data. For others who did apply, our records may not show the place of birth, and again their names are not included in our data.
All data are from a 100% sample of our records on Social Security card applications as of the end of February 2007.
Replies
I had read the same statement you posted, but then came across "Actuarial Note #139, Name Distributions in the Social Security Area, August 1997", found at the bottom of http://www.socialsecurity.gov/OACT/babynames/index.html#forms , which states:
Between 1954 and 1984 the Social Security Administration occasionally published a listing of surnames and their popularity in the Report of Distribution of Surnames in the Social Security Number File. This note expands on that project by presenting the most popular given names.
The source file is a one percent sample of Social Security Number card applications. ...This file is not limited to persons born in the United States but is representative of all Social Security Number card holders.
For purposes of this document, names have not been edited or grouped together according to spelling variations of the same name. People quoting from this document are urged to explicitly acknowledge this qualification.
This was published in the summer of 1998, and I wasn't sure if it related to that year or the survey as a whole. It is misleading having both criteria listed, and I haven't been successful contacting anyone regarding it. Obviously, popularity research is not their first priority and they mention that, but it'd be nice to have a clear answer. How would you take this information? I'm now inclined to believe that the sample does indeed use 100% of SS applicants as the website was updated May of last year and I'd hope the information would be up to date and correct.
What are your thoughts? Thanks for your assistance,
Tempestgirl
Between 1954 and 1984 the Social Security Administration occasionally published a listing of surnames and their popularity in the Report of Distribution of Surnames in the Social Security Number File. This note expands on that project by presenting the most popular given names.
The source file is a one percent sample of Social Security Number card applications. ...This file is not limited to persons born in the United States but is representative of all Social Security Number card holders.
For purposes of this document, names have not been edited or grouped together according to spelling variations of the same name. People quoting from this document are urged to explicitly acknowledge this qualification.
This was published in the summer of 1998, and I wasn't sure if it related to that year or the survey as a whole. It is misleading having both criteria listed, and I haven't been successful contacting anyone regarding it. Obviously, popularity research is not their first priority and they mention that, but it'd be nice to have a clear answer. How would you take this information? I'm now inclined to believe that the sample does indeed use 100% of SS applicants as the website was updated May of last year and I'd hope the information would be up to date and correct.
What are your thoughts? Thanks for your assistance,
Tempestgirl
Sorry I didn't see this until today.
I believe that back when the SSA site did not have year-by-year lists before 1990, but only had decade by decade lists for the 1880s, 1890s, 1900s, 1910s, etc., that those lists were based on a 1% sample of all people who had birthdates listed in those decades. But I believe that when the SSA expanded its site to include lists for every individual year from 1880 on, that it created those lists from the full data set. I think that if they were still using just a 1% sample, there would have to be more ties in the numbers the report for the names on their year by year lists than there actually are. :)
I believe that back when the SSA site did not have year-by-year lists before 1990, but only had decade by decade lists for the 1880s, 1890s, 1900s, 1910s, etc., that those lists were based on a 1% sample of all people who had birthdates listed in those decades. But I believe that when the SSA expanded its site to include lists for every individual year from 1880 on, that it created those lists from the full data set. I think that if they were still using just a 1% sample, there would have to be more ties in the numbers the report for the names on their year by year lists than there actually are. :)
Thanks!(