grouper-dev - RE: [grouper-dev] help with math homework ?
Subject: Grouper Developers Forum
List archive
- From: caleb racey <>
- To: "McDermott, Michael" <>, Chris Hyzer <>
- Cc: Tom Zeller <>, Grouper Dev <>
- Subject: RE: [grouper-dev] help with math homework ?
- Date: Fri, 6 Jan 2012 13:46:38 +0000
- Accept-language: en-GB
- Acceptlanguage: en-GB
Years back when we were setting up our data infrastructure I put a fair bit of effort into looking for a good torture test identity data set to test our data feeds e.g. a data set with names with apostrophes, accents, really long names, names without surnames, hyphenated names, multiple firstnames, multiple last names etc etc I has hoped that since pretty much everyone who has processed name date will have been bitten by false assumptions someone would have grouped together example of all the edge cases so you could test against them. Unfortunately all my research ever uncovered where numerous blog posts talking about how pretty much every assumption you make about name data format turns out to be false. I still think a good torture test data set would be immensely valuable but have yet to find an example. Generating fake data is a good approach but is unlikely to uncover all the nasty rare examples. cal From: [mailto:] On Behalf Of McDermott, Michael There are libraries in Perl and Ruby at least that do this sort of thing (generate fake data), would they be helpful? On Thu, Jan 5, 2012 at 11:23 PM, Chris Hyzer <> wrote: Because the prop100k is not the proportion of the top 1000 names, it is the proportion of all names. So if all names that exist, if you take a sample of 100k, then 880 of them are Smith. If you look at 100k of the top 1000 names, then 2000 of them are Smith. The names you have in the list are only ~40% of all names. So if you take 2000 and multiple by 40% you get ~800... in this case, I think the 2000 is fine, (well, actually I think a cartesian product of 200 first and 500 last names is fine, but if we want to go weighted, then 2000 smiths works for me :) )
-- |
- [grouper-dev] help with math homework ?, Tom Zeller, 01/05/2012
- RE: [grouper-dev] help with math homework ?, Chris Hyzer, 01/05/2012
- Re: [grouper-dev] help with math homework ?, McDermott, Michael, 01/06/2012
- RE: [grouper-dev] help with math homework ?, caleb racey, 01/06/2012
- Re: [grouper-dev] help with math homework ?, Peter Schober, 01/06/2012
- Re: [grouper-dev] help with math homework ?, Tom Zeller, 01/12/2012
- Re: [grouper-dev] help with math homework ?, GW Brown, Information Systems and Computing, 01/13/2012
- Re: [grouper-dev] help with math homework ?, Tom Zeller, 01/12/2012
- Re: [grouper-dev] help with math homework ?, Peter Schober, 01/06/2012
- RE: [grouper-dev] help with math homework ?, caleb racey, 01/06/2012
- Re: [grouper-dev] help with math homework ?, McDermott, Michael, 01/06/2012
- RE: [grouper-dev] help with math homework ?, Chris Hyzer, 01/05/2012
Archive powered by MHonArc 2.6.16.