grouper-dev - [grouper-dev] help with math homework ?
Subject: Grouper Developers Forum
List archive
- From: Tom Zeller <>
- To: Grouper Dev <>
- Subject: [grouper-dev] help with math homework ?
- Date: Thu, 5 Jan 2012 21:18:25 -0600
So I have an AI to grab 100,000 names from the Internet and provision
them to ldap for grouperdemo.
Fortunately google exists, who clued me in that the census provides us
with a spreadsheet of the most common 1,000 surnames and their
proportion per 100,000 names. It looks like :
# From http://www.census.gov/genealogy/www/data/2000surnames
#
name,rank,count,prop100k,cum_prop100k,pctwhite,pctblack,pctapi,pctaian,pct2prace,pcthispanic
SMITH,1,2376206,880.85,880.85,73.35,22.22,0.4,0.85,1.63,1.56
...
I do not know what "rank" is, but the technical documentation from the
census data website may be of interest to identity matchers needing to
clean up name data. They had to clean up OCR.
Anyway, my issue is that the weighted random collection I googled for
returns approximately 2,000 SMITHs every time I generate 100,000
random surnames, when it should return 880.05 (prop100k from the
census data).
The weighted random collection is from
http://stackoverflow.com/questions/6409652/random-weighted-selection-java-framework
which lists the question as
"closed as not a real question"
which is perhaps why I get 2,000 SMITHs instead of 880.05.
public class RandomCollection<E> {
private final NavigableMap<Double, E> map = new TreeMap<Double, E>();
private final Random random;
private double total = 0;
public RandomCollection() {
this(new Random());
}
public RandomCollection(Random random) {
this.random = random;
}
public void add(double weight, E result) {
if (weight <= 0) return;
total += weight;
map.put(total, result);
}
public E next() {
double value = random.nextDouble() * total;
return map.ceilingEntry(value).getValue();
}
}
which I populated it from the census data
RandomCollection<String> surnames ...
surnames.add(Double.parseDouble(prop100k)/100000.0, name);
where prop100k and name match the columns of the census data.
Any ideas ?
Thanks,
TomZ
P.S. I thought this would help with real-time provisioning testing as well.
- [grouper-dev] help with math homework ?, Tom Zeller, 01/05/2012
- RE: [grouper-dev] help with math homework ?, Chris Hyzer, 01/05/2012
- Re: [grouper-dev] help with math homework ?, McDermott, Michael, 01/06/2012
- RE: [grouper-dev] help with math homework ?, caleb racey, 01/06/2012
- Re: [grouper-dev] help with math homework ?, Peter Schober, 01/06/2012
- Re: [grouper-dev] help with math homework ?, Tom Zeller, 01/12/2012
- Re: [grouper-dev] help with math homework ?, GW Brown, Information Systems and Computing, 01/13/2012
- Re: [grouper-dev] help with math homework ?, Tom Zeller, 01/12/2012
- Re: [grouper-dev] help with math homework ?, Peter Schober, 01/06/2012
- RE: [grouper-dev] help with math homework ?, caleb racey, 01/06/2012
- Re: [grouper-dev] help with math homework ?, McDermott, Michael, 01/06/2012
- RE: [grouper-dev] help with math homework ?, Chris Hyzer, 01/05/2012
Archive powered by MHonArc 2.6.16.