Which law?

Oct. 9th, 2006 08:24 pm
sawyl: (Default)
[personal profile] sawyl
The name of the law I was completely and utterly unable to remember at lunch time was Benford's.

Date: 2006-10-09 07:53 pm (UTC)
From: [identity profile] gleet.livejournal.com
Benford's Law is fun:

bruno paulstevenson $ find . -type f -exec ls -l {} \; | awk '{print $5}' | cut -c1-1 | sort | uniq -c
23674 1
14912 2
10707 3
8644 4
7967 5
7423 6
3456 7
3040 8
6472 9

(don't know what 's so special about 9)

Date: 2006-10-10 07:45 pm (UTC)
From: [identity profile] sawyl.livejournal.com
Very cool. Here are my numbers:
  23810 eld004> find . -type f -printf "%s\n" | awk '{f[substr($1,0,1)]++;t++}
  > END{for(i=1;i<11;++i){printf("%d  %5.2f\n",i%10,100*f[i%10]/t)}}'
  1  45.67                                                                        
  2  27.94                                                                        
  3   9.99                                                                        
  4   4.06                                                                        
  5   3.07                                             
  6   2.37                                                                       
  7   2.17                                                    
  8   2.12                                                       
  9   2.19                                                       
  0   0.43                                                    
  23811 eld004>

What do you reckon? Could this be the birth of a new meme?

Date: 2006-10-11 09:36 pm (UTC)
From: [identity profile] gleet.livejournal.com
ooh, you win with awk cleverness, but it could make a good meme in certain cirles. Unfortunately, I don't think I know anyone else who could do it.

Date: 2006-10-11 10:14 pm (UTC)
From: [identity profile] gleet.livejournal.com
clearly I meant circles, but that printf in the find arg list is not known to my mac or sun. Is it a common extension?

Date: 2006-10-12 08:15 am (UTC)
From: [identity profile] sawyl.livejournal.com
I think the -printf option is a GNU special. The portable way to do it is probably:

find . -type f | xargs ls -l | awk '{f[substr($5,0,1)]++;t++}
END{for(i=1;i<11;++i){printf("%d %5.2f\n",i%10,100*f[i%10]/t)}}'

Date: 2006-10-12 08:24 am (UTC)
From: [identity profile] gleet.livejournal.com
Or avoiding the empty files, and I guess stat is a little more efficient than ls, though I haven't timed both ways:

find . -type f ! -empty -exec stat -f %z {} \; | awk '{f[substr($1,0,1)]++;t++} END{for(i=1;i<11;++i){printf("%d %5.2f\n",i%10,100*f[i%10]/t)}}'

Date: 2006-10-12 08:23 am (UTC)
From: [identity profile] sawyl.livejournal.com
The awk thing is a holdover from the Cray days, which used to have problems (http://www.spikynorman.dsl.pipex.com/CrayWWWStuff/Cfaqp2.html#TOC7) launching new processes. We used to have macho competitions to see who could do the most with the fewest number of child processes, all of which culminated in my writing a vast, rambling, workload manager daemon in horrific combination of awk and Korn shell. Ouch.

Profile

sawyl: (Default)
sawyl

August 2018

S M T W T F S
   123 4
5 6 7 8910 11
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 6th, 2026 03:17 am
Powered by Dreamwidth Studios