copyleft() {
cat<<-EOF
bars: print histogram of a column of data
Copyright (C) 2004 Tim Menzies
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation, version 2.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
EOF
}
Before learning, it's good to have a peek at data distributions.
usage() {
cat<<-EOF
Usage: bars [FLAGS]... FILE
Print histogram of a column of data
Flags:
-1 NUM space reserved for the histogram first column
(shows the bin's keys); default: $Q$one0$Q
-2 NUM space reserved for the histogram second column
(shows the bin's counter); default: $Q$two0$Q
-3 NUM space reserved for the histogram's third column
(shows the histogram bars); default: $Q$three0$Q
-c NUM column to process; default: $Q$c0$Q
-d CHAR column deliminters; default:$Q$sep0$Q
-h print this help text
-l copyright notice
-m CHAR mark for drawing histogram bars; default:$Q$mark0$Q
-r NUM rounding factor; default:$Q$round0$Q;
increase to decrese number of bins;
set to zero to disable rounding
-u don't sort output; default:$Q$unsort0$Q
-x run an example
EOF
}
How big are the files in my home directory right now? I could check using freqx ( http://www.cs.pdx.edu/~timm/dm/freqx.html ):
bash-2.05$ ls -sa $HOME | gawk 'NR>1{print $1}' | freqx
36 2
5 4
2 22
2 10
2 0
1 6
1 32
1 24
1 140
1 110
1 1
Can't read that. I want a histogram
bash-2.05$ ls -sa $HOME | gawk 'NR>1{print $1}' | bars
0| 44| ********************
10| 3| **
20| 3| **
30| 1| *
110| 1| *
140| 1| *
Hmm, want more details on those smaller files. So I'll decrease the rounding factor.
bash-2.05$ ls -sa $HOME | gawk 'NR>1{print $1}' | bars -r4
0| 3| ***
4| 41| ****************************************
8| 1| *
12| 2| **
24| 3| ***
32| 1| *
112| 1| *
140| 1| *
Copy the following files to your own directory: http://www.cs.pdx.edu/~timm/dm/bars and http://www.cs.pdx.edu/~timm/dm/bars.awk.
Make bars executable:
chmod +x bars
Edit your paths (see the section Settings. The first line of percentile should point to your local bash shell and the gawk variable (below) should point to your local version of gawk.
Check that all it works:
bars -x
If the installation worked, then you should see a histogram on the size of the files in your root directory. For me, that looks like:
0| 44| **********************
10| 3| **
20| 3| **
30| 1| *
110| 1| *
140| 1| *
Defaults:
c0="NF" sep0="," mark0="*" round0=10 unsort0=0 one0=4 two0=4 three0=40
Paths:
gawk="/pkgs/gnu/bin/gawk" junk="/tmp/bars$$"
Minor details:
Q="\""
barsDemo() {
ls -as $1 | $gawk 'NR>1 {print $1}' > ${junk}.data
main ${junk}.data
}
main() {
$gawk -F$sep -f bars.awk Collect="$c" Mark="$mark" NoSort="$unsort" \
Col1="$one" Col2="$two" Col3="$three"\
Round="$round" \
$1
}
BEGIN {
Command-line options
FS=","; #column seperator Round=10; #size of bins Collect=2; #column to process Mark="*"; #marks to draw histogram NoSort=0; #disable sorting of keys Col1=4; #width of histogram key column Col2=3; #width of histogram key value column Col3=20; #width of histogram bar display column
Internal globals
Num; #array where we store the numbers
Inf= 2**32; #the largest number we can process
Max = -1*Inf; #max count seen in any bucket
#initialized to the smallest number
Here; #the key of the current bucket
}
Convert the symbol ``NF'' to the number of the last field
NR==1 {if (Collect=="NF") Collect=NF;}
Collect the number, rounded.
{ if (Round) {
Here=round($Collect/Round)*Round}
else {Here=$Collect};
Num[Here]++;
if (Num[Here]>Max) Max=Num[Here];
}
Report
END {
if (NoSort) {
histogram(Num) }
else {
sortedgram(Num)}
}
Generate histogram bars for the entire histogram
function histogram(a, i) { for(i in a) print bar(a,i) }
Pre-sort the histogram and generated the bars in sorted order
function sortedgram(a, add,i,j,keys,n) {
for(i in a) {
if (Round) {
keys[j++]=i+0} #ensures numeric, not string, sort
else keys[j++]=i}
n=asort(keys);
for(i=1;i<=n;i++)
print bar(a,keys[i]);
}
Genrate a single histogram bar of Marks, resized according to the column3 width.
function bar(a,i, scale) {
if (Max < Col3) {
scale=1 }
else {scale=Col3/Max};
if (Round) {
return sprintf(" %" Col1".0f|%" Col2 "d| %s",\
i, a[i],string(round(a[i]*scale),Mark))}
else {
return sprintf(" %"Col1" s|%"Col2" d| %s",\
i, a[i],string(round(a[i]*scale),Mark))}
}
Round a number
function round(x) { return int(x+0.5) }
Generate a string n long of characters c.
function string(n,c, s) { while(n--) {s=s c}; return s}
demo=""
while getopts "1:2:3:c:d:hlm:r:ux" flag
do case "$flag" in
1) one=$OPTARG;;
2) two=$OPTARG;;
3) three=$OPTARG;;
c) c=$OPTARG;;
d) sep=$OPTARG;;
h) usage;exit;;
l) copyleft; exit;;
m) mark=$OPTARG;;
r) round=$OPTARG;;
u) unsort=1;;
x) demo="barsDemo $HOME";;
esac
done
shift $(($OPTIND - 1))
one=${one:=$one0}
two=${two:=$two0}
three=${three:=$three0}
c=${c:=$c0}
sep=${sep:=$sep0}
c=${c:=$c0}
mark=${mark:=$mark0}
round=${round:=$round0}
unsort=${unsort:=$unsort0}
[ -n "$demo" ] && $demo && exit;
main $*
Tim Menzies ,
tim@menzies.us,
http://menzies.us
This page generated by Site:
see http://www.cs.pdx.edu/~timm/dm/site.html
This site is built using PerlPod.Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.
Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).
Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.
The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/
Copyright (C) Tim Menzies 2004
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.