AWK library utilities

Data Mining
CS 510 (DM)
Winter,2004
home | news | site map
review | project | subject | group
weka | mining | gawk | bash
modeling | reference | pods
Display: big | small

Why all the scripting?

Motivation

Life is too short to re-code everything all the time.

Here are some useful AWK library functions


[TOP]


Usage

 gawk -f lib.awk -f yourOtherFiles

Installation

Copy the following files to your own directory: http://www.cs.pdx.edu/~timm/dm/lib.awk.

[TOP]


Source code

Useful Globals

 BEGIN {
  Q = "\"";
   _ = SUBSEP;
  Inf    = 1.99999*(2**(127)); 
  NegInf = -1*2**(126);     
  White="^[ \t\n\r]*$";
  Number="^[+-]?([0-9]+[.]?[0-9]*|[.][0-9]+)([eE][+-]?[0-9]+)?$";
 }

Printing magic

 function barph(str) {print str>"/dev/tty"}
 function die(str)   {barph(str); exit 1}

Array Magic

These array store the size of the array in position Array[0].

 function top(a) {return a[a[0]]}
 function push(a,x,  i) {i=++a[0]; a[i]=x; return i}
 function push2(a,x,y,  i) {i=++a[x,0]; a[x,i]=y; return i} 
 function pop(a,   x,i) {
  i=a[0]--;  
  if (!i) {return ""} else {x=a[i]; delete a[i]; return x}}

Arrays to strings.

 function saya(s,a) {print s; print a2s(a)}
 function a2s(a,  n,pre,i,str) {
   for(i in a) str= str pre "[" i "]=[" a[i] "]\n";
   return str;
 }

String magic

 function number(x)    { return x ~  Number  }
 function symbol(x)    { return ! number(x)  }
 function blank(x)     { return x~/^[ \t]*$/ }
 function trimLeft(x)  { sub(/^[ \t]*/,"",x); return x}
 function trimRight(x) { sub(/[ \t]*$/,"",x); return x}
 function trim(x, y)   { return trimRight(trimLeft(x))}
 function str2keys(str,keys,sep,   n,i,tmp) {
   n=split(str,tmp,sep);
   for(i in tmp) keys[tmp[i]];
   keys[0]=n;
 } 
 function pairs2nums(str,pairs,sep,   n,i,tmp) {
   n=split(str,tmp,sep);
   for(i=1;i<=n;i=i+2) {
     paris[tmp[i]]=tmp[i+1]+0;
     pairs[0]++;
   }
 }

Match magic

 function lhs(s) {return substr(s, 1, RSTART-1)}
 function mid(s,x,y) {return substr(s, RSTART + x, RLENGTH - x - y)}
 function rhs(s) {return substr(s, RSTART+RLENGTH)}

Number magic

 function odd(x)     {return x % 2}
 function even(x) {return ! odd(x)}
 function most(x,y)  { if (x>y) {return x} else {return y}}
 function least(x,y) { if (x<y) {return x} else {return y}}
 function round(x)   { if (x<0) {return int(x-0.5)} else {return int(x+0.5)}}
 function between(min,max) { 
        if (max==min) {return min} 
        else {return min+ ((max-min)*rand())}}
 function mean(sumX,n) {return sumX/n}
 function sd(sumSq,sumX,n) {return sqrt((sumSq-((sumX*sumX)/n))/(n-1))}

File exists

 function exists(file,        dummy, ret) {
   ret=0;
   if ( (getline dummy < file) >=0 ) {ret = 1; close(file)};
   return ret;
 }

Symbol table magic

The old symbol table trick. If the counter on this column is one...

 function novel(x,val,counter) {return ++counterl[x,val]== 1}

... then this is the first time we have seen this symbol. So we have something new to remember.

 function remember(x,val,symbols) {symbols[x,++symbols[x,0]]=val}

[TOP]


Credits

Author

Tim Menzies , tim@menzies.us, http://menzies.us

Software

This page generated by Site: see http://www.cs.pdx.edu/~timm/dm/site.html

Acknowledgements

This site is built using PerlPod.

Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.

Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).

Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.

The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/

[TOP]


Legal

Copyright

Copyright (C) Tim Menzies 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Disclaimer

The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.

[TOP]