Assumption Management

Data Mining
CS 510 (DM)
Winter,2004
home | news | site map
review | project | subject | group
weka | mining | gawk | bash
modeling | reference | pods
Display: big | small

Why all the scripting?
 copyleft() {
        cat<<-EOF
        assume: guess variables from known legal ranges
        Copyright (C) 2004 Tim Menzies
        This program is free software; you can redistribute it and/or
        modify it under the terms of the GNU General Public License
        as published by the Free Software Foundation, version 2.
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
        You should have received a copy of the GNU General Public License
        along with this program; if not, write to the Free Software
        Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
        EOF
  }

Motivation

This is a demonstrator of simple assumption management.

For years, I've been building abductive inference engines that found all consistent subsets from a theory that lead to desired goals [Me96a]. Aduction is nice since it generates multiple options and lets you reason about them explictedly. Which is good for things like exploring requirements, model-based diagnosis, and much more besides.

Sad to say, all those adbuctive devices ran in time exponential on model size. So I started doing experiments with stochastic abduction. That is, generate one randomly selected world, then reset the assumptions, then build another, etc. etc. A repeated observation was that a generating a few randomly selected worlds reached nearly as many goals as found in any of the possible worlds.

So now, when exploring a space of options, I just randomly pick variables from known ranges, then give the results to a machine learner to find the key variables that distinguish between worlds.

This tool is the engine room of random variable selection stuff and works on variables with known types (list of legal values).

[TOP]


Usage

 usage() {
        cat<<-EOF
        Usage: assume  [FLAG]... FILE
        assume: simple assumption management
        Flags: 
          -b        brave mode- ignore certain runtime checks
          -h        print this help text
          -l        copyright notice
          -r FILE   read restraints from FILE; default=$Q$restraints0$Q
          -t FILE   read standard types from FILE;
                    default=$Q$types00$Q
          -x        run an example
        EOF
 }

Concepts

Types are defined using the syntax

 %RID typeLabel = RANGE

where RID is one of REAL, INTEGER, or DISCRETE. RANGE contains min to max values (for numbers) or a list of legal values (for discretes)

Variables are defined using the syntax

 %DEF varLabel : typeLabel

This lets multiiple variables share the same type. For variables that have a unique type, there is another syntax:

 %DEF varLabel = RID RANGE

If something is defined twice, then the second definition overrides the first. This is very handy when experimenting with control settings to a model. Some main types file can store all the usual types and some smaller restraints file can offer restrictions on the usual ranges. This smaller restraints file represents some control constraints that might be learnt by (e.g.) a machine learner.

Examples

A sample types file:

 #Real types are selected randomly from min to max
 %REAL     posint    =   0 Inf
 #Integer types are selected randomly from max to min, 
 #then rounded to ints.
 %INTEGER  percent   =   0 100
 #Discrete types are selected randomly from a
 #list of symbols
 %DISCRETE vlvh      =   n l h vl vh
 #Variables of a certain type
 %DEF scedPercent    : percent
 %DEF revl           : posint
 %DEF wheels         : vlvh
 %DEF acap           : vlvh
 #Variables with their own special type.
 %DEF loves          = REAL 0 10
 %DEF friends        = INTEGER 20 100

A sample restraints file:

 %DEF acap           = DISCRETE h vh

Installation

First, if you have installed anything from this site before, save your config file to somewhere safe.

Second, copy the following files to your directory (from either ~timm/public_html/dm or http://www.cs.pdx.edu/~timm/dm or from http://www.cs.pdx.edu/~timm/dm/assume.zip): config, assume, types0.dat, restraints0.dat, assume.awk, and assumeeg.awk.

Third, make assume executable:

 chmod +x assume

Fourth, compare your safe version of config with the new version you just copied and fix up any paths.

Five, edit this file and config. The first line of this file should point to your local bash shell. and you'll need to check at least the #paths section in config.

Check that all it works:

 assume

If the installation worked, then you should see a table where some variable is accessed N times, but only changes values after we forget the cached values. For example:

 scedPercent |       revl | wheels | acap | loves | friends
 ---------------------------------------------------------- <= forget
          26 |  412698134 |      h |   vh |  2.94 |  35
          26 |  412698134 |      h |   vh |  2.94 |  35
          26 |  412698134 |      h |   vh |  2.94 |  35
 ---------------------------------------------------------- <= forget
          60 | 1806293207 |      l |   vh |  5.59 |  27
          60 | 1806293207 |      l |   vh |  5.59 |  27
          60 | 1806293207 |      l |   vh |  5.59 |  27
 ---------------------------------------------------------- <= forget
          40 |  905520028 |      l |    h |  4.09 |  68
          40 |  905520028 |      l |    h |  4.09 |  68
          40 |  905520028 |      l |    h |  4.09 |  68
 ---------------------------------------------------------- <= forget
          90 | 3365035684 |     vh |    h |  5.65 |  53
          90 | 3365035684 |     vh |    h |  5.65 |  53
          90 | 3365035684 |     vh |    h |  5.65 |  53
 ---------------------------------------------------------- <= forget
          80 | 3100432077 |     vh |    h |  0.02 |  96
          80 | 3100432077 |     vh |    h |  0.02 |  96
          80 | 3100432077 |     vh |    h |  0.02 |  96

[TOP]


Source code

Settings

Defaults:

 nervous0="1"
 types0="types0.dat"
 restraints0="restraints0.dat"

Paths:

 . config

Minor details:

 Q="\""

Demo code

 assumeDemo() { 
        $gawk -f lib.awk -f assume.awk -f assumeeg.awk \
                 Nervous=$nervous $types $restraints
 }

assume.awk: The Worker

 BEGIN {

Command line variables

   Nervous=1;           #enables certain checks. disable to make run faster

Internal variables (not for command-line)

   _Ok           =1000  #magic flags defining status when processing types
   _NotKnown     =1001; 
   _BadRange     =1002;
   _Contradiction=1003;
   _Real         =1004; #magic flags defining each type
   _Integer      =1005;
   _Discrete     =1006;
   ASSUMED;             #place to store our assumptions

Start-up actions

   srand();
 }

Declare that a variable has its own special type

 /^[ \t]*%%/ {next}
 /^[ \t]*%DEF/ && $3=="=" && $4=="DISCRETE" {defineSpecial($2,_Discrete); next}
 /^[ \t]*%DEF/ && $3=="=" && $4=="INTEGER"  {defineSpecial($2,_Integer); next}
 /^[ \t]*%DEF/ && $3=="=" && $4=="REAL"     {defineSpecial($2,_Real); next}
 function defineSpecial(label,what,  temp) {
   temp=specialType(label);
   if (what ==  _Discrete) { 
     makeDiscrete(temp,1) }
   else {makeNum(temp,what,inf($5), inf($6))};
   define($2,temp);
 }
 function specialType(stem) { return stem "_" 1000000*rand() }

Define numeric types

 /^[ \t]*%REAL/    {makeNum($2, _Real,    inf($4), inf($5))}
 /^[ \t]*%INTEGER/ {makeNum($2, _Integer, inf($4), inf($5))}  
 function inf(n,  tmp) {
   tmp=n;
   if (n=="Inf")  
     tmp=Inf;
   else 
     if (n=="-Inf") 
       tmp= -1*Inf; 
   return tmp; 
 }
 function makeNum(label,type,min,max) {
   Type[label]= type;
   Type[label,"min"]=min;
   Type[label,"max"]=max;
 }

Define discrete types

 /^[ \t]*%DISCRETE/ {makeDiscrete($2)}
 function makeDiscrete(label,offset,   i) {
  unMakeDiscrete(label);
  Type[label]= _Discrete;
  for(i=4+offset;i<=NF;i++)  {
    pos=++Type[label,0];
    Type[label,pos]=$i; }
 }
 function unMakeDiscrete(label,  i) {
   delete Type[label]; 
   for(i=1;i<=Type[label,0];i++) 
     delete Type[label,i];
   delete Typle[label,0];
 }

Declare that a variable comes from a known type

 /^[ \t]*%DEF/ && $3=":" {define($2,$4)}
 function define(label,type) {
   if (Nervous && !knownType(type)) return die("unknown type " type);
   Range[label]=type;
 }

Checking that a type or range is known

 function knownType(x)   {return x in Type }
 function knownVar(x)     {return x in Range }

Checking the type of each variable

 function integer(x)      {return Type[Range[x]]== _Integer  } 
 function real(x)         {return Type[Range[x]]== _Real  } 
 function discrete(x)     {return Type[Range[x]]== _Discrete }
 function numeric(x)      {return integer(x) || real(x)  }

For numeric variables, we can find their min and max.

 function leastRange(x)        {return Type[Range[x],"min"]}
 function mostRange(x)         {return Type[Range[x],"max"]}

Discrete variables have cardinality equal to their number of distinct values.

 function cardinality(x)  {return Type[Range[x],0]}

Accessing the nth value of discrete variables

 function nthValue(x,n)   {return Type[Range[x],n]}

Show everything we know about a variable

 function describe(x, c,i,r,str,t) {
   if (! knownVar(x)) 
     return barph("unknown var " x);
   r=Range[x];
   if (! knownType(r)) 
     return barph("unknown type " r);
   t=Type[r];
   str= "[" x "] of [" r "]";
   if (t == _Discrete) {
     str= str  " is [_Discrete]";
     c=cardinality(x);
     str= str " has [" c "] values:\n\t'" nthValue(x,1);
     for(i=2;i<=c;i++) 
       str= str "' and '" nthValue(x,i);
     str=str "'";
   }
   else {
     if (t == _Integer)  
       str= str  " is [_Integer]";
     if (t == _Real)     
       str= str  " is [_Real]";
     str=str " in [" leastRange(x) ".." mostRange(x) "]"}; 
   print str;
 }

Checking if some values satisfies the type rules

 function legal(x,y,    i,j) {
  if (numeric(x)) {
    return y >= leastRange(x) && y <= mostRange(x)}
  else {
    i=cardinality(x)+1;
    while(--i) if (nthValue(x,i)==y) return 1;
    return 0;}
 }

Assuming variable values

Return a value that is picked randomly from that variable's type. As a side-effect, if this variable has no current value, cached the picked value.

 function assume(x) {
   if (Nervous && !knownVar(x)) return die("unknown variable " x);
   if (x in Assumed) {
     return Assumed[x]}
   else {return Assumed[x]=anyLegalValue(x)}
 }

Return a randomly selected legal value for variable x.

 function anyLegalValue(x,n)    {
   if (numeric(x)) {
     n=between(leastRange(x),mostRange(x));
     #bug here
     #if (integer(x))  n=round(n);
      }
   else {
     n=nthValue(x,round(between(1,cardinality(x))));
   };
   return n;
 }

Forget

Forget all cached values.

 function forget() { delete Assumed }

Assert variable values

Try to assert that variable x has value y. This is useful for pre-asserting some intiial settings prior to exploring other assumptions.

 function assert(x,y) {
   if (Nervous) {
     if (!knownVar(x)) return _NotKnown;
     if (!legal(x,y))  return _BadRange;
     if (x in Assumed && Assumed[x] != y) return _Contradiction};
   Assumed[x]=y;
   return _Ok;
 }

assumeeg.awk: Demo code

 END {
  j=5; 
  printf("%12s | %10s | %6s | %4s | %5s | %3s\n",\
         " scedPercent", "revl", "wheels", \
         "acap", "loves", "friends"); 
 while(j--) {
   print " -------------------"\
         "-------------------"\
         "-------------------- <= forget";
   forget(); 
   i=3
   while(i--) 
     printf("%12s | %10.0f | %6s | %4s | %5.2f | %3s\n",\
            assume("scedPercent"), assume("revl"), assume("wheels"),\
            assume("acap"), assume("loves"), assume("friends"));}
}

Command line processing

 while getopts "bhlp:s:x" flag
 do case "$flag" in
        b)  nervous=0;;
        h)  usage; exit ;;
        l)  copyleft; exit;;
        p)  project=$OPTARG;;
        s)  standard=$OPTARG;;
    esac
 done
 shift $(($OPTIND - 1))
 nervous=${nervous:=$nervous0}
 types=${types:=$types0}  
 restraints=${restraints:=$restraints0}
 assumeDemo

References

Me96a
Applications of Abduction: Knowledge Level Modeling by T.J. Menzies (1996) International Journal of Human Computer Studies, Vol. 45, pages 305-355, http://menzies.us/pdf/96abkl.pdf.

[TOP]


Credits

Author

Tim Menzies , tim@menzies.us, http://menzies.us

Software

This page generated by Site: see http://www.cs.pdx.edu/~timm/dm/site.html

Acknowledgements

This site is built using PerlPod.

Style sheet switching method taken from Eddie Traversa's excellent and simple-to-apply tutorial: http://dhtmlnirvana.com/content/styleswitch/styleswitch1.html.

Search engine powered by ATOMZ http://www.atomz.com/search/. Note, the indexes to this site are only updated weekly (heh, its a free service- what more ja want?).

Icons on this site come from http://www.sql-news.de/rubriken/olap.asp and http://www.ifnet.it/webif/centrodi/eng/toolbar.htm.

The JAVA machine learners used at this site come from the extensive data mining libraries found in the University of Waikato's Environment for Knowledge Analysis (the WEKA) http://www.cs.waikato.ac.nz/ml/weka/

[TOP]


Legal

Copyright

Copyright (C) Tim Menzies 2004

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2; see http://www.gnu.org/copyleft/gpl.html. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Disclaimer

The content from or through this web page are provided 'as is' and the author makes no warranties or representations regarding the accuracy or completeness of the information. Your use of this web page and information is at your own risk. You assume full responsibility and risk of loss resulting from the use of this web page or information. If your use of materials from this page results in the need for servicing, repair or correction of equipment, you assume any costs thereof. Follow all external links at your own risk and liability.

[TOP]