It is available by anonymous-ftp from: ftp.ee.pdx.edu in /pub/users/cat/rootd
The best way to view the faq is via the www, from http://www.ee.pdx.edu/~rootd/gene-linkage.html
I also send the FAQ to news.answers, and to Dave Kristofferson, so it should be included in the "standard" FAQ archives. Of course, I won't be able to test that till after this goes out :-(
I am Darrell Root, and I'm editing this in my own time. Unfortunately, I don't have all that much free time, so this FAQ is sorta haphazard and has some obvious holes (for example, some of the "software packages for linkage analysis" answers point out ftp sites which are not included in the "ftp site list". In addition, I haven't double-checked much of the information which I received from people (and I may have made a typo or two), so if something appears incorrect, you're probably right.
Many thanks to everyone who sent me tons of information after the FAQ revision 1. Unfortunately, that's when things started to get "busy" and I'm just now doing the update (SIX MONTHS LATER). In addition, I moved the faq www site from http://www.ee.pdx.edu/rootd/gene-linkage.html to http://www.ee.pdx.edu/~rootd/gene-linkage.html Sorry about that.
Tim Trautmann (timt@ee.pdx.edu) adapted the FAQ for www/Mosaic use (before I learned html). He's responsible for all the wonderful hypertext/ftp links. Great work Tim! (I'm afraid my hurried edits to get this revision out have not been perfect, and the FAQ's formatting is a little messed up--this is entirely my fault due to my haste: timt's formatting was perfect...)
This FAQ is not perfect, in fact, it's not even pretty. During my 18 months doing linkage analysis work, I searched the net trying to find stuff, and used up a bunch of time. This FAQ is sufficiently disorganized that it may take you half-a-day to sort through it, but I hope that will save you some time.
On a personal note, I'm continuing my career as a system administrator, and am no longer doing genetic linkage analysis. If I have time, I'll incorporate corrections/additions that people email me (rootd@ee.pdx.edu), but I'm not actively searching/editing the faq. In addition, someone who is doing linkage analysis would almost certainly do a better job (assuming they have the time :-). For this reason, I'm placing this FAQ in the public domain so anyone who wants to take over editing it can do so without restriction. If you have the time, and want to be a FAQ maintainer, send me some email.
My eternal thanks to those who sent me information. My repeated apologies for not updating the FAQ for six months.
Think back to the old times. What do you understand now, that you didn't understand then? What lack of knowledge caused you to waste the most time? What information would have helped you become productive more quickly? Share your hard-earned lessons with others!
There are a couple areas where I'd like to specifically request assistance:
Conference schedules/information (too volatile for a FAQ, let the journals handle it...but there's a nice gopher site in our gopher section :-)
Oops. My mistake. I tried to keep a list of everyone and their contribution, but didn't completely succeed (translation: I failed). My apologies. Send me email and I will make appropriate corrections...
There are many more sites with useful stuff. Email information to rootd@ohsu.edu and I will add them to this list.
There is a database program called archie, which maintains a list of all files in registered anonymous-ftp sites. You can telnet to an archie server, and have it search the database. Each site is updated every 30 days, so very recently posted programs might not be listed yet.
To use archie, you need to telnet to one of the archie server sites, which are:
(thanks to O'Reilly's Internet book for this list)
Use the login name "archie" and nothing as your password. Here is a simple archie login an search:
bigbox% telnet archie.unl.edu login: archie password: <--just hit return, not like anonomous-ftp unl-archie> find linkmap # Search type: sub. # Your queue position: 2 # Estimated time for completion: 00:24 working... - Host gatekeeper.dec.com (16.1.0.2) Last updated 21:04 9 Apr 1994 Location: /contrib/src/pa/m3-2.07/src/driver/boot-DS3100 FILE -rw-r--r-- 4000 bytes 23:00 2 Jun 1992 M3LinkMap_i.c FILE -rw-r--r-- 14027 bytes 23:00 2 Jun 1992 M3LinkMap_m.c Location: /contrib/src/pa/m3-2.07/src/driver/linker/src FILE -rw-r--r-- 1307 bytes 00:00 4 Dec 1991 M3LinkMap.i3 FILE -rw-r--r-- 3078 bytes 00:00 4 Dec 1991 M3LinkMap.m3 unl-archie>Unfortunately, these linkmap programs have nothing to do with Lathrop and Ott's linkage package. Most gene-linkage programs are not on archie-registered ftp sites.
send email to archie-admin@bunyip.com with the domain-name of the ftp site and the email address of the administrator. If you are the administrator of the ftp-site identify yourself as such.
--> 5. Computational Molecular Biology- programs, documents, help/ --> 14. Upcoming-Conferences/
Jurg Ott's Analysis of Human Genetic Linkage is THE work in this area, It is available from Johns Hopkins University Press ($47.50)
J.D. Terwilliger & J. Ott, "Handbook of Human Genetic Linkage," Johns Hopkins University Press, 1994, $60. It grew out of the handouts for the linkage courses and provides detailed instructions on how to use the LINKAGE (and some other programs) on a PC.
Guide to Human Genome Computing, edited by Martin J. Bishop, and published by Academic Press (1994). It is very internet-oriented. The first chapter talks about ftp sites, etc. and Chapter 3 is dedicated to linkage analysis.($40)
E.A. Thompson: "Pedigree Analysis in Human Genetics", Johns Hopkins University Press, Baltimore and London, 1986 ($35).
K.E. Davies (editor): "Human Genetic Diseases - A Practical Approach". IRL Press, Oxford England and Washington, D.C., 1986 ($25, softbound; $40, hardbound).
Muin J Khoury, Terri H Beaty, Bernice H Cohen. Fundamentals of Genetic epidemiology. Oxford University Press 1993, Monographs in epidemiology and biostatistics, Volume 19. "A good introductory book with 339 pages (att:several mistakes)"
Please send me other suggestions.
medline is a database for searching for articles in journals. If your site is a member of NorthWestNet, you can get to medline using telnet. Just telnet to uwin.u.washington.edu and go into the library databases. It can even email you the output if you wish! Many libraries and many internet service providers have medline services online. Some interfaces are better than others (we don't even bother using the one at OHSU--it's too painful...) Your local library can probably supply you with information.
[cgochiku;2Aug94] posted this:
For those of you out there with Macs who use MEDLINE and would like a way to put those text files of downloaded references into a database, check out medline-hc.sit in the Stanford archives. It is a hypercard stack I wrote that allows fast importing of references, including the abstracts. The file is at sumex-aim.stanford.edu /info-mac/sci/medline-hc
Victor McKusick wrote a book: Mendelian Inheritance in Man. It is continuously updated online at Johns-Hopkins University (making it online-MIM or OMIM). Combined with the Genome- Data-Base, it is available via ftp at ftp.gdb.org You need to get an account. Send email to help@gdb.org for information. After you get an account, the telnet address is gdb.org The GDB www address is gdbwww.gdb.org, which has a useful but restricted version of GDB available.
Here's an old workshop announcement that might be useful:
WWW stands for world-wide-web. People set up www servers (similar to anonymous ftp servers) that you can browse through. The webspinners (people who set up web sites) include "links" to other related sites. All you have to do is click a mouse-button on the link, and you will immediately go to the other site. The CEPH www site, for example, has a link to the genethon www site. This makes it very easy for you to get related information. My favorite www site has the before-repair and after-repair Hubble telescope pictures side-by-side.
Written by NCSA (the National Center for Supercomputing Applications) this program lets you look through www sites. It can spawn viewers to look at graphical data, output sound data on your computer's speaker (if your computer has a speaker), save your "favorite" www sites between sessions, and access automated www-search-engines (which search the www for you--similar to archie).
Lynx is another world-wide-web browser (like Mosaic). Lynx, however, uses a text interface, so you don't need a fancy x-window (or MacMosaic or WinMosaic) to browse the web (this is great when I dial in from home, and only have an ascii terminal). You can find lynx (and several other web-browsers) at ftp.isri.unlv.edu in /pub/mirror/infosystems/WWW/clients. Of course, you could always use archie to find other ftp sites with lynx.
Surprise surprise, not everyone has a workstation or Xwindows, and many scientists only have simple vt100 emulation on their desktop machine. They read about www, gopher, archie, etc, but due to hardware or software limitations they can not get at any of the goodies on the net. There are not many "public access" sites that allow you to open up a telnet session and then choose from the most popular services on the net today.
Well for anyone who can open up a telnet session you can now play with the big boys even though your equipment is from the last decade.
Give the command:
telnet info.funet.fi
and you will be presented with the following menu.
The cooperative human linkage center can be contacted at:
Among other things, CHLC provides primer selection and linkage analysis via email. Information on those services can be found by sending email to:
According to Bob Stodola at chlc:
"Currently, our email server is fairly crude -- it does crimap two-point analysis and maps the data with respect to the CHLC markers. Our plan is to include a substantially enhanced version which replicates what CHLC is using in terms of data diagnostics and mapping information."
And another center:
I am David Featherston, from the Dutch EMBnet Node, where we are starting a linkage analysis service: software availability and support/advice (at first),and (if I ever get my Drosophila/F2 geneticists head wrapped around pedigrees and maximum likelihood) training and perhaps consultancy. At present, we have MapMaker/EXP 3.0b, MapMaker/QTL 1.1, Lathrop and Lalouel's Linkage programmes and Schaffer et al's Fastlink versions of ILINK, LINKMAP, LODSCORE and MLINK on offer. "On offer" means that if a user has a Genomics Package account at the CAOS/CAMM Center, they can use these programmes on our fast computers to analysetheir data sets. Anyone is welcome to contact me for information about what elseis included in a Genomics Package account, and for the details about opening one.
Their ftp site is camms1.caos.kun.nl, and the email contact is davidf@caos.kun.nl
Please send comments on database programs you use!
An upgrade from a previous version is $10 (current version = 4.4). Documentation costs $10 (get it). The full package including documentation costs $45. The best thing about (mac) peddraw is that the text file formats are included in the documetation. I have a sed-awk-sh script which converts linkage format to peddraw format, making generation of large pedigrees easy. My simple script is available via anonymous ftp at ftp.ee.pdx.edu in /pub/users/cat/rootd/convert.new
[Also see the "What is Cryllic?" question]
Any family can be used for chromosome mapping, so CEPH has picked a particular family "shape" and generated a large database with these families. Programs designed for chromosome mapping can be optimized for using these families, reducing the time needed for calculations. Only families afflicted with a disease can be used for disease-gene-mapping. As a result, programs designed for disease-gene-mapping need to be able to deal with arbitrary pedigrees. In addition, these programs need to be able to handle incomplete-penetrance.
Cathy Falk writes: There ARE a couple of versions of Liped around that work on the Sun, but each one seems to have its own developmental path (from the original), so it's not so easy to describe. We have a version that came from UCLA (Dr. Anne Spence) which we have had running on the Sun for some time. It accepts up to 6 alleles per locus, and we now want to increase that. It also has a somewhat different structure for the input files. Dr. Peggy Pericek-Vance, at Duke, has a version that accepts up to 8 alleles, but it is a modification of an earlier LIPED and is not totally compatible with our current (UCLA) version. Dr. Ott has a PC version which he thinks would be easy to modify for the Sun, and Dr. David Greenberg informed me that he has a version for DEC (VMS) machines.
The Affected Pedigree Member Method distribution contains the new APM programs, a new file conversion utility, and a histogram/statistics generator (all of which are version 2.0).
To build the entire distribution, you need C, Pascal, and Fortran compilers. A make utility is also helpful.
Instructions on building the distribution are in the file HowTo. Please read the file READ_ME_FIRST before doing anything. For an introduction to the APM programs, read the Intro file. For a list of known bugs, read the BUGS file.
By linkage data, I mean any genetic-linkage dataset, not just those for the Lathrop Linkage package. This is an important question, and I simply do not know the answer.
I've used the crimap-chrompic option, and played with xpic/phap a little bit, but I really hope some people send me some information on this topic.
Cyrillic is a pedigree editor, with facilities for including marker data, you can then ask it to interface with LINKAGE, i.e. it creates the input files for MAKEPED, and runs the whole show. It is Windows based, so input of the pedigree is very efficient. You also have a data form associated with each individual where you can store names, DNA numbers, etc. If you want I can email you version 1.11, to have a look at. They also have technical support by email from Oxford. Let me know if you are interested. I had to learn to use the program here, and teach everyone else in the lab. Just before I started working here they had bought it. I also had to learn the old way of preparing the datain files for MAKEPED, and I promise you that I will never look back. There were some serious bugs in version 1, but as far as I can tell it has all been fixed quite nicely. There are of course some features that they are still busy implementing, but it is an excellent interface with LINKAGE!
If anyone's interested, DOLINK can downcode alleles automatically. I'm not sure if it uses the same algorithm as Ott's, but it's described in the documentation along with potential drawbacks. The main use of DOLINK is to prepare files for LINKAGE etc. from a database. It's at diamond.gene.ucl.ac.uk in /pub/packages/dcurtis. I've _nearly_ got a version ready to run under X (current is only for DOS) + I will try to accelerate this if there is huge public interest.
Maxhap is the maximum possible number of haplotypes in your analysis. You multiply together the number of alleles at each locus used in a particular run (not all the loci in your dataset, just the loci you use). Remember that affection status counts as two alleles, regardless of the number of liability classes.
For example, if a dataset has the following information:
affection status: 4 liability classes Marker A: 3 alleles marker B: 4 alleles marker C: 5 allelesAnd your run includes a linkmap run between affection-status, A, and B, then your MAXHAP must be (at least) 2*3*4
Usually, there is no advantage to coding disease or loci as either binary or numeric using liability classes. Generally binary coding is more complex in that we humans have a hard time thinking that way. Some co-dominant phenotypes lend themselves to binary coding e.g. ABO bloodtypes:
A - 1 0 1 B - 0 1 1 O - 0 0 1 AB - 1 1 1 unk - 0 0 0in this case, one codes the O type factor as present in all cases except unknowns. Since one cannot distinguish AO from AA at the phenotype level one codes both genotypes as 1 0 1, presence of A and O. In reality O represents absence of both A and B. One can however not code that using 0 0, since 0 0 would be an unknown.
Use of binary codes has decrease since DNA markers have come into use, as they allow one to type an individual with respect to genotype. One can use binary codes if one has phenotypic data which does not allow one to discriminate the underlying genotype exactly and one can code it as the presence (1) or absence (0) of factors such as the A and B antigens.
Most disease locus data can be coded very effectively using affected/unaffected and appropriate liability classes. Hope the explanation is sufficiently clear.
(and another answer from Jurg Ott)
Binary factor notation allows representing loci with codominant and dominant mode (full penetrance) of inheritance while 'allele numbers' notation is good only for codominant loci. Few people use binary factor notation, they either use allele numbers for codominant loci, or affection status' notation for dominant loci (complete or incomplete penetrance). The main reason why binary factor notation is used is probably that CEPH's database is in that notation.Jurg Ott
Best approach is to specify n+1 alleles, where there are n alleles actually observed in the pedigree. Use the correct allele frequencies for the n alleles, and for the n+1th allele, use 1 minus the sum of the frequencies of the observed alleles.
Related: see: newfs (8) - create a new file system for details on default values for file systems: inode -- 2048 bytes/inode block -- 8192 bytes/block frag(ment) -- 1024 bytes/fragmentGerard Tromp notes that you can increase the speed of programs which create/access large files in the /tmp directory by creating a tmpfs filesystem. The stuff is complicated and I haven't fully assimilated/understood his email yet, so I'm not including it yet. I'll be happy to send any interested parties a forward of Gerard Tromp's email. I hope to have tmpfs information in the next edition of the FAQ.
Of course, buying more RAM will increase your speed. I've heard that increasing RAM from 16 to 32 megs will result in a large increase in speed. Increasing RAM from 32-64 megs will result in a significant increase. Increasing beyond 64megs is not particulairly helpful. Note that this data is anecdotal in nature (I haven't seen it myself), but it makes intuitive sense to me. If someone sends me some SIMMS for our sparcII, I'll be glad to test it out :-) A professor has offered to let me run a fastlink benchmark on his sparc10 with 128megs RAM. I'll post results as soon as they come in. note: I run on a sun sparcII. I'd like to hear data from people on other platforms. I'd especially like to hear data on the speed-RAM relationship.
Paging space is hard-drive space which is used as virtual RAM. Unix boxes use paging space constantly, swapping processes out to the hard-drive and into RAM constant. Remember that "paging space" is the same as "swap space". There are two types of paging-space on sun systems (and many other types of Unix systems as well): paging files, and paging partitions. Paging files are actual files (you can do an ls and find them in a directory somewhere) in the filesystem. Paging partitions are separate disk partitions, and as such are not in the filesystem.
A filesystem has two types of overhead. Consider the following output:
bigbox% df Filesystem kbytes used avail capacity Mounted on /dev/sd0a 7735 5471 1491 79% / /dev/sd0g 151399 127193 9067 93% /usr /dev/sd3a 306418 266644 9133 97% /usr2 bigbox% df -i Filesystem iused ifree %iused Mounted on /dev/sd0a 951 3913 20% / /dev/sd0g 10218 66390 13% /usr /dev/sd3a 6278 150394 4% /usr2The top df command shows the space available on "bigbox" in k. Note that, although sd3a has 306 megs, of which 267 megs are used, only 9 megs are available. This is because the filesystem saves a "10%" rainy day fund, so 10% of the filesystem is unusable. Although you can reduce this percentage (with the root password and using an arcane command), it is not recommended. According to sun's documentation, when the filesystem gets more than 90% full the speed of the filesystem will begin to rapidly drop. When you have a 100 meg paging file, there is a corresponding 10 megs of "rainy-day-fund" which you cannot access, so setting up a 100 meg paging file requires 110 megs of disk space. But when you use a seperate partition as a paging partition, no 10% rainy-day fund is necessary. 100 megs of raw disk space will give you 100 megs of virtual-RAM.
The bottom df command shows the number of inodes available in the filesystem. An inode points to files, and is part of the filesystem that you rarely need to look at. By default, when you create a filesystem in a partition, one inode is created for every 2k in the partition. The 306 meg partition has 156,000 inodes, but only 4% of them are used. I don't know how large an inode is (a quick search through my documentation failed to find it) but I would guess that an inode is 256 bytes. If that's true, the 150,000 unused inodes above are wasting 37.5 megs of disk-space. One inode for every 2k is too much. When you create a 100 meg paging file, you only use 1 inode, but that 100 megs of filesystem has a corresponding 50,000 inodes! If you create a paging-partition, you are not using a filesystem, so no inodes are necessary. In addition, when you create a filesystem, you can reduce the number of inodes to something more reasonable (like one inode for every 10k of disk space). I generally don't mess with the inode count on my / and /usr partitions, since that contains the operating system. Make certain not to reduce the default inode number too much: YOU DONT WANT TO RUN OUT OF INODES. We converted our 350 megs of paging files to paging partition, and got another 70 megs of free disk space as a result (20%)!
Unix system administration is a complex task which requires experience. An experienced sysadmin can do in minutes what it would take you hours (or days) to accomplish. In addition, an experienced sysadmin won't make stupid mistakes very often (lets see, while I was learning on-the-job I ruined our backup tape during an upgrade {luckily the upgrade was successful!}, moved a directory inside itself as root, botched email service a couple times, and spent tons of time figuring out how to accomplish simple tasks).
Most universities have small budgets for their system administrators. Many head sysadmins have recruited students to assist them. Basically the students slave away for nothing, learn tons of stuff, barely pass their classes, become unix gods, and get hired for 40k+/year if/when they graduate/flunk out. If your university has a sysadmin group like this, you can probably "hire" them to support your machine for about $6/hour at about 4 hours/week*machine. The head-sysadmin will be happy to give some money to their more-experienced volunteers, the volunteers get another line on their resume+additional experience, and you get experienced sysadmins to run your machine. In addition, most sysadmin groups have an automated nightly backup. Just think: your machine gets backed up EVERY NIGHT AUTOMATICALLY!
At Portland State University the Electrical Engineering sysadmin group has been hired to maintain the unix machines of four other departments, at an average price of $15/week*machine (no additional price for xterms!) The quality of the service is excellent (especially since the most experienced volunteers are usually the ones given the money), there is no annual training-gap as people leave (since the experienced volunteers are constantly training the new ones) and you have the entire resources and experience of the sysadmin group to help you.
Of course, test them by deleting an unimportant file and seeing if they can restore it from backups (the backup test is the most important in system administration--have you tested your backups lately?). If they successfully restore the file from backups, give them the sun-optimization list (above two questions) and watch as the most experienced volunteer turns the optimization into a recruit-training session :-) They may even have a contest to see how small they can make your kernel-configuration file!
If your location doesn't have such a group, perhaps another universtiy in town has one.
Paging space, also referred to as swap space, as well as its use can be identified by:
pstat -s (Non-root users need to use: /usr/etc/pstat -s) e.g. > sanger 1% /usr/etc/pstat -s > 11456k allocated + 3108k reserved = 14564k used, 252744k available > sanger 2%swap space can be mounted on several disk partitions, that is on several partitions on the same disk or on a partition on several disks.
e.g. > sanger 2% cat /etc/fstab > /dev/sd0a / 4.2 rw 1 1 > /dev/sd0e /usr 4.2 rw 1 2 > . > ... several other partitions removed from listing > . > /dev/sd1b swap swap rw 0 0 > /dev/sd2b swap swap rw 0 0 > swap /tmp tmp rw 0 0 > sanger 3%
The crimap utilities package contains genlink and linkgen, which converts between .gen files and linkage file. I am attempting to find an ftp site. If you know of one, let me know. I already have source. If I could find the authors, to have them authorize it, I'd be happy to put the entire crimap-utilities package on one of my ftp sites.
You can output the file in linkage format, and use link2gen (if you have it, see F2). The disadvantage here is that your marker names are seperated from your data, and its easy to make a mistake and get them mixed up. You can output the file in ped.out format and use mkcrigen. mkcrigen is a great program, which automatically transfers the marker-names with the data (eliminating one source of error). Unfortunately, I only have an executable with a hardcoded 80-marker maximum. Nobody can find the source code.
lnktocri is very similar to link2gen, and is included in the multimap tar file
John Attwood has a ceph2cri program, which reads your ped.out file and outputs a .gen file. It is available via anonymous ftp from ftp.gene.ucl.ac.uk in /pub/packages/linkage_utils. It runs on DOS machines. According to John Attwood: "Making the Unix-based system available is much more complex, as it involves many scripts, Makefiles and executables, but I'll try to do it when I have time." If you need the unix version, send me email and I'll forward a summary to John Attwood. That way he won't waste time putting together a unix version unless there is definitive interest.
There is an excellent program called Genetics Construction Kit that models fruit fly genetics - lots of features, and a pretty good interface. It comes on a CD with a bunch of other really good biology education software from a consortium called BioQuest ($89 for the CD, and its really worth it - only mac stuff though). Look around on bulletin boards for the Intro to BioQuest hypercard stack which gives their philosophy and a description of the programs they have.
Michael Bacon says:
Well, recently out of a genetics class, I can recommend a program called "Catlab." The idea is that you breed lots and lots of cats, and try to figure out what genes control the cat's coat and tail.
gen5ajt says:
We use Populus 3.x for DOS (Windows version out soonish), this is an excelent population genetics package, I couldn't recommend it too much. It's free and downloadable by ftp from somwhere.