CS410 Bahasa Indonesia Dictionary: Project Overview and Status
John L. Whiteman
Dictionary Prototype
Bahasa Indonesia Dictionary
Dictionary In Action
www.studyindonesian.com
Project Background
For the past year and a half, I've been working on a personal website
project to teach people how to read, write, and speak Bahasa Indonesia.
The domain is www.studyindonesian.com,
but it isn't updated yet. I've already built lessons, homework, exams,
and games (javascript and Macromedia Flash MX). I hope to get it all
online by the end of the year.
What the site really needs is a comprehensive learner's dictionary.
Therefore, I've decided to build one for my open source project for
CS410 at Portland State University. The latest prototype can be found at
here. The reality is that the dictionary will take months if not
years to become really useful. The most important component of this
project is not the code, but the data that goes in the dictionary.
I expect people will download the project for the data files only and once
they get to an acceptable level of accurate entries (10,000 or more).
This is a public domain project.
Therefore, the intent of the code is not just to provide an interface for
people to search for words, but to provide a comphrensive management
system that supports the overall growth and quality of the dictionary.
[top]
Project Requirements
- A web frontend for people to search (exact and fuzzy) word(s). The
searches can be done English to Indonesian or Indonesian to English.
- Linear translations of fuzzy searched words that don't fit exact
matches
- Comprehensive list of words in the dictionary (on going). This list
can be uploaded as an ASCII file and implement anyway the user wants
without my code.
- A dictionary (database) management system that supports the enhancement
of the dictionary.
- Routines that allow the insertion of new words as well as checking
for duplicate entries
- Routines that track what words are missed. Those words are placed
in a database table and incremented during each encountered. A management
system will be in place to monitor these missed words and will allow the
administrator to easily insert them into the dictionary if they are valid.
- Routines that track what words are found (exact matches). Again
this may be useful to tracking categories of words that most common so
as to focus more in those groups.
- Routines that also update and delete existing entries
- Routines that track web traffic
- Routines that allow addition of more administrators
- Routines that allow third-party uploads of dictionary text files.
For example, someone may create entries for things like business or
technical or slang or adult words that are not easily found in
publications
- Routines that backup the database from time to time as text files.
- Routines that provide mail support
- Routines that allow people to request words be added to the dictionary
- Routines that support various statistics
- The database will be mySQL
- The frontend and backend will be PHP
- If time permits an independent PERL frontend and backend will also be coded
- Code will be written in object-oriented fashion
[top]
Project Status
August 15, 2003 [HAPPY BIRTHDAY BULE!]
- Need to fix linear to not do case sensistive searches
- Need to modify linear to extract punctuation word for word not as a whole
- Linear should recognize urls and if so then apply them!
- Need to build backup development script
- Need to build task manager next...include bugs
- Entries should include nya?
August 14, 2003
- Checked in source code using CVS at source forge
- Installed application at
http://bahasa.sourceforge.net/bahasa/search.php
Found a few problems with session variables which I think are fixed.
- I want to change how I handle page refreshes to prevent duplicate
actions to the database. getuid() is a possible alternative. It will
be grunt work to do this but at least the sessions will no longer
cause the annoying page expiration errors
August 12, 2003
- Check in source code at siruis.cs.pdx.edu:/disk/velveeta/oss/cvs/bahasa
as 'bahasa' using CVS [DONE]
- Build upload/replace file page [DONE]
- Build tasks management (todo list) feature
- Check in source code at source forge using CVS
- Finish linear statistics
- Build download page...Can't use FTP because most window boxes don't
have ftp server by default
- Build alphabet editor
- Build pronunciation editor
- Build word category manager
- Copy website to source forge website
August 06, 2003
- Give presentation to class [DONE]
August 04, 2003
- Build linear searches [DONE]
- Include phonetic table
- Build about the Indonesian Language Page (alphabet, phonetic) [STARTED]
- Merge hits table statistics into dictionary statistics [DONE]
- Build upload
- Build download
- Add indonesian translation for parts of speech
- Add remote address security feature for logins [ALMOST DONE]
- Include maybes statistics, not exact and not misses [DONE]
August 03, 2003
- Add/delete administrators [DONE]
- Build dictionary statistics [DONE]
- Build database statistics [DONE]
- Build PHP statistics [DONE]
- Build visitors statistics [DONE]
- Enhance searches statistics [DONE]
- Build super user login [DONE]
- Build straight sql interface for super user [DONE]
- Add remote address security feature for logins [ALMOST DONE]
- Finish fuzzy searches [DONE]
- Include insert hits in search routines [DONE]
- Include misses in search routines [DONE]
- Add parts of speech help to client insert [DONE]
August 01, 2003
- Build request form (very simplified form of insert) [DONE]
- Build requests form (something like misses) [DONE]
- Build database create configuration script [DONE}
- Once insert miss happens delete entry in db [DONE]
- Once request miss happens delete entry in db [DONE}
July 31, 2003
- BUG: Qualify full paths in class files (windows is too stupid) [DONE}
- BUG: For search check status before checking offline
...go offline if 1 [DONE]
- Enhance admin.php to give suggestions if database is offline [DONE]
- BUG: Double slash windows include paths even though PHP is smart
enough to figure it out [DONE]
- Test config.pl on windows and unix [DONE]
July 30, 2003
- Build fuzzy searches for everything...see like searches [DONE]
- Build email should send just email [DONE}
- Test config.pl script under UNIX and Windows for portability [PROBLEMS]
- Include alphabet table [DONE]
- Add hits calls in search [DONE]
- Include Indonesia calendar [DONE]
July 29, 2003
Build parts of speech editor interface [DONE]
Build interface to set dictionary on/offline [DONE]
July 28, 2003
- Build update entry interface [DONE]
- Make admin smarter and sharper [DONE]
- Handle database and administrator login restrictions [DONE]
- Build e-mail form [DONE]
July 24, 2003
- Finish new insert [DONE]
- Finish replace insert [DONE]
- Add change password for administrators [DONE]
- Add new administrators [MAYBE NOT]
- Dictionary should look for data sql files [DONE]
- A backup should be provided for each table and all [DONE]
- Build routine to perform backup as mysql...overwrite old backup? [DONE]
July 17, 2003
- Cleanup Dictionary.pm to use new environment [DONE]
- Make search.php work again [DONE]
- Add descriptions for searches [DONE]
- Add sequential translation radio button [DONE]
- Add hits and misses routines in Dictionary.pm [DONE]
July 16, 2003
- Finish PHP file configuration portion of perl script config.pl [DONE]
-
July 15, 2003
- Found out the some windows configurations don't support some of
my require_file schema for PHP. This means that all require statements
must be absolute for the given platform. Therefore, I started writing
a backend perl setup script that will configure these files before
showtime.
- Some good came out of this though because the Dictionary.pm can
be greatly simplified to only care about the installation path.
It will then assume the default files are present for it to continue.
The user, if not use the setup script will only have to hand edit
the Dictionary.pm file to give installation directory. The downside is
that because windows requires full paths then so be it. Each frontend
script will have to hand edited to qualify path...if user doesn't use
setup script. The setup script is being written to be hopefully
platform independent.
July 14, 2003
- dictionary.pl contains redirect depending on results of
check_dictionary: 0 OK, 1 DB IS DOWN, 2 DB NOT INITIALIZED,
3 DICTIONARY TURNED OFF, 4 NO ENTRIES IN DB
- Change directory structures to match /bahasa/indonesia/php
- Build uninitialize using unschema.sql [DONE]
- Build administration table with offline flag and set in
initialize_dictionary [DONE]
- Insert default user and crypted password in
initialize_dictionary [DONE]
- Start check_dictionary [DONE]
July 13, 2003
- Use crypt to store and retrieve database passwords [DONE]
- Start dictionary.sql core data file [DONE]
- Research pronunciation guidelines [DONE]
July 11, 2003
- Build generic load and send member to allow uploading of
external SQL files. Breaks each SQL statement by semi-colon and
explode into array. PHP mysql_query doesn't handle multiple
SQL statements per invocation. This is a way around it. [DONE]
- Put configuration parameters in Dictionary.pm. [DONE]
- Move all default data to mysql directory as .sql files. This includes
schema.sql, parts_of_speech.sql, drop_tables.sql, and dictonary.sql [DONE]
- Create drop table SQL file [DONE}
- All methods return error strings when applicable, let client figure it
out. Also stamp each error string with class method name. [DONE]
- Add api comments [DONE]
July 10, 2003
- Build exact match routine to handle searches prior to inserts [DONE]
- Handle insert.php duplicate insertion when page is refreshed [DONE]
- Add count sequence in search page [DONE]
- Enhance menu item to not disable link when flag is set [DONE]
- Begin commenting code [DONE]
- Finish insert method in Dictionary.pm [DONE]
July 09, 2003
- Finish exact search routine [DONE]
- Finish logout page (session unregister) [DONE]
- Create a purge array function specifically for results [DONE]
[top]