IEUC Research — Projects and Code 9/5/10 — 4:55 EDT

Forging the Future for End Users Like You!
(Revsion 1)
Like most research groups, the IEUC moves from project to project as resources and technical interests shift, building the occasional demo system to better understand the problems facing End Users and to explore potential solutions.
Most of our development work is Web-Centric making some of our past efforts moot while continually opening up new possibilities.
We are particularly interested in Programming Language Design and have coded in just about every mainstream language out there including: PHP, Javascript, Scheme, Haskell, Prolog, Python, Ruby, Java, Clojure, and F#. We are constantly testing out new libraries, tools, and frameworks looking for new ways to integrate the work of others. Accordingly, many of our projects leverage code written in multiple languages.
THL — The Humane Login System
This Project combined PHP and Javascript code to offer a working implementation of the late Jef Raskin's recommendations for dispensing with the User Name component of conventional login systems. Jef's key insight was that since user names are essentially public information and thus provide no real security value, they could be eliminated if they were replaced with globally unique pass-phrases, which, if they were grammatically sound, would be significantly easier for users to memorize than random symbolic passwords.
Our working demo successfully implemented this idea until until our hosting provider changed servers.
Pass-Phrase Generator
BoardThis project was the core component of our Humane Login System.
The technique can however, stand on its own to generate pass-phrases for every day accounts, so we are exploring ways to deploy it in a live setting.
In a nutshell, the algorithm is very simple.
First, download or compose several word lists corresponding to basic parts of speech. Say 1,000 common nouns, a few hundred proper names of people, a list of locations, a big set of adjectives, lists of transitive and intransitive verbs, etc.
The number of lists your use and their size and obscurity will be determined by your security needs and number of users who will be using your system.
You will then create a mySQL database with tables for each part of speech. It will also have a table of MD5 hashes of previously generated pass-phrases and a table of candidate pass-phrases.
One or more grammar patterns are then chosen as pass phrase templates (e.g. adjective noun; noun intransitive verb; adjective adjective transitive verb noun). A function is written corresponding to each template which randomly fills its slots with entries from the table corresponding to each needed part of speech.
When a user requests a pass-phrase, the system randomly selects a templates which randomly generates a phrase. Its MD5 has value is then computed. If it is not found in the previously generated phrase table or in candidate pass phrase table, it is added to the candidate table and added to an array. Additional candidate phrases are generated until the array is filled at which point it is fed to the user interface. The user is then prompted to accept a pass-phrase from the list or to request a fresh batch of options.
The hash codes for any unchosen options are then deleted from the candidate phrases table as is the chosen phrase after its hash has been added to the previously selected hash table.
This approach insures that each selected pass-phrase is globally unique, making it suitable for use as a Humane-Login identifier.
PEG Board
We were the first to call for the creation of an IDE to use in developing Parsing Expression Grammars with our proposal of the PEG Board project, fortunately, the need for this work has been largely rendered moot by the release of several tools along these lines for use in the ANTRL parser generator ecosystem. We would still be interested in seeing an ANTLR independent solution, so do let us know if you'd like to pick up the baton on this work.
Here is our description of the PEG Board Project:
If you have ever taken a Compilers course or worked with Regular Expressions (a form of terse symbolic notation used to define a restricted subset of unambiguous language grammars) in Perl or some other scripting language, you know how hard it can be to get the specification for even a small language grammar just right. Indeed, even if you are just a non-technical End User who has ever be forced to look at the apparent gibberish of such regular expressions, you can probably appreciate how helpful it would be to have a better way to express these ideas.
Well at last there is a better way to define grammars that are intended to be unambiguous. It isn't suitable for parsing Natural Language, but that has never really been the issue for designers of programming and End User scripting languages.
This new formalism is called a Parsing Expression Grammar and it directly captures how a computer decides if a given input is a valid sentence in some unambiguous command language. It has a number of highly desirable properties and languages expressed using it can be automatically converted into highly efficient parsers for those languages using an algorithm called Packrat Parsing.
Even more exciting is the fact the hardest part of making use of this work has already been done by an NYU researcher who developed a PEG-based Packrat Parser Generator called Rats!
This software is written in Java and presently has no user interface. As a result, to make use of it, one needs to understand both the PEG grammar formalism as well as Java programming at a fairly advanced level. We would like to see the Java housekeeping details of using the Rats! system abstracted away and hidden behind a simple graphical user interface that would let users input PEG grammars and test their application to sample data without having to deal with Java directly.
We don't have any funding to support this work, but it would be a major contribution to the development community and lead instantly to a high level of personal recognition for a volunteer willing to meet the challenge. This work is suitable for an advanced undergraduate or graduate student to undertake and is well within the abilities of a serious Java hobbyist/open-source hacker.
If you think this is something you might want to take on, we would ask that you release your work as open source under the aegis of The Institute and to Contact Us for more details.
Survey Work
One of the drawbacks of living on the technological cutting edge is that it can prove very challenging to retain the ability look at problems with the eyes of a novice.
Likewise, it can be hard to put ourselves in the shoes of any number of domain experts and to discern the inner working of communities we don't actively participate in.
To that end, we have installed a rather full featured web survey system to collect survey data.
We are currently looking at using the system to collect examples of major computing annoyances that trigger Compute Rage. We also want to survey computer user groups and orphaned technology enthusiasts to what we can learn from them.
Zoomable Facetted Pie Menus — A Tool for Compact Inverse Parsing
This project started with the observation that many of the resources we wish to categorize lend themselves to a recursive facetted classification scheme such that facet sequencing would correspond to both a functional application of filters and a natural language description of what is being sought.
For example, let us posit a taxonomy that can deal with a set of categories like people, questions, organizations, projects, programs, books. These primary categories. Each category might be further refined with an optional set of descriptive categories like 'job title/academic rank' with a hierarchical controlled vocabulary of sanctioned descriptors like 'students:undergrad, students:undergrad:freshman, students:undergrad:sophomore, students:undergrad:junior, students:undergrad:senior,students:graduate student', where ':' denotes a subcategory combinator.
A set of typed relational combinator functions can then be defined to operate on the juxtaposition of one or more pairs of qualified base facets. A representative relations might be a function of arity 2 taking the person or organization facet as its first parameter and ANY facet as its second parameter (including a repetition of its first parameter) to specify a "who study" filter.
By sequencing facets, descriptors, and combinators, one can construct highly refined queries against a sparsely populated n-dimensional data space. This can be achieved on the command line by making descriptors prefix combinators with a higher precedence that relational combinators which would be parsed was left associative infix operators.
The long standing drawback to such an approach has been the requirement that a user memorize the taxonomy before hand.
But we can combine a zoomable user interface with a pie menu to create a compact inverse parse to generate grammatically correct sentences in such a classification language.
It could work something like this: a pie menu with three concentric rings would be drawn with a thin outer and inner borders. The innermost circle would display the database entity count. The outer and inner rings would initially be unlabeled. The middle ring would be formatted as conventional pie menu listing the base facets.
Selecting a base facet would cause the center count to reflect only that subset of the database. It would also reformat the outer ring with descriptor facet labels and the inner ring with relation names. Selecting (either through a modified click or prolonged hover) a descriptor in the outer ring would cause the inner border to thicken and zoom the pie so the innermost ring would contract away, the descriptor ring become the middle ring and a new outer ring of descriptor values would appear. Selecting one would further contract the menu to zoom the next level of descriptor hierarchy into view, which updating the hit count along the way. Mousing over the thick inner border would reverse the zoom, thickening the outer border and zooming the descriptors out of view such that the outer ring would contain the base facets, the middle ring would list relations and the inner ring would list those base facets with entities associated with the initial qualified facet by the selected relation.
If a large number of labels appear in any ring, a fish eye zoom could be applied to further compress the display allowing arbitrary complex queries to specified in a fixed area. A re-selection of the relation target facet would zoom the inner border between the relation and target facet to allow descriptive facets to be inserted. A synchronized text field (with a recent queries history popup) located below or along side the pie menu would reflect the texualization of the query, which a user could directly enter to pre-populated the zoomable faceted pie menu, pending graphical refinement.
At each stage of query expansion, any sterile branches (i.e. potential descriptors and relations with no corresponding database entities would be pruned. Likewise, the same interface could be invoked to assign classifications to a preselected entity or set of entities.
A cross platform java-based implementation of this approach is theoretically possible; however we are currently leaning toward a web based proof of concept contingent on the availability of css-based rotational transformations in the next generation of browsers.
We are currently seeking volunteers to implement this exciting new design!
(We would like to credit S. S. Ranganathan for his pioneering work on Faceted Classification, Chris Crawford for the idea of Inverse Parsers, Ben Bederson for his zoomable user interface work, and the graphic design team of the StarGate science fiction franchise for suggesting the possibility of using a sequence of pie menu selections to specify coordinates in a higher-dimensional space.)
Epimarkup — Semantic Web Authoring for Everyone
Authoring content for the today's web is unduly complicated in the absence of a heavy weight content management system like Drupal or WordPress that can abstract away all of the underlying markup and scripting complexity. In the coming era of the Semantic Web where hidden markup is expected to provide machine readable representations of the substance, users will become even more dependent on markup generator services with inconsistent baroque graphical interfaces.
Yet even when such tools are used to create markup with syntactically valid microformats to carry such metadata payloads, the results will be treated with suspicion by legacy systems whose security filters are apt to strip them out, thus defeating all of the effort that went into their initial production.
This is just one facet of what is wrong with today's web which is still missing tremendously important functionality that has been available in proprietary and research hypertext systems for decades now — things like bi-directional links and Natural Language URI's that would transparently be updated if underlying files move.
Our working thesis is that we could readily support all of this missing hypertext functionality as well as a more designer-oriented layout and behavior language, without breaking any of our current Open Web infrastructure. We could also enhance accessibility by having the code for these features make sense to ordinary people if read as explanatory text. Moreover, accessibility features could be framed in terms of Use Cases invoked via an interactive dialog, rather than as annotations on GUI elements.
This would allow a script to pop up a command line or generate drop down menus to support a dialog like this:
User: What can I do here?
System: You can access a page summary, create a user account, login, or review one of several info widgets.
User: What are the widgets?
System: An event calendar, local weather, and a horoscope.
User: Show me my horoscope.
In terms of implementation, commands could be stock English phrases like "Display the Photo of Big Ben", "Accept requests for notification if this page changes", or "Describe this graphic in detail as ......." which could be detected by a script running on the web server or in an End User's Browser which could then replace them with corresponding HTML markup tags that would work with other scripts and browser plugins.
We call this approach Epimarkup.
Epimarkup uses Quasi Natural Language expressions (i.e. stock English phrases) to formally represent semantic content and control the expression of implicit markup. Because it is not conventional markup set out from text in angle brackets it can pass unmolested through legacy systems before being transformed either on the server or in the client into conventional microformat oriented markup. The injected microformat markup can then be detected and exploited by today's browser extensions like Operator and Zotero.
At its simplest, a phrase like "Frame this in a simulated woood frame decorative table." could be detected by a client side javascript which would inject a 3 row, 3 column table with a class attribute set to "epimarkup-generated-simulated-wood-frame-decorative-table" around the element containing the epimarkup. However, a screen reader would simply pass the text through, allowing a blind web surfer to know that the following text would have been displayed inside a wooden frame graphical border.
A structured phrase like "Johnny Nemonic is the title of a book written by William Gibson and it is the title of a movie adaptation of that book." could generate two hyperlinked machine readable bibliographic records.
Because the discourse domain for epimarkup is highly constrained in any given instance, it lends itself to simple brute force parsing of cookbook recipes.
We already have some simple in-house proof of concept test cases functioning and are looking for volunteers to help us expand this project into a useful Open Source library.

