MAKING THE GUI TALK New technology holds promise for blind and learning-disabled people who live in a GUI-oriented world AUTHOR: Richard S. Schwerdtfeger More than 120,000 people in the U.S. are legally blind, according to figures put out by the American Foundation for the Blind in New York. In addition, the National Institutes of Health in Washington, D.C., states that 10 percent to 15 percent of the U.S. population have learning disabilities. For some years now, to communicate with the sighted world, blind people have used computers via screen-reader systems (SRDs) that can access a physical display buffer of characters and orally present the information (see "Opening Doors for the Disabled," August 1990 BYTE). In the past few years, however, the widespread trend toward the development and use of GUIs has caused a multitude of problems for the visually impaired and some learning-disabled people. The advent of GUIs has made it virtually impossible for blind people to use much of the industry's most popular software and has limited their chances for advancement in the now graphically oriented world. The coming of GUIs has left some SRDs with no means to access graphical information. GUIs do not use a physical display buffer, but instead employ a pixel-based display buffer. Fortunately, new technology is coming into existence to correct this problem. Companies are now enhancing SRDs to work by intercepting low-level graphics commands and constructing a text database that models the display. Developers have created new screen-reader software and combined it with a common user interface. One prototype of this type of application is Screen Reader/PM from IBM. It is an enhancement of the original DOS Screen Reader that the company brought out four years ago. The research project has a current installed base of more than 40 beta-test sites. With a system like this, the disabled user can maneuver a mouse over the display and use the keyboard or a separate keypad, and a voice synthesizer will actually describe an icon the GUI has displayed or the graphical text shown on the screen. This new technology has alleviated many of the problems that GUIs created for blind and some learning-disabled users, and their future now looks much more promising. Recent technological developments are putting the computer back into the hands of blind and learning-disabled people. Also spurring development of adaptive technology for the disabled are two new laws. An amendment to Public Law 506 (29 U.S.C. 794d) states that a company can sell a product to the U.S. government only if that device is accessible to people with disabilities. The Americans with Disabilities Act, signed into law by President Bush in 1990, states, in part, that if your company has more than 25 employees, you must provide reasonable accommodation (including technology) in the form of job adaptations for the handicapped. Taking the First Step A package called OutSpoken is available that reads and vocalizes a GUI. It was designed and developed in 1989 by Berkeley Systems (BSI) for the Macintosh GUI. OutSpoken is a screen reader that communicates through voice synthesis with blind users as they move the mouse around the GUI and come in contact with text and graphical objects on the screen. BSI realized that it had to construct a database that the screen- reader software could access. This database, called an off-screen model (OSM), is the conceptual basis for GUI screen readers that vendors are developing today. The advent of OutSpoken was a significant breakthrough for the blind community. Other companies and software developers are considering how to use this new technology to make their systems accessible to the disabled. I will define what an OSM is, how it is created, and how SRDs can access and use it. The information in this article is based on my work on the Screen Reader project at IBM's T. J. Watson Research Center in Yorktown Heights, New York. Off-Screen Model An OSM is a database reconstruction of what is visible on the screen and also what is not visible on the screen. A database must manage information resources, provide utilities for managing these resources, and supply utilities to access the database. The resources the OSM must manage are text, off-screen bit maps, icons, and cursors. Text is obviously the largest resource the OSM must maintain. It is arranged in the database relative to its position on the display or to the bit map on which it is drawn. This situation gives the model an appearance like that of the conventional display buffer. Each character must have certain information associated with it: foreground and background color; font family and typeface name; point size; and font style, such as bold, italic, strike-out, underscore, and width. Merging text by baseline (i.e., position on the display) gives the model a physical display buffer appearance with which current SRDs are accustomed to working. Therefore, to merge associated text in the database, the model combines text in such a way that characters in a particular window have the same baseline. Each string of text in the OSM has an associated bounding rectangle used for model placement, a window handle, and a handle to indicate whether the text is associated with a particular bit map or the display. GUI text is not always placed directly onto the screen. Text can be drawn into memory in bit-map format and then transferred to the screen. It can also be clipped from a section of the display and saved in memory for placement back onto the screen later. An example of this is a drop-down menu: Text in the area to be overlaid by the menu is transferred to memory and later tranferred back onto the screen when the menu is deleted. The operation required to transfer bit maps is called a BitBlt, or bit block transfer. GUIs use BitBlts for speed. For example, in a drop- down menu, it is much faster to transfer the pixels cut from the screen back onto the screen than to have the application redraw or "paint" that portion of the screen from scratch. Icons are pictorial representations of an idea or object. In the GUI, icons represent many things, ranging from programs that are currently running to objects that receive user input to perform a specific action (e.g., a push button or check box). Icons are read, identified, and then stored as single text images or as completely separate, nontext entities. People who are blind need to be able to use icons the way they use text-in other words, they need to be able to hear a verbal description of each icon. When an application is designed without a provision for keyboard access to particular icons, the blind user needs a mouse to locate the icons. A screen-reader application must be able to track mouse movements, match their positions with OSM icons, and vocalize the associated icon names. The OSM must also provide cursor information so a screen reader can vocalize the user's cursor position on the display. In windowing systems, more than one application can be displayed on the screen. However, your keyboard input is directed to only one active window at a time. But each active window is often made up of many child windows. The applications cursor is placed in at least one of these child windows. The OSM must keep track of a cursor's window identification (i.e., handle) so that when a window becomes active, SRDs can determine if it has a cursor and vocalize it. An SRD cursor is the area on the display where the next potential action will occur or where users can enter their next text. An example of a cursor is the blinking insertion bar in an editor or the rectangular highlighting text in a menu of options. The OSM must keep track of the cursor's screen position, dimensions, the associated text string (i.e., speakable cursor text), and string character position. If the cursor is a blinking insertion bar, its character position is that of the associated character in the OSM string. In this case, the cursor's text is the entire string. A wide rectangular cursor is called a selector, since it is used to isolate screen text to identify an action. Examples of selector cursors are those used for spreadsheets or drop-down menus. The text that is enclosed within the borders of the selector is the text that the screen reader would speak. Construction Toolkit Now, I'll build a toolkit of utility functions that you can use for constructing and maintaining the resource information. The software performing the display reconstruction is driven from low- level GUI graphics functions. Therefore, with some exceptions, OSM utilities used to construct and maintain the model appear like operations being performed by the graphics engine. These operations are performed on text or database representations of bit maps, cursors, or icons, as opposed to pictures or actual bit maps. The utilities required to handle text are clipping, erasure, text merger, and text transfer. These tools must handle icons as well as text. Developers do not normally worry about clipping text to the windows in their GUI applications. Text running outside the window must be clipped off by the low-level graphics routines, and the code constructing the model must also perform the same operation. Therefore, the model must provide tools to clip the text to the clip region supplied by the software constructing the model. This region is represented by an array of rectangles defining the visible domain. You can use text-erasing tools when placing one window over another or when you reduce the size of a window. You can use text-merger tools when you combine text with other text having the same handle and screen baseline. Simply put, text is merged into an OSM bit-map representation using common baselines. Performing a BitBlt requires tools to transfer functions for text and cursors-for example, if you are moving a window from one part of the display to another, or when text that was previously drawn to a memory bit map is transferred to the screen. Therefore, tools are also required for moving text between the visible display portion of the database and the nonvisible off-screen portion where OSM representations of bit maps are stored. System Dependent Now, on to building the model. The construction code, called the patch code, is operating-system dependent, because specific patch-code function calls are patched or hooked in with a subset of the low-level graphics calls. The calls to hook are those that draw text; perform BitBlts; save bit maps to display card memory; restore bit maps from display card memory; select and unselect bit maps for drawing; delete bit maps; map display buffers to windows; draw rectangles, boxes, borders, and regions; and update cursors. For each of these graphics calls, you have a corresponding patch function call. The job of these patch functions is to directly mimic the low-level calls by performing the same operations on the model. For instance, the instruction drawtext would draw text on the screen as well as merge text into the model. How you hook these calls depends on the GUI system architecture. X Window System is based on a client/server model. X applications (clients) communicate with the X server by sending and receiving messages over a socket. When the X server initializes, it creates a socket that clients use to establish new connections and returns the socket address to the calling environment. When a new client starts, it uses that socket address to connect to the X server. To establish a new connection, the client connects to the socket and receives a new socket to use for communication. When the X server accepts a connection, it receives a new socket connected only to the new client, leaving the original socket open for new connections. SRD patch software intercepts this communication by changing the socket address that clients use to connect to the X server. New clients use the modified socket address and connect directly to the screen reader instead of the X server. The SRD then connects to the real X server on behalf of the client and manages all socket I/O between the client and server. Screen Reader's View Regarding the screen reader's interface to the model, the OSM must notify the SRD of cursor changes and window updates. Cursor changes and window-update notifications are performed either by setting flags, in the case of polling, or by various means of interprocess communication (IPC) on multitasking systems. OSM change notifications result in the SRD's referring to the OSM to check for important updates. Therefore, the SRD software writer needs to reduce the OSM viewing area to a particular application or window. This area of the model is called a view, and a subset of this view is called a viewport. OSM software must provide tools to set up and maintain views and viewports. An example of the way you use views and viewports is the Lotus 1-2-3/G spreadsheet shown in the screen. To construct a view of the spreadsheet, the SRD must pass the OSM the spreadsheet's bounding rectangle and all window identifications composing the spreadsheet. The OSM then extracts all view text and sorts it by baseline. The user then might ask the SRD to isolate a specific spreadsheet column for ease of use. Upon such a request, the SRD vocalizes each column (i.e., viewport) entry. Turning On the Lights Here's an example of how text drawn by an application GUI program is spoken by a screen-reader program. First, an application uses a standard application programming interface call to draw "Hello World" in an OS/2 Presentation Manager (PM) window. The application could use a function called WinDrawText to place the text in the window. The WinDrawText function results in a low-level graphics engine call to GreCharStringPos. For GreCharStringPos, there is a corresponding call to the patch-code function OsmCharStringPos, passing it the same parameters. PM is very helpful in providing many query functions needed to build an OSM node before placing it into the model. The construction code uses functions such as GreGetClipRects and GreQueryFontAttributes to obtain clipping and font information. When the necessary information is acquired from the Gre functions, the patch code then determines the text colors by performing its own color mixing. Next, a database node is constructed. OSM utilities create the node; add the font, color, spacing, text, and text bounds; and clip the node to the region. If the text is still visible, a window handle is retrieved from PM and placed in the node. Finally, OSM utilities are used to merge the text into its proper place in the database-in this case, the visible (displayed) portion. After the text is merged into the OSM, the SRD is notified. Assuming that "Hello World" is in the current view, on the next keystroke command, the blind user could hear the SRD speak what is in the window. If "Hello World" were the only text to be displayed in the window, the user could program the SRD to speak it automatically on update. Twisting Screen Readers' Arms This is a good time to discuss the effects of making the current screen readers work in the new GUI system environment. Modifying current screen readers to accommodate the new GUI software is no easy task. Most of these systems run under DOS as TSR programs. Unlike DOS, leading GUI software operates in multitasking environments where applications are running concurrently. The SRD performs a juggling act as each application gets the user's input. Since an SRD is a new multitasking application itself, it must use forms of IPC, like queues. PM and Windows use message queuing so that windows can receive keyboard and mouse input and communicate between other windows. A program remains dormant most of the time, until it receives a message. Operating systems like OS/2 provide for multiple full-screen sessions as well as a GUI. Due to the protected-mode environment of systems like OS/2, SRD developers must now write device drivers to access the protected full-screen display-buffer memory of non-GUI applications. With macros or a full-profile language used by current fully functional SRD systems, you can program how each application is read to the user. The blind user or the SRD developer must write profiles to employ multitasking constructs and be able to distinguish GUI constructs like window classes for various system and application controls. Writing new profiles for these systems will require major rewrites to the existing software base and is a hair-raising experience. If you resize the window, text or child windows within the frame may disappear. The writer can't always count on the text or icons remaining in the window. Capabilities and Shortcomings With the new technology, screen readers will be able to work with most popular GUI software packages. These include the new GUI versions of Microsoft Word and Excel or Lotus 1-2-3 for Windows and PM-any application that relies mainly on text or icons. The new screen-reader world still won't be a utopian one. The disabled will have problems with packages such as CorelDraw, which draws pictures as well as text. It includes no way to describe the pictures to the blind user. Disabled people will also have trouble with other packages that cannot tell them the location of graphical images on the screen. In addition, some software packages construct screen text from lines rather than from fixed fonts. In this case, the application performs the drawing instead of placing a fixed font on the screen. Access to this text requires that character recognition be built into off-screen models. What Else Is Needed? To fully exploit the new technology, developers should make OSM libraries accessible to the public. Access to an OSM would allow screen-reader developers to tailor their systems to people with learning disabilities. As an example, a modified screen reader could say "file disposer" rather than "wastebasket" for the Macintosh GUI, an enhancement that would help learning-disabled people who have problems translating symbolic representations into meaning. Other screen-reader companies will want to develop their own OSMs. Developing the hook code for AIX or PM is a very large undertaking. Therefore, GUI systems should provide hook facilities that allow applications to intercept the low-level graphics calls and construct their own OSMs. Release of this new software and promulgation of the new laws will motivate computer companies to develop more accessible systems. But there is still much more to be done. Sneak Preview For the past year, I have been part of a project team whose goal is to convert IBM's Screen Reader for DOS to Screen Reader OS/2 and PM. In particular, I've been involved in developing PM's OSM. Commenting on the destiny of Screen Reader/PM, IBM's Jim Thatcher, research staff member and project manager, says, "Currently, Screen Reader for PM is an IBM Research Division prototype being used on the job by more than 40 people in IBM and other companies. It hasn't yet been determined whether or not it will become a product. However, we are demonstrating it at major conferences, including The World Congress on Technology in Washington, D.C., this month, and it is entered in the Johns Hopkins National Search for Computing to Assist Persons with Disabilities." (See the text box "Chaotic Progress" .) IBM's Screen Reader/PM is the first fully functional SRD for a GUI. It is fully programmable. Project team members have completely rewritten the profile access language to accommodate PM and OS/2 multitasking constructs. Screen Reader/PM automatically switches profiles as the blind user switches between applications, either on the PM GUI or in full-screen sessions. The group is also porting Screen Reader/PM to AIX and X. Screen Reader/AIX is to be the first GUI screen reader for a Unix-based system. With this software, the blind user will also be able to work with Motif. Development of Screen Reader/AIX marks the beginning of a portable OSM between two large multitasking operating systems. Berkeley Systems is rewriting OutSpoken for Windows 3.0 and developing a portable OSM called the GUI Access Toolkit. This toolkit will make it possible for third-party developers to create and market new access software for Windows and the Macintosh without having to develop their own OSMs. These efforts forecast a new era of independence for the visually impaired community. For the first time, blind and visually impaired people will be able to work with multitasking systems. They will be able to perform tasks such as formatting a disk and running a spreadsheet calculation while they are logged onto a mainframe and listening to the daily announcements in a window. With software like IBM's Screen Reader/AIX, blind people will be able to use workstations. This access will open many new jobs now unattainable by visually impaired people. ---------------------------------------------------------------------- Richard S. Schwerdtfeger is an independent software consultant specializing in OS/2. He is working with the Screen Reader/PM development team at IBM's T. J. Watson Research Center in Yorktown Heights, New York. You can reach him on BIX as "rschwer." ---------------------------------------------------------------------- Hearing Graphics for the First Time ----------------------------------- by Joseph J. Lazzaro As computers become more graphically oriented with each new product release, vendors must forge tools so that blind people can access image-intensive systems. Screen Reader/PM may be the instrument with which visually impaired and learning-disabled people gain equal access to the GUI. As a computer user who is blind and responsible for adapting job sites for a state and federally funded computer access project, I have tested nearly every screen-reader system (SRD) on virtually every text-based computer platform. Until now, most systems have been mute in graphical environments (see "Windows of Vulnerability," June BYTE). But new technology has surfaced that may help the visually impaired and learning disabled to operate in a GUI-oriented world. Screen Reader/PM from IBM is a breakthrough speech package. The company sent me a system and gave me the chance to check out a beta version of the software. I evaluated Screen Reader/PM's capabilities using an IBM PS/2 Model 70 equipped with 4 MB of memory, a 70-MB hard drive, and an Accent stand- alone voice synthesizer from Aicom (San Jose, CA). Screen Reader/PM also works with other synthesizers on the market, so you can choose the voice unit that best suits the application at hand. Installation is relatively easy. It involves copying the speech software to an appropriate subdirectory and rewriting the CONFIG.SYS and STARTUP .CMD files. I had to connect the voice synthesizer to one of the serial communications ports-a trivial job, to say the least. Screen Reader/PM comes bundled with a small keyboard, about the size of a numeric keypad. This keyboard plugs into the mouse port on any PS/2 computer and lets you control all aspects of the SRD. There are keys for reading lines, words, and characters, as well as commands for reading entire windows. Once I completed the installation, I cranked up the software, which gave a voice to OS/2 and Presentation Manager. When the synthesizer came to life, I was amazed to find that, for the first time, I could actually use an IBM-based graphics-based system. As I moved the arrow keys, the synthesizer crisply verbalized each highlighted menu selection. I found Screen Reader/PM to be quite responsive, with little or no lag time between keystrokes and verbalizations. This is very important for a speech program, as a sluggish package would slow down productivity. When I began further exploring the external keyboard, I discovered a dedicated help key that gave me verbal verification of every key in the system. This keyboard has both positive and negative implications. It prevents conflicts with applications that seize control of the main keyboard, an occasional problem with terminal-emulation hardware and software. But blind users who are more comfortable keeping their hands firmly fixed on the home row to control voice functions may at first find the use of a secondary control board slow and painful. I was pleased to find that you can bypass this external keyboard and use the numeric keypad on the main keyboard if you wish; this is a feature that will satisfy a wide range of speech users. All in all, I found the system fast and responsive, although there are a few rough edges in the current beta version. This is to be expected, especially when you consider that Screen Reader/PM is the first graphics-based speech package for the family of IBM computer platforms. The program lives up to its claims -- to allow blind people equal access to the OS/2 and Presentation Manager GUI, as well as to a host of programs running under OS/2. Screen Reader/PM may also help learning-disabled users who have difficulties with graphical representations of text. According to IBM, OS/2 2.0 will be able to run Windows applications; this may eventually give blind and learning-disabled people access to that graphics-based environment. I think Screen Reader/PM will evolve into a stable and highly reliable access package, one that will provide many disabled users with the freedom the coming of the GUI almost took away. ---------------------------------------------------------------------- Joseph J. Lazzaro is the cofounder of Talking Computer Systems in Cambridge, Massachusetts. He is writing a book on assistive technology for the American Library Association and is project director for the adaptive-technology program of the Massachusetts Commission for the Blind. You can reach him on BIX as "lazzaro." ---------------------------------------------------------------------- Chaotic Progress ---------------- by Janet J. Barron Suppose that, for some reason, society confined you to a self- contained world and tagged you as "disabled" or "handicapped." Although your mental capacities were OK, you couldn't travel around unaided or get a job to support yourself. With the advent of the microcomputer and devices that let it speak, however, you qualified for a job, and life changed for the better. Then along came the GUI, a technology that quickly became popular and pervasive in the enabled world (see "Windows of Vulnerability," June BYTE). But because it couldn't talk, you began to wonder if you'd have to return to the kind of isolated existence you knew before GUIs. It was a pretty scary thought. Now organizations and people have begun to develop ways to provide access to the GUI. And therein lies the most recent good news regarding adaptive technology. "Talking" graphics are on their way. Enhanced screen readers will allow those disabled people who were almost shut out of the computing environment by the GUI's coming to do their jobs and keep up with the changes in technology. For a good while now, several companies have been working in the field of adaptive technology (e.g., Berkeley Systems of Berkeley, CA, and Dragon Systems of Waltham, MA). Also among the organizations involved in these efforts are the University of Wisconsin's Trace Research and Development Center and Johns Hopkins University's Applied Physics Laboratory in Laurel, Maryland. As a result of increased activity in assistive technology, some great things are happening. Ten years ago, Johns Hopkins held a competition to motivate people to create innovative computer technology to help the disabled. The competition was very successful, and many of the ideas entered in the contest have become mainstream products and programs used by people who may be blind, deaf, learning disabled, developmentally disabled, or physically disabled. The computing world has changed dramatically in the past 10 years. So Paul Hazan, the Johns Hopkins project director who directed the first national computer search, is doing it all over again this year. This year's contest is cosponsored by the National Science Foundation and MCI Communications. Besides the top prize of $10,000, numerous other awards and recognitions include smaller amounts of money, computers, and certificates. Winning entries will be exhibited at the Smithsonian Museum in Washington, D.C., in February. The main payoff, however, won't be the money or the other prizes. It will be new and innovative technology developed to help disabled people actualize their potentials. The Trace Center works in several areas to address the communication needs of people who are nonspeaking and have other severe disabilities. Much of the work at Trace is directed toward studying ways in which nonspeaking and physically impaired people can converse and write using high-technology aids. Trace scientists are also researching and establishing standards for the control mechanisms used to operate computers, communication devices, and home environmental controls, and are investigating ways to make computers and other electronic equipment more accessible to disabled people. Gregg Vanderheiden is director of the Trace Center and a specialist in assistive technology. Early on, Vanderheiden realized that blind and learning-disabled computer users lacked ways to access GUIs. Since then, he has devoted a great deal of his time and efforts to making GUIs accessible to the disabled. Also working in the area of technology for disabled people is Joe Lazzaro, project director for the adaptive-technology program at the Massachusetts Commission for the Blind. His full-time job is trying out technological devices and matching and fitting them for disabled people so that they can obtain jobs and become self-sufficient. Joe writes for BYTE (see "Opening Doors for the Disabled," August 1990 BYTE), is a whiz-bang computer programmer and user, and is finishing up a book devoted to the subject of adaptive technology. Because he is legally blind, this subject is important to him-adaptive technology actually did open up his world. Now, because of computers, he can help others become enabled. One man whom Joe and technology have helped become productive is Dennis, who is blind and wheelchair-bound, is missing fingers and a leg, and has other complications caused by diabetes. Joe's first adaptation for Dennis was speech output for MultiMate on his DOS-based computer. Because Dennis's job is as a counselor (ombudsman) to help disabled people obtain services and get help, he must make phone calls and write effective formal letters to service companies and vendors. He must also prepare reports regarding clients and contacts that go to his supervisors once a month. Because of Dennis's missing fingers, speech capabilities for his computer are not all that he needs. Joe is considering providing a program for him called Dragon Dictate by Dragon Systems. To use an application such as a word processor or a database, Dennis would load his software by speaking verbal commands into a desktop microphone. Then he could dictate rather than type his letters and documents and use verbal commands to edit, format, and access and control the information. The new technology will double Dennis's output almost immediately. And as he grows more proficient with this type of technology, he can expect a four- to five-fold increase in his productivity. With a system such as Screen Reader/PM, Dennis would also have access to the GUI to produce a pictorial representation by job of all the clients he deals with across the state. For instance, he could import his client database into his spreadsheet and produce a pie chart or other type of graph. According to government figures, 25 million Americans are disabled. Two laws have recently been passed that, besides indicating concern for the quality of life of people who are disabled, have significantly stimulated new work in adaptive technology (see the main text). Because of these new laws and the efforts of those who are tweaking technology, many formerly disenfranchised people are finally reaping benefits from using computers that open the world of the enabled to the world of the disabled. But, as Richard Schwerdtfeger says in the main text, there is still much to do. ---------------------------------------------------------------------- Janet J. Barron is a BYTE technical editor for State of the Art/Features. You can contact her on BIX as "neural." ----------------------------------------------------------------------