CSpeak - The Virtual UI

Edited

Overview

In order to provide the most flexibility possible with the largest variety of applications, Clinically Speaking has developed an abstraction of the user interface called the Virtual UI. The Virtual UI defines a set of common properties and functions that Clinically Speaking recognizes and expects a target UI to implement. Please keep in mind that some of the topics in this document are purposefully simplified to help with understanding. This document is not intended to be a full technical specification of any technology or framework.

Definitions

Accessible Rich Internet Applications (ARIA)

A specification created by the World Wide Web Consortium (W3C) that details how to create accessibility enabled web pages. ARIA properties are generally mapped to UIA properties.

Basic Text Control

Basic text control is defined as text control where less than 4 of the text control operations exist or are available for a given text control. In most cases when a text control is said to have basic text control it can be assumed that the only available operation is insert text where the text operation is not direct.

Full Text Control

Full text control is defined as text control were all 4 of the text control operations exist and are available for a given text control.

Direct Text Insertion (DTI)

Direct text insertion is defined as a text control operation that directly sends control signals to the control and not through an intermediary like the clipboard or virtual keyboard. A common DTI is the Value Pattern defined in the Windows UI Automation Value Pattern.

Microsoft Active Accessibility (MSAA)

The predecessor to Microsoft UIA. This framework is now built into UIA and is not generally speaking adopted by modern applications. Some applications like the Chromium runtime (e.g., Chrome, MS Edge) still require this API for automation purposes.

User Interface (UI)

The user interface is the collection of all elements and controls that compose the interface of a program that a user may access or see while navigating any application or webpage. For the purposes of this document, you can assume that the user interface is scoped to a single application or webpage.

User Interface Automation (UIA)

This is an API developed by Microsoft for the Windows OS. UIA provides definitions for common control patterns and functions that an application or control may implement. These control patterns and functions are publicly available to any automation client running in the current Windows session.

Text Control Operations

The Virtual UI defines 4 text control operations that a text control must implement for full text control to exist. The 4 operations are accessed through an operation channel in the Virtual UI API.

Get Selection

When called this function must return the start of the selection and the length of the selection. The selection should be 0-indexed.

void GetSelection(ref int ASelectionStart, ref int ASelectionLength);

Get Text

When called this function must return the entire body of text in plain text format. The preferred new line format is the Windows standard newline, “\r\n”, but depending on the UI framework this may change.

string GetText();

Insert Text

When called this function will insert the provided text argument over the current selection.

void InsertText(string AText);

Replace Text

This is a common composite function found in the Windows UI ecosystem. This function blends the Set Selection and Insert Text functions into a single call. If a control is built on using the Win32 framework it is likely you will see this function.

void ReplaceText(int ASelectionStart, int ASelectionLength, string AText) {
  SetSelection(ASelectionStart, ASelectionLength);
  InsertText(AText);
}

Set Selection

This function will set the selection start and length of the text control to the provided arguments. The selection should be 0-indexed.

void SetSelection(int ASelectionStart, int ASelectionLength);

Accessibility Tools

Windows Tools

1.       Accessibility Insights for Windows

2.       UIA Verification Tool (Included with the Windows UI SDK)

3.       Spy++ (Included with Microsoft Visual Studio)

4.       WinSpy++

Note: Spy++ and WinSpy++ are only useful for Win32 based controls and properties.

Web Accessibility Tools

1.       Accessibility Insights for Web (Chrome or MS Edge)

Note: Most web browsers have built in accessibility tools to help with verification purposes.

Virtual UI Objects

In the Virtual UI API, there is a small collection of objects that you should be familiar with.

Operation Channel

An operation channel is used to call a specific text control operation for a specific framework. The only public property available for non-Clinically Speaking staff is the name of the operation channel. The name will identify the means of, or framework to use, to call the text control operation. Operation channels are managed solely by Clinically Speaking. Any application you wish to interface with must use an established operation channel.

Note: If you are a third-party who is looking to increase compatibility with your application, and you require a new operation channel please contact Clinically Speaking Support.

Process Template

A process template is an object used to identify a process through its metadata. The main identifying properties are the application name and process name. The process name is the primary property for identifying the process at runtime. A process template can also contain a series of edit control templates to override specific functionality for the specific process. If a process template has the ignored property enabled, the basic text edit control template will always be returned for that control.

Edit Control Template

An edit control template is an object used to identify an edit control through its metadata and define how CSpeak should interface with the edit control through operation channels. The main identifying properties are the name, automation ID, and class name properties of the control. These properties must be accessible by UIA for CSpeak to read them. An edit control template with a class name of, “*”, is a wild card template and will be used if a specific template cannot be found in the parent process template.

Example: Creating Templates for WordPad (Workflow 1)

In this example we will use the CSpeak v7 client to map to WordPad in a Windows 11 environment.

Step 1: Open CSpeak and WordPad

To get start please open and log into CSpeak as an administrator or higher security level. Please open WordPad.exe from the start menu.

Step 2: Open the Virtual UI Content Page

On the home content page in CSpeak you will see an option labelled, “Virtual UI”. Please click on this option to be taken to the process templates list.

Step 3: Create the Process Template

If a process template doesn’t exist for WordPad click on the button, “+”, in the top right to create it. On the process template dialog, you need to supply the application name and process name. If you know these two fields, you can enter them manually. Alternatively, you can select the process from the list of open processes combo box. When you have the information entered, please click the button, “Submit”.

Find the process template in the list and double click on it to open the process template.

Step 4: Create the Edit Control Templates

Now that we have told CSpeak how to recognize the process. We need to tell CSpeak how to interact with the text controls inside of the process. For WordPad this will be straightforward as there is only 1 text field to interact with. For this example, we are going to assume that you have the Accessibility Insights for Windows tool installed.

Open Accessibility Insights for Windows and dock it on the left side of your screen. Dock WordPad onto the right side of your screen. With Accessibility Insights open click into the text field inside of WordPad. You will know that Accessibility Insights is tracking you when you see a border around the focused control.

With the properties now available to you, note the AutomationId, ClassName, and Name properties shown. These 3 properties can be used to create the edit control template. Back in the process template content page please click the “+”, button to add an edit control template. The first 3 properties should match the properties shown in Automation Insights. If you know what the new line format is for the control you can select it, otherwise the most likely option is CRLF.

The final 4 properties to set are the operation channels. There isn’t a good way to guess which operation channel you should use for any given control. And most times you will have to run through a few iterations to determine what is the best option or if an option even exists. In this example we are going to refer to the automation patterns that are available and shown inside of the Accessibility Insights application under the properties.

The TextPattern automation pattern is what the operation channel, “UIAutomation”, will use to interact with the text control. The operation channel, “UIAutomation”, has the text operations, “Get Selection”, “Get Text”, and “Set Selection” available. This leaves the text operation, “Insert Text”. In most cases a control can provide full text control with the insert text channel set to the “Clipboard” or “Keyboard”, operation channels. For our example here let’s use the operation channel, “PInvoke”.

A quick note on the name, automation ID, and class name properties. You can use all 3 properties in an edit control template; however, it is generally advised that you as few of them as possible to make sure that you have the broadest mapping as possible. In order of importance is the automation ID, class name, and name. The only reason to use more than 1 of these properties is to provide a template for a specific instance of an edit control. In this example only the class name needs to be specified.

Once you are done with the dialog, click the button, “Submit”, to create the edit control template. At this point you will have a complete process and edit control template to allow CSpeak to dictate into WordPad.

Example: Creating Templates for WordPad (Workflow 2)

In this example we will use the CSpeak v7 client to map to WordPad in a Windows 11 environment.

Step 1: Open CSpeak and WordPad

To get started please open and log into CSpeak. Please open WordPad.exe from the start menu. With CSpeak open please click on the navigation option, “CSpeak Settings”, and check the option for, “Show Text Control Window”, and save. Once the settings are saved you can close the CSpeak settings window.

Step 2: Creating Templates with the Text Control Window

Click into the text control inside of WordPad and turn your microphone on and then off. You should see the text control window appear in the top left of your screen. This window shows you the currently applied template for the focused text control and allows you to modify that template in real time so that you can test on the fly.

From here you can try out the different operation channels and when you have found the template you would like to use, click on the save icon found to the right of the header, “Template”. At this point your new process and edit control template has been added to the client settings for CSpeak.

A quick note about this workflow. The edit control template that is created will have all 3 identification properties set. If you intend to broaden the set of controls this template will apply to, you should modify it in the Virtual UI content page to use fewer identification properties.