john Gill technology header image

COST 219ter WG3

Testing and evaluation – organisations and methods


Anna-Liisa Salminen
1st March 2006

The COST 219ter Testing for Accessibility working group (WG3) wanted to improve the testing and evaluation of accessibility for new generation network services and terminals for disabled & elderly people in Europe. The first step in this process was to identify European organisations that test and evaluate future telecommunication network services and devices. The second step was to find out what kind of methods organisations use when they are testing/evaluating telecommunication network services and equipment to be used by disabled and elderly people. This was done using two questionnaires. The results from these questionnaires are reported below.

Testing and evaluation organisations

The one page questionnaire (see Appendix 1) was sent as an e-mail to 200 potential organisations that were identified by COST219ter management committee members. The questionnaire was also openly available at COST219ter website.  The questionnaire was delivered in the end of 2004.  By 13th June 2005 27 replies were received: 8 from Finland, 3 from Portugal and UK, 2 from Belgium, Germany, Sweden and the Netherlands, 1 from Greece, Denmark, Ireland, Austria and Spain.

24 out of 26 respondents replied that they test / evaluate next generation telecommunication network services and equipment to be used by disabled and elderly people. Four of them tested / evaluated only accessibility, three only usability and 19 both accessibility and usability.

Their interest area/s in the field of testing / evaluation were research (n=21), product development (n=18), consumer services and services for consumer offices (n=16), testing services for companies (n=15), product approvals or accreditation (n=6), and manufacturing (n=5).

Twenty one (21) of the respondents reported that they tested/evaluated products/services with both software and hardware. Fifteen respondents (n=15) tested/evaluated software products/services (n=15), and ten (n=10) hardware products/services.
           
All of the respondents allowed COST219ter to put the contact information of their department on the COST219ter web site.

All of the respondents were interested in participating in a test and evaluation co-operation organised by COST219ter and 23 of the respondents were prepared to fill in a 2nd more detailed questionnaire on their testing/evaluation procedures.

Testing and evaluation methods

The second questionnaire (Appendix 2) was planned to find out what kind of methods organisations use when they are testing/evaluating telecommunication network services and equipment to be used by disabled and elderly people. 

This second questionnaire was sent to the 23 organisations that promised to fill in the questionnaire at the end of July 2005. By the end of August 2005 twelve (12) replies were received:
5 from Finland, 2 from Belgium, 1 from Portugal, Germany, the Netherlands, Greece and Austria.

Products

Products that organisations test/evaluate are mobile phones and services (n=7) telecommunication services and software (n=9), hardware such as computers, PDAs, handsets (n=7) and smart homes (n=4). In addition respondents reported that they test/evaluate alarm systems for elderly, assistive technology, websites and related applications, different types of aids (CCTV’s, Ultracane, GPS), prototypes of future information technology products, digital TV, usability of web pages, banking automats, rfid tags and devices, complete computer systems and nearly every kind of devices to human technology.

Frequency

Frequency of testing and evaluation activities in the organisations varied. Two of the organisations test/evaluate products on a daily basis, two weekly, three monthly, five few times in a year.

Evaluation of customer support and other services

Customer support and other services that are related to the products/services were evaluated sometimes in nine organisations and never in three organisations.

User participation

Users participate in testing and evaluation of the products and services always in eight (n=8) organisations and sometimes in four (n=4) organisations.

Users are people from the general population (n=6), people with mobility impairments (n=6), people with seeing and related impairments (n=9), people with hearing impairments (n=5),
people with communication impairments (n=5), people with problems in learning and applying knowledge (n=4) and elderly people (n=3). In most organisations users represented several or all of these groups. Three organisations tested and evaluated products for people with seeing impairments only.
           
When users participate the testing/evaluation they perform tasks they are asked to do (n=8), they perform tasks that are realistic and appropriate to them (n=9) and their opinions about the product are sought (n=9). Co-operation with them is based on the idea of partnership (n=8). Three (n=3) organisations reported that users receive payments on the same basis as all other partners, and one (n=1) organisation reported that users receive reimbursement of their costs. One respondent reported that users are asked to sign a consensus form.

Organisations use other user types in their tests/evaluations mainly sometimes (n=9)  and always in three (n=3) organisations. Expert types that they use are relatives of the potential users (n=7), health care or rehab workers (n=7), teachers (n=5) and especially expert individuals in a specific field (n=12). One organisation reported that they use commercial/sales representatives.

Testing/evaluation context

Respondents reported that they conduct their testing/evaluation both in laboratories (n=7) and in users natural environments (n=10). One respondent reported that they have a special test room, which is not really a lab and another told that they conduct testing in their offices and in collaboration with other organisations.

Three (n=3) of the organisations conduct tests/evaluations as one time testing per project, four (n=4) used longer testing period. Six (n=6) respondents reported that they used iterative testing processes.

Materials and method in tests and evaluations

Guidelines and checklists that are mainly used in the tests are:

Property checklists were used in three (n=3) organisations and self-made checklists in ten (n=10) organisations.

Other guidelines that were reported to be in use in one of the organisations were:

Instruments that organisations use in their tests are automated testing tools (such as Bobby, A-Prompt, Lift, CSS Validator) in six (n=6) organisations, manual testing in nine (n=9) organisations, browser testing such as Lynux, Microsoft Internet Explorer, Navigator, Opera in six (n=6) organisations and access technology such as screen reader, magnification software, Braille display in eight (n=8) organisations. One respondent highlighted that it depends on the object to be tested if they use comparative testing or outcome measures. One used multi-platform testing/in-house validation tools. One respondent reported that they use eye tracking cameras and event observer cameras.

The selection of methods that organisations used in their tests/evaluations was broad: scenarios (n=7), prototyping (n=7), task analysis (n=8), cognitive walkthrough (n=2), video logging (n=5), laboratory observations (n=5), ergonomic assessment (n=5), think aloud / verbal protocols (n=5), observations (n=9), user groups and user panels (n=8), expert groups and expert panels (n=8), focus groups (n=8), questionnaires (n=9), interviews (n=11), diaries (n=3), heuristic tests (n=6), comparative testing (n=1), co-operative evaluations (n=1) and empathic modelling (n=1). The number of methods that were used in organisations varied from three to fourteen.

Examples of typical testing/evaluation protocols

The respondents described their typical testing/evaluation protocols, which are listed in the following:

Example 1:
Typically we follow a sequential evaluation approach (see Gabbard, Hix & Swan, 1999), which, as implied from its name, involves a number of evaluation methods executed in sequence, such as cognitive walkthrough (Wharton et al., 1994), heuristic evaluation (Nielsen & Molich, 1990; Nielsen, 1994), formative evaluation (Scriven, 1967), summative evaluation (Hix & Hartson, 1993). The typical sequence in this respect is:

  1. Task Analysis
  2. Expert-based evaluation (heuristics inspection or walkthrough methods applied on paper-based prototypes, digital mock-ups and interactive prototypes)
  3. (Laboratory-based) Co-operative evaluation (sometimes by means of empathic modelling –see question 10)
  4. (Laboratory-based) User Tests
  5. Remote or Field Studies

Example 2:
Testing by trying different aids to help to use computers. New products (joystick/head mouse, communication programs... ) are first looked together with the user and near people. Then the user has a longer time to test these aids in his own daily surroundings. During the process of 4-7 meetings the best things for the user are found. Testing is kind of testing by trying things and checking out what works out. Meaning is to start from the user expectations on hopes and work is done in close co-operation with user and users’ close associates. The goals are set together and they are evaluated during the process.

Example 3:
Our multidisciplinary research teams develop and apply usability and user research methods in the development of new technologies. Emphasis is placed on design methods which help us demonstrate to users the possibilities that the future technology presents, and with which we can gather user feedback while the product idea is still being defined. We encourage users to think of meaningful uses for new technologies. A typical research project produces both technical prototypes and studies user feedback of product or service acceptability in the potential user groups. Field evaluations of products in their real contexts of use are our strongest expertise. We also conduct user and usage culture research, which enables us to better understand the users' everyday life, into which we are bringing the new technology. Our research is guided by the Design for All principle, involving different user groups in design activities and giving special attention to groups that benefit especially from the technology being developed.

Example 4:
We mostly perform evaluations, not actual testing, when we think it’s appropriate; i.e. new products on the market that appeal to a need like talking GPS, sonar canes for the blind and video telephony for the deaf. We try to collect as much information as we can and discuss the use and usability with experts in the specific fields of interest. The result is a ‘point-of-view’ that we publish on our website and in other publications on paper.

Example 5:
For “technical accessibility issues” our methods are based upon the “Evaluating Web Sites for Accessibility” Guidelines published by W3C (http://www.w3.org/WAI/eval/). This is only our “basic” approach as we developed a more thorough in-house evaluation methodology. “Usability testing” has been added to our offer since 2 years and works very well. The usability testing we perform is based upon different international recommendations and procedures. We also developed an in-house methodology based upon user interaction, scenario’s, expert reviews, heuristics assessments, interviews, etc. All sessions are recorded on video and shown to the client. In partnership we also added eye-tracking to our evaluation methodology.

Example 6:
We test the usability of access technology, e.g. screen reader, Braille display, as well as the accessibility of internet websites. For each product or service we use different testing protocols.

Example 7:
At least two weeks prior to the test each users gets:
General Introduction
Form for Informed Consent
Explanation for Informed Consent
Information about technical specifications of the prototype
Declaration of Confidentiality

Script during test:
1)Welcome (Room 1)
2)Optional: Introductory video (Room 1)
3)Declaration of Confidentiality (Room 1)
4)Demonstration (Room 2)
5)Instructions (Room 2)
6)Informed Consent (Room 2)
7)Test (Room 2)
8)Interview (Room 1)

References and further information provided by the respondents

Mourouzis, A., Antona, M., Boutsakis, E., Stephanidis, C. (2005) An evaluation framework incorporating user interface accessibility. In Proc. of HCII2005 Conference.

Mourouzis, A., Nota, S., Boutsakis, E., Kartakis, G., Stephanidis, C. (2005) Expert-based assessment of the ARGO Web browser for people with disability (in the proceedings of the AAATE Conference 2005)

Maudsley, Greenberg & Mander (1993) Prototyping an intelligent agent through Wizard of Oz. Interchi '93 Conference Proceedings

Monk, A., Wright, P., Haber, J., and Davenport, L. (1993) Improving your human-computer interface: A practical technique. Prentice Hall International (UK) Ltd.

USERfit (1996) USERfit. A practical handbook on user-centered design for Assistive Technology. D. Poulson (Ed) European Commission. DGXIII TIDE Project 1062

Law, C., Barnicle, K., & Henry S. L. Usability screening techniques: evaluating for a wider
range of environments, circumstances and abilities. In Proc. of UPA2000 Conference
(Usability Professionals' Association annual conference), 2000.

Irish National Disability Authority IT (see http://accessit.nda.ie/)

Infovisie Magazine. See www.infovisie.be  (In collaboration with two organisations for the visually impaired in Belgium (Blindenzorg Licht en Liefde) and the Netherlands (Visio) we publish test reports about technical aids for the target group. These test reports are published in this tri-monthly Dutch magazine)

www.incobs.de

www.bik-online.info

Gelderblom et.al. 2001 Comparative Product testing for electrical scooters, tailored information for different stakeholders, AAATE conference proceedings 2001

Comments about testing and evaluation for accessibility and usability and its future development

Respondents highlighted the importance of testing and evaluation and the need to develop more systematic and common approaches:

Exploiting the benefits of computer technology and realizing the collective vision of an Information Society for All, has proven somewhat elusive, especially in terms of accessibility where progress is made slowly. This is mainly due to the limited availability of appropriate guidance and best practice knowledge for building high-quality user interfaces (UIs), accessible and usable by a diverse user population with different abilities, skills, requirements, and preferences. As a result, appropriate evaluation processes, as an integral part of the development lifecycle, become critical to their success. However, despite the numerous cases that report results of studies which concern people with disability, little attention has been paid on the actual process of conducting evaluations for, diverse user groups, including disabled users. Typically, traditional evaluation methods and techniques originally developed for conducting conventional UI usability evaluations (e.g., of GUIs) are introduced in late development stages of UIs for people with disability and adapted on a case-by-case basis. Reportedly, such non-systematic approaches to evaluation are often proved inefficient and ineffective in assessing accessibility or other qualities - such as utility and usability - of systems that are targeted to a diverse user population.

The evaluation of UIs for diverse user groups, including people with disability, requires more rigorous methods and more systematic approaches. An analysis of the involved sources of variability and other related issues reveals the necessity for adopting more comprehensive evaluation methodologies that address additional system qualities such as visibility-find ability, perceived usefulness prior access and use, availability-approachability, interaction qualities (e.g., accessibility, utility and usability), and user relationship maintainability.

The need of accessibility testing is growing because of jhs129. Also the need of questionnaires based on different kind of user groups needs will grow.

We try to test as much as we can as we cannot imagine delivering a website to a client which wasn’t thoroughly evaluated and tested by “real users”. We have noticed that clients are more demanding and they (slowly) become aware of the importance of usability and user testing. It is very important to harmonize the different (national) testing methodologies that currently exist in and around Europe. That’s why we are a supporter of initiatives such as “euroaccessibility.org” and (more recently) the WAB Cluster.

Importance of ethical aspects and guidelines needs to be taken into account. Term "user friendly" does not necessarily means "elderly friendly".

Appendix 1. First questionnaire

Want to be part of a new European approach to improving the testing & evaluation process?

How can it help me?
Testing and evaluation are essential components of the design and the adaptation process.

The COST 219ter Testing for Accessibility working group (WG3) exists to improve the testing and evaluation of accessibility for new generation network services and terminals for disabled & elderly people in Europe.

The first step in this process is to identify European organisations like yourselves that test and evaluate future telecommunication network services and devices.

Your contribution will be used to produce guidelines to improve testing for accessibility that is intended to be of benefit to all participating testing organisations.

The guidelines are to be used to produce recommendations on testing and evaluation and to foster co-operation between you and other organisations by networking and organising workshops on testing and evaluation.

If you are interested in participating in this project, and the possibility of future co-operation with this group and other organisations like yourselves, please fill in the questionnaire.

The main objective of the COST 219ter Action is to increase the accessibility of next generation telecommunication network services and equipment to elderly people and people with disabilities by design or, alternatively, by adaptation when required.

Thank you in advance for your contribution.

Patrick Roe
Chairman, COST 219ter

 

Department and organisation:…………………………………………………………………
Contact person:…………………………………………………………………………………
Address:………………………………………………………………………………………….
Phone:……………………………………………………………………………………………
E-mail:……………………………………………………………………………………………
Web-site of the department/organisation:……………………………………………………

N.B. Only one questionnaire should be filled in per department/organisation

1. Do you test / evaluate next generation telecommunication network services and equipment to be used by disabled and elderly people?
yes / no

2. Do you test / evaluate:
only accessibility / only usability* / both accessibility and usability  

3. Your interest area/s in the field of testing / evaluation:
product development / manufacturing / testing services for companies / product approvals or accreditation / consumer services, services for consumer offices / research

4. What type of services and equipment are you testing/evaluating?
 software products/services / hardware products/services / products/services with both software and hardware / anything else, please specify:                                                          

5. Do you allow us to put the contact information of your department on the COST219ter web site?             
yes / no

6. Would your department be interested in participating in a test and evaluation co-operation organised by COST219ter (e.g. networks, workshops)?
yes / no

7. Would you be prepared to fill in a 2nd more detailed questionnaire on your testing / evaluation procedures?
yes / no 

Appendix 2. Second questionnaire

The COST 219ter Testing for Accessibility working group (WG3) exists to improve the testing and evaluation of accessibility for new generation network services and terminals for disabled & elderly people in Europe.

The first step in this process was to identify European organisations that test and evaluate future telecommunication network services and devices.

The second step in this process is to find out what kind of methods organisations use when they are testing / evaluating telecommunication network services and equipment to be used by disabled and elderly people. 

We are grateful that in the first questionnaire you promised that you are prepared to fill in this second questionnaire. This questionnaire completes information that you already provided us in the first questionnaire and collects more detailed information on your testing/evaluation procedures.

Tick one or several choices to each question and fill in the open questions.

1. Background information

Organisation:
Contact person:

What kinds of products do you test/evaluate, please describe:
mobile phones and services
telecommunication services and software
hardware (computers, PDAs, handsets)
smart homes
other, please specify………………………………………….

How often do you test/evaluate these products:
daily
weekly
monthly
few times in a year
more rarely  

2. Do you evaluate customer support and other services that are related to the products/services?
never / sometimes / always

3. Do users participate in the testing/evaluation of the products and services?
never / sometimes / always

If sometimes/always, they are:
              people from general population
              people with mobility impairments
              people with seeing and related impairments
              people with hearing impairments
              people with communication impairments
              people with problems in learning and applying knowledge
              elderly people (65+)
              other, please specify……………………………………………

4. When users participate the testing/evaluation…
they perform tasks they are asked to do 
they perform tasks that are realistic and appropriate to them
their opinions about the product are sought
co-operation with them is based on the idea of partnership
they receive payments on the same basis as all other partners  
other, please specify………………………………………………

5. Do you use other user types such as experts in your tests/evaluations?
never
sometimes
always

If sometimes/always, what expert types do you use?
  relatives of the potential users
  health care or rehab workers
  teachers
  expert individuals in a specific field
  other, please specify …………………………………………………

6. In what kind of environments do you conduct testing/evaluation?
laboratories
users natural environments
other, please specify………………………………………………….

7. Do you conduct tests/evaluations
as one time testing per project
during longer testing period
as an iterative process
                                 
8. What kinds of guidelines and checklists do you use in your tests?
              Web Content Accessibility Guidelines
              User Agent Accessibility Guidelines (WAI)
              IMS Guidelines for Developing Accessible learning Applications
              IBM Developer Guidelines for Web Accessibility
              IBM Developer Guidelines for Hardware Accessibility
              ONS capability scales
              property checklists
              self-made checklists 
              Nordic recommendations
              none
              other, please specify ……………………………………………

9. What kinds of instruments do you use in your tests?
automated testing tools (such as Bobby, A-Prompt, Lift, CSS Validator)
manual testing
browser testing (such as Lynux, Microsoft Internet Explorer, Navigator, Opera)
access technology (screen reader, magnification software, Braille display)
none
other, please specify………………………………………………..

10. What kinds of methods do you use in your tests/evaluations?
scenarios / prototyping / task analysis / cognitive walkthrough / video logging / laboratory observations / ergonomic assessment / think aloud/verbal protocols / observations / user groups and user panels / expert groups and expert panels / focus groups / questionnaires / interviews / diaries / heuristic tests / other, please specify……………………………………………

11. Describe your typical testing/evaluation protocol?
(with references to possible www-resources and publications)

12. Anything else you want to say about testing and evaluation for accessibility and usability and it's future development?

 

 



John Gill Technology Limited Footer
John Gill Technology Limited Footer