TECHNOLOGY AREA(S): Air Platform, Battlespace, Human Systems
OBJECTIVE: Develop an innovative software capability to improve the utility of structured automatic speech recognition (ASR) by allowing end-users to customize the set of supported utterances without external support.
DESCRIPTION: Over the past decade, there has been moderate demand for automatic speech recognition (ASR) integration with simulation-based training systems that has coincided with commercially available services such as Apple's Siri and Google Now. ASR provides training systems the ability to interpret human speech and react to that speech with appropriate actions (e.g., executing a spoken command) and responses (e.g., replying to a human with confirmation or requests for clarification). The navy seeks a software capability to improve ASR technology effectiveness and sustainability by allowing instructors or scenario developers to expand and modify the recognized speech in the training environment. As a result, this software will provide a means to enhance training fidelity by ensuring the ASR remains robust as tactics and protocols change over time. Further, by providing this capability to the end users of the training systems, this product will reduce the cost and schedule associated with such updates (e.g., costs of contract award to make software updates; schedule delays to align software updates with planned engineering change requests or technology refreshes). ASR successes within simulation-based training systems have been modest, historically, due in most part to its complexity to properly implement. Some domains have overcome the complex challenges that exist in implementing ASR by making use of enforced doctrinal phraseology, which the speech recognition technologies can exploit. In these cases, speech recognition technologies can exploit this structure for the purposes of recognizing human utterances. However, in more complex and fluid training environments that are less structured, where such templates and standards do not exist, more complex natural-language processing techniques are necessary to achieve that purpose. These environments require ASR systems with the flexibility for the instructor to customize and edit the feature. This inflexibility barrier remains, and limits the utility of ASR for structured training domains. End-users of ASR-enabled training systems have little to no ability to edit or customize the feature to better match their particular needs (e.g., unit-specific phraseology, local area references, and supported alternatives to known message patterns). Currently, if training personnel want to append a particular phrase or a specific term to the existing grammar, for instance, he/she will most likely have to contact the developer of the capability directly to get it added. Updating existing grammars via contacting the original developer can be burdensome in both time and cost resources. Further, depending on the architecture and lifecycle milestone of the training system, updating software may be difficult or impossible without a larger engineering change request to the system. In contrast, an organic editor embedded within with existing ASR software can facilitate quick updates (minutes to hours) to the grammar. Within military domains where tactics and protocols adapt over time, having the capability to make updates without a significant system upgrade is essential. Additionally, there is limited support within ASR software solutions that provide trainees with an opportunity to familiarize themselves with the ASR capability; this leads to a very high failure rate and ultimately dissatisfaction with the training system as a whole because the systems capabilities and limitations are misunderstood. Development of an innovative software solution to address the current gap outlined above is needed. The resulting software capability should be modular and flexible in nature to allow multiple aviation platforms to leverage the functionality. For example, consider U.S. Naval aviation crews that conduct similar mission sets, but have their own unique doctrinal phraseology. Although each platform (e.g., P-8A, P-3C, MH-60R) may prosecute an anti-submarine warfare (ASW) mission similarly, their doctrinal phraseology is likely specific to their respective platforms. The solution should have enough flexibility to account for platform specific changes, or multiple platform accommodations. The resulting software capability should include up-front train the speaker modules. These independent training software programs are desired to familiarize trainees on best practices with respect to interacting with the ASR feature to reduce the rate of ASR failures during training. The vast majority of current ASR systems include no capacity for speaker training or practice. Although ASR accuracy can be improved by expanding and customizing the acceptable grammar parameters to each specific scenario, the pre-training software program(s) can further reduce the chance of system disuse as trainees will better understand how to use the system.
PHASE I: Design a speech-recognition software suite for augmenting and editing ASR grammars and providing practice support for end-user trainees. Develop and demonstrate a proof-of-concept example of an ASR grammar customization capability in a relevant domain (e.g., the P-8A Anti-Submarine Warfare mission). The software should be designed with sound human factors principles to ensure that it is usable by an end user instructor or scenario designer, allowing them to take an existing grammar and add, remove, and modify to augment the speech grammar to meet additional or new requirements. Develop and demonstrate a training module that would allow a trainee to interact with the updated grammar to test and validate the updates. Risk Management Framework guidelines should be considered in initial design to support information assurance compliance throughout the effort.
PHASE II: Refine the development of the ASR customization software suite, targeting the representative domain. Demonstrate and evaluate the utility of practice support tool in improving trainee speech recognition performance. Improvement includes increases in recognition accuracy, compared to a baseline of not using the pre-training familiarization tools, but also should include user satisfaction with the system. With ASR related technologies, user satisfaction is just as important as high recognition accuracy. Demonstrate and deliver a fully-featured prototype. Investigate requirements for integration into end-user training systems. Risk Management Framework guidelines should be considered and adhered to during the development to support information assurance compliance.
PHASE III: Extend the baseline functionality to meet robust multiple aviation platform speech training requirements, including P-8A. Implement Risk Management Framework guidelines to support information assurance compliance, including updates to any outputs for meeting specific training systems information assurance requirements. Integrate the resultant software suite into relevant training, scenario development, and/or speech system to support test and demonstration of the technology in a relevant environment, such as the part-task trainers of platforms running ASW mission sets. Private Sector Commercial Potential: The advancement of speech technologies in recent years continues to push the commercial availability of products further. Advancements such as this will increase the feasibility and utility of speech technologies in domains such as educational/academic environments (e.g., intelligent tutors; computer-based, instructorless training environments such as virtual high schools), commercial aviation training, unmanned systems interfaces, multi-media environments (e.g., vehicle interfaces) and multi-lingual translation devices.
1. Begel, A., & Graham, S. L. (2006). An assessment of a speech-based programming environment. In Visual Languages and Human-Centric Computing (VL/HCC'06) (pp. 116-120), IEEE. Retrieved from http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1698772&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1698772
2. CrabbÃ©, B., Duchier, D., Gardent, C., Le Roux, J., & Parmentier, Y. (2013). Xmg: extensible metagrammar. Computational Linguistics, 39(3), 591-629. Retrieved from http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00144#.V6JO-TZf17g
3. Duchier, D., & Parmentier, Y. (2015). High-level methodologies for grammar engineering, introduction to the special issue. Journal of Language Modelling, 3(1), 5-19. Retrieved from http://jlm.ipipan.waw.pl/index.php/JLM/article/view/117
4. Ehsani, F., & Knodt, E. (1998). Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. Language Learning & Technology, 2(1), 45-60. Retrieved from http://llt.msu.edu/vol2num1/article3/
5. Godwin-Jones, R. (2009). Emerging technologies: Speech tools and technologies. Language Learning & Technology, 13(3), 4-11. Retrieved from http://llt.msu.edu/vol13num3/emerging.pdf
6. Liu, Jingjing, et al. (2013). Asgard: A portable architecture for multilingual dialogue systems. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE. Retrieved from http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=6639301
7. Sonntag, D., Nesselrath, R., Sonnenberg, G., & Herzog, G. (2009). Supporting a Rapid Dialogue System Engineering Process. Proceedings of the 1st IWSDS. Retrieved from https://www.researchgate.net/publication/236260346_Supporting_a_Rapid_Dialogue_Engineering_Process
8. Taylor, G., Frederiksen, R., Crossman, J., Quist, M., & Theisen, P. (2012). A multi-modal intelligent user interface for supervisory control of unmanned platforms. In Collaboration Technologies and Systems (CTS), 2012 International Conference on (pp. 117-124), IEEE. DOI: 10.1109/CTS.2012.6261037
9. Wald, M. (2006). Creating accessible educational multimedia through editing automatic speech recognition captioning in real time. Interactive Technology and Smart Education, 3(2), 131-141. Retrieved from http://www.emeraldinsight.com/doi/abs/10.1108/17415650680000058-
KEYWORDS: Automatic Speech Recognition; Speech Understanding; Speech System Training; Training Fidelity; Customizable Speech; Speech Interface