Beyond Siri: DARPA’s BOLT
It’s 2020. A US soldier sits down with a village sheikh, with an unusual robot in tow. The sheikh greets him courteously, respectfully, in flowing Arabic. At the appropriate time, the robot offers the same speech in English. The soldier nods, speaks, and gives a command, whereupon the robot offers dependable translation that’s even customized to the local dialect. Offshore, an intelligence analyst sorts through a combination of intercepted emails, recorded cell phone conversations, and document archives, looking for patterns and connections. She’s not fluent in Arabic, but the same technology used by the soldier is providing usable translations for her searches – asking her questions as needed, and helped by embedded clarifications and tags.
Thanks to a 2003 DARPA program, The world got to know Siri, the show-stealing component of Apple’s iPhone 4S. DARPA’s 2011 BOLT program aims to take the next step, from a silicon intermediary between man and machine to an intermediary between people. Even as it also provides a powerful back-end translation system for traditional intelligence tasks. It’s one of a family of ongoing translation research efforts, all aiming to solve a persistent and expensive problem for the US military.
The Broad Operational Language Technology Program (BOLT) has a goal of creating technology capable of translating multiple foreign languages in all genres, retrieving information from the translated material, and enabling bilingual communication via speech or text. Initial languages include up to 5 dialects of Arabic, plus Mandarin Chinese. The first 2 dialects will be addressed in Phases 1 and 2 of “Activity D”; the second 2 dialects will be addressed in Phases 3 and 4; and the final dialect will be addressed in Phase 5.
In the field, the current goal appears to be a robot, but BOLT will also be a back-end IT system to help analysts with translations of stored text and voice.
BOLT’s Technical Area 1, “Algorithmic Development and Integrated Systems,” involves 6 sub-segments for contractors: A) Genre-independent translation and information retrieval system; B) Human-machine communication system; C) Human-human dialogue system; D) Arabic dialect components; E) Grounded language acquisition; and F) Other basic technologies to enable Activities A-C.
The 6 goals overlap only in part with those segments, and include:
1. Translate multiple foreign languages in all genres, retrieving information from the translated material, and enabling bilingual communication via speech or text;
2. Accurate translation into English of informal speech and text such as occurs in e-mail, SMS messaging and conversational speech in Arabic (including up 5 dialects such as Levantine or Iraqi) and in Mandarin Chinese, at an accuracy rate of 90%.
3. Voice communication in English and Arabic, which has a number of dimensions. One is bilingual human-human conversation in speech or text, with a test of 90% success over 5 complete turns. Another is human-machine communication by voice or other input modes, in English and Arabic, so you could issue verbal commands to a robot, for instance. The last bit involves human-machine dialogue in either English or Arabic, to help it with clarification and disambiguation.
4. Retrieve targeted information from multi-lingual (initially Arabic and Chinese) sources for which machine translation services exist, using natural language English queries. The system will have a human-machine clarification and disambiguation dialogue to help refine those queries, and means to annotate documents (text or spoken) for fast comprehensive search. The goal is 90% accuracy for 90% of the test queries, using textual natural language queries plus the ambiguity resolution dialogue.
5. Research in deep semantic language acquisition using robotic visual and tactile information as input for experiential learning of objects, actions, and consequences. This is known as “Activity E,” and the test is a robot equipped with vision and tactile inputs who can recognize 250 objects varying in color, shape, size, etc., and understand the consequences (pre-state and post-state) of 100 actions, so that it can execute complex commands with 90% completion rate. It’s not exactly C-3PO… more like Johnny 5.
6. Research in basic technologies. which is to say, algorithms for Semantic Role Labeling (SRL), parsing, language modeling, discourse analysis, co-reference resolution, dialogue turn analysis, automatic evaluation, etc. that are essential to the success of the above.
Technical Area 2 involves Data Collection, to create repositories for testing.
Activity 2-A involves collecting 2 million words each of unclassified email, messaging and conversation, in English, Levantine or Iraqi Arabic, and Mandarin Chinese. They will be will be translated, then aligned and annotated for parsing and semantic roles.
Activity 2-D involves 1 million words of text (any genre) and 1 million words of transcribed speech in 5 dialects of Arabic (2 in year 1, 2 in year 2, 1 in year 3).
Technical Area 3 will involve building evaluation frameworks, and measuring the performance of the technologies and/or systems developed in Technical Area 1, using the data provided by Technical Area 2. No firm picked for work in Technical Area 1, activities A-F, can work on evaluation.
If the research proves itself out, one of the big technical challenges faced in making BOLT an operational field system will revolve around how it’s split up.
Siri, for instance, lives in the cloud, and depends on a connection between the phone and data center in order to work her magic. In exchange, back-end updates and advances are seamless, learning happens to the system as a whole, and new services are instantly available.
The military, on the other hand, often operates in austere regions, and the explosion of data from command systems, UAVs, and surveillance devices has made bandwidth a scarce commodity. Architects will need to contemplate how much of BOLT’s field-facing functions need to be stored and run on the robot (or a future device), and what aspects, if any, can rely on the larger back-end system.
For the intelligence analysts who might use BOLT, searches and analysis of things like intercepted communications involves access to sensitive information. A system in the cloud has all of Siri’s advantages, but also becomes a logical central point of attack because of its wide access. A fragmented system of local installs, on the other hand, will create more torubleshooting issues, more expense, and issues of version control, slower machine learning and updates, etc.
Contracts & Key Events
Work on these DARPA contracts is expected to run until Sept 30/16.
Oct 25/11: SRI International in Menlo Park, CA receives a $7.1 million cost-plus-fixed-fee BOLT contract. Work will be performed in Menlo Park, CA (50.22%); Flushing, NY (2.49%); New York, NY (11.39%); Hong Kong (2.41%); Portland, Ore. (1.00%); Rochester, NY (1.58%); Edinburgh, UK (2.41%); Seattle, WA (6.56%); Marseille, France (2.56%); Amherst, MA (10.68%); Richardson, TX (1.26%); and Sunnyvale, CA (7.43%) (HR0011-12-C-0016).
SRI has some famous background in this area. A 2003 DARPA program, the $22M Personalized Assistant that Learns (PAL) project, led SRI to develop CALO software, which was spun out as its own company in 2007. Apple bought that company in 2010, and in 2011, the iPhone 4S introduced the world to… Siri.
Oct 14/11: Raytheon BBN Technologies Corp. in Cambridge, MA receives an $8.4 million cost-plus-fixed-fee BOLT contract. Specifically, Raytheon BBN will conduct work under Technical area 1, Activity A, B, C and D.
Work will be performed in Cambridge, MA (65.91%); Marina del Rey, CA (11.70%); New York, NY (9.47%); Los Angeles, CA (8.05%); Chicago, IL (1.68%); Berkeley, CA (1.46%); Bedford, MA (0.95%); Ithaca, NY (0.39%); and Philadelphia, PA (0.39%) (HR0011-12-C-0014).
Oct 13/11: A BOLT from Big Blue. IBM Corp. in Yorktown Heights, NY receives a $6.6 million BOLT cost contract. Specifically, IBM will conduct work for Technical area 1 Activity A, “Genre-Independent Translation and Information Retrieval System,” and Activity C, “Human-Human Dialogue System.” Work will be performed in Yorktown Heights, NY (56.93%); Aachen, Germany (8.83%); College Park, MD (8.65%); Cambridge, United Kingdom (8.16%); Stanford, CA (8.10%); Baltimore, MD (4.12%); Le Mans, France (3.82%); and Cambridge, MA (1.39%) (HR0011-12-C-0015).
IBM has a number of resources it can call on, including past work on DARPA’s GALE (Global Autonomous Language Exploitation) program. Some of their work on WATSON, which had to figure out the rules of various Jeopardy TV show categories by seeing and understanding the patterns in answers from other participants, may help here.
April 4/11: DARPA issues BOLT’s initial synopsis & solicitation on FBO.gov.
- FBO.gov – DARPA-BAA-11-40: BOLT. Solicitation Number: DARPA-BAA-11-40
- Popular Science (April 6/11) – DARPA’s Newest Language Translator Would Be Less Handheld Device, More Robot Assistant
- WIRED Danger Room (April 5/11) – Military’s Newest Recruit: C-3P0
- Information Week (April 26/11) – DARPA Launches Translation Software Initiative
- SRI International – Siri, the Virtual Personal Assistant for the Apple iPhone 4S
- Popular Science (Feb 8/10) – Google’s Handheld Translator Seeks to Cross Language Barriers. But it’s still a ways from finished, and beware the “cascading errors” issue.
- Popular Science (July 2/09) – Speech Recognition iPhone App Translates Arabic On the Fly.