Web Japan > Trends in Japan > Sci-tech > Instant Phone Translation

Instant Phone Translation

No Longer a Dream

Recognizing Voice and Translating Speech into Another Language Many people wish they could overcome the language barrier and communicate with anyone they please. For years, the dream of a "voice translation device" has been limited to science fiction. But now, Japanese phone technology has finally made the dream come true.


Demonstration of the Phone Translation Service by NTT DOCOMO. The representative on the right speaks in Japanese, while the representative on the left speaks in English.

In order for computers to translate the spoken word, three technologies are required: a voice recognition function that allows them to convert spoken words into text, a machine translation function that translates the text into another language, and a speech synthesis function that converts the text back into audible speech and sends this sound signal to a speaker. Work on these technologies has been steadily progressing around the world. In Japan, NTT and other telecommunications companies as well as university research centers worked to develop a Japanese speech synthesis technology. At the same time, a national project headed by Kyoto University was at work on Japanese-to-English and English-to-Japanese machine translation technology. Both of these efforts resulted in functional technologies in the early 1980s, but it has taken a much longer time to produce a prototype of a "voice translation device." This is because, until lately, the processing speed of a super computer was necessary to realize nearly instantaneous and highly accurate voice recognition.

Integrating Cloud Computing with Cell Phone Technology


A user can view his or her spoken words and their translation on a tablet (left) or smartphone (right).

Enlarge photo
The world moved one step closer to making the dream of "voice translation devices" practical as we entered the 21st century. In order to perform processing at a speed equivalent to the super computers of the 1990s, the world had to wait for "cloud computing," which allows users to use all their devices—computers, cell phones, or other things—via a connection to a server on the Internet.

In May 2011, NTT DOCOMO announced a Phone Translation Service for smartphones and certain cell phones. It translates the user's words into the language spoken by the person on the other end of the call almost instantaneously. A campaign to recruit users to try out the new system began in November 2011. They were provided with smartphones and cell phones (FOMA phones) and given access to both Japanese and English language services. Translation accuracy is constantly improving, and the new translation service should be available to general cell phone users early in 2012.


Sample screen shot. The words a speaker says are displayed as text, translated, and then transmitted to the listener in his or her own spoken language.

Enlarge photo

To begin using the service, users press the "begin translation" button on the touch screen and then speak. The spoken words are first displayed on the LCD screen on the speaker's handset. They are then translated and displayed in the language of the person on the other end of the call. Two to three seconds later, the translated text is transferred into sound signals that the other person hears as speech in his or her own language. That person can then answer questions, for example, by speaking in his or her own language. This answer is then converted into written words, translated, and then sent to the other person as a voice signal (text display is only available on smartphones).

Nearly Instant Voice Recognition, Machine Translation, and Speech Synthesis The phone translation service records the spoken words in the cloud, transposes them to written words, translates the written words, reconverts the translated text into a voice signal, and then sends that voice signal to the other person's handset. The key to the new service was to develop and implement an algorithm able to perform the voice recognition process quickly and accurately. Taking advantage of the widespread use of cloud computing, applications that make it possible for a network computer to translate words spoken into a smartphone have recently been developed. However, for this type of application to work, it must be installed on both users' smartphones. Japan's translation service—which can be used by both smartphones and conventional cell phones—is the first of its kind in the world.


How the Phone Translation Service works: Via the cloud, speech is recorded and processed using voice recognition, machine translation and speech synthesis technology. The translation is then heard on both phones.

Applications That Make Multi-Language Conversations Possible In October 2011, the National Institute of Information and Communications Technology (NICT) developed an application that allows several people speaking in different languages to converse via smartphone. The application, known as ChaTra, allows up to five people to converse in up to six languages: Japanese, English, Chinese, Korean, Indonesian, and Vietnamese. It utilizes VoiceTra, a speech translation software launched in August 2010.


Screen shots of chats using ChaTra, a translation application developed by NICT. © NICT

NICT is testing speech translation technology in cooperation with research institutes around the world. It is releasing the results of these tests to private companies and the technology is already being used in applications. In December 2011, NICT's speech translation technology was provided to Narita International Airport, which has released it as NariTra, a speech translation application for travelers.

Now you can talk on the phone or participate in conferences with people who speak other languages, make reservations in foreign hotels from home (using the telephone), have a conversation with local people in a foreign café (using a face-to-face translation application), or converse in multiple languages using special applications… the possibilities are almost endless. Japanese technology is making the dream of tearing down language barriers between people around the world through the use of translation devices into reality. (January 2012)

Page Top

Related Articles