{"id":335379,"date":"2017-03-09T15:45:00","date_gmt":"2017-03-09T07:45:00","guid":{"rendered":"https:\/\/www.engadget.com\/2017\/03\/09\/baidu-deep-voice-natural-sounding-speec\/21|21876774"},"modified":"2017-03-09T15:45:00","modified_gmt":"2017-03-09T07:45:00","slug":"baidus-deep-voice-can-quickly-synthesize-realistic-human-speech","status":"publish","type":"post","link":"https:\/\/people.utm.my\/asmawisham\/baidus-deep-voice-can-quickly-synthesize-realistic-human-speech\/","title":{"rendered":"Baidu&#8217;s Deep Voice can quickly synthesize realistic human speech"},"content":{"rendered":"<p>Google&#8217;s WaveNet can also synthesize realistic human speech, but it&#8217;s quite computationally demanding and hard to use for real-world applications at this point. Baidu says it solved WaveNet&#8217;s problem by using deep-learning techniques to convert text to phenomes, the smallest unit of speech. It then turns those phonemes into sounds using its speech synthesis network. The system converts the word &#8220;hello,&#8221; for instance, into &#8220;(silence HH), (HH, EH), (EH, L), (L, OW), (OW, silence)&#8221; before the speech network pronounces it.<\/p>\n<p>Both steps rely on deep learning and don&#8217;t need human input. However, the system doesn&#8217;t control which phonemes or syllables are stressed and how long they&#8217;re pronounced. That&#8217;s where Baidu steps in &#8212; it switches them around to change the emotions it wants to convey.<\/p>\n<p>While the company says Deep Voice has solved WaveNet&#8217;s problem, it still requires a ton of computing power. A computer has to generate words to say in 20 microseconds to mimic human-like interaction. Baidu&#8217;s researchers explain:<\/p>\n<blockquote readability=\"10\">\n<p>&#8220;To perform inference at real-time, we must take great care to never recompute any results, store the entire model in the processor cache (as opposed to main memory), and optimally utilize the available computational units.&#8221;<\/p>\n<\/blockquote>\n<p>Still, the researchers believe real-time speech synthesis is possible. They&#8217;ve already created quickly generated samples and collected feedback through <a href=\"https:\/\/www.engadget.com\/2014\/12\/03\/amazon-mechanical-turk-workers-ask-for-respect\/\">Amazon&#8217;s Mechanical Turk<\/a>. They asked a large number of people through the service to rate the quality of their samples, and the results indicate that they&#8217;re of excellent quality.<\/p>\n","protected":false},"excerpt":{"rendered":"<p> <img decoding=\"async\" src=\"http:\/\/o.aolcdn.com\/dims-shared\/dims3\/GLOB\/crop\/3504x2336+0+0\/resize\/1600x1067!\/format\/jpg\/quality\/85\/http:\/\/o.aolcdn.com\/hss\/storage\/midas\/35028999a9b5090014b6bb1f862bb9c6\/202127669\/458699011.jpg\" \/>Baidu has been quietly working on other projects besides self-driving cars at its AI center in Silicon Valley, and now it has revealed one of them to MIT&#039;s Technology Review. Apparently, the Chinese tech titan has created a text-to-speech system call&#8230; <\/p>\n","protected":false},"author":5817,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[23],"tags":[27,59,26],"class_list":["post-335379","post","type-post","status-publish","format-standard","hentry","category-media","tag-engadget","tag-media","tag-technology"],"_links":{"self":[{"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/posts\/335379","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/users\/5817"}],"replies":[{"embeddable":true,"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/comments?post=335379"}],"version-history":[{"count":0,"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/posts\/335379\/revisions"}],"wp:attachment":[{"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/media?parent=335379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/categories?post=335379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/people.utm.my\/asmawisham\/wp-json\/wp\/v2\/tags?post=335379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}