Speech Recognition « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 29, 2018

Processing “Non-Hot Mike” Data (Audio Processing for Data Scientists)

Filed under: Ethics,Politics,Privacy,Speech Recognition — Patrick Durusau @ 6:32 pm

A “hot mike” is one that is transmitting your comments, whether you know the mike is activated or not.

For example, a “hot mike” in 2017 caught this jewel:

Israeli Prime Minister Benjamin Netanyahu called the European Union “crazy” at a private meeting with the leaders of four Central European countries, unaware that a microphone was transmitting his comments to reporters outside.

“The EU is the only association of countries in the world that conditions the relations with Israel, that produces technology and every area, on political conditions. The only ones! Nobody does it. It’s crazy. It’s actually crazy. There is no logic here,” Netanyahu said Wednesday in widely reported remarks.

Netanyahu was meeting with the leaders of Hungary, Slovakia, Czech Republic and Poland, known as the Visegrad Group.

The microphone was switched off after about 15 minutes, according to reports.
…

A common aspect of “hot mike” comments is the speaker knew the microphone was present, but assumed it was turned off. In “hot mike” cases, the speaker is known and the relevance of their comments usually obvious.

But what about “non-hot mike” comments? That is comments made by a speaker with no sign of a microphone?

Say casual conversation in a restaurant, at a party, in a taxi, in a conversation at home or work, or anywhere in between?

Laws governing the interception of conversations are vast and complex so before processing any conversation data you suspect to be intercepted, seek legal counsel. This post assumes you have been properly cautioned and chosen to proceed with processing conversation data.

Royal Jain, in Intro to audio processing world for a Data scientist, begins a series of posts to help bridge the gap between NLP and speech/audio processing. Jain writes:

Coming from NLP background I had difficulties in understanding the concepts of speech/audio processing even though a lot of underlying science and concepts were the same. This blog series is an attempt to make the transition easier for people having similar difficulties. The First part of this series describes the feature space which is used by most machine learning/deep learning models.
…

Looking forward to more posts in this series!

Data science ethics advocates will quickly point out that privacy concerns surround the interception of private conversations.

They’re right!

But when the privacy in question belows to those who plan, fund and execute regime-change wars, killing hundreds of thousands and making refugees out of millions more, generally increasing human misery on a global scale, I have an answer to the ethics question. My question is one of risk assessment.

You?

Comments Off

January 24, 2018

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Filed under: Adversarial Learning,Speech Recognition — Patrick Durusau @ 4:56 pm

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text by Nicholas Carlini and David Wagner.

Abstract:

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

You can consult the data used and code at: http://nicholas.carlini.com/code/audio_adversarial_examples.

Important not only for defeating automatic speech recognition but also for establishing properties of audio recognition differ from visual recognition.

A hint that automatic recognition properties cannot be assumed for unexplored domains.

Comments Off

December 19, 2014

DeepSpeech: Scaling up end-to-end speech recognition [Is Deep the new Big?]

Filed under: Deep Learning,Machine Learning,Speech Recognition — Patrick Durusau @ 5:18 pm

DeepSpeech: Scaling up end-to-end speech recognition by Awni Hannun, et al.

Abstract:

We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a “phoneme.” Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called DeepSpeech, outperforms previously published results on the widely studied Switchboard Hub5’00, achieving 16.5% error on the full test set. DeepSpeech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Although the academic papers, so far, are using “deep learning” in a meaningful sense, early 2015 is likely to see many vendors rebranding their offerings as incorporating or being based on deep learning.

When approached with any “deep learning” application or service, check out the Internet Archive WayBack Machine to see how they were marketing their software/service before “deep learning” became popular.

Is there a GPU-powered box in your future?

I first saw this in a tweet by Andrew Ng.

Update: After posting I encountered: Baidu claims deep learning breakthrough with Deep Speech by Derrick Harris. Talks to Andrew Ng, great write-up.

Comments Off