Welcome to MSDN Blogs Sign in | Join | Help

OK.  I've been MIA from this blog for much too long.  Lame, I know.  Hopefully this entry about a cool Microsoft automotive product with speech capabilities will make up for months of lameness.

Microsoft's collaboration with Italian automaker Fiat was announced awhile back, but the press momentum is building, at least in Europe, where the Blue&Me enabled cars are shipping.  It's too bad Fiat decided not to ship the Alfa Romeo in the States.  Those are pretty sweet looking rides.  And you can talk to it and have it talk back to you in 9 different languages!  There's also a lot of positive buzz at the CeBit trade show and the Geneva Motorshow.

Two of my favorite quotes:

"The most interesting aspect of the new system is the fact that is is more flexible and open than any other available today."  --Quattroruote

"The love affairs between Microsoft and car manufacturers are multiplying.  Today the Fiat Group is the new bride."  -- Le Blog Auto

 

An eggcorn is a new label for a spontaneously reshaped known expression.  

Huh? 

It's when a common expression like "free rein" is semantically re-analyzed and then re-written to reflect the new analysis, like "free reign".  Eggcorns aren't just misspellings.  Nor are they examples of folk etymology (e.g. "Jerusalem artichoke" for "girasole artichoke"), malapropism (e.g. "fortuitous" for "fortunate") or mondegreens (e.g. "Excuse me while I kiss this guy"). 

This website has a database of eggcorns with sample citations and a brief description.  http://eggcorns.lascribe.net/browse-eggcorns/

Some of my favorites are: to besiege someone rather than beseeching them, to get balked down rather than getting bogged down, whorefrost for hoarfrost, rebel rousing instead of rabble rousing, and the self-congratulatory whoa is me for woe is me.  (Some of these may be simple misspellings.)

One of the coolest tools of the Microsoft Speech Server (MSS) platform (and heretofore most underappreciated in my opinion) has got to be the Prompt Engine.  I think this tool is a key differentiator for MSS and deserves more attention.  Managing and using recorded prompts is one of the most tedious, time-consuming aspects of developing speech applications.  As a former VoiceXML developer, I spent countless hours writing hundreds of lines of VoiceXML code like the following: 

<Prompt>                                                           
    <audio src=”you_ordered_a.wav”> You ordered a </audio>
    <audio src=”../numbers/{%1}.wav”> {%1} </audio>
    <audio src=”pizza.wav”> pizza. </audio>
    <audio src=”and_your_telephone_number_is.wav”> And your telephone number is </audio>
    <audio src=”../numbers/{%2}.wav”> {%2} </audio>
    <audio src=”is_that_correct.wav”> Is that correct? </audio>
</Prompt>

Some of the difficulties with this appproach are:

  • You have to verify that each referenced .wav file exists, is named correctly, and is in the right directory.  (As your library of prompts grows, the directory structure to house all the prompts gets more sophisticated.  The more complicated your directory structure, the more difficult it becomes to reference the .wav files.) 
  • You also have to verify that the TTS fallback string is in sync with the recorded prompt.  Dealing with thousands of prompts, there will invariably be missing prompts, mis-named .wav files, and TTS strings that don't match their referenced .wav files.  Debugging these errors can be costly and trying.  Often there is no way to know what a .wav file contains without listening to it.   
  • Deploying the thousands of recorded prompts is painful as well - one corrupted .wav file could stymie the deployment of the rest of the prompts. 

And here're some of the reasons why I think the Microsoft Prompt Engine is such a godsend to speech application developers:

  • With the Microsoft Prompt Engine, you build a database of all your prompts and deploy a single binary file.  No risk of inadvertently losing one or two prompts during deployment. 
  • You don't have to reference each .wav file individually - the Prompt Engine automatically finds the extractions it needs from the right .wav files.  No more relative filepaths, complicated directory structures, or filename mismatches!  In fact, you don't have to have a separate .wav file for each concatenative part of your prompt anymore.  You can extract as many words or phrases from a single .wav file as you'd like.
  • There's no sync problem with the TTS fallback string, because the same string is used for TTS fallback and referencing the recorded prompt. 

The sample prompt above is much simpler on MSS with the Prompt Engine: 

sPrompt = “You ordered a <div/> ” + {%1} + “ <div/> pizza <div/> and your telephone number is <div/> ” + {%2} + “.  <div/> Is that correct?”

(The <div/> tags specify which phrases (extractions) to look for in the prompt database.)

Of course, keeping track of which extractions you have in the database can be just as complicated as keeping track of which .wav files you've recorded.   No problem!  With the Prompt Validator, you simply enter the text you want your application to speak, and you can find out instantly which words or phrases you're missing.  In the sample screen below, you'll see that "large" is spoken in TTS.

Prompt Validation results

A bit long-winded, I know, but I think the MSS Prompt Engine is really cool.  If you're a speech app developer and you haven't tried it out yet, check it out!

I recently attended a wedding where, rather than using expensive, stamped RSVP cards, my friends Boots and Evelyn had their guests communicate their replies online.  While some may balk at the non-traditional, arguably non-romantic method of tallying the number of wedding guests, the advantages are obvious:

  • no postage ($0.37 x hundreds of guests = $$$ that could be spent on more wine at the reception)
  • no expensive stationery (again, more wine! AND you save trees)
  • no fear of lost replies in the mail
  • immediate tally of guests
  • no manual counting of hundreds of little reply cards
  • you can check the tally from any connected computer

Their web-based reservation system got me thinking that this would make a decent speech application.  Imagine a Wedding.com or some such outfit deploying a speech application that lets people search for a wedding by date, city-state, or name.  (If the number of weddings got to be so large that accuracy became a problem, the grammars could be constrained by month, state, etc.)  The application could then offer any or all of the following:

  • general information on the wedding: when, where, etc.
  • allow guests to RSVP - guestlists could be uploaded ahead of time to create the grammar.  If that doesn't work, guests can record their names.  The application could solicit whatever information is necessary: number of guests, choice of entree (btw, never get the chicken at weddings).
  • gift registries (sometimes you're out running errands when you remember you need to pick up a wedding gift, but you can't remember where your friends are registered.)  If the registry stores had their own speech-based retail applications, you could even purchase the wedding gift by phone!
  • audio "guest book" that lets guests record short little messages for the couple

Here's the clincher - the application could call those guests who haven't replied by a certain date and get the RSVP information.  The reminder call can begin with a brief message recorded by the couple: "We really hope you can make it to Seattle next month for our wedding.  We're very excited to see you!"  Now calling people on the phone is pretty intrusive, especially when the calling party is an automated system, but I suspect most people would be happy to get such a reminder call.

Have I drunk too much of the Speech Kool-Aid for thinking this would be a cool, useful application?

Further consolidation in the speech space.  See the SpeechTech Magazine announcement for details on an IV conference call to discuss the acquisition.

At the risk of sounding like a dork, here's something that I think is funny.  (Beware the "humor" of a linguist!) [:)]

"This book fills a much needed void..."

At first glance this statement seems to be a compliment, until the brain finishes parsing the sentence and realizes that it's actually an insult - a scathing one at that.  At best, the sentence dismisses the book as being unnecessary.  It implies that the book, by its mere existence, is a disservice to the world at large.  What is "much needed" in the sentence above is the void, not the book. 

That's why I think it's so funny that the expression "fill a much needed void" has come to be so commonly misused today.  An MSN search on "fills a much needed void" and its various forms ("filled a much needed void", "filling a much needed void", etc.) returns nearly two thousand matches, the vast majority of which misuses this expression.  Devices, books, academic disciplines, illegal aliens, the venerable Head Start program, and just about anything in the world that someone would find indispensible are all subjected to this insult by some of their most ardent fans:

  • "The iTop is a very clever device and fills a much needed void in the vast iPod accessory market."
  • "This is a great book that fills a much needed void in the psychological and legal literature."
  • "The area of Clinical Rehabilitation Psychology fills a much needed void."
  • "...illegal immigrants fill a much needed void in the labor pool (cheap labor)..."
  • "...the Head Start programs fill a much-needed void for those who can’t afford typical childcare."

Even Microsoft products are fair game:

  • "Microsoft has recently deprecated the Jet engine, so VistaDB appears to be poised to fill a much needed void."

Geoffrey Moore's Crossing the Chasm is a popular book from 1999 that makes the argument that high-tech products have the unique challenge of needing to "cross the chasm" from the technically savvy to mainstream audiences if they are to be successful.  One of my favorite assessments of the speech space so far (from an analyst) is that the speech industry acts as though it's already crossed the chasm.  The data would prove otherwise, and I often wonder if the industry wouldn't benefit from a little more honesty and a little less bravado. 

OK, for my first blog entry that might come across a tad negative, but I'm a firm believer and advocate of speech.  I just think we should stop and take stock of where in the chasm we currently are.  We've collectively annoyed a lot of users to date, and speech is NOT a given for the majority of people.  I think acknowledging these facts will make it easier to cross the chasm... for real.

Thoughts?

 
Page view tracker