Jen's WebLog

All things speech and language related.

  • Should we make better mistakes?

    Interesting article on using common sense information in speech recognition:

    http://www.trnmag.com/Stories/2005/032305/Common_sense_boosts_speech_software_032305.html

    Users appear to like their experience better if the result (even if incorrect) is semantically plausible.

  • Blogging, recruiting, and me

    A few weeks ago, I was asked to speak about what blogging means to me at Microsoft. It's part of a women's luncheon for my division and the overall goal is to tell people about different types of community involvement.  It reminded me of how long it has been since I've actually written here.  I'm constantly taking notes of neat Speech/ voice/ linguistic related things on my computer with the intent of later publishing it here. I see this blog as a way to write out my thoughts about these areas and how they relate to the current work that I do.  Almost everyday it seems I find out some other neat tidbit.

    I just returned from a recruiting trip for Microsoft. It's my second campus trip and I cant oxen begin to explain the renewed energy it gives me for my job here at Microsoft and for Microsoft's future. There are some amazing college students out there with a tremendous amount of potential. I'm proud to be able to talk to these individuals about technology, their future and the future of Microsoft. I think it is such an amazing opportunity.

  • Intro

    Better late than never, eh?

    Now that there are more speech bloggers out there, I should probably tell you who I am.  My name is Jen, obviously, and I've been working in speech here at Microsoft for about the past three years.  My primary focus is everything desktop related, but I spend a lot of time out here reading about customer experiences, wants, needs, etc. and will probably focus on that more than anything.  I'd like to see desktop speech in more mainstream use and I'm looking for ways to make that happen.

  • Healthcare and Technology

    http://informationweek.com/story/showArticle.jhtml;jsessionid=EYVCHTVEF1PQKQSNDBCCKHSCJUMEKJVN?articleID=59200094

    CEOs pushing healthcare technology- speech, handwriting & Tablet seem like natural extensions to aid the healthcare community.

  • UK urged to make computers more human

    I ran across this aricle today: http://www.vnunet.com/news/1160755 about the British Computing Society encouraging UK researchers to investigate implementing more human-like behavior in computers.  The article talks some about cognitive processing, invesitgating mental disorders and intelligent robots.

    But, surprisingly, speech was not mentioned.  I personally think that speech is the next natural interface with computers.  But does natural equal human-like?  In my opinion, yes.  But not as the current technology stands.  I think that in order for people to use speech widely on desktop systems that they want an interaction that rivals that of speaking to a human - more on the lines of semantic interpretation.  We need to make speech be exciting for a user, to provide some value.  An alternate way of typing is novel, but not that thrilling - especially given that typing is taught to most students in high school or younger.  For persons with RSI or some other disability, I realize that dictation (good dictation and command and control) may be enough.  But, for the other users, they want more - they want to accomplish something that they can't do otherwise, or accomplish something in a faster, more natural way.  Currently, I don't think there's any SR system on the market that allows for this sort of experience.

    I think we are coming close in embedded applications, but we are limited by memory.  I've seen academic demos of some really amazing applications that do add value to the end user - online recognition and translation for example.  Things like controlling smart homes are intriguing as well.  I know that when I'm laying on the couch I would love to just be able to say "set temperature to 68 degrees" and have it work.  Save me some work and I'll use SR for everything.  That's the message I'm getting from most non-impaired users.

     

  • So much speech, so little time.

    Dear readers, today I have not one, but two articles for you.  I don't have much time to comment, but I'll try soon:

    http://www.physorg.com/news2731.html : Multi-lingual speech based technology

    http://www.vnunet.com/news/1160633: : IBM's more natural sounding tts

     

  • Linguistics and the web.

    So, I'm a linguist at heart.  Give me a sentence and I'll give you a syntactic and/or sematic breakdown for your viewing pleasure.  So, Fil sent around this article today and now I'm excited about linguistic search engines and corpora and everything.  I need a sentence to parse.

    Seriously though, I often think about the relation to SR engines and syntactic / semantic grammars.  I don't know of any engines (aside from research ones) that actually consider straight syntactical information in their evaluation.  What people really want is a semantic engine.  They want to speak to the computer like they talk to you or I.  Who wouldn't?

    If you like things like this as much as me (which is doubtful) then check out these fun places.  I won't tell.

    Original article: http://www.economist.com/science/displayStory.cfm?story_id=3576374

    Linguist's search engine: http://lse.umiacs.umd.edu:8080/ - so you can look for things like <noun phrase> <verb phrase> instead of specific words

    Language Log: http://itre.cis.upenn.edu/~myl/languagelog/ - a blog for linguistics!  My life is complete.

     

     

  • Speech Recognition instead of call centers?

    I was pointed to this article that talks about using AIM for conversations between hearing-imparied and non-hearing imparied individuals.  This allows the two parties to communicate without the use of  TTY terminal.

    Still, there is a call center transcribing the information.  This seems like a really interesting application for speech recognition.  Assuming accuracy was high enough, this would remove the potential awkardness of speaking while there was a transcriber present.  I know there are other things to consider here, but to me, the link seems obvious.

    http://www.betanews.com/article/AOL_MCI_Offer_Phone_Numbers_to_Deaf_with_IM/1102908638 

  • IBM Releases 500 Patents

    IBM has announced that they are releasing 500 of their patents to the open development community.  Speech related ones are below:

    • US6199043 Conversation management in speech recognition interfaces
    • US5649060 Automatic indexing and aligning of audio and text using speech recognition
    • US5636325 Speech synthesis and analysis of dialects
    • US6185529 Speech recognition aided by lateral profile image
    • US5615299 Speech recognition using dynamic features
    • US5615296 Continuous speech recognition and voice response system and method to enable conversational dialogues with microprocessors
    • US5263117 Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer
    • US5222146 Speech recognition apparatus having a speech coder outputting acoustic prototype ranks

    See: http://news.bbc.co.uk/2/hi/technology/4163975.stm and http://www.ibm.com/news/us/en/2005/01/patents.html for a list of patents.


© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker