US20020072914A1 - Method and apparatus for creation and user-customization of speech-enabled services - Google Patents

Method and apparatus for creation and user-customization of speech-enabled services Download PDF

Info

Publication number
US20020072914A1
US20020072914A1 US09/732,600 US73260000A US2002072914A1 US 20020072914 A1 US20020072914 A1 US 20020072914A1 US 73260000 A US73260000 A US 73260000A US 2002072914 A1 US2002072914 A1 US 2002072914A1
Authority
US
United States
Prior art keywords
exemplar
natural language
action
variant
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/732,600
Inventor
Hiyan Alshawi
Shona Douglas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Priority to US09/732,600 priority Critical patent/US20020072914A1/en
Assigned to AT&T CORP. reassignment AT&T CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ALSHAWI, HIYAN, DOUGLAS, SHONA
Priority to EP01310087A priority patent/EP1215657A3/en
Priority to US10/103,049 priority patent/US7212964B1/en
Publication of US20020072914A1 publication Critical patent/US20020072914A1/en
Priority to US11/215,756 priority patent/US7912726B2/en
Priority to US11/656,155 priority patent/US7467081B2/en
Priority to US12/336,429 priority patent/US8073683B2/en
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AT&T INTELLECTUAL PROPERTY II, L.P.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition

Definitions

  • This invention relates generally to speech recognition technology. More particularly, the invention relates to development and customization of spoken language interfaces for a plurality of speech-enabled systems and sub-systems.
  • Speech recognition technology has been applied to a variety of interactive spoken language services to reduce costs.
  • Services benefiting from a spoken language interface may include, for example, services providing products and/or services, e-mail services, and telephone banking and/or brokerage services.
  • Speech-enabled systems permit users to verbally articulate to the system a command relating to desired actions. The speech-enabled system recognizes the command and performs the desired action.
  • the underlying technologies utilized in such speech-enabled systems include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
  • speech recognition and speech synthesis technologies include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
  • computer-telephony integration includes, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
  • dialog and response generation technologies include, for example, voice recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies. The role of each technology in speech-enabled systems is described below briefly.
  • speech recognition technology is used to convert an input of human speech into a digitized representation.
  • speech synthesis takes a digitized representation of human speech or a computer-generated command and converts these into outputs that can be perceived by a human—for example, a computer-generated audio signal corresponding to the text form of a sentence.
  • Known computer-telephony integration technology is typically used to interface the telephony network (which may be switched or packed-based) to, for example, a personal computer having the speech recognition and speech synthesis technologies.
  • the computer-telephony platform can send and receive, over a network, digitized speech (to support recognition and synthesis, respectively) to and from a user during a telephony call.
  • the computer-telephony integration technology is used to handle telephony signaling functions such as call termination and touch-tone detection.
  • Language interpretation systems convert the digitized representation of the human speech into a computer-executable action related to the underlying application and/or service for which the spoken language interface is used—for example, a speech-enabled e-mail service.
  • the dialog and response generation systems generate and control the system response for the speech-enabled service which may correspond to, for example, the answer to the user's question, a request for clarification or confirmation, or a request for additional information from the user.
  • the dialog and response systems typically utilize the speech synthesis systems or other output devices (e.g., a display) to present information to the user.
  • the dialog and response generation component may be responsible for predicting the grammar (also called the “language model”) that is to be used by the speech recognizer to constrain or narrow the required processing for the next spoken input by the user. For example, in a speech-enabled e-mail service, if the user has indicated the need to retrieve messages, the speech recognizer may limit processing for possible commands relating to retrieving messages the user may use.
  • semantic representations are computer data structures or code intended to encode the meaning of a sentence (or multiple sentences) spoken by a user (e.g., in a language interpretation system), or to encode the intended meaning of the system's response to the user (e.g., in a dialog and response generation system).
  • Various types of such intermediate semantic representations are used including hierarchically embedded value-attribute lists (also called “frames”) as well as representations based on formal logic.
  • the language interpretation component converts the recognized word sequence (or digitized representation) into an instance of the intermediate semantic representation.
  • Various means have been used for this conversion step, including conversion according to rules that trigger off of keywords and phrases, and conversion according to a manually written or statistically trained transition network.
  • the resulting intermediate representation is then mapped into the actual executable application actions. This second conversion phase is often achieved by an exhaustive set of manually authored rules or by a computer program written specifically for this spoken language application.
  • This approach requires programming experts familiar with the speech-enabled interfaces and programming experts familiar with the underlying application programs. As a result, speech-enabled interfaces using this approach can be very expensive to develop and/or customize.
  • Another conventional method uses customized software modules for interfacing with the language interpretation system to determine which application-specific action to execute for a given recognized input sequence.
  • customized software modules need to be developed for each application and for handling the various application-specific commands.
  • this conventional approach for developing speech-enabled interfaces can be costly due to increased development times.
  • What is needed is a system and method for creating and customizing speech-enabled services that may solve the difficulties encountered using conventional approaches. For example, what is needed is an efficient speech-enabled interface that is not only robust and flexible, but can also be easily customized by users so that personal language preferences can be used.
  • Embodiments of the invention relate to a system and method for providing speech-enabled application programs.
  • the speech-enabled programs automatically execute requests input by users.
  • One or more natural language variants may be mapped with at least one natural language exemplar.
  • the natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program.
  • the natural language variant may correspond to an alternative way of expressing the same request.
  • a recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants.
  • the natural language exemplar mapped to the prospective variant is identified.
  • An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request.
  • users of the system can create a plurality of personalized natural language variants that represent preferred ways of expressing the desired requests. Accordingly, the system may be able to recognize the plurality of variants and execute the action as specified by the user's request.
  • FIG. 1 is a diagrammatic representation of a system in accordance with embodiments of the present invention.
  • FIG. 2 is block diagram illustrating a system in accordance with an embodiment of the present invention.
  • FIG. 3 is a flow chart illustrating a method in accordance with an embodiment of the present invention.
  • FIGS. 4A and 4B show a flow chart illustrating an exemplary method in accordance with an embodiment of the present invention.
  • FIG. 5 is a diagrammatic representation of a customization module for use in the system as shown in FIG. 1.
  • Embodiments of the present invention relate to the creation, development and customization of spoken language interfaces for a plurality of speech-enabled services.
  • the invention provides a natural language interface to permit programmers and users to create and/or customize spoken language interfaces.
  • the invention may provide an efficient and cost-effective way of developing spoken language interfaces that can be easily adapted to different systems or services—for example, messaging systems, auction systems, or interactive voice recognition (IVR) systems.
  • the spoken language interface can be easily customized by end users based on their personal preferences and speech habits.
  • Embodiments of the present invention may use natural-language to natural-language mapping between user-specified commands and commands specified by, for example, an application program developer.
  • Embodiments of the present invention provide an efficient system for executing a plurality of user commands that may map to a finite number of executable actions as specified by the program developer. Accordingly, the program developer may need only specify a finite number of exemplary English (or other language) commands that may be related to application actions. These exemplary English commands may be mapped with a plurality of English variations that a user may use for the desired action. The user can customize the English variations to create preferred commands to execute a desired action.
  • FIG. 1 a block diagram of a speech-enabled system in accordance with embodiments of the present invention is shown.
  • User 101 may use terminal device 102 for access to the application program 115 .
  • the terminal device 102 may be, for example, a personal computer, a telephone, a mobile phone, a hand-held device, personal digital assistant (PDA) or other suitable device having suitable hardware and/or software to connect with network 120 and access application program 115 .
  • Terminal device 102 may be installed with suitable hardware and software, for example, an Internet browser and a modem for connection to the Internet.
  • Network 120 may include, for example, a public switched telephone network (PSTN), a cellular network, an Internet, an intranet, satellite network and/or any other suitable national and/or international communications network or combination thereof.
  • PSTN public switched telephone network
  • Network 120 may include a plurality of communications devices (e.g., routers, switches, servers, etc.) including at least one computer telephony platform 121 .
  • Platform 121 may be a high-capacity computer and/or server that has the capacity to send, receive, and/or process digitized speech (e.g., to support speech recognition and synthesis functions).
  • Platform 121 may be equipped to interface with a switch-based or packet-based network 120 .
  • platform 121 may be equipped with telephony interface to handle telephony signaling functions such as call termination and touch tone detection.
  • Platform 121 may be located within the network 120 or, alternatively, it may be located outside network 120 .
  • Platform 121 may serve as a gateway interface to spoken language processor (SLP) 104 .
  • Platform 121 may receive data from terminal device 102 and dispatch this information to SLP 104 .
  • platform 121 may dispatch data from SLP 104 to the terminal device 102 .
  • SLP 104 may be coupled to network 120 to provide a speech-enabled interface for an application programming interface (API) 114 and corresponding application or service 115 .
  • Application program 115 may support services or systems, for example, messaging systems, auction systems, interactive voice recognition (IVR) systems, or any other suitable system or service that may utilize a spoken language interface for automation.
  • API 114 may be a software interface that application 115 may use to request and carry out lower-level services performed by a computer's or telephone system's operating system.
  • An API 114 may include, for example, a set of standard software interrupts, calls, and data formats used by application program 115 to interface with network services, mainframe communications programs, telephone equipment or program-to-program communications.
  • SLP 104 may include a plurality of components, for example, an output synthesizer 105 , recognizer 106 , variation matcher 107 , variant database 108 , exemplar adjuster 110 , action invoker 111 and context specifications database 112 . It is recognized that output synthesizer 105 may provide data that can be presented to user's terminal device 102 .
  • Output synthesizer 105 and recognizer 106 are known per se.
  • Output synthesizer 105 may be a speech synthesizer or display formatter for delivering information to the user 101 .
  • the display formatter may produce data suitable for presentation on any physical display, for example, a cathode ray tube (CRT), liquid crystal display (LCD), flat plasma display, or any other type of suitable display.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • flat plasma display or any other type of suitable display.
  • any suitable speech synthesizer that can take unrestricted text or digital data as input and convert this text or data into an audio signal for output to the user 101 may be used in embodiments of the invention.
  • Recognizer 106 may receive a natural language request in the form of, for example, an audio or analog signal S from user 101 and may convert this signal into a digitized data string or recognized word sequence, W.
  • Signal S may be converted by the terminal device 102 and/or platform 121 to travel across network 120 and is delivered to recognizer 106 .
  • Digitized data string W may represent the natural language request S in the form of a digital signal as output by recognizer 106 .
  • W may be a sequence of words in text form.
  • Recognizer 106 may use any known process or system to convert signals S into a data string W.
  • recognizer 106 can load and switch between language models dynamically.
  • the English language is referred to herein as the spoken language for use with the speech-enabled services.
  • the present invention can be applied to spoken language interfaces for other languages.
  • the user's terminal device 102 may include a handwriting recognition device, keyboard, and/or dial pad the user 101 may use to input a command and generate signal S.
  • the generated signal S may be delivered to the recognizer 106 and processed as described above.
  • variation matcher 107 may use variation database 108 and a variation matching function (not shown) to map the digitized data string W (or the recognized word sequence) into an exemplary English sentence E (i.e., an exemplar).
  • the exemplar E may correspond to a typical way, as defined by an applications developer, of phrasing a particular request relevant to the current application program 115 .
  • the variation matcher 107 may further compute a string mapping function ⁇ that may indicate the difference in meaning between the recognized digitized data string W and the exemplar E.
  • variant database 108 may contain a language model (L) 130 related to a particular context and/or the application program or service 115 , currently accessed by the user 101 .
  • the language model 130 may be derived from the plurality of variant command files 109 using techniques known in the art for generating speech recognition language models from collections of sentences.
  • each file 109 may be pertinent to particular context C corresponding to language model 130 .
  • variant database 108 may contain a single language model 130 relating to a particular application program or, alternatively, may contain a plurality of language models 130 relating to various application programs.
  • Variant command file 109 for context C may contain, for example, an exemplar E 1 related to context C and associated variants V 1 1 to V 1 n .
  • exemplar E 1 related to context C
  • variant database 108 may store a set of related data of the form (C, V, E), where each V is an alternative way to phrase in natural language a request for the action A that is associated with exemplar E in context C. Since exemplars E may also be valid ways of phrasing application actions A, they are included in the variant database 108 as “variants” of themselves.
  • a set of exemplars E 1 to E m associated with the particular context C of an application program may be provided by the developer of the spoken language interface or of the application program.
  • the developer may be an expert in the application API 114 .
  • the developer need not be an expert in speech-enabled services.
  • Each exemplar E may represent an exemplary way of phrasing, in English or any other suitable language, a particular executable command or action for context C (as will be discussed below in more detail).
  • the developer may map exemplar E to action A.
  • each file 109 may contain a plurality of English variants V 1 1 ⁇ V m n .
  • Variants V 1 1 ⁇ V 1 n may represent different ways of saying or representing corresponding exemplar E 1 ; variants V 2 1 ⁇ V 2 k may represent different ways for saying exemplar E 2 ; etc.
  • These variants V 1 1 ⁇ V m n may be created by anyone without requiring any relevant expertise or knowledge of the application program and/or speech-enabled technologies.
  • the user 101 may create variant V 1 1 that represents the manner in which the user 101 typically refers to the desired action represented by exemplar E 1 .
  • the created variant(s) V 1 1 ⁇ V 1 n may be mapped to its associated exemplar E 1 for a particular context C, for example, in the form (C, V, E), as indicated above.
  • C may correspond to the context of reading e-mail messages
  • E 1 may be the exemplar “Retrieve my mail messages”
  • the variants may include, for example, V 1 1 “get my mail”, V 1 2 “fetch my e-mail”, V 1 3 “fetch messages”.
  • context specifications database 112 may contain a set of exemplar action specification files 113 for one application program or a plurality of different application programs.
  • Exemplar action files 113 may correspond and/or relate to variant files 109 .
  • the variants in a variant file 109 may be used to express the actions A in a corresponding action file 113 , and A may be available for execution by action invoker 111 .
  • exemplar-action specification file 113 may contain a plurality of contexts C 1 ⁇ C m , a plurality of associated exemplars E 1 ⁇ E m , associated actions A 1 ⁇ A m , and pointer to a next context C′ ⁇ C x . Accordingly, each exemplar-action specification file 113 may contain a list of “exemplar-action” records stored or correlated as (C, E, A, C′). Each record (C, E, A, C′) may associate the exemplar E with a sequence A of action strings in the command language executable by the action invoker 111 in context C, and an identifier C′ of another, or the same, application context specification.
  • each exemplary action specification file in the set of files 113 may correspond to a stage of interaction with the user.
  • the application program is a speech-enabled e-mail service
  • the first action specification file 113 may contain actions relating to logging on, or identification of the user to the service
  • a second action specification file 113 may contain actions relating to sending or retrieving e-mail messages.
  • action specification file 113 related to actions required to identify the user may be activated, followed by activating the action specification file for, for example, retrieving e-mail messages.
  • the second action specification file may contain exemplars E and associated actions A relating to retrieving messages, for example, retrieving new messages, retrieving previously read messages, etc.
  • a language model L for use by the speech recognizer may be built for each context, based on the variants specified for that context. These models may be augmented with lists of proper names that may be used instead of those present in the exemplars E and variants V. Standard techniques for language modeling can be used for deriving the language models from the set of variants.
  • variant database 108 and context specification database 112 are shown as two different databases, it is recognized that variant database 108 and context specification database 112 may be consolidated into a single database. It should be noted that descriptions of data flow and data configuration in databases 108 and 112 are given by way of example and variations may be made by one of ordinary skill in the art. For example, variations to the variant command files 109 or configuration or flow of the included data (e.g., C, V, E) and/or to the exemplar action specification files 113 may be made by one of ordinary skill in the art.
  • recognizer 106 may produce a digitized data string W that matches exemplar E or variant V, stored in the variant database 108 , exactly and the system can then proceed with invoking the corresponding application action A.
  • an exact match between data string W and a corresponding exemplar E or variant V may not be found.
  • recognition errors, requests for actions involving different objects (e.g., using different names) from those in the exemplars E or variants V, linguistic variation in the user utterances (including variants from their own customizations) and/or any combination of variations thereof may prevent exact matches from being found.
  • variation matcher 107 may seek to select a prospective variant V, in active context C, that most resembles, or most closely matches, the natural language request as represented by digitized data W.
  • Variation matcher 107 may also specify the necessary changes or adaptations (i.e., string mapping function ⁇ ) to be made to, for example, variation matcher.
  • Any known technique may be used to determine whether, for example, a given text or data sequence (e.g., a prospective variant) most resembles or closely matches the recognized word sequence. For example, known mathematical algorithms (as described below) may be applied to find such matches.
  • exemplar adjuster 110 may receive the exemplar E and string mapping function ⁇ from the variation matcher 107 .
  • Exemplar adjuster 110 with input from context specifications database 112 may apply the string mapping function ⁇ to an application action A (an API call) that is paired with the exemplar E (e.g., from the context specifications database) to produce the actual API call or adapted action A′.
  • Adapted action A′ may then be executed by the action invoker 111 to carry out the user's request.
  • Exemplar adjuster 110 may apply necessary adaptations to the action strings A to be invoked by the application and to the exemplar E (e.g., for confirmation purposes).
  • variation matcher 107 may compute a function f taking an input W and a sequence ⁇ (V 1 , E 1 ), . . . , (V n , E n )> of pairs of strings.
  • the output of f may be one of the input pairs ⁇ (V 1 , E 1 ) ⁇ together with a string mapping function ⁇ that is:
  • the selected pair (V 1 , E 1 ) may be the first pair in the input sequences for which a string distance function ⁇ is minimal:
  • String mapping function ⁇ may include a sequence of string editing operations, specifically insertions, deletions, and substitutions.
  • Exemplar adjuster 110 may fetch the action Ai associated with the exemplar Ei, where i may be any integer from 1 to m.
  • a second string mapping function ⁇ ′ may be derived from ⁇ , including only those string editing operations that are valid transformations of the action string Ai.
  • a valid transformation may be one that results in an action string A that is well formed in the sense that it is parsed successfully by the action invoker 111 .
  • Second string mapping function ⁇ ′ is then applied to both sides of the selected pair by the exemplar adapter 110 to produce the “adapted” pair ⁇ (E′i, A′i) ⁇ .
  • the string distance ⁇ is the string edit distance and ⁇ is the corresponding edits found by the dynamic programming algorithm used to compute the minimal edit distance.
  • Such edit-distance computation algorithms are known in computer science and have been used in various applications such as document search and evaluating the outputs of speech recognition systems.
  • language and action strings may both be treated as sequences of tokens (typically words in the case of language strings).
  • Edit-distance functions rely on a table of token distances for use when comparing tokens.
  • Token distances can be uniform (e.g., two words that are different have a token distance of 1 and identical tokens have a distance of 0 ).
  • token distances can be provided in the form of a table that reflects the closeness in meaning between any two words.
  • edit-distance matching may be used in conjunction with a natural language generation component.
  • Natural language generators are known per se. Natural language generators may be used to apply linguistic principles to generate a set of paraphrases, or close paraphrases, of English sentences. Such linguistic principles include syntactic transformations (e.g., the active-passive transformation) and paraphrases based on lexical semantics (e.g., “A sells X to B” is the same as “B buys X from A”).
  • a natural language generator may first be used to produce paraphrases of each of the variants present in a context. This may result in an expanded set of variants for the context to which edit-distance matching may then be applied as indicated above.
  • natural language generators may be used to automatically generate at least one variant V by generating paraphrases of an exemplar E.
  • variation matcher 107 Although only two embodiments of a variation matcher 107 have been described, it is recognized that alternative techniques may be applied in the variation matcher. For example, any suitable method that can measure the difference in meaning between two sentences and represent that difference as a string mapping function can be used as the basis for a variation matcher.
  • the action invoker 111 may be a command string interpreter capable of executing dynamically generated strings (e.g., method calls and database query requests) corresponding to actions in the API for the application.
  • the command interpreter may execute scripting languages (e.g., TCL), or procedure calls for languages with reflection (e.g., Java), or database query languages (e.g., SQL).
  • the exemplar adjuster 110 can ask user 101 for confirmation that adapted exemplar E′ may express the action that is desired by the user 101 . If the user confirms positively, the action invoker 111 may dispatch adapted action A′ to API 114 . Application program 115 may execute the dispatched action and return the resulting output O′ to the action invoker 111 . The session manager 103 may present output O′ to the user via output synthesizer 105 .
  • Session manager or controller 103 may be coupled with SLP 104 and may manage the plurality of components within the SLP 104 .
  • session manager 103 may provide data flow control for the various components of SLP 104 during a speech-enabled session.
  • the session manager 103 may maintain an active context C.
  • There may be an initial context specification associated with each program application.
  • Each context specification may be associated with a collection of variants in the variant database 108 .
  • session manager 103 is shown external to SLP 104 , it is recognized that alternatively session manager 103 may be incorporated within SLP 104 .
  • FIG. 2 is a component-level block diagram of a spoken language processing system 200 in accordance with an embodiment of the present invention.
  • the spoken language processing system 200 may be used as the speech-enabled interface for a desired service 210 .
  • a user or customer may, for example, input command S to be executed by the service 210 .
  • the user may input command S using terminal device 102 .
  • the user may articulate a spoken command into a microphone of, for example, the terminal device 102 (e.g., a telephone, PC, or other communication device).
  • the terminal device 102 may include handwriting recognition system, a dial-pad, a touch screen or keyboard or other input device that the user 101 may use to input command S.
  • a recognized input string W may be generated by the speech recognizer 106 .
  • the recognized input string W may be in the form of digitized data that represents a command (S) input by a user.
  • the recognizer may be located internal to or external to the natural language processing system 200 .
  • the recognizer 106 may be coupled to a processor 203 located in the spoken language system 200 of the present invention.
  • the processor may perform the functions of, for example, variation matcher 107 , exemplar adjuster 110 , action invoker 111 , and/or perform other processing functions that may required by the system 200 .
  • the processor 203 may process the command S that is input by the user to generate recognized input string W.
  • Processor 203 may be coupled to a memory 204 and controller 202 .
  • the memory 204 may be used to store, for example, variant database 108 , context specification database 112 , and/or any other data or instructions that may be required by processor 203 and/or controller 202 . It is recognized that any suitable memory may be used in system 200 .
  • the databases 108 and/or 112 may be organized as contexts related to the desired service. Accordingly, depending on the service accessed or the stage of service, the processor may load the proper context.
  • processor 203 may use the variation database and a variation matching function to map the recognized input string W into an exemplary natural language exemplar E, stored in the variant database 108 in memory 204 .
  • the exemplar E may correspond to a typical way of phrasing a particular request relevant to the current application.
  • At least one natural language exemplar E may correspond to one or more natural language variants V.
  • These natural language variants V may represent alternative ways to express exemplar E.
  • These variants may also be stored in the variant database 108 and may be created by, for example, the user, application programmer, and/or speech interface developer.
  • processor 203 may select, from the one or more natural language variants V, a prospective variant that most resembles or closely matches the recognized word sequence using any known technique for matching as described above. After the selection is made, the corresponding natural language exemplar E may be identified.
  • the processor may identify an application action A (API call) corresponding to the exemplar E.
  • Action A and corresponding exemplar(s) may be stored in, for example, context specification database 112 stored in memory 204 .
  • controller 202 may cause the action A to be invoked by service 210 .
  • the processor 203 may also generate string mapping function ⁇ .
  • String mapping function ⁇ may specify the difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V.
  • the processor 203 may then apply the string mapping function ⁇ to the application action A that corresponds with the exemplar E, to produce the actual API call or adapted action A′.
  • the controller 202 may cause the actual API call A′ to be executed by the service 210 to carry out the user's request.
  • the processor may apply the string mapping function ⁇ to the exemplar E to produce an adapted exemplar E′.
  • the adapted exemplar E′ may be presented to the user via output synthesizer 105 .
  • the user may be asked to confirm whether the action desired by the user may be expressed by exemplar E or adapted exemplar E′. If the user accepts E or E′, the controller 202 executes action A or adapted action A′, respectively. If the user does not accept E or E′, then the processor 203 may continue processing the recognized input string W, as described above, until the user's request has been carried out. In alternative embodiments, if the user does not accept E or E′, the controller may ask the user to rephrase their request.
  • Application program 210 may execute the action A or adapted action A′ and return the resulting output O′ to the controller 202 .
  • the controller 202 may present output O′ to the user's terminal device 102 via output synthesizer 105 .
  • User 101 may access SLP 104 of a speech-enabled service in accordance with the present invention ( 301 ).
  • Session manager 103 may cause speech recognizer 106 to load (or switch to) the language model L for the active context related to the application program serviced by the SLP 104 ( 302 ).
  • the user 101 may be presented with a greeting via output synthesizer 105 , and the user may respond by articulating a command into an input of terminal device 102 ( 303 ).
  • the speech recognizer 106 may receive input S and produce an output data string W ( 304 ).
  • the output data string W may be a transcription hypothesis of the user's command.
  • Variation matcher 107 is applied to W to select an exemplar E from the active context C and to construct a string-mapping function ⁇ ( 305 ).
  • the exemplar adjuster 110 applies the string-mapping function ⁇ in order to construct an adapted exemplar E′ and an adapted executable action A′ ( 306 ).
  • the system asks the user for confirmation to proceed with the sequence of actions A′ by presenting to the user (via the output synthesizer 105 ) the English expression E′ ( 307 ) and asking user 101 whether the adapted action A′ as expressed by the adapted exemplar E′ is desired ( 308 ).
  • the session manager passes the adapted action A′ to the action invoker which executes the action A′ and returns any resulting output O′ to the user via the output synthesizer 105 ( 309 ).
  • the session manager may send this output (or a summary of it as appropriate) to the speech synthesizer or display.
  • the active context for handling the next request by the user is changed to the context C′ associated with E in C, ( 310 ).
  • step 308 the speech recognizer produces another output string W based on the command ( 304 ).
  • the speech recognizer 106 may produce another output string W that may be different from the previously created W.
  • the variation matcher 107 may receive another output string W or may receive the same output string W and the variation matcher 107 may select another exemplar E′ and mapping function ⁇ ′.
  • the system may, for example, re-execute steps 306 through 308 to construct an adapted action A′ and adapted exemplar E′ that is desired by the user.
  • the controller may ask the user to rephrase their request.
  • FIGS. 4A and 4B show a flow chart applying embodiments of the present invention to an exemplary speech-enabled e-mail service.
  • a user may desire to retrieve e-mail messages and may log on via the Internet or call the speech-enabled e-mail service.
  • the user may articulate speech into a microphone of, for example, the terminal device 102 or a telephone or other communication device (not shown).
  • the controller 103 may load the active context language model for the speech-enabled e-mail service from variant database 108 of SLP 104 .
  • the user's input may be converted to an electrical signal that is passed to the recognizer as an input command S.
  • Input command S may be, for example, “Was there anything new from Kathleen?” ( 401 ).
  • Recognizer 106 may convert the command S into an output string W which may be interpreted as “Is there any thing few from Kathleen?” ( 402 ).
  • the recognizer 106 may be susceptible to errors depending on the clarity of the input or other external or internal variations; thus, for example, “new” may be interpreted as “few” by recognizer 106 .
  • Variation matcher 107 takes the string W and attempts to find a suitable match from the variant database 108 .
  • Variation matcher may retrieve a stored variant V “Is there anything from Joe?” ( 403 ).
  • the variant matcher 107 may retrieve exemplar E (e.g., Get the messages with sender Joe) that is associated with variant V of step 403 ( 404 ).
  • Variant matcher 107 may construct a string mapping function ⁇ , that expresses the difference between output string W of step 402 and variant V of step 403 ( 405 ).
  • String mapping function ⁇ indicates the insertion of the word “few” and the substitution of the word “Joe” by “Kathleen” ( 405 ).
  • various known techniques may be implemented to determine string-mapping function ⁇ .
  • the action A of step 406 is an exemplary action expressed as line of code that the application program understands and may be able to execute. It is recognized that the line of code for action A is given by example only and that many different expressions can be written.
  • a subset of string mapping function ⁇ as applicable to action A ( ⁇ A ) is generated and may be applied to action A ( 407 ).
  • Adapted exemplar E′ may generate, for example, “Get the messages with sender Kathleen” ( 409 ).
  • the adapted exemplar E′ may be presented to the user; and if the user confirms that the user desires the adapted action A′ as expressed by the exemplar E′, the adapted action A′ may be executed by the API 114 of application program 115 . Accordingly, messages from Kathleen, for example, may be presented to the user via output synthesizer 105 .
  • Embodiments of the present invention may permit users to customize the variant database 108 so that they can create variants that closely represent the manner in which the user would articulate a particular action.
  • FIG. 5 shows a block diagram showing a speech customization module 500 in accordance with embodiments of the present invention.
  • the customization module 500 may be used to add personalized variants relating to stored exemplars E in variant database 108 .
  • Users 101 may use, for example, a known web browser 502 to access context customizer 503 . Although a web browser is shown, a user may use a telephone or other suitable device to access context customizer 503 .
  • Context customizer 503 may be coupled to variant database 108 and customizer server 501 .
  • Users of the system 100 may access the generic context files Cg 109 stored in variant database 108 and create customized content files 504 stored in a customization server 501 .
  • Generic context files Cg 109 may contain, for example, context identifier C, a variant V and corresponding exemplar E.
  • Customization server 501 may contain customized context files 504 for a plurality of users U 1 -UN. Each customized file 504 may contain personalized context containing personalized variants (e.g., V 1 1 , V 1 2 to V m n ) personal to the user.
  • User U 1 may create one or more variants V corresponding to, for example, exemplar E.
  • the user 101 may customize files 504 to reflect this preference. It is recognized that any language—for example, French, Spanish, etc. may be used in embodiments of the present invention.
  • user U 1 may customize a context Cu 1 , adding to the variants associated with C in the user's U 1 personal variant database file 504 by composing natural language requests V and associating them with natural language requests or lists of requests E which are taken from the exemplars E associated with context C.
  • the customization module 500 may permit a user to create and edit natural-language to natural-language (e.g., English-to-English) customization files stored, for example, on a server 501 using a standard HTTP browser.
  • User U 1 may be authenticated by the customization module 500 using known techniques and choosing an application, and within that a context C, to customize.
  • user may construct pairs of the form “When I say V 1 , I mean E′′ by choosing an exemplar E from among the available exemplars in C and entering a personalized variant V 1 to be associated with that exemplar.
  • the resulting variants may be uploaded into variant database 104 in the form: (U 1 , V 1 , E, C), indicating that the customized variant V 1 belongs to user U 1 and is related to exemplar E in context C. Accordingly, when the user U 1 uses system 100 of FIG. 1, the customized context will be available to the user U 1 including the customized variants in addition to any variants that may already be present in the database for all users. In embodiments of the present invention, for subsequent customizations, the user may be presented with their own custom version of any context they have customized in the past. Additionally, users may be able to revert back to the generic context Cg when desired.

Abstract

Embodiments of the invention relate to a system and method for providing speech-enabled application programs. The speech-enabled programs automatically execute requests input by users. One or more natural language variants may be mapped with at least one natural language exemplar. The natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program. The natural language variant may correspond to an alternative way of expressing the same request. A recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants. The natural language exemplar mapped to the prospective variant is identified. An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request.

Description

    FIELD OF INVENTION
  • This invention relates generally to speech recognition technology. More particularly, the invention relates to development and customization of spoken language interfaces for a plurality of speech-enabled systems and sub-systems. [0001]
  • BACKGROUND OF THE INVENTION
  • In recent years, the desire to use speech-enabled systems has increased. Speech recognition technology has been applied to a variety of interactive spoken language services to reduce costs. Services benefiting from a spoken language interface may include, for example, services providing products and/or services, e-mail services, and telephone banking and/or brokerage services. Speech-enabled systems permit users to verbally articulate to the system a command relating to desired actions. The speech-enabled system recognizes the command and performs the desired action. [0002]
  • Typically, the underlying technologies utilized in such speech-enabled systems include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies. The role of each technology in speech-enabled systems is described below briefly. [0003]
  • As is known, speech recognition technology is used to convert an input of human speech into a digitized representation. Conversely, speech synthesis takes a digitized representation of human speech or a computer-generated command and converts these into outputs that can be perceived by a human—for example, a computer-generated audio signal corresponding to the text form of a sentence. [0004]
  • Known computer-telephony integration technology is typically used to interface the telephony network (which may be switched or packed-based) to, for example, a personal computer having the speech recognition and speech synthesis technologies. Thus, the computer-telephony platform can send and receive, over a network, digitized speech (to support recognition and synthesis, respectively) to and from a user during a telephony call. Additionally, the computer-telephony integration technology is used to handle telephony signaling functions such as call termination and touch-tone detection. [0005]
  • Language interpretation systems convert the digitized representation of the human speech into a computer-executable action related to the underlying application and/or service for which the spoken language interface is used—for example, a speech-enabled e-mail service. [0006]
  • The dialog and response generation systems generate and control the system response for the speech-enabled service which may correspond to, for example, the answer to the user's question, a request for clarification or confirmation, or a request for additional information from the user. The dialog and response systems typically utilize the speech synthesis systems or other output devices (e.g., a display) to present information to the user. Additionally, the dialog and response generation component may be responsible for predicting the grammar (also called the “language model”) that is to be used by the speech recognizer to constrain or narrow the required processing for the next spoken input by the user. For example, in a speech-enabled e-mail service, if the user has indicated the need to retrieve messages, the speech recognizer may limit processing for possible commands relating to retrieving messages the user may use. [0007]
  • Using one conventional method, language interpretation and dialog and response generation are mediated by intermediate representations, often referred to as semantic representations. These representations are computer data structures or code intended to encode the meaning of a sentence (or multiple sentences) spoken by a user (e.g., in a language interpretation system), or to encode the intended meaning of the system's response to the user (e.g., in a dialog and response generation system). Various types of such intermediate semantic representations are used including hierarchically embedded value-attribute lists (also called “frames”) as well as representations based on formal logic. [0008]
  • To facilitate this intermediate representation process, a two-step process is typically used. First, the language interpretation component converts the recognized word sequence (or digitized representation) into an instance of the intermediate semantic representation. Various means have been used for this conversion step, including conversion according to rules that trigger off of keywords and phrases, and conversion according to a manually written or statistically trained transition network. Second, the resulting intermediate representation is then mapped into the actual executable application actions. This second conversion phase is often achieved by an exhaustive set of manually authored rules or by a computer program written specifically for this spoken language application. This approach requires programming experts familiar with the speech-enabled interfaces and programming experts familiar with the underlying application programs. As a result, speech-enabled interfaces using this approach can be very expensive to develop and/or customize. [0009]
  • Alternatively, another conventional method uses customized software modules for interfacing with the language interpretation system to determine which application-specific action to execute for a given recognized input sequence. Using this approach, customized software modules need to be developed for each application and for handling the various application-specific commands. As a result, this conventional approach for developing speech-enabled interfaces can be costly due to increased development times. [0010]
  • Using conventional approaches, development of speech-enabled services requires skills different from, and in addition to, skills needed for programming the underlying application program for the service. Even for skilled spoken language system engineers, development of robust interfaces can be difficult and time-consuming with current technology. This increases the development time for such services and more generally slows widespread adoption of spoken language interface technology. [0011]
  • Since these conventional approaches require specialized programming skills, customizing these speech-enabled services, by users, based on personal language preferences, if at all possible, can be very difficult. [0012]
  • What is needed is a system and method for creating and customizing speech-enabled services that may solve the difficulties encountered using conventional approaches. For example, what is needed is an efficient speech-enabled interface that is not only robust and flexible, but can also be easily customized by users so that personal language preferences can be used. [0013]
  • SUMMARY OF THE INVENTION
  • Embodiments of the invention relate to a system and method for providing speech-enabled application programs. The speech-enabled programs automatically execute requests input by users. One or more natural language variants may be mapped with at least one natural language exemplar. The natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program. The natural language variant may correspond to an alternative way of expressing the same request. A recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants. The natural language exemplar mapped to the prospective variant is identified. An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request. [0014]
  • In embodiments of the invention, users of the system can create a plurality of personalized natural language variants that represent preferred ways of expressing the desired requests. Accordingly, the system may be able to recognize the plurality of variants and execute the action as specified by the user's request. [0015]
  • The above and other features and advantages of the present invention will be readily apparent and fully understood from the following detailed description of preferred embodiments, taken in connection with the appended drawings.[0016]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a diagrammatic representation of a system in accordance with embodiments of the present invention. [0017]
  • FIG. 2 is block diagram illustrating a system in accordance with an embodiment of the present invention. [0018]
  • FIG. 3 is a flow chart illustrating a method in accordance with an embodiment of the present invention. [0019]
  • FIGS. 4A and 4B show a flow chart illustrating an exemplary method in accordance with an embodiment of the present invention. [0020]
  • FIG. 5 is a diagrammatic representation of a customization module for use in the system as shown in FIG. 1.[0021]
  • DETAILED DESCRIPTION OF THE INVENTION
  • Embodiments of the present invention relate to the creation, development and customization of spoken language interfaces for a plurality of speech-enabled services. The invention provides a natural language interface to permit programmers and users to create and/or customize spoken language interfaces. The invention may provide an efficient and cost-effective way of developing spoken language interfaces that can be easily adapted to different systems or services—for example, messaging systems, auction systems, or interactive voice recognition (IVR) systems. Advantageously, the spoken language interface can be easily customized by end users based on their personal preferences and speech habits. [0022]
  • Embodiments of the present invention may use natural-language to natural-language mapping between user-specified commands and commands specified by, for example, an application program developer. Embodiments of the present invention provide an efficient system for executing a plurality of user commands that may map to a finite number of executable actions as specified by the program developer. Accordingly, the program developer may need only specify a finite number of exemplary English (or other language) commands that may be related to application actions. These exemplary English commands may be mapped with a plurality of English variations that a user may use for the desired action. The user can customize the English variations to create preferred commands to execute a desired action. [0023]
  • Referring FIG. 1, a block diagram of a speech-enabled system in accordance with embodiments of the present invention is shown. [0024] User 101 may use terminal device 102 for access to the application program 115. The terminal device 102 may be, for example, a personal computer, a telephone, a mobile phone, a hand-held device, personal digital assistant (PDA) or other suitable device having suitable hardware and/or software to connect with network 120 and access application program 115. Terminal device 102 may be installed with suitable hardware and software, for example, an Internet browser and a modem for connection to the Internet.
  • [0025] Network 120 may include, for example, a public switched telephone network (PSTN), a cellular network, an Internet, an intranet, satellite network and/or any other suitable national and/or international communications network or combination thereof.
  • [0026] Network 120 may include a plurality of communications devices (e.g., routers, switches, servers, etc.) including at least one computer telephony platform 121. Platform 121 may be a high-capacity computer and/or server that has the capacity to send, receive, and/or process digitized speech (e.g., to support speech recognition and synthesis functions). Platform 121 may be equipped to interface with a switch-based or packet-based network 120. Additionally, platform 121 may be equipped with telephony interface to handle telephony signaling functions such as call termination and touch tone detection. Platform 121 may be located within the network 120 or, alternatively, it may be located outside network 120. Platform 121 may serve as a gateway interface to spoken language processor (SLP) 104. Platform 121 may receive data from terminal device 102 and dispatch this information to SLP 104. Conversely, platform 121 may dispatch data from SLP 104 to the terminal device 102.
  • In accordance with embodiments of the invention, [0027] SLP 104 may be coupled to network 120 to provide a speech-enabled interface for an application programming interface (API) 114 and corresponding application or service 115. Application program 115 may support services or systems, for example, messaging systems, auction systems, interactive voice recognition (IVR) systems, or any other suitable system or service that may utilize a spoken language interface for automation. API 114 may be a software interface that application 115 may use to request and carry out lower-level services performed by a computer's or telephone system's operating system. An API 114 may include, for example, a set of standard software interrupts, calls, and data formats used by application program 115 to interface with network services, mainframe communications programs, telephone equipment or program-to-program communications.
  • In embodiments of the invention, [0028] SLP 104 may include a plurality of components, for example, an output synthesizer 105, recognizer 106, variation matcher 107, variant database 108, exemplar adjuster 110, action invoker 111 and context specifications database 112. It is recognized that output synthesizer 105 may provide data that can be presented to user's terminal device 102.
  • [0029] Output synthesizer 105 and recognizer 106 are known per se. Output synthesizer 105 may be a speech synthesizer or display formatter for delivering information to the user 101. The display formatter may produce data suitable for presentation on any physical display, for example, a cathode ray tube (CRT), liquid crystal display (LCD), flat plasma display, or any other type of suitable display. Alternatively or additionally, any suitable speech synthesizer that can take unrestricted text or digital data as input and convert this text or data into an audio signal for output to the user 101 may be used in embodiments of the invention.
  • [0030] Recognizer 106 may receive a natural language request in the form of, for example, an audio or analog signal S from user 101 and may convert this signal into a digitized data string or recognized word sequence, W. Signal S may be converted by the terminal device 102 and/or platform 121 to travel across network 120 and is delivered to recognizer 106. Digitized data string W may represent the natural language request S in the form of a digital signal as output by recognizer 106. For example, W may be a sequence of words in text form. Recognizer 106 may use any known process or system to convert signals S into a data string W. In one embodiment, recognizer 106 can load and switch between language models dynamically. For simplicity, the English language is referred to herein as the spoken language for use with the speech-enabled services. However, the present invention can be applied to spoken language interfaces for other languages.
  • In alternative embodiments of the invention, the user's [0031] terminal device 102 may include a handwriting recognition device, keyboard, and/or dial pad the user 101 may use to input a command and generate signal S. The generated signal S may be delivered to the recognizer 106 and processed as described above.
  • Advantageously, in accordance with embodiments of the present invention, [0032] variation matcher 107 may use variation database 108 and a variation matching function (not shown) to map the digitized data string W (or the recognized word sequence) into an exemplary English sentence E (i.e., an exemplar). The exemplar E may correspond to a typical way, as defined by an applications developer, of phrasing a particular request relevant to the current application program 115. The variation matcher 107 may further compute a string mapping function φ that may indicate the difference in meaning between the recognized digitized data string W and the exemplar E.
  • In embodiments of the invention, [0033] variant database 108 may contain a language model (L) 130 related to a particular context and/or the application program or service 115, currently accessed by the user 101. The language model 130 may be derived from the plurality of variant command files 109 using techniques known in the art for generating speech recognition language models from collections of sentences. In embodiments of the invention, each file 109 may be pertinent to particular context C corresponding to language model 130. It is recognized that variant database 108 may contain a single language model 130 relating to a particular application program or, alternatively, may contain a plurality of language models 130 relating to various application programs.
  • [0034] Variant command file 109 for context C may contain, for example, an exemplar E1 related to context C and associated variants V1 1 to V1 n. For each exemplar E in a context C, there may be a collection of, for example, English variants V1 1−V1 n. These variants V, together with their associated exemplars E, are stored in the variant database 108. The database 108 may store a set of related data of the form (C, V, E), where each V is an alternative way to phrase in natural language a request for the action A that is associated with exemplar E in context C. Since exemplars E may also be valid ways of phrasing application actions A, they are included in the variant database 108 as “variants” of themselves.
  • In embodiments of the invention, a set of exemplars E[0035] 1 to Em associated with the particular context C of an application program may be provided by the developer of the spoken language interface or of the application program. The developer may be an expert in the application API 114. The developer need not be an expert in speech-enabled services. Each exemplar E may represent an exemplary way of phrasing, in English or any other suitable language, a particular executable command or action for context C (as will be discussed below in more detail). The developer may map exemplar E to action A. For a particular context C, each file 109 may contain a plurality of English variants V1 1−Vm n. Variants V1 1−V1 n may represent different ways of saying or representing corresponding exemplar E1; variants V2 1−V2 k may represent different ways for saying exemplar E2; etc. These variants V1 1−Vm n may be created by anyone without requiring any relevant expertise or knowledge of the application program and/or speech-enabled technologies. For example, the user 101 may create variant V1 1 that represents the manner in which the user 101 typically refers to the desired action represented by exemplar E1. In embodiments of the present invention, the created variant(s) V1 1−V1 n may be mapped to its associated exemplar E1 for a particular context C, for example, in the form (C, V, E), as indicated above.
  • As a specific example, C may correspond to the context of reading e-mail messages, E[0036] 1 may be the exemplar “Retrieve my mail messages”, and the variants may include, for example, V1 1 “get my mail”, V1 2 “fetch my e-mail”, V1 3 “fetch messages”.
  • Referring again to FIG. 1, [0037] context specifications database 112 may contain a set of exemplar action specification files 113 for one application program or a plurality of different application programs. Exemplar action files 113 may correspond and/or relate to variant files 109. For example, the variants in a variant file 109 may be used to express the actions A in a corresponding action file 113, and A may be available for execution by action invoker 111.
  • For a given context C in exemplar action specification files [0038] 113, certain application actions A may be valid. These actions may relate to a specific context for a given application program. In embodiments of the present invention, exemplar-action specification file 113 may contain a plurality of contexts C1−Cm, a plurality of associated exemplars E1−Em, associated actions A1−Am, and pointer to a next context C′−Cx. Accordingly, each exemplar-action specification file 113 may contain a list of “exemplar-action” records stored or correlated as (C, E, A, C′). Each record (C, E, A, C′) may associate the exemplar E with a sequence A of action strings in the command language executable by the action invoker 111 in context C, and an identifier C′ of another, or the same, application context specification.
  • In embodiments of the present invention, each exemplary action specification file in the set of [0039] files 113 may correspond to a stage of interaction with the user. For example, if the application program is a speech-enabled e-mail service, the first action specification file 113 may contain actions relating to logging on, or identification of the user to the service, and a second action specification file 113 may contain actions relating to sending or retrieving e-mail messages. Thus, once the user 101 has accessed the service, action specification file 113 related to actions required to identify the user may be activated, followed by activating the action specification file for, for example, retrieving e-mail messages. The second action specification file may contain exemplars E and associated actions A relating to retrieving messages, for example, retrieving new messages, retrieving previously read messages, etc.
  • A language model L for use by the speech recognizer may be built for each context, based on the variants specified for that context. These models may be augmented with lists of proper names that may be used instead of those present in the exemplars E and variants V. Standard techniques for language modeling can be used for deriving the language models from the set of variants. [0040]
  • Although, in FIG. 1, [0041] variant database 108 and context specification database 112 are shown as two different databases, it is recognized that variant database 108 and context specification database 112 may be consolidated into a single database. It should be noted that descriptions of data flow and data configuration in databases 108 and 112 are given by way of example and variations may be made by one of ordinary skill in the art. For example, variations to the variant command files 109 or configuration or flow of the included data (e.g., C, V, E) and/or to the exemplar action specification files 113 may be made by one of ordinary skill in the art.
  • In one embodiment, [0042] recognizer 106 may produce a digitized data string W that matches exemplar E or variant V, stored in the variant database 108, exactly and the system can then proceed with invoking the corresponding application action A. In alternative embodiments, an exact match between data string W and a corresponding exemplar E or variant V may not be found. For example, recognition errors, requests for actions involving different objects (e.g., using different names) from those in the exemplars E or variants V, linguistic variation in the user utterances (including variants from their own customizations) and/or any combination of variations thereof may prevent exact matches from being found.
  • In embodiments of the invention, [0043] variation matcher 107 may seek to select a prospective variant V, in active context C, that most resembles, or most closely matches, the natural language request as represented by digitized data W. Variation matcher 107 may also specify the necessary changes or adaptations (i.e., string mapping function φ) to be made to, for example, variation matcher. Any known technique may be used to determine whether, for example, a given text or data sequence (e.g., a prospective variant) most resembles or closely matches the recognized word sequence. For example, known mathematical algorithms (as described below) may be applied to find such matches.
  • In embodiments of the present invention, [0044] exemplar adjuster 110 may receive the exemplar E and string mapping function φ from the variation matcher 107. Exemplar adjuster 110 with input from context specifications database 112 may apply the string mapping function φ to an application action A (an API call) that is paired with the exemplar E (e.g., from the context specifications database) to produce the actual API call or adapted action A′. Adapted action A′ may then be executed by the action invoker 111 to carry out the user's request.
  • [0045] Exemplar adjuster 110 may apply necessary adaptations to the action strings A to be invoked by the application and to the exemplar E (e.g., for confirmation purposes).
  • In embodiments of the present invention, [0046] variation matcher 107 may compute a function f taking an input W and a sequence <(V1, E1), . . . , (Vn, En)> of pairs of strings. The output of f may be one of the input pairs {(V1, E1)} together with a string mapping function φ that is:
  • f(W, <(V1, E1), . . . , (VnEn)>)→(V1, Ebφ)
  • The selected pair (V[0047] 1, E1) may be the first pair in the input sequences for which a string distance function μ is minimal:
  • mini1≦j≦i−1μ(W, Vj)>μ(W, V1)≦minii+1≦k≦nμ(W, Vk)
  • String mapping function φ may include a sequence of string editing operations, specifically insertions, deletions, and substitutions. [0048]
  • [0049] Exemplar adjuster 110 may fetch the action Ai associated with the exemplar Ei, where i may be any integer from 1 to m. A second string mapping function φ′ may be derived from φ, including only those string editing operations that are valid transformations of the action string Ai. A valid transformation may be one that results in an action string A that is well formed in the sense that it is parsed successfully by the action invoker 111. Second string mapping function φ′ is then applied to both sides of the selected pair by the exemplar adapter 110 to produce the “adapted” pair {(E′i, A′i)}.
  • In one embodiment of the [0050] variation matcher 107, the string distance μ is the string edit distance and φ is the corresponding edits found by the dynamic programming algorithm used to compute the minimal edit distance. Such edit-distance computation algorithms are known in computer science and have been used in various applications such as document search and evaluating the outputs of speech recognition systems. In this embodiment, language and action strings may both be treated as sequences of tokens (typically words in the case of language strings).
  • Edit-distance functions rely on a table of token distances for use when comparing tokens. Token distances can be uniform (e.g., two words that are different have a token distance of [0051] 1 and identical tokens have a distance of 0). Alternatively, token distances can be provided in the form of a table that reflects the closeness in meaning between any two words.
  • In an alternative embodiment of a [0052] variation matcher 107, edit-distance matching may be used in conjunction with a natural language generation component. Natural language generators are known per se. Natural language generators may be used to apply linguistic principles to generate a set of paraphrases, or close paraphrases, of English sentences. Such linguistic principles include syntactic transformations (e.g., the active-passive transformation) and paraphrases based on lexical semantics (e.g., “A sells X to B” is the same as “B buys X from A”). In this embodiment of the variation matcher 107, a natural language generator may first be used to produce paraphrases of each of the variants present in a context. This may result in an expanded set of variants for the context to which edit-distance matching may then be applied as indicated above. In embodiments, natural language generators may be used to automatically generate at least one variant V by generating paraphrases of an exemplar E.
  • Although only two embodiments of a [0053] variation matcher 107 have been described, it is recognized that alternative techniques may be applied in the variation matcher. For example, any suitable method that can measure the difference in meaning between two sentences and represent that difference as a string mapping function can be used as the basis for a variation matcher.
  • The action invoker [0054] 111 may be a command string interpreter capable of executing dynamically generated strings (e.g., method calls and database query requests) corresponding to actions in the API for the application. For example, the command interpreter may execute scripting languages (e.g., TCL), or procedure calls for languages with reflection (e.g., Java), or database query languages (e.g., SQL).
  • In embodiments of the present invention, the [0055] exemplar adjuster 110 can ask user 101 for confirmation that adapted exemplar E′ may express the action that is desired by the user 101. If the user confirms positively, the action invoker 111 may dispatch adapted action A′ to API 114. Application program 115 may execute the dispatched action and return the resulting output O′ to the action invoker 111. The session manager 103 may present output O′ to the user via output synthesizer 105.
  • Session manager or [0056] controller 103 may be coupled with SLP 104 and may manage the plurality of components within the SLP 104. For example, session manager 103 may provide data flow control for the various components of SLP 104 during a speech-enabled session. At any point in interacting with a user, the session manager 103 may maintain an active context C. There may be an initial context specification associated with each program application. Each context specification may be associated with a collection of variants in the variant database 108. Although session manager 103 is shown external to SLP 104, it is recognized that alternatively session manager 103 may be incorporated within SLP 104.
  • FIG. 2 is a component-level block diagram of a spoken [0057] language processing system 200 in accordance with an embodiment of the present invention. The spoken language processing system 200 may be used as the speech-enabled interface for a desired service 210. Thus, using the spoken language processing system 200, a user or customer may, for example, input command S to be executed by the service 210. The user may input command S using terminal device 102. The user may articulate a spoken command into a microphone of, for example, the terminal device 102 (e.g., a telephone, PC, or other communication device). In alternative embodiments of the invention, for example, the terminal device 102 may include handwriting recognition system, a dial-pad, a touch screen or keyboard or other input device that the user 101 may use to input command S.
  • A recognized input string W may be generated by the [0058] speech recognizer 106. The recognized input string W may be in the form of digitized data that represents a command (S) input by a user. The recognizer may be located internal to or external to the natural language processing system 200. The recognizer 106 may be coupled to a processor 203 located in the spoken language system 200 of the present invention. The processor may perform the functions of, for example, variation matcher 107, exemplar adjuster 110, action invoker 111, and/or perform other processing functions that may required by the system 200. In embodiments of the present invention, the processor 203 may process the command S that is input by the user to generate recognized input string W.
  • [0059] Processor 203 may be coupled to a memory 204 and controller 202. The memory 204 may be used to store, for example, variant database 108, context specification database 112, and/or any other data or instructions that may be required by processor 203 and/or controller 202. It is recognized that any suitable memory may be used in system 200. The databases 108 and/or 112 may be organized as contexts related to the desired service. Accordingly, depending on the service accessed or the stage of service, the processor may load the proper context. In embodiments of the invention, processor 203 may use the variation database and a variation matching function to map the recognized input string W into an exemplary natural language exemplar E, stored in the variant database 108 in memory 204. As described above, the exemplar E may correspond to a typical way of phrasing a particular request relevant to the current application.
  • In embodiments of the present invention, at least one natural language exemplar E may correspond to one or more natural language variants V. These natural language variants V may represent alternative ways to express exemplar E. These variants may also be stored in the [0060] variant database 108 and may be created by, for example, the user, application programmer, and/or speech interface developer. In this case, processor 203 may select, from the one or more natural language variants V, a prospective variant that most resembles or closely matches the recognized word sequence using any known technique for matching as described above. After the selection is made, the corresponding natural language exemplar E may be identified.
  • In any case, if an exact match for the natural language exemplar E corresponding to the recognized input string W is identified, the processor may identify an application action A (API call) corresponding to the exemplar E. Action A and corresponding exemplar(s) may be stored in, for example, [0061] context specification database 112 stored in memory 204. After the action A has been identified, controller 202 may cause the action A to be invoked by service 210.
  • In alternative embodiments, if there exists a difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V, the [0062] processor 203 may also generate string mapping function φ. String mapping function φ may specify the difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V. In this case, the processor 203 may then apply the string mapping function φ to the application action A that corresponds with the exemplar E, to produce the actual API call or adapted action A′. The controller 202 may cause the actual API call A′ to be executed by the service 210 to carry out the user's request.
  • In alternative embodiments of the invention, the processor may apply the string mapping function φ to the exemplar E to produce an adapted exemplar E′. The adapted exemplar E′ may be presented to the user via [0063] output synthesizer 105. The user may be asked to confirm whether the action desired by the user may be expressed by exemplar E or adapted exemplar E′. If the user accepts E or E′, the controller 202 executes action A or adapted action A′, respectively. If the user does not accept E or E′, then the processor 203 may continue processing the recognized input string W, as described above, until the user's request has been carried out. In alternative embodiments, if the user does not accept E or E′, the controller may ask the user to rephrase their request.
  • [0064] Application program 210 may execute the action A or adapted action A′ and return the resulting output O′ to the controller 202. The controller 202 may present output O′ to the user's terminal device 102 via output synthesizer 105.
  • Now the operation of an exemplary embodiment of the present invention will be described with reference to FIG. 3. [0065] User 101 may access SLP 104 of a speech-enabled service in accordance with the present invention (301). Session manager 103 may cause speech recognizer 106 to load (or switch to) the language model L for the active context related to the application program serviced by the SLP 104 (302). The user 101 may be presented with a greeting via output synthesizer 105, and the user may respond by articulating a command into an input of terminal device 102 (303). The speech recognizer 106 may receive input S and produce an output data string W (304). The output data string W may be a transcription hypothesis of the user's command.
  • [0066] Variation matcher 107 is applied to W to select an exemplar E from the active context C and to construct a string-mapping function φ (305). The exemplar adjuster 110 applies the string-mapping function φ in order to construct an adapted exemplar E′ and an adapted executable action A′ (306). The system asks the user for confirmation to proceed with the sequence of actions A′ by presenting to the user (via the output synthesizer 105) the English expression E′ (307) and asking user 101 whether the adapted action A′ as expressed by the adapted exemplar E′ is desired (308).
  • If the user selects, or says, “Yes,” the session manager passes the adapted action A′ to the action invoker which executes the action A′ and returns any resulting output O′ to the user via the output synthesizer [0067] 105 (309). The session manager may send this output (or a summary of it as appropriate) to the speech synthesizer or display. Based on the record (C, E, A, C′) in active context specification, the active context for handling the next request by the user is changed to the context C′ associated with E in C, (310).
  • If in [0068] step 308, the user selects, or says, “No” indicating that the exemplar E′ does not express the action desired by the user, the speech recognizer produces another output string W based on the command (304). In embodiments of the present invention, the speech recognizer 106 may produce another output string W that may be different from the previously created W. The variation matcher 107 may receive another output string W or may receive the same output string W and the variation matcher 107 may select another exemplar E′ and mapping function φ′. The system may, for example, re-execute steps 306 through 308 to construct an adapted action A′ and adapted exemplar E′ that is desired by the user. In other embodiments of the present invention, the controller may ask the user to rephrase their request.
  • FIGS. 4A and 4B show a flow chart applying embodiments of the present invention to an exemplary speech-enabled e-mail service. A user may desire to retrieve e-mail messages and may log on via the Internet or call the speech-enabled e-mail service. The user may articulate speech into a microphone of, for example, the [0069] terminal device 102 or a telephone or other communication device (not shown). The controller 103 may load the active context language model for the speech-enabled e-mail service from variant database 108 of SLP 104.
  • The user's input may be converted to an electrical signal that is passed to the recognizer as an input command S. Input command S may be, for example, “Was there anything new from Kathleen?” ([0070] 401). Recognizer 106 may convert the command S into an output string W which may be interpreted as “Is there any thing few from Kathleen?” (402). As indicated above, the recognizer 106 may be susceptible to errors depending on the clarity of the input or other external or internal variations; thus, for example, “new” may be interpreted as “few” by recognizer 106. Variation matcher 107 takes the string W and attempts to find a suitable match from the variant database 108. Variation matcher may retrieve a stored variant V “Is there anything from Joe?” (403).
  • Based on the variant V, the [0071] variant matcher 107 may retrieve exemplar E (e.g., Get the messages with sender Joe) that is associated with variant V of step 403 (404). Variant matcher 107 may construct a string mapping function φ, that expresses the difference between output string W of step 402 and variant V of step 403 (405). String mapping function φ indicates the insertion of the word “few” and the substitution of the word “Joe” by “Kathleen” (405). In embodiments of the invention, various known techniques may be implemented to determine string-mapping function φ.
  • [0072] Variation matcher 107 may select an action A as “{mailAgent.setFolder(“INBOX); mailAgent.getmessages(“From=Joe”)}” based on the exemplar E of step 404 (406). The action A of step 406 is an exemplary action expressed as line of code that the application program understands and may be able to execute. It is recognized that the line of code for action A is given by example only and that many different expressions can be written. A subset of string mapping function φ as applicable to action A (φA) is generated and may be applied to action A (407).
  • Adapted action A′ may be generated by applying φ[0073] A to action A resulting in the line of code, for example, “{mailAgent.setFolder(“INBOX); mailAgent.getmessages(“From=Kathleen”)}” (408). Adapted exemplar E′ may generate, for example, “Get the messages with sender Kathleen” (409). The adapted exemplar E′ may be presented to the user; and if the user confirms that the user desires the adapted action A′ as expressed by the exemplar E′, the adapted action A′ may be executed by the API 114 of application program 115. Accordingly, messages from Kathleen, for example, may be presented to the user via output synthesizer 105.
  • Embodiments of the present invention may permit users to customize the [0074] variant database 108 so that they can create variants that closely represent the manner in which the user would articulate a particular action. FIG. 5 shows a block diagram showing a speech customization module 500 in accordance with embodiments of the present invention. The customization module 500 may be used to add personalized variants relating to stored exemplars E in variant database 108. Users 101 may use, for example, a known web browser 502 to access context customizer 503. Although a web browser is shown, a user may use a telephone or other suitable device to access context customizer 503.
  • [0075] Context customizer 503 may be coupled to variant database 108 and customizer server 501. Users of the system 100 may access the generic context files Cg 109 stored in variant database 108 and create customized content files 504 stored in a customization server 501. Generic context files Cg 109 may contain, for example, context identifier C, a variant V and corresponding exemplar E. Customization server 501 may contain customized context files 504 for a plurality of users U1-UN. Each customized file 504 may contain personalized context containing personalized variants (e.g., V1 1, V1 2 to Vm n) personal to the user. User U1 may create one or more variants V corresponding to, for example, exemplar E. Thus, if the user U1 prefers to refer to a single action using varying commands, the user 101 may customize files 504 to reflect this preference. It is recognized that any language—for example, French, Spanish, etc. may be used in embodiments of the present invention.
  • In embodiments of the invention, user U[0076] 1 may customize a context Cu1, adding to the variants associated with C in the user's U1 personal variant database file 504 by composing natural language requests V and associating them with natural language requests or lists of requests E which are taken from the exemplars E associated with context C.
  • The [0077] customization module 500 may permit a user to create and edit natural-language to natural-language (e.g., English-to-English) customization files stored, for example, on a server 501 using a standard HTTP browser. User U1 may be authenticated by the customization module 500 using known techniques and choosing an application, and within that a context C, to customize. In one embodiment, user may construct pairs of the form “When I say V1, I mean E″ by choosing an exemplar E from among the available exemplars in C and entering a personalized variant V1 to be associated with that exemplar.
  • Once the user U[0078] 1 customizes file 504 to reflect personalized variants, the resulting variants may be uploaded into variant database 104 in the form: (U1, V1, E, C), indicating that the customized variant V1 belongs to user U1 and is related to exemplar E in context C. Accordingly, when the user U1 uses system 100 of FIG. 1, the customized context will be available to the user U1 including the customized variants in addition to any variants that may already be present in the database for all users. In embodiments of the present invention, for subsequent customizations, the user may be presented with their own custom version of any context they have customized in the past. Additionally, users may be able to revert back to the generic context Cg when desired.
  • The present invention has been described in terms of preferred and exemplary embodiments thereof. Numerous other embodiments, modifications and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure. [0079]

Claims (61)

What is claimed is:
1. A method for providing speech-enabled application programs comprising:
responsive to an input string, selecting from one or more natural language variants a prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the prospective variant.
2. The method of claim 1, wherein the mapping comprises:
mapping the one or more natural language variants with at least one natural language exemplar.
3. The method of claim 2, wherein the prospective variant corresponds to at least one natural language exemplar.
4. The method of claim 1, further comprising:
executing an action instruction associated with the identified natural language exemplar.
5. The method of claim 1, further comprising:
mapping a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar.
6. The method of claim 5, further comprising:
generating a mapping function that specifies a difference between the input string and the prospective variant.
7. The method of claim 6, further comprising:
applying the mapping function to the action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
8. The method of claim 7, further comprising:
executing the produced adapted action instruction.
9. The method of claim 6, further comprising:
applying the mapping function to the identified natural language exemplar to produce an adapted exemplar.
10. The method of claim 9, further comprising
forwarding the adapted exemplar to a user to confirm whether the user desires an adapted action corresponding to the adapted exemplar.
11. The method of claim 10, further comprising:
executing the adapted action if the user confirms that an adapted exemplar expresses the action desired by the user.
12. The method of claim 11, further comprising:
if the user does not accept that the adapted exemplar expresses the action desired by the user, selecting from the one or more natural language variants an alternative prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the alternative prospective variant.
13. The method of claim 12, further comprising:
executing an action instruction associated with the identified natural language exemplar.
14. The method of claim 2, further comprising:
storing one or more natural language variants mapped to at least one natural language exemplar in a memory.
15. The method of claim 14, wherein at least one natural language variant is input by a user.
16. The method of claim 14, wherein at least one natural language variant is input by an application developer.
17. The method of claim 14, wherein the at leas t one natural language exemplar is input by an application developer.
18. The method of claim 14, wherein the at least one natural language exemplar is produced automatically by a natural language generator.
19. The method of claim 14, further comprising:
producing at least one natural language variant by automatically generating paraphrases of the natural language exemplar.
20. The method of claim 1, further comprising:
loading an active context file relating to a service accessed by a user, the active context file containing the one or more natural language variants and the natural language exemplar.
21. The method of claim 1, further comprising:
comparing the input string with the one or more natural language variants.
22. The method of claim 1, wherein the input string is input by at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
23. A system for providing speech-enabled application programs comprising:
a voice recognizer to receive an input string and produce a recognized input string;
a memory to store one or more natural language variants corresponding to at least one natural language exemplar; and
a processor to:
select from the one or more natural language variants a prospective variant that most resembles the received recognized input string; and
identify the at least one natural language exemplar corresponding to the prospective variant.
24. The system of claim 23, further comprising:
a controller adapted to execute an action instruction associated with the identified natural language exemplar corresponding to the prospective variant.
25. The system of claim 23, the processor adapted to map a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar and the memory to store the mapped action instructions.
26. The system of claim 25, the processor adapted to further generate a mapping function that specifies a difference between the received recognized input string and the prospective variant.
27. The system of claim 26, the processor adapted to apply the mapping function to the action instruction associated with the identified natural language exemplar mapped to the prospective variant to produce an adapted action instruction.
28. The system of claim 27, the controller adapted to execute the produced adapted action instruction.
29. The system of claim 28, further comprising:
an output synthesizer to present a result of the executed instruction by providing data that can be presented to an audio or visual terminal device.
30. The system of claim 29, wherein the output synthesizer is at least one of a display format and a speech synthesizer.
31. The system of claim 23, further comprising:
an input device to generate an input string.
32. The system of claim 31, wherein said input device is at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
33. A machine-readable medium having stored thereon executable instructions for performing a method comprising:
responsive to an input string, selecting from one or more natural language variants a prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the prospective variant.
34. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
mapping the one or more natural language variants with at least one natural language exemplar.
35. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
executing an action instruction associated with the identified natural language exemplar.
36. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
mapping a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar.
37. The machine-readable medium of claim 36 having stored thereon further executable instructions for performing a method comprising:
generating a mapping function that specifies a difference between the input string and the prospective variant.
38. The machine-readable medium of claim 37 having stored thereon further executable instructions for performing a method comprising:
applying the mapping function to the action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
39. The machine-readable medium of claim 38 having stored thereon further executable instructions for performing a method comprising:
executing the produced adapted action instruction.
40. The machine-readable medium of claim 37 having stored thereon further executable instructions for performing a method comprising:
applying the mapping function to the identified natural language exemplar to produce an adapted exemplar.
41. The machine-readable medium of claim 40 having stored thereon further executable instructions for performing a method comprising:
forwarding the adapted exemplar to a user to confirm whether the user desires an adapted action corresponding to the adapted exemplar.
42. The machine-readable medium of claim 41 having stored thereon further executable instructions for performing a method comprising:
executing the adapted action if the user confirms that an adapted exemplar expresses the action desired by the user.
43. The machine-readable medium of claim 42 having stored thereon further executable instructions for performing a method comprising:
selecting from the one or more natural language variants an alternative prospective variant that most resembles the input string, if the user does not accept that the adapted exemplar expresses the action desired by the user; and
identifying a natural language exemplar via a mapping between the exemplar and the alternative prospective variant.
44. The machine-readable medium of claim 43 having stored thereon further executable instructions for performing a method comprising:
executing an action instruction associated with the identified natural language exemplar.
45. In a speech-enabled service, a method for creating customized files containing personalized command variants relating to the speech-enabled service, the method comprising:
accessing a context file relating to the speech enabled service, the context file containing a natural language exemplar associated with a desired action;
creating a customized variant for the desired action; and
correlating the created variant with the natural language exemplar.
46. The method of claim 45, wherein the created variant represents one preferred way of expressing the desired action.
47. The method of claim 46, further comprising:
storing the created variant in a customized context file, wherein during service access by a user the personalized context file is uploaded by the speech-enabled service allowing the user to express the desired action using the created variant.
48. The method of claim 45, wherein the context file is accessed using a web browser.
49. The method of claim 45, wherein the context file is accessed using a telephone.
50. A system for providing speech-enabled application programs comprising:
a memory to store one or more natural language variants corresponding to a natural language exemplar; and
a processor to:
select from the one or more natural language variants a prospective variant that most resembles an input string; and
identify a natural language exemplar via a mapping between the exemplar and the prospective variant.
51. The system of claim 50, further comprising:
a voice recognizer to receive the input string and produce a recognized input string.
52. The system of claim 50, further comprising:
a controller adapted to execute an action instruction associated with the identified natural language exemplar.
53. The system of claim 50, the processor adapted to map the one or more natural language variants with the natural language exemplar.
54. The system of claim 50, the processor adapted to map a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar and the memory to store the mapped action instructions.
55. The system of claim 51, the processor adapted to generate a mapping function that specifies a difference between the recognized input string and the prospective variant.
56. The system of claim 55, the processor adapted to apply the mapping function to an action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
57. The system of claim 56, further comprising:
a controller adapted to execute the produced adapted action instruction.
58. The system of claim 57, further comprising:
an output synthesizer to present a result of the executed instruction by providing data that can be presented to an audio or visual terminal device.
59. The system of claim 58, wherein the output synthesizer is at least one of a display format and a speech synthesizer.
60. The system of claim 50, further comprising:
an input device to generate the input string.
61. The system of claim 60, wherein said input device is at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
US09/732,600 2000-12-08 2000-12-08 Method and apparatus for creation and user-customization of speech-enabled services Abandoned US20020072914A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US09/732,600 US20020072914A1 (en) 2000-12-08 2000-12-08 Method and apparatus for creation and user-customization of speech-enabled services
EP01310087A EP1215657A3 (en) 2000-12-08 2001-11-30 Method and apparatus for creation and user-customisation of speech enabled services
US10/103,049 US7212964B1 (en) 2000-12-08 2002-03-22 Language-understanding systems employing machine translation components
US11/215,756 US7912726B2 (en) 2000-12-08 2005-08-30 Method and apparatus for creation and user-customization of speech-enabled services
US11/656,155 US7467081B2 (en) 2000-12-08 2007-01-22 Language-understanding training database action pair augmentation using bidirectional translation
US12/336,429 US8073683B2 (en) 2000-12-08 2008-12-16 Language-understanding training database action pair augmentation using bidirectional translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/732,600 US20020072914A1 (en) 2000-12-08 2000-12-08 Method and apparatus for creation and user-customization of speech-enabled services

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US10/103,049 Continuation-In-Part US7212964B1 (en) 2000-12-08 2002-03-22 Language-understanding systems employing machine translation components
US11/215,756 Division US7912726B2 (en) 2000-12-08 2005-08-30 Method and apparatus for creation and user-customization of speech-enabled services

Publications (1)

Publication Number Publication Date
US20020072914A1 true US20020072914A1 (en) 2002-06-13

Family

ID=24944199

Family Applications (5)

Application Number Title Priority Date Filing Date
US09/732,600 Abandoned US20020072914A1 (en) 2000-12-08 2000-12-08 Method and apparatus for creation and user-customization of speech-enabled services
US10/103,049 Expired - Fee Related US7212964B1 (en) 2000-12-08 2002-03-22 Language-understanding systems employing machine translation components
US11/215,756 Expired - Fee Related US7912726B2 (en) 2000-12-08 2005-08-30 Method and apparatus for creation and user-customization of speech-enabled services
US11/656,155 Expired - Lifetime US7467081B2 (en) 2000-12-08 2007-01-22 Language-understanding training database action pair augmentation using bidirectional translation
US12/336,429 Expired - Fee Related US8073683B2 (en) 2000-12-08 2008-12-16 Language-understanding training database action pair augmentation using bidirectional translation

Family Applications After (4)

Application Number Title Priority Date Filing Date
US10/103,049 Expired - Fee Related US7212964B1 (en) 2000-12-08 2002-03-22 Language-understanding systems employing machine translation components
US11/215,756 Expired - Fee Related US7912726B2 (en) 2000-12-08 2005-08-30 Method and apparatus for creation and user-customization of speech-enabled services
US11/656,155 Expired - Lifetime US7467081B2 (en) 2000-12-08 2007-01-22 Language-understanding training database action pair augmentation using bidirectional translation
US12/336,429 Expired - Fee Related US8073683B2 (en) 2000-12-08 2008-12-16 Language-understanding training database action pair augmentation using bidirectional translation

Country Status (2)

Country Link
US (5) US20020072914A1 (en)
EP (1) EP1215657A3 (en)

Cited By (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061054A1 (en) * 2001-09-25 2003-03-27 Payne Michael J. Speaker independent voice recognition (SIVR) using dynamic assignment of speech contexts, dynamic biasing, and multi-pass parsing
US20030061053A1 (en) * 2001-09-27 2003-03-27 Payne Michael J. Method and apparatus for processing inputs into a computing device
US20030060181A1 (en) * 2001-09-19 2003-03-27 Anderson David B. Voice-operated two-way asynchronous radio
US20030130875A1 (en) * 2002-01-04 2003-07-10 Hawash Maher M. Real-time prescription renewal transaction across a network
US20030130868A1 (en) * 2002-01-04 2003-07-10 Rohan Coelho Real-time prescription transaction with adjudication across a network
US20030216913A1 (en) * 2002-05-14 2003-11-20 Microsoft Corporation Natural input recognition tool
US20040030559A1 (en) * 2001-09-25 2004-02-12 Payne Michael J. Color as a visual cue in speech-enabled applications
US20040092293A1 (en) * 2002-11-06 2004-05-13 Samsung Electronics Co., Ltd. Third-party call control type simultaneous interpretation system and method thereof
US20050246177A1 (en) * 2004-04-30 2005-11-03 Sbc Knowledge Ventures, L.P. System, method and software for enabling task utterance recognition in speech enabled systems
US20050283367A1 (en) * 2004-06-17 2005-12-22 International Business Machines Corporation Method and apparatus for voice-enabling an application
US6985865B1 (en) * 2001-09-26 2006-01-10 Sprint Spectrum L.P. Method and system for enhanced response to voice commands in a voice command platform
US20060056602A1 (en) * 2004-09-13 2006-03-16 Sbc Knowledge Ventures, L.P. System and method for analysis and adjustment of speech-enabled systems
US20060069569A1 (en) * 2004-09-16 2006-03-30 Sbc Knowledge Ventures, L.P. System and method for optimizing prompts for speech-enabled applications
US20060136222A1 (en) * 2004-12-22 2006-06-22 New Orchard Road Enabling voice selection of user preferences
US7231343B1 (en) * 2001-12-20 2007-06-12 Ianywhere Solutions, Inc. Synonyms mechanism for natural language systems
US20080133220A1 (en) * 2006-12-01 2008-06-05 Microsoft Corporation Leveraging back-off grammars for authoring context-free grammars
US20080183474A1 (en) * 2007-01-30 2008-07-31 Damion Alexander Bethune Process for creating and administrating tests made from zero or more picture files, sound bites on handheld device
US20100202598A1 (en) * 2002-09-16 2010-08-12 George Backhaus Integrated Voice Navigation System and Method
US20100281435A1 (en) * 2009-04-30 2010-11-04 At&T Intellectual Property I, L.P. System and method for multimodal interaction using robust gesture processing
CN102681463A (en) * 2012-05-22 2012-09-19 青岛四方车辆研究所有限公司 Compact-type expanded input-output (IO) device
US8606584B1 (en) * 2001-10-24 2013-12-10 Harris Technology, Llc Web based communication of information with reconfigurable format
CN103901795A (en) * 2012-12-26 2014-07-02 中国科学院软件研究所 CPLD (Complex Programmable Logic Device)-based IO-station digital input module and input method
WO2014144949A3 (en) * 2013-03-15 2014-11-20 Apple Inc. Training an at least partial voice command system
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9953027B2 (en) 2016-09-15 2018-04-24 International Business Machines Corporation System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9984063B2 (en) * 2016-09-15 2018-05-29 International Business Machines Corporation System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US20190075167A1 (en) * 2017-09-07 2019-03-07 Samsung Electronics Co., Ltd. Electronic device, server and recording medium supporting task execution using external device
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US20190124031A1 (en) * 2017-10-20 2019-04-25 Sap Se Message processing for cloud computing applications
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11126446B2 (en) * 2019-10-15 2021-09-21 Microsoft Technology Licensing, Llc Contextual extensible skills framework across surfaces
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020072914A1 (en) * 2000-12-08 2002-06-13 Hiyan Alshawi Method and apparatus for creation and user-customization of speech-enabled services
DE10203368B4 (en) * 2002-01-29 2007-12-20 Siemens Ag Method and device for establishing a telephone connection
EP1447793A1 (en) * 2003-02-12 2004-08-18 Hans Dr. Kuebler User-specific customization of voice browser for internet and intranet
FR2868588A1 (en) * 2004-04-02 2005-10-07 France Telecom VOICE APPLICATION SYSTEM
US8942985B2 (en) 2004-11-16 2015-01-27 Microsoft Corporation Centralized method and system for clarifying voice commands
US7703037B2 (en) 2005-04-20 2010-04-20 Microsoft Corporation Searchable task-based interface to control panel functionality
US7925975B2 (en) 2006-03-10 2011-04-12 Microsoft Corporation Searching for commands to execute in applications
US7848915B2 (en) * 2006-08-09 2010-12-07 International Business Machines Corporation Apparatus for providing feedback of translation quality using concept-based back translation
CN105117376B (en) * 2007-04-10 2018-07-10 谷歌有限责任公司 Multi-mode input method editor
US9779079B2 (en) * 2007-06-01 2017-10-03 Xerox Corporation Authoring system
JP5235344B2 (en) * 2007-07-03 2013-07-10 株式会社東芝 Apparatus, method and program for machine translation
US8635069B2 (en) 2007-08-16 2014-01-21 Crimson Corporation Scripting support for data identifiers, voice recognition and speech in a telnet session
JP5100445B2 (en) * 2008-02-28 2012-12-19 株式会社東芝 Machine translation apparatus and method
US8521516B2 (en) * 2008-03-26 2013-08-27 Google Inc. Linguistic key normalization
US8700385B2 (en) * 2008-04-04 2014-04-15 Microsoft Corporation Providing a task description name space map for the information worker
US8352244B2 (en) * 2009-07-21 2013-01-08 International Business Machines Corporation Active learning systems and methods for rapid porting of machine translation systems to new language pairs or new domains
JP2011033680A (en) * 2009-07-30 2011-02-17 Sony Corp Voice processing device and method, and program
US9063931B2 (en) * 2011-02-16 2015-06-23 Ming-Yuan Wu Multiple language translation system
CN104040238B (en) 2011-11-04 2017-06-27 汉迪拉布公司 Polynucleotides sample preparation apparatus
US9336302B1 (en) 2012-07-20 2016-05-10 Zuci Realty Llc Insight and algorithmic clustering for automated synthesis
US10033797B1 (en) 2014-08-20 2018-07-24 Ivanti, Inc. Terminal emulation over HTML
JP6466138B2 (en) 2014-11-04 2019-02-06 株式会社東芝 Foreign language sentence creation support apparatus, method and program
US9472196B1 (en) * 2015-04-22 2016-10-18 Google Inc. Developer voice actions system
DE102015006662B4 (en) 2015-05-22 2019-11-14 Audi Ag Method for configuring a voice control device
US9401142B1 (en) 2015-09-07 2016-07-26 Voicebox Technologies Corporation System and method for validating natural language content using crowdsourced validation jobs
US9519766B1 (en) 2015-09-07 2016-12-13 Voicebox Technologies Corporation System and method of providing and validating enhanced CAPTCHAs
WO2017044409A1 (en) 2015-09-07 2017-03-16 Voicebox Technologies Corporation System and method of annotating utterances based on tags assigned by unmanaged crowds
WO2017044415A1 (en) * 2015-09-07 2017-03-16 Voicebox Technologies Corporation System and method for eliciting open-ended natural language responses to questions to train natural language processors
JP6481643B2 (en) * 2016-03-08 2019-03-13 トヨタ自動車株式会社 Audio processing system and audio processing method
US11100278B2 (en) 2016-07-28 2021-08-24 Ivanti, Inc. Systems and methods for presentation of a terminal application screen
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US11354521B2 (en) 2018-03-07 2022-06-07 Google Llc Facilitating communications with automated assistants in multiple languages
AU2018412575B2 (en) * 2018-03-07 2021-03-18 Google Llc Facilitating end-to-end communications with automated assistants in multiple languages
JP7132090B2 (en) * 2018-11-07 2022-09-06 株式会社東芝 Dialogue system, dialogue device, dialogue method, and program
US11575999B2 (en) 2020-01-16 2023-02-07 Meta Platforms Technologies, Llc Systems and methods for hearing assessment and audio adjustment
RU2758683C2 (en) * 2020-04-28 2021-11-01 Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) System and method for augmentation of the training sample for machine learning algorithms
US11664010B2 (en) 2020-11-03 2023-05-30 Florida Power & Light Company Natural language domain corpus data set creation based on enhanced root utterances
US20230214604A1 (en) * 2022-01-06 2023-07-06 PRIVACY4CARS, Inc. Translating technical operating instruction

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5675707A (en) * 1995-09-15 1997-10-07 At&T Automated call router system and method
US5729659A (en) * 1995-06-06 1998-03-17 Potter; Jerry L. Method and apparatus for controlling a digital computer using oral input
US6122614A (en) * 1998-11-20 2000-09-19 Custom Speech Usa, Inc. System and method for automating transcription services
US6138100A (en) * 1998-04-14 2000-10-24 At&T Corp. Interface for a voice-activated connection system
US6311159B1 (en) * 1998-10-05 2001-10-30 Lernout & Hauspie Speech Products N.V. Speech controlled computer user interface
US6324512B1 (en) * 1999-08-26 2001-11-27 Matsushita Electric Industrial Co., Ltd. System and method for allowing family members to access TV contents and program media recorder over telephone or internet
US6327566B1 (en) * 1999-06-16 2001-12-04 International Business Machines Corporation Method and apparatus for correcting misinterpreted voice commands in a speech recognition system

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5454062A (en) * 1991-03-27 1995-09-26 Audio Navigation Systems, Inc. Method for recognizing spoken words
JPH05197573A (en) * 1991-08-26 1993-08-06 Hewlett Packard Co <Hp> Task controlling system with task oriented paradigm
US5493692A (en) * 1993-12-03 1996-02-20 Xerox Corporation Selective delivery of electronic messages in a multiple computer system based on context and environment of a user
US5544354A (en) * 1994-07-18 1996-08-06 Ikonic Interactive, Inc. Multimedia matrix architecture user interface
JP3066274B2 (en) * 1995-01-12 2000-07-17 シャープ株式会社 Machine translation equipment
JPH09128396A (en) * 1995-11-06 1997-05-16 Hitachi Ltd Preparation method for bilingual dictionary
US5823879A (en) * 1996-01-19 1998-10-20 Sheldon F. Goldberg Network gaming system
US6341372B1 (en) 1997-05-01 2002-01-22 William E. Datig Universal machine translator of arbitrary languages
US5974413A (en) * 1997-07-03 1999-10-26 Activeword Systems, Inc. Semantic user interface
WO1999046763A1 (en) 1998-03-09 1999-09-16 Lernout & Hauspie Speech Products N.V. Apparatus and method for simultaneous multimode dictation
JP3059413B2 (en) * 1998-03-16 2000-07-04 株式会社エイ・ティ・アール音声翻訳通信研究所 Natural language understanding device and natural language understanding system
US7051277B2 (en) * 1998-04-17 2006-05-23 International Business Machines Corporation Automated assistant for organizing electronic documents
US6070142A (en) * 1998-04-17 2000-05-30 Andersen Consulting Llp Virtual customer sales and service center and method
US6345243B1 (en) * 1998-05-27 2002-02-05 Lionbridge Technologies, Inc. System, method, and product for dynamically propagating translations in a translation-memory system
US6144375A (en) * 1998-08-14 2000-11-07 Praja Inc. Multi-perspective viewer for content-based interactivity
US6327346B1 (en) * 1998-09-01 2001-12-04 At&T Corp. Method and apparatus for setting user communication parameters based on voice identification of users
US6453292B2 (en) 1998-10-28 2002-09-17 International Business Machines Corporation Command boundary identifier for conversational natural language
US7082397B2 (en) 1998-12-01 2006-07-25 Nuance Communications, Inc. System for and method of creating and browsing a voice web
US6275789B1 (en) * 1998-12-18 2001-08-14 Leo Moser Method and apparatus for performing full bidirectional translation between a source language and a linked alternative language
US6978262B2 (en) * 1999-01-05 2005-12-20 Tsai Daniel E Distributed database schema
US6397212B1 (en) * 1999-03-04 2002-05-28 Peter Biffar Self-learning and self-personalizing knowledge search engine that delivers holistic results
JP3016779B1 (en) * 1999-03-08 2000-03-06 株式会社エイ・ティ・アール音声翻訳通信研究所 Voice understanding device and voice understanding system
US6408272B1 (en) * 1999-04-12 2002-06-18 General Magic, Inc. Distributed voice user interface
US6745165B2 (en) * 1999-06-16 2004-06-01 International Business Machines Corporation Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system
US6178404B1 (en) 1999-07-23 2001-01-23 Intervoice Limited Partnership System and method to facilitate speech enabled user interfaces by prompting with possible transaction phrases
US6658388B1 (en) 1999-09-10 2003-12-02 International Business Machines Corporation Personality generator for conversational systems
US6684183B1 (en) 1999-12-06 2004-01-27 Comverse Ltd. Generic natural language service creation environment
US6748361B1 (en) * 1999-12-14 2004-06-08 International Business Machines Corporation Personal speech assistant supporting a dialog manager
US6701362B1 (en) * 2000-02-23 2004-03-02 Purpleyogi.Com Inc. Method for creating user profiles
US7249159B1 (en) * 2000-03-16 2007-07-24 Microsoft Corporation Notification platform architecture
US6782356B1 (en) * 2000-10-03 2004-08-24 Hewlett-Packard Development Company, L.P. Hierarchical language chunking translation table
US6922670B2 (en) 2000-10-24 2005-07-26 Sanyo Electric Co., Ltd. User support apparatus and system using agents
US20020072914A1 (en) * 2000-12-08 2002-06-13 Hiyan Alshawi Method and apparatus for creation and user-customization of speech-enabled services

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5729659A (en) * 1995-06-06 1998-03-17 Potter; Jerry L. Method and apparatus for controlling a digital computer using oral input
US5675707A (en) * 1995-09-15 1997-10-07 At&T Automated call router system and method
US6138100A (en) * 1998-04-14 2000-10-24 At&T Corp. Interface for a voice-activated connection system
US6311159B1 (en) * 1998-10-05 2001-10-30 Lernout & Hauspie Speech Products N.V. Speech controlled computer user interface
US6122614A (en) * 1998-11-20 2000-09-19 Custom Speech Usa, Inc. System and method for automating transcription services
US6327566B1 (en) * 1999-06-16 2001-12-04 International Business Machines Corporation Method and apparatus for correcting misinterpreted voice commands in a speech recognition system
US6324512B1 (en) * 1999-08-26 2001-11-27 Matsushita Electric Industrial Co., Ltd. System and method for allowing family members to access TV contents and program media recorder over telephone or internet

Cited By (182)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US7158499B2 (en) * 2001-09-19 2007-01-02 Mitsubishi Electric Research Laboratories, Inc. Voice-operated two-way asynchronous radio
US20030060181A1 (en) * 2001-09-19 2003-03-27 Anderson David B. Voice-operated two-way asynchronous radio
US20040030559A1 (en) * 2001-09-25 2004-02-12 Payne Michael J. Color as a visual cue in speech-enabled applications
US20030061054A1 (en) * 2001-09-25 2003-03-27 Payne Michael J. Speaker independent voice recognition (SIVR) using dynamic assignment of speech contexts, dynamic biasing, and multi-pass parsing
US6985865B1 (en) * 2001-09-26 2006-01-10 Sprint Spectrum L.P. Method and system for enhanced response to voice commands in a voice command platform
US20030061053A1 (en) * 2001-09-27 2003-03-27 Payne Michael J. Method and apparatus for processing inputs into a computing device
US8606584B1 (en) * 2001-10-24 2013-12-10 Harris Technology, Llc Web based communication of information with reconfigurable format
US20090144248A1 (en) * 2001-12-20 2009-06-04 Sybase 365, Inc. Context-Based Suggestions Mechanism and Adaptive Push Mechanism for Natural Language Systems
US8036877B2 (en) 2001-12-20 2011-10-11 Sybase, Inc. Context-based suggestions mechanism and adaptive push mechanism for natural language systems
US7231343B1 (en) * 2001-12-20 2007-06-12 Ianywhere Solutions, Inc. Synonyms mechanism for natural language systems
US20030130868A1 (en) * 2002-01-04 2003-07-10 Rohan Coelho Real-time prescription transaction with adjudication across a network
US20030130875A1 (en) * 2002-01-04 2003-07-10 Hawash Maher M. Real-time prescription renewal transaction across a network
US20030216913A1 (en) * 2002-05-14 2003-11-20 Microsoft Corporation Natural input recognition tool
US7380203B2 (en) * 2002-05-14 2008-05-27 Microsoft Corporation Natural input recognition tool
US20100202598A1 (en) * 2002-09-16 2010-08-12 George Backhaus Integrated Voice Navigation System and Method
US8145495B2 (en) * 2002-09-16 2012-03-27 Movius Interactive Corporation Integrated voice navigation system and method
US20040092293A1 (en) * 2002-11-06 2004-05-13 Samsung Electronics Co., Ltd. Third-party call control type simultaneous interpretation system and method thereof
US20050246177A1 (en) * 2004-04-30 2005-11-03 Sbc Knowledge Ventures, L.P. System, method and software for enabling task utterance recognition in speech enabled systems
US20050283367A1 (en) * 2004-06-17 2005-12-22 International Business Machines Corporation Method and apparatus for voice-enabling an application
US8768711B2 (en) * 2004-06-17 2014-07-01 Nuance Communications, Inc. Method and apparatus for voice-enabling an application
US20070027694A1 (en) * 2004-09-13 2007-02-01 Bushey Robert R System and method for analysis and adjustment of speech-enabled systems
US7110949B2 (en) 2004-09-13 2006-09-19 At&T Knowledge Ventures, L.P. System and method for analysis and adjustment of speech-enabled systems
US8117030B2 (en) 2004-09-13 2012-02-14 At&T Intellectual Property I, L.P. System and method for analysis and adjustment of speech-enabled systems
US20060056602A1 (en) * 2004-09-13 2006-03-16 Sbc Knowledge Ventures, L.P. System and method for analysis and adjustment of speech-enabled systems
US7653549B2 (en) 2004-09-16 2010-01-26 At&T Intellectual Property I, L.P. System and method for facilitating call routing using speech recognition
US20060143015A1 (en) * 2004-09-16 2006-06-29 Sbc Technology Resources, Inc. System and method for facilitating call routing using speech recognition
US20060069569A1 (en) * 2004-09-16 2006-03-30 Sbc Knowledge Ventures, L.P. System and method for optimizing prompts for speech-enabled applications
US7043435B2 (en) 2004-09-16 2006-05-09 Sbc Knowledgfe Ventures, L.P. System and method for optimizing prompts for speech-enabled applications
US20080040118A1 (en) * 2004-09-16 2008-02-14 Knott Benjamin A System and method for facilitating call routing using speech recognition
US9083798B2 (en) * 2004-12-22 2015-07-14 Nuance Communications, Inc. Enabling voice selection of user preferences
US20060136222A1 (en) * 2004-12-22 2006-06-22 New Orchard Road Enabling voice selection of user preferences
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US20120095752A1 (en) * 2006-12-01 2012-04-19 Microsoft Corporation Leveraging back-off grammars for authoring context-free grammars
US20080133220A1 (en) * 2006-12-01 2008-06-05 Microsoft Corporation Leveraging back-off grammars for authoring context-free grammars
US8108205B2 (en) * 2006-12-01 2012-01-31 Microsoft Corporation Leveraging back-off grammars for authoring context-free grammars
US8862468B2 (en) * 2006-12-01 2014-10-14 Microsoft Corporation Leveraging back-off grammars for authoring context-free grammars
US20080183474A1 (en) * 2007-01-30 2008-07-31 Damion Alexander Bethune Process for creating and administrating tests made from zero or more picture files, sound bites on handheld device
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US20100281435A1 (en) * 2009-04-30 2010-11-04 At&T Intellectual Property I, L.P. System and method for multimodal interaction using robust gesture processing
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
CN102681463A (en) * 2012-05-22 2012-09-19 青岛四方车辆研究所有限公司 Compact-type expanded input-output (IO) device
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
CN103901795A (en) * 2012-12-26 2014-07-02 中国科学院软件研究所 CPLD (Complex Programmable Logic Device)-based IO-station digital input module and input method
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
WO2014144949A3 (en) * 2013-03-15 2014-11-20 Apple Inc. Training an at least partial voice command system
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US9984063B2 (en) * 2016-09-15 2018-05-29 International Business Machines Corporation System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning
US9953027B2 (en) 2016-09-15 2018-04-24 International Business Machines Corporation System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US20190075167A1 (en) * 2017-09-07 2019-03-07 Samsung Electronics Co., Ltd. Electronic device, server and recording medium supporting task execution using external device
US11032374B2 (en) * 2017-09-07 2021-06-08 Samsung Electronics Co., Ltd. Electronic device, server and recording medium supporting task execution using external device
US11765234B2 (en) 2017-09-07 2023-09-19 Samsung Electronics Co., Ltd. Electronic device, server and recording medium supporting task execution using external device
US20190124031A1 (en) * 2017-10-20 2019-04-25 Sap Se Message processing for cloud computing applications
US10826857B2 (en) * 2017-10-20 2020-11-03 Sap Se Message processing for cloud computing applications
US11126446B2 (en) * 2019-10-15 2021-09-21 Microsoft Technology Licensing, Llc Contextual extensible skills framework across surfaces

Also Published As

Publication number Publication date
US7212964B1 (en) 2007-05-01
US7912726B2 (en) 2011-03-22
US7467081B2 (en) 2008-12-16
EP1215657A3 (en) 2005-04-27
EP1215657A2 (en) 2002-06-19
US20060004575A1 (en) 2006-01-05
US8073683B2 (en) 2011-12-06
US20090099837A1 (en) 2009-04-16
US20070118352A1 (en) 2007-05-24

Similar Documents

Publication Publication Date Title
US7912726B2 (en) Method and apparatus for creation and user-customization of speech-enabled services
US7869998B1 (en) Voice-enabled dialog system
US8645122B1 (en) Method of handling frequently asked questions in a natural language dialog service
EP1380153B1 (en) Voice response system
US7197460B1 (en) System for handling frequently asked questions in a natural language dialog service
US7024363B1 (en) Methods and apparatus for contingent transfer and execution of spoken language interfaces
Reddy et al. Speech to text conversion using android platform
US6366882B1 (en) Apparatus for converting speech to text
US6801897B2 (en) Method of providing concise forms of natural commands
EP1602102B1 (en) Management of conversations
Black et al. Building synthetic voices
US7146323B2 (en) Method and system for gathering information by voice input
RU2352979C2 (en) Synchronous comprehension of semantic objects for highly active interface
US6246989B1 (en) System and method for providing an adaptive dialog function choice model for various communication devices
US20080208586A1 (en) Enabling Natural Language Understanding In An X+V Page Of A Multimodal Application
GB2323694A (en) Adaptation in speech to text conversion
EP1215656B1 (en) Idiom handling in voice service systems
CA2346145A1 (en) Speech controlled computer user interface
JP6625772B2 (en) Search method and electronic device using the same
Primorac et al. Android application for sending SMS messages with speech recognition interface
US7069513B2 (en) System, method and computer program product for a transcription graphical user interface
JPH07222248A (en) System for utilizing speech information for portable information terminal
Davies et al. The IBM conversational telephony system for financial applications.
US20060031853A1 (en) System and method for optimizing processing speed to run multiple dialogs between multiple users and a virtual agent
US20020138276A1 (en) System, method and computer program product for a distributed speech recognition tuning platform

Legal Events

Date Code Title Description
AS Assignment

Owner name: AT&T CORP., NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALSHAWI, HIYAN;DOUGLAS, SHONA;REEL/FRAME:011382/0616

Effective date: 20001207

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T INTELLECTUAL PROPERTY II, L.P.;REEL/FRAME:041512/0608

Effective date: 20161214