US20020072914A1 - Method and apparatus for creation and user-customization of speech-enabled services - Google Patents
Method and apparatus for creation and user-customization of speech-enabled services Download PDFInfo
- Publication number
- US20020072914A1 US20020072914A1 US09/732,600 US73260000A US2002072914A1 US 20020072914 A1 US20020072914 A1 US 20020072914A1 US 73260000 A US73260000 A US 73260000A US 2002072914 A1 US2002072914 A1 US 2002072914A1
- Authority
- US
- United States
- Prior art keywords
- exemplar
- natural language
- action
- variant
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/45—Example-based machine translation; Alignment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
Definitions
- This invention relates generally to speech recognition technology. More particularly, the invention relates to development and customization of spoken language interfaces for a plurality of speech-enabled systems and sub-systems.
- Speech recognition technology has been applied to a variety of interactive spoken language services to reduce costs.
- Services benefiting from a spoken language interface may include, for example, services providing products and/or services, e-mail services, and telephone banking and/or brokerage services.
- Speech-enabled systems permit users to verbally articulate to the system a command relating to desired actions. The speech-enabled system recognizes the command and performs the desired action.
- the underlying technologies utilized in such speech-enabled systems include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
- speech recognition and speech synthesis technologies include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
- computer-telephony integration includes, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies.
- dialog and response generation technologies include, for example, voice recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies. The role of each technology in speech-enabled systems is described below briefly.
- speech recognition technology is used to convert an input of human speech into a digitized representation.
- speech synthesis takes a digitized representation of human speech or a computer-generated command and converts these into outputs that can be perceived by a human—for example, a computer-generated audio signal corresponding to the text form of a sentence.
- Known computer-telephony integration technology is typically used to interface the telephony network (which may be switched or packed-based) to, for example, a personal computer having the speech recognition and speech synthesis technologies.
- the computer-telephony platform can send and receive, over a network, digitized speech (to support recognition and synthesis, respectively) to and from a user during a telephony call.
- the computer-telephony integration technology is used to handle telephony signaling functions such as call termination and touch-tone detection.
- Language interpretation systems convert the digitized representation of the human speech into a computer-executable action related to the underlying application and/or service for which the spoken language interface is used—for example, a speech-enabled e-mail service.
- the dialog and response generation systems generate and control the system response for the speech-enabled service which may correspond to, for example, the answer to the user's question, a request for clarification or confirmation, or a request for additional information from the user.
- the dialog and response systems typically utilize the speech synthesis systems or other output devices (e.g., a display) to present information to the user.
- the dialog and response generation component may be responsible for predicting the grammar (also called the “language model”) that is to be used by the speech recognizer to constrain or narrow the required processing for the next spoken input by the user. For example, in a speech-enabled e-mail service, if the user has indicated the need to retrieve messages, the speech recognizer may limit processing for possible commands relating to retrieving messages the user may use.
- semantic representations are computer data structures or code intended to encode the meaning of a sentence (or multiple sentences) spoken by a user (e.g., in a language interpretation system), or to encode the intended meaning of the system's response to the user (e.g., in a dialog and response generation system).
- Various types of such intermediate semantic representations are used including hierarchically embedded value-attribute lists (also called “frames”) as well as representations based on formal logic.
- the language interpretation component converts the recognized word sequence (or digitized representation) into an instance of the intermediate semantic representation.
- Various means have been used for this conversion step, including conversion according to rules that trigger off of keywords and phrases, and conversion according to a manually written or statistically trained transition network.
- the resulting intermediate representation is then mapped into the actual executable application actions. This second conversion phase is often achieved by an exhaustive set of manually authored rules or by a computer program written specifically for this spoken language application.
- This approach requires programming experts familiar with the speech-enabled interfaces and programming experts familiar with the underlying application programs. As a result, speech-enabled interfaces using this approach can be very expensive to develop and/or customize.
- Another conventional method uses customized software modules for interfacing with the language interpretation system to determine which application-specific action to execute for a given recognized input sequence.
- customized software modules need to be developed for each application and for handling the various application-specific commands.
- this conventional approach for developing speech-enabled interfaces can be costly due to increased development times.
- What is needed is a system and method for creating and customizing speech-enabled services that may solve the difficulties encountered using conventional approaches. For example, what is needed is an efficient speech-enabled interface that is not only robust and flexible, but can also be easily customized by users so that personal language preferences can be used.
- Embodiments of the invention relate to a system and method for providing speech-enabled application programs.
- the speech-enabled programs automatically execute requests input by users.
- One or more natural language variants may be mapped with at least one natural language exemplar.
- the natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program.
- the natural language variant may correspond to an alternative way of expressing the same request.
- a recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants.
- the natural language exemplar mapped to the prospective variant is identified.
- An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request.
- users of the system can create a plurality of personalized natural language variants that represent preferred ways of expressing the desired requests. Accordingly, the system may be able to recognize the plurality of variants and execute the action as specified by the user's request.
- FIG. 1 is a diagrammatic representation of a system in accordance with embodiments of the present invention.
- FIG. 2 is block diagram illustrating a system in accordance with an embodiment of the present invention.
- FIG. 3 is a flow chart illustrating a method in accordance with an embodiment of the present invention.
- FIGS. 4A and 4B show a flow chart illustrating an exemplary method in accordance with an embodiment of the present invention.
- FIG. 5 is a diagrammatic representation of a customization module for use in the system as shown in FIG. 1.
- Embodiments of the present invention relate to the creation, development and customization of spoken language interfaces for a plurality of speech-enabled services.
- the invention provides a natural language interface to permit programmers and users to create and/or customize spoken language interfaces.
- the invention may provide an efficient and cost-effective way of developing spoken language interfaces that can be easily adapted to different systems or services—for example, messaging systems, auction systems, or interactive voice recognition (IVR) systems.
- the spoken language interface can be easily customized by end users based on their personal preferences and speech habits.
- Embodiments of the present invention may use natural-language to natural-language mapping between user-specified commands and commands specified by, for example, an application program developer.
- Embodiments of the present invention provide an efficient system for executing a plurality of user commands that may map to a finite number of executable actions as specified by the program developer. Accordingly, the program developer may need only specify a finite number of exemplary English (or other language) commands that may be related to application actions. These exemplary English commands may be mapped with a plurality of English variations that a user may use for the desired action. The user can customize the English variations to create preferred commands to execute a desired action.
- FIG. 1 a block diagram of a speech-enabled system in accordance with embodiments of the present invention is shown.
- User 101 may use terminal device 102 for access to the application program 115 .
- the terminal device 102 may be, for example, a personal computer, a telephone, a mobile phone, a hand-held device, personal digital assistant (PDA) or other suitable device having suitable hardware and/or software to connect with network 120 and access application program 115 .
- Terminal device 102 may be installed with suitable hardware and software, for example, an Internet browser and a modem for connection to the Internet.
- Network 120 may include, for example, a public switched telephone network (PSTN), a cellular network, an Internet, an intranet, satellite network and/or any other suitable national and/or international communications network or combination thereof.
- PSTN public switched telephone network
- Network 120 may include a plurality of communications devices (e.g., routers, switches, servers, etc.) including at least one computer telephony platform 121 .
- Platform 121 may be a high-capacity computer and/or server that has the capacity to send, receive, and/or process digitized speech (e.g., to support speech recognition and synthesis functions).
- Platform 121 may be equipped to interface with a switch-based or packet-based network 120 .
- platform 121 may be equipped with telephony interface to handle telephony signaling functions such as call termination and touch tone detection.
- Platform 121 may be located within the network 120 or, alternatively, it may be located outside network 120 .
- Platform 121 may serve as a gateway interface to spoken language processor (SLP) 104 .
- Platform 121 may receive data from terminal device 102 and dispatch this information to SLP 104 .
- platform 121 may dispatch data from SLP 104 to the terminal device 102 .
- SLP 104 may be coupled to network 120 to provide a speech-enabled interface for an application programming interface (API) 114 and corresponding application or service 115 .
- Application program 115 may support services or systems, for example, messaging systems, auction systems, interactive voice recognition (IVR) systems, or any other suitable system or service that may utilize a spoken language interface for automation.
- API 114 may be a software interface that application 115 may use to request and carry out lower-level services performed by a computer's or telephone system's operating system.
- An API 114 may include, for example, a set of standard software interrupts, calls, and data formats used by application program 115 to interface with network services, mainframe communications programs, telephone equipment or program-to-program communications.
- SLP 104 may include a plurality of components, for example, an output synthesizer 105 , recognizer 106 , variation matcher 107 , variant database 108 , exemplar adjuster 110 , action invoker 111 and context specifications database 112 . It is recognized that output synthesizer 105 may provide data that can be presented to user's terminal device 102 .
- Output synthesizer 105 and recognizer 106 are known per se.
- Output synthesizer 105 may be a speech synthesizer or display formatter for delivering information to the user 101 .
- the display formatter may produce data suitable for presentation on any physical display, for example, a cathode ray tube (CRT), liquid crystal display (LCD), flat plasma display, or any other type of suitable display.
- CTR cathode ray tube
- LCD liquid crystal display
- flat plasma display or any other type of suitable display.
- any suitable speech synthesizer that can take unrestricted text or digital data as input and convert this text or data into an audio signal for output to the user 101 may be used in embodiments of the invention.
- Recognizer 106 may receive a natural language request in the form of, for example, an audio or analog signal S from user 101 and may convert this signal into a digitized data string or recognized word sequence, W.
- Signal S may be converted by the terminal device 102 and/or platform 121 to travel across network 120 and is delivered to recognizer 106 .
- Digitized data string W may represent the natural language request S in the form of a digital signal as output by recognizer 106 .
- W may be a sequence of words in text form.
- Recognizer 106 may use any known process or system to convert signals S into a data string W.
- recognizer 106 can load and switch between language models dynamically.
- the English language is referred to herein as the spoken language for use with the speech-enabled services.
- the present invention can be applied to spoken language interfaces for other languages.
- the user's terminal device 102 may include a handwriting recognition device, keyboard, and/or dial pad the user 101 may use to input a command and generate signal S.
- the generated signal S may be delivered to the recognizer 106 and processed as described above.
- variation matcher 107 may use variation database 108 and a variation matching function (not shown) to map the digitized data string W (or the recognized word sequence) into an exemplary English sentence E (i.e., an exemplar).
- the exemplar E may correspond to a typical way, as defined by an applications developer, of phrasing a particular request relevant to the current application program 115 .
- the variation matcher 107 may further compute a string mapping function ⁇ that may indicate the difference in meaning between the recognized digitized data string W and the exemplar E.
- variant database 108 may contain a language model (L) 130 related to a particular context and/or the application program or service 115 , currently accessed by the user 101 .
- the language model 130 may be derived from the plurality of variant command files 109 using techniques known in the art for generating speech recognition language models from collections of sentences.
- each file 109 may be pertinent to particular context C corresponding to language model 130 .
- variant database 108 may contain a single language model 130 relating to a particular application program or, alternatively, may contain a plurality of language models 130 relating to various application programs.
- Variant command file 109 for context C may contain, for example, an exemplar E 1 related to context C and associated variants V 1 1 to V 1 n .
- exemplar E 1 related to context C
- variant database 108 may store a set of related data of the form (C, V, E), where each V is an alternative way to phrase in natural language a request for the action A that is associated with exemplar E in context C. Since exemplars E may also be valid ways of phrasing application actions A, they are included in the variant database 108 as “variants” of themselves.
- a set of exemplars E 1 to E m associated with the particular context C of an application program may be provided by the developer of the spoken language interface or of the application program.
- the developer may be an expert in the application API 114 .
- the developer need not be an expert in speech-enabled services.
- Each exemplar E may represent an exemplary way of phrasing, in English or any other suitable language, a particular executable command or action for context C (as will be discussed below in more detail).
- the developer may map exemplar E to action A.
- each file 109 may contain a plurality of English variants V 1 1 ⁇ V m n .
- Variants V 1 1 ⁇ V 1 n may represent different ways of saying or representing corresponding exemplar E 1 ; variants V 2 1 ⁇ V 2 k may represent different ways for saying exemplar E 2 ; etc.
- These variants V 1 1 ⁇ V m n may be created by anyone without requiring any relevant expertise or knowledge of the application program and/or speech-enabled technologies.
- the user 101 may create variant V 1 1 that represents the manner in which the user 101 typically refers to the desired action represented by exemplar E 1 .
- the created variant(s) V 1 1 ⁇ V 1 n may be mapped to its associated exemplar E 1 for a particular context C, for example, in the form (C, V, E), as indicated above.
- C may correspond to the context of reading e-mail messages
- E 1 may be the exemplar “Retrieve my mail messages”
- the variants may include, for example, V 1 1 “get my mail”, V 1 2 “fetch my e-mail”, V 1 3 “fetch messages”.
- context specifications database 112 may contain a set of exemplar action specification files 113 for one application program or a plurality of different application programs.
- Exemplar action files 113 may correspond and/or relate to variant files 109 .
- the variants in a variant file 109 may be used to express the actions A in a corresponding action file 113 , and A may be available for execution by action invoker 111 .
- exemplar-action specification file 113 may contain a plurality of contexts C 1 ⁇ C m , a plurality of associated exemplars E 1 ⁇ E m , associated actions A 1 ⁇ A m , and pointer to a next context C′ ⁇ C x . Accordingly, each exemplar-action specification file 113 may contain a list of “exemplar-action” records stored or correlated as (C, E, A, C′). Each record (C, E, A, C′) may associate the exemplar E with a sequence A of action strings in the command language executable by the action invoker 111 in context C, and an identifier C′ of another, or the same, application context specification.
- each exemplary action specification file in the set of files 113 may correspond to a stage of interaction with the user.
- the application program is a speech-enabled e-mail service
- the first action specification file 113 may contain actions relating to logging on, or identification of the user to the service
- a second action specification file 113 may contain actions relating to sending or retrieving e-mail messages.
- action specification file 113 related to actions required to identify the user may be activated, followed by activating the action specification file for, for example, retrieving e-mail messages.
- the second action specification file may contain exemplars E and associated actions A relating to retrieving messages, for example, retrieving new messages, retrieving previously read messages, etc.
- a language model L for use by the speech recognizer may be built for each context, based on the variants specified for that context. These models may be augmented with lists of proper names that may be used instead of those present in the exemplars E and variants V. Standard techniques for language modeling can be used for deriving the language models from the set of variants.
- variant database 108 and context specification database 112 are shown as two different databases, it is recognized that variant database 108 and context specification database 112 may be consolidated into a single database. It should be noted that descriptions of data flow and data configuration in databases 108 and 112 are given by way of example and variations may be made by one of ordinary skill in the art. For example, variations to the variant command files 109 or configuration or flow of the included data (e.g., C, V, E) and/or to the exemplar action specification files 113 may be made by one of ordinary skill in the art.
- recognizer 106 may produce a digitized data string W that matches exemplar E or variant V, stored in the variant database 108 , exactly and the system can then proceed with invoking the corresponding application action A.
- an exact match between data string W and a corresponding exemplar E or variant V may not be found.
- recognition errors, requests for actions involving different objects (e.g., using different names) from those in the exemplars E or variants V, linguistic variation in the user utterances (including variants from their own customizations) and/or any combination of variations thereof may prevent exact matches from being found.
- variation matcher 107 may seek to select a prospective variant V, in active context C, that most resembles, or most closely matches, the natural language request as represented by digitized data W.
- Variation matcher 107 may also specify the necessary changes or adaptations (i.e., string mapping function ⁇ ) to be made to, for example, variation matcher.
- Any known technique may be used to determine whether, for example, a given text or data sequence (e.g., a prospective variant) most resembles or closely matches the recognized word sequence. For example, known mathematical algorithms (as described below) may be applied to find such matches.
- exemplar adjuster 110 may receive the exemplar E and string mapping function ⁇ from the variation matcher 107 .
- Exemplar adjuster 110 with input from context specifications database 112 may apply the string mapping function ⁇ to an application action A (an API call) that is paired with the exemplar E (e.g., from the context specifications database) to produce the actual API call or adapted action A′.
- Adapted action A′ may then be executed by the action invoker 111 to carry out the user's request.
- Exemplar adjuster 110 may apply necessary adaptations to the action strings A to be invoked by the application and to the exemplar E (e.g., for confirmation purposes).
- variation matcher 107 may compute a function f taking an input W and a sequence ⁇ (V 1 , E 1 ), . . . , (V n , E n )> of pairs of strings.
- the output of f may be one of the input pairs ⁇ (V 1 , E 1 ) ⁇ together with a string mapping function ⁇ that is:
- the selected pair (V 1 , E 1 ) may be the first pair in the input sequences for which a string distance function ⁇ is minimal:
- String mapping function ⁇ may include a sequence of string editing operations, specifically insertions, deletions, and substitutions.
- Exemplar adjuster 110 may fetch the action Ai associated with the exemplar Ei, where i may be any integer from 1 to m.
- a second string mapping function ⁇ ′ may be derived from ⁇ , including only those string editing operations that are valid transformations of the action string Ai.
- a valid transformation may be one that results in an action string A that is well formed in the sense that it is parsed successfully by the action invoker 111 .
- Second string mapping function ⁇ ′ is then applied to both sides of the selected pair by the exemplar adapter 110 to produce the “adapted” pair ⁇ (E′i, A′i) ⁇ .
- the string distance ⁇ is the string edit distance and ⁇ is the corresponding edits found by the dynamic programming algorithm used to compute the minimal edit distance.
- Such edit-distance computation algorithms are known in computer science and have been used in various applications such as document search and evaluating the outputs of speech recognition systems.
- language and action strings may both be treated as sequences of tokens (typically words in the case of language strings).
- Edit-distance functions rely on a table of token distances for use when comparing tokens.
- Token distances can be uniform (e.g., two words that are different have a token distance of 1 and identical tokens have a distance of 0 ).
- token distances can be provided in the form of a table that reflects the closeness in meaning between any two words.
- edit-distance matching may be used in conjunction with a natural language generation component.
- Natural language generators are known per se. Natural language generators may be used to apply linguistic principles to generate a set of paraphrases, or close paraphrases, of English sentences. Such linguistic principles include syntactic transformations (e.g., the active-passive transformation) and paraphrases based on lexical semantics (e.g., “A sells X to B” is the same as “B buys X from A”).
- a natural language generator may first be used to produce paraphrases of each of the variants present in a context. This may result in an expanded set of variants for the context to which edit-distance matching may then be applied as indicated above.
- natural language generators may be used to automatically generate at least one variant V by generating paraphrases of an exemplar E.
- variation matcher 107 Although only two embodiments of a variation matcher 107 have been described, it is recognized that alternative techniques may be applied in the variation matcher. For example, any suitable method that can measure the difference in meaning between two sentences and represent that difference as a string mapping function can be used as the basis for a variation matcher.
- the action invoker 111 may be a command string interpreter capable of executing dynamically generated strings (e.g., method calls and database query requests) corresponding to actions in the API for the application.
- the command interpreter may execute scripting languages (e.g., TCL), or procedure calls for languages with reflection (e.g., Java), or database query languages (e.g., SQL).
- the exemplar adjuster 110 can ask user 101 for confirmation that adapted exemplar E′ may express the action that is desired by the user 101 . If the user confirms positively, the action invoker 111 may dispatch adapted action A′ to API 114 . Application program 115 may execute the dispatched action and return the resulting output O′ to the action invoker 111 . The session manager 103 may present output O′ to the user via output synthesizer 105 .
- Session manager or controller 103 may be coupled with SLP 104 and may manage the plurality of components within the SLP 104 .
- session manager 103 may provide data flow control for the various components of SLP 104 during a speech-enabled session.
- the session manager 103 may maintain an active context C.
- There may be an initial context specification associated with each program application.
- Each context specification may be associated with a collection of variants in the variant database 108 .
- session manager 103 is shown external to SLP 104 , it is recognized that alternatively session manager 103 may be incorporated within SLP 104 .
- FIG. 2 is a component-level block diagram of a spoken language processing system 200 in accordance with an embodiment of the present invention.
- the spoken language processing system 200 may be used as the speech-enabled interface for a desired service 210 .
- a user or customer may, for example, input command S to be executed by the service 210 .
- the user may input command S using terminal device 102 .
- the user may articulate a spoken command into a microphone of, for example, the terminal device 102 (e.g., a telephone, PC, or other communication device).
- the terminal device 102 may include handwriting recognition system, a dial-pad, a touch screen or keyboard or other input device that the user 101 may use to input command S.
- a recognized input string W may be generated by the speech recognizer 106 .
- the recognized input string W may be in the form of digitized data that represents a command (S) input by a user.
- the recognizer may be located internal to or external to the natural language processing system 200 .
- the recognizer 106 may be coupled to a processor 203 located in the spoken language system 200 of the present invention.
- the processor may perform the functions of, for example, variation matcher 107 , exemplar adjuster 110 , action invoker 111 , and/or perform other processing functions that may required by the system 200 .
- the processor 203 may process the command S that is input by the user to generate recognized input string W.
- Processor 203 may be coupled to a memory 204 and controller 202 .
- the memory 204 may be used to store, for example, variant database 108 , context specification database 112 , and/or any other data or instructions that may be required by processor 203 and/or controller 202 . It is recognized that any suitable memory may be used in system 200 .
- the databases 108 and/or 112 may be organized as contexts related to the desired service. Accordingly, depending on the service accessed or the stage of service, the processor may load the proper context.
- processor 203 may use the variation database and a variation matching function to map the recognized input string W into an exemplary natural language exemplar E, stored in the variant database 108 in memory 204 .
- the exemplar E may correspond to a typical way of phrasing a particular request relevant to the current application.
- At least one natural language exemplar E may correspond to one or more natural language variants V.
- These natural language variants V may represent alternative ways to express exemplar E.
- These variants may also be stored in the variant database 108 and may be created by, for example, the user, application programmer, and/or speech interface developer.
- processor 203 may select, from the one or more natural language variants V, a prospective variant that most resembles or closely matches the recognized word sequence using any known technique for matching as described above. After the selection is made, the corresponding natural language exemplar E may be identified.
- the processor may identify an application action A (API call) corresponding to the exemplar E.
- Action A and corresponding exemplar(s) may be stored in, for example, context specification database 112 stored in memory 204 .
- controller 202 may cause the action A to be invoked by service 210 .
- the processor 203 may also generate string mapping function ⁇ .
- String mapping function ⁇ may specify the difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V.
- the processor 203 may then apply the string mapping function ⁇ to the application action A that corresponds with the exemplar E, to produce the actual API call or adapted action A′.
- the controller 202 may cause the actual API call A′ to be executed by the service 210 to carry out the user's request.
- the processor may apply the string mapping function ⁇ to the exemplar E to produce an adapted exemplar E′.
- the adapted exemplar E′ may be presented to the user via output synthesizer 105 .
- the user may be asked to confirm whether the action desired by the user may be expressed by exemplar E or adapted exemplar E′. If the user accepts E or E′, the controller 202 executes action A or adapted action A′, respectively. If the user does not accept E or E′, then the processor 203 may continue processing the recognized input string W, as described above, until the user's request has been carried out. In alternative embodiments, if the user does not accept E or E′, the controller may ask the user to rephrase their request.
- Application program 210 may execute the action A or adapted action A′ and return the resulting output O′ to the controller 202 .
- the controller 202 may present output O′ to the user's terminal device 102 via output synthesizer 105 .
- User 101 may access SLP 104 of a speech-enabled service in accordance with the present invention ( 301 ).
- Session manager 103 may cause speech recognizer 106 to load (or switch to) the language model L for the active context related to the application program serviced by the SLP 104 ( 302 ).
- the user 101 may be presented with a greeting via output synthesizer 105 , and the user may respond by articulating a command into an input of terminal device 102 ( 303 ).
- the speech recognizer 106 may receive input S and produce an output data string W ( 304 ).
- the output data string W may be a transcription hypothesis of the user's command.
- Variation matcher 107 is applied to W to select an exemplar E from the active context C and to construct a string-mapping function ⁇ ( 305 ).
- the exemplar adjuster 110 applies the string-mapping function ⁇ in order to construct an adapted exemplar E′ and an adapted executable action A′ ( 306 ).
- the system asks the user for confirmation to proceed with the sequence of actions A′ by presenting to the user (via the output synthesizer 105 ) the English expression E′ ( 307 ) and asking user 101 whether the adapted action A′ as expressed by the adapted exemplar E′ is desired ( 308 ).
- the session manager passes the adapted action A′ to the action invoker which executes the action A′ and returns any resulting output O′ to the user via the output synthesizer 105 ( 309 ).
- the session manager may send this output (or a summary of it as appropriate) to the speech synthesizer or display.
- the active context for handling the next request by the user is changed to the context C′ associated with E in C, ( 310 ).
- step 308 the speech recognizer produces another output string W based on the command ( 304 ).
- the speech recognizer 106 may produce another output string W that may be different from the previously created W.
- the variation matcher 107 may receive another output string W or may receive the same output string W and the variation matcher 107 may select another exemplar E′ and mapping function ⁇ ′.
- the system may, for example, re-execute steps 306 through 308 to construct an adapted action A′ and adapted exemplar E′ that is desired by the user.
- the controller may ask the user to rephrase their request.
- FIGS. 4A and 4B show a flow chart applying embodiments of the present invention to an exemplary speech-enabled e-mail service.
- a user may desire to retrieve e-mail messages and may log on via the Internet or call the speech-enabled e-mail service.
- the user may articulate speech into a microphone of, for example, the terminal device 102 or a telephone or other communication device (not shown).
- the controller 103 may load the active context language model for the speech-enabled e-mail service from variant database 108 of SLP 104 .
- the user's input may be converted to an electrical signal that is passed to the recognizer as an input command S.
- Input command S may be, for example, “Was there anything new from Kathleen?” ( 401 ).
- Recognizer 106 may convert the command S into an output string W which may be interpreted as “Is there any thing few from Kathleen?” ( 402 ).
- the recognizer 106 may be susceptible to errors depending on the clarity of the input or other external or internal variations; thus, for example, “new” may be interpreted as “few” by recognizer 106 .
- Variation matcher 107 takes the string W and attempts to find a suitable match from the variant database 108 .
- Variation matcher may retrieve a stored variant V “Is there anything from Joe?” ( 403 ).
- the variant matcher 107 may retrieve exemplar E (e.g., Get the messages with sender Joe) that is associated with variant V of step 403 ( 404 ).
- Variant matcher 107 may construct a string mapping function ⁇ , that expresses the difference between output string W of step 402 and variant V of step 403 ( 405 ).
- String mapping function ⁇ indicates the insertion of the word “few” and the substitution of the word “Joe” by “Kathleen” ( 405 ).
- various known techniques may be implemented to determine string-mapping function ⁇ .
- the action A of step 406 is an exemplary action expressed as line of code that the application program understands and may be able to execute. It is recognized that the line of code for action A is given by example only and that many different expressions can be written.
- a subset of string mapping function ⁇ as applicable to action A ( ⁇ A ) is generated and may be applied to action A ( 407 ).
- Adapted exemplar E′ may generate, for example, “Get the messages with sender Kathleen” ( 409 ).
- the adapted exemplar E′ may be presented to the user; and if the user confirms that the user desires the adapted action A′ as expressed by the exemplar E′, the adapted action A′ may be executed by the API 114 of application program 115 . Accordingly, messages from Kathleen, for example, may be presented to the user via output synthesizer 105 .
- Embodiments of the present invention may permit users to customize the variant database 108 so that they can create variants that closely represent the manner in which the user would articulate a particular action.
- FIG. 5 shows a block diagram showing a speech customization module 500 in accordance with embodiments of the present invention.
- the customization module 500 may be used to add personalized variants relating to stored exemplars E in variant database 108 .
- Users 101 may use, for example, a known web browser 502 to access context customizer 503 . Although a web browser is shown, a user may use a telephone or other suitable device to access context customizer 503 .
- Context customizer 503 may be coupled to variant database 108 and customizer server 501 .
- Users of the system 100 may access the generic context files Cg 109 stored in variant database 108 and create customized content files 504 stored in a customization server 501 .
- Generic context files Cg 109 may contain, for example, context identifier C, a variant V and corresponding exemplar E.
- Customization server 501 may contain customized context files 504 for a plurality of users U 1 -UN. Each customized file 504 may contain personalized context containing personalized variants (e.g., V 1 1 , V 1 2 to V m n ) personal to the user.
- User U 1 may create one or more variants V corresponding to, for example, exemplar E.
- the user 101 may customize files 504 to reflect this preference. It is recognized that any language—for example, French, Spanish, etc. may be used in embodiments of the present invention.
- user U 1 may customize a context Cu 1 , adding to the variants associated with C in the user's U 1 personal variant database file 504 by composing natural language requests V and associating them with natural language requests or lists of requests E which are taken from the exemplars E associated with context C.
- the customization module 500 may permit a user to create and edit natural-language to natural-language (e.g., English-to-English) customization files stored, for example, on a server 501 using a standard HTTP browser.
- User U 1 may be authenticated by the customization module 500 using known techniques and choosing an application, and within that a context C, to customize.
- user may construct pairs of the form “When I say V 1 , I mean E′′ by choosing an exemplar E from among the available exemplars in C and entering a personalized variant V 1 to be associated with that exemplar.
- the resulting variants may be uploaded into variant database 104 in the form: (U 1 , V 1 , E, C), indicating that the customized variant V 1 belongs to user U 1 and is related to exemplar E in context C. Accordingly, when the user U 1 uses system 100 of FIG. 1, the customized context will be available to the user U 1 including the customized variants in addition to any variants that may already be present in the database for all users. In embodiments of the present invention, for subsequent customizations, the user may be presented with their own custom version of any context they have customized in the past. Additionally, users may be able to revert back to the generic context Cg when desired.
Abstract
Embodiments of the invention relate to a system and method for providing speech-enabled application programs. The speech-enabled programs automatically execute requests input by users. One or more natural language variants may be mapped with at least one natural language exemplar. The natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program. The natural language variant may correspond to an alternative way of expressing the same request. A recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants. The natural language exemplar mapped to the prospective variant is identified. An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request.
Description
- This invention relates generally to speech recognition technology. More particularly, the invention relates to development and customization of spoken language interfaces for a plurality of speech-enabled systems and sub-systems.
- In recent years, the desire to use speech-enabled systems has increased. Speech recognition technology has been applied to a variety of interactive spoken language services to reduce costs. Services benefiting from a spoken language interface may include, for example, services providing products and/or services, e-mail services, and telephone banking and/or brokerage services. Speech-enabled systems permit users to verbally articulate to the system a command relating to desired actions. The speech-enabled system recognizes the command and performs the desired action.
- Typically, the underlying technologies utilized in such speech-enabled systems include, for example, speech recognition and speech synthesis technologies, computer-telephony integration, language interpretation, and dialog and response generation technologies. The role of each technology in speech-enabled systems is described below briefly.
- As is known, speech recognition technology is used to convert an input of human speech into a digitized representation. Conversely, speech synthesis takes a digitized representation of human speech or a computer-generated command and converts these into outputs that can be perceived by a human—for example, a computer-generated audio signal corresponding to the text form of a sentence.
- Known computer-telephony integration technology is typically used to interface the telephony network (which may be switched or packed-based) to, for example, a personal computer having the speech recognition and speech synthesis technologies. Thus, the computer-telephony platform can send and receive, over a network, digitized speech (to support recognition and synthesis, respectively) to and from a user during a telephony call. Additionally, the computer-telephony integration technology is used to handle telephony signaling functions such as call termination and touch-tone detection.
- Language interpretation systems convert the digitized representation of the human speech into a computer-executable action related to the underlying application and/or service for which the spoken language interface is used—for example, a speech-enabled e-mail service.
- The dialog and response generation systems generate and control the system response for the speech-enabled service which may correspond to, for example, the answer to the user's question, a request for clarification or confirmation, or a request for additional information from the user. The dialog and response systems typically utilize the speech synthesis systems or other output devices (e.g., a display) to present information to the user. Additionally, the dialog and response generation component may be responsible for predicting the grammar (also called the “language model”) that is to be used by the speech recognizer to constrain or narrow the required processing for the next spoken input by the user. For example, in a speech-enabled e-mail service, if the user has indicated the need to retrieve messages, the speech recognizer may limit processing for possible commands relating to retrieving messages the user may use.
- Using one conventional method, language interpretation and dialog and response generation are mediated by intermediate representations, often referred to as semantic representations. These representations are computer data structures or code intended to encode the meaning of a sentence (or multiple sentences) spoken by a user (e.g., in a language interpretation system), or to encode the intended meaning of the system's response to the user (e.g., in a dialog and response generation system). Various types of such intermediate semantic representations are used including hierarchically embedded value-attribute lists (also called “frames”) as well as representations based on formal logic.
- To facilitate this intermediate representation process, a two-step process is typically used. First, the language interpretation component converts the recognized word sequence (or digitized representation) into an instance of the intermediate semantic representation. Various means have been used for this conversion step, including conversion according to rules that trigger off of keywords and phrases, and conversion according to a manually written or statistically trained transition network. Second, the resulting intermediate representation is then mapped into the actual executable application actions. This second conversion phase is often achieved by an exhaustive set of manually authored rules or by a computer program written specifically for this spoken language application. This approach requires programming experts familiar with the speech-enabled interfaces and programming experts familiar with the underlying application programs. As a result, speech-enabled interfaces using this approach can be very expensive to develop and/or customize.
- Alternatively, another conventional method uses customized software modules for interfacing with the language interpretation system to determine which application-specific action to execute for a given recognized input sequence. Using this approach, customized software modules need to be developed for each application and for handling the various application-specific commands. As a result, this conventional approach for developing speech-enabled interfaces can be costly due to increased development times.
- Using conventional approaches, development of speech-enabled services requires skills different from, and in addition to, skills needed for programming the underlying application program for the service. Even for skilled spoken language system engineers, development of robust interfaces can be difficult and time-consuming with current technology. This increases the development time for such services and more generally slows widespread adoption of spoken language interface technology.
- Since these conventional approaches require specialized programming skills, customizing these speech-enabled services, by users, based on personal language preferences, if at all possible, can be very difficult.
- What is needed is a system and method for creating and customizing speech-enabled services that may solve the difficulties encountered using conventional approaches. For example, what is needed is an efficient speech-enabled interface that is not only robust and flexible, but can also be easily customized by users so that personal language preferences can be used.
- Embodiments of the invention relate to a system and method for providing speech-enabled application programs. The speech-enabled programs automatically execute requests input by users. One or more natural language variants may be mapped with at least one natural language exemplar. The natural language exemplar may correspond to a typical way to express a request relevant to the speech-enabled application program. The natural language variant may correspond to an alternative way of expressing the same request. A recognized input string is received and a prospective variant that most resembles the received recognized input string is selected from the natural language variants. The natural language exemplar mapped to the prospective variant is identified. An action instruction associated with the identified natural language exemplar is executed to fulfill the user's request.
- In embodiments of the invention, users of the system can create a plurality of personalized natural language variants that represent preferred ways of expressing the desired requests. Accordingly, the system may be able to recognize the plurality of variants and execute the action as specified by the user's request.
- The above and other features and advantages of the present invention will be readily apparent and fully understood from the following detailed description of preferred embodiments, taken in connection with the appended drawings.
- FIG. 1 is a diagrammatic representation of a system in accordance with embodiments of the present invention.
- FIG. 2 is block diagram illustrating a system in accordance with an embodiment of the present invention.
- FIG. 3 is a flow chart illustrating a method in accordance with an embodiment of the present invention.
- FIGS. 4A and 4B show a flow chart illustrating an exemplary method in accordance with an embodiment of the present invention.
- FIG. 5 is a diagrammatic representation of a customization module for use in the system as shown in FIG. 1.
- Embodiments of the present invention relate to the creation, development and customization of spoken language interfaces for a plurality of speech-enabled services. The invention provides a natural language interface to permit programmers and users to create and/or customize spoken language interfaces. The invention may provide an efficient and cost-effective way of developing spoken language interfaces that can be easily adapted to different systems or services—for example, messaging systems, auction systems, or interactive voice recognition (IVR) systems. Advantageously, the spoken language interface can be easily customized by end users based on their personal preferences and speech habits.
- Embodiments of the present invention may use natural-language to natural-language mapping between user-specified commands and commands specified by, for example, an application program developer. Embodiments of the present invention provide an efficient system for executing a plurality of user commands that may map to a finite number of executable actions as specified by the program developer. Accordingly, the program developer may need only specify a finite number of exemplary English (or other language) commands that may be related to application actions. These exemplary English commands may be mapped with a plurality of English variations that a user may use for the desired action. The user can customize the English variations to create preferred commands to execute a desired action.
- Referring FIG. 1, a block diagram of a speech-enabled system in accordance with embodiments of the present invention is shown.
User 101 may useterminal device 102 for access to theapplication program 115. Theterminal device 102 may be, for example, a personal computer, a telephone, a mobile phone, a hand-held device, personal digital assistant (PDA) or other suitable device having suitable hardware and/or software to connect withnetwork 120 andaccess application program 115.Terminal device 102 may be installed with suitable hardware and software, for example, an Internet browser and a modem for connection to the Internet. -
Network 120 may include, for example, a public switched telephone network (PSTN), a cellular network, an Internet, an intranet, satellite network and/or any other suitable national and/or international communications network or combination thereof. -
Network 120 may include a plurality of communications devices (e.g., routers, switches, servers, etc.) including at least onecomputer telephony platform 121.Platform 121 may be a high-capacity computer and/or server that has the capacity to send, receive, and/or process digitized speech (e.g., to support speech recognition and synthesis functions).Platform 121 may be equipped to interface with a switch-based or packet-basednetwork 120. Additionally,platform 121 may be equipped with telephony interface to handle telephony signaling functions such as call termination and touch tone detection.Platform 121 may be located within thenetwork 120 or, alternatively, it may be located outsidenetwork 120.Platform 121 may serve as a gateway interface to spoken language processor (SLP) 104.Platform 121 may receive data fromterminal device 102 and dispatch this information toSLP 104. Conversely,platform 121 may dispatch data fromSLP 104 to theterminal device 102. - In accordance with embodiments of the invention,
SLP 104 may be coupled tonetwork 120 to provide a speech-enabled interface for an application programming interface (API) 114 and corresponding application orservice 115.Application program 115 may support services or systems, for example, messaging systems, auction systems, interactive voice recognition (IVR) systems, or any other suitable system or service that may utilize a spoken language interface for automation.API 114 may be a software interface thatapplication 115 may use to request and carry out lower-level services performed by a computer's or telephone system's operating system. AnAPI 114 may include, for example, a set of standard software interrupts, calls, and data formats used byapplication program 115 to interface with network services, mainframe communications programs, telephone equipment or program-to-program communications. - In embodiments of the invention,
SLP 104 may include a plurality of components, for example, anoutput synthesizer 105,recognizer 106,variation matcher 107,variant database 108,exemplar adjuster 110,action invoker 111 andcontext specifications database 112. It is recognized thatoutput synthesizer 105 may provide data that can be presented to user'sterminal device 102. -
Output synthesizer 105 andrecognizer 106 are known per se.Output synthesizer 105 may be a speech synthesizer or display formatter for delivering information to theuser 101. The display formatter may produce data suitable for presentation on any physical display, for example, a cathode ray tube (CRT), liquid crystal display (LCD), flat plasma display, or any other type of suitable display. Alternatively or additionally, any suitable speech synthesizer that can take unrestricted text or digital data as input and convert this text or data into an audio signal for output to theuser 101 may be used in embodiments of the invention. -
Recognizer 106 may receive a natural language request in the form of, for example, an audio or analog signal S fromuser 101 and may convert this signal into a digitized data string or recognized word sequence, W. Signal S may be converted by theterminal device 102 and/orplatform 121 to travel acrossnetwork 120 and is delivered torecognizer 106. Digitized data string W may represent the natural language request S in the form of a digital signal as output byrecognizer 106. For example, W may be a sequence of words in text form.Recognizer 106 may use any known process or system to convert signals S into a data string W. In one embodiment,recognizer 106 can load and switch between language models dynamically. For simplicity, the English language is referred to herein as the spoken language for use with the speech-enabled services. However, the present invention can be applied to spoken language interfaces for other languages. - In alternative embodiments of the invention, the user's
terminal device 102 may include a handwriting recognition device, keyboard, and/or dial pad theuser 101 may use to input a command and generate signal S. The generated signal S may be delivered to therecognizer 106 and processed as described above. - Advantageously, in accordance with embodiments of the present invention,
variation matcher 107 may usevariation database 108 and a variation matching function (not shown) to map the digitized data string W (or the recognized word sequence) into an exemplary English sentence E (i.e., an exemplar). The exemplar E may correspond to a typical way, as defined by an applications developer, of phrasing a particular request relevant to thecurrent application program 115. Thevariation matcher 107 may further compute a string mapping function φ that may indicate the difference in meaning between the recognized digitized data string W and the exemplar E. - In embodiments of the invention,
variant database 108 may contain a language model (L) 130 related to a particular context and/or the application program orservice 115, currently accessed by theuser 101. Thelanguage model 130 may be derived from the plurality of variant command files 109 using techniques known in the art for generating speech recognition language models from collections of sentences. In embodiments of the invention, eachfile 109 may be pertinent to particular context C corresponding tolanguage model 130. It is recognized thatvariant database 108 may contain asingle language model 130 relating to a particular application program or, alternatively, may contain a plurality oflanguage models 130 relating to various application programs. -
Variant command file 109 for context C may contain, for example, an exemplar E1 related to context C and associated variants V1 1 to V1 n. For each exemplar E in a context C, there may be a collection of, for example, English variants V1 1−V1 n. These variants V, together with their associated exemplars E, are stored in thevariant database 108. Thedatabase 108 may store a set of related data of the form (C, V, E), where each V is an alternative way to phrase in natural language a request for the action A that is associated with exemplar E in context C. Since exemplars E may also be valid ways of phrasing application actions A, they are included in thevariant database 108 as “variants” of themselves. - In embodiments of the invention, a set of exemplars E1 to Em associated with the particular context C of an application program may be provided by the developer of the spoken language interface or of the application program. The developer may be an expert in the
application API 114. The developer need not be an expert in speech-enabled services. Each exemplar E may represent an exemplary way of phrasing, in English or any other suitable language, a particular executable command or action for context C (as will be discussed below in more detail). The developer may map exemplar E to action A. For a particular context C, eachfile 109 may contain a plurality of English variants V1 1−Vm n. Variants V1 1−V1 n may represent different ways of saying or representing corresponding exemplar E1; variants V2 1−V2 k may represent different ways for saying exemplar E2; etc. These variants V1 1−Vm n may be created by anyone without requiring any relevant expertise or knowledge of the application program and/or speech-enabled technologies. For example, theuser 101 may create variant V1 1 that represents the manner in which theuser 101 typically refers to the desired action represented by exemplar E1. In embodiments of the present invention, the created variant(s) V1 1−V1 n may be mapped to its associated exemplar E1 for a particular context C, for example, in the form (C, V, E), as indicated above. - As a specific example, C may correspond to the context of reading e-mail messages, E1 may be the exemplar “Retrieve my mail messages”, and the variants may include, for example, V1 1 “get my mail”, V1 2 “fetch my e-mail”, V1 3 “fetch messages”.
- Referring again to FIG. 1,
context specifications database 112 may contain a set of exemplar action specification files 113 for one application program or a plurality of different application programs. Exemplar action files 113 may correspond and/or relate to variant files 109. For example, the variants in avariant file 109 may be used to express the actions A in acorresponding action file 113, and A may be available for execution byaction invoker 111. - For a given context C in exemplar action specification files113, certain application actions A may be valid. These actions may relate to a specific context for a given application program. In embodiments of the present invention, exemplar-
action specification file 113 may contain a plurality of contexts C1−Cm, a plurality of associated exemplars E1−Em, associated actions A1−Am, and pointer to a next context C′−Cx. Accordingly, each exemplar-action specification file 113 may contain a list of “exemplar-action” records stored or correlated as (C, E, A, C′). Each record (C, E, A, C′) may associate the exemplar E with a sequence A of action strings in the command language executable by the action invoker 111 in context C, and an identifier C′ of another, or the same, application context specification. - In embodiments of the present invention, each exemplary action specification file in the set of
files 113 may correspond to a stage of interaction with the user. For example, if the application program is a speech-enabled e-mail service, the firstaction specification file 113 may contain actions relating to logging on, or identification of the user to the service, and a secondaction specification file 113 may contain actions relating to sending or retrieving e-mail messages. Thus, once theuser 101 has accessed the service,action specification file 113 related to actions required to identify the user may be activated, followed by activating the action specification file for, for example, retrieving e-mail messages. The second action specification file may contain exemplars E and associated actions A relating to retrieving messages, for example, retrieving new messages, retrieving previously read messages, etc. - A language model L for use by the speech recognizer may be built for each context, based on the variants specified for that context. These models may be augmented with lists of proper names that may be used instead of those present in the exemplars E and variants V. Standard techniques for language modeling can be used for deriving the language models from the set of variants.
- Although, in FIG. 1,
variant database 108 andcontext specification database 112 are shown as two different databases, it is recognized thatvariant database 108 andcontext specification database 112 may be consolidated into a single database. It should be noted that descriptions of data flow and data configuration indatabases - In one embodiment,
recognizer 106 may produce a digitized data string W that matches exemplar E or variant V, stored in thevariant database 108, exactly and the system can then proceed with invoking the corresponding application action A. In alternative embodiments, an exact match between data string W and a corresponding exemplar E or variant V may not be found. For example, recognition errors, requests for actions involving different objects (e.g., using different names) from those in the exemplars E or variants V, linguistic variation in the user utterances (including variants from their own customizations) and/or any combination of variations thereof may prevent exact matches from being found. - In embodiments of the invention,
variation matcher 107 may seek to select a prospective variant V, in active context C, that most resembles, or most closely matches, the natural language request as represented by digitized dataW. Variation matcher 107 may also specify the necessary changes or adaptations (i.e., string mapping function φ) to be made to, for example, variation matcher. Any known technique may be used to determine whether, for example, a given text or data sequence (e.g., a prospective variant) most resembles or closely matches the recognized word sequence. For example, known mathematical algorithms (as described below) may be applied to find such matches. - In embodiments of the present invention,
exemplar adjuster 110 may receive the exemplar E and string mapping function φ from thevariation matcher 107.Exemplar adjuster 110 with input fromcontext specifications database 112 may apply the string mapping function φ to an application action A (an API call) that is paired with the exemplar E (e.g., from the context specifications database) to produce the actual API call or adapted action A′. Adapted action A′ may then be executed by the action invoker 111 to carry out the user's request. -
Exemplar adjuster 110 may apply necessary adaptations to the action strings A to be invoked by the application and to the exemplar E (e.g., for confirmation purposes). - In embodiments of the present invention,
variation matcher 107 may compute a function f taking an input W and a sequence <(V1, E1), . . . , (Vn, En)> of pairs of strings. The output of f may be one of the input pairs {(V1, E1)} together with a string mapping function φ that is: - f(W, <(V1, E1), . . . , (VnEn)>)→(V1, Ebφ)
- The selected pair (V1, E1) may be the first pair in the input sequences for which a string distance function μ is minimal:
- mini1≦j≦i−1μ(W, Vj)>μ(W, V1)≦minii+1≦k≦nμ(W, Vk)
- String mapping function φ may include a sequence of string editing operations, specifically insertions, deletions, and substitutions.
-
Exemplar adjuster 110 may fetch the action Ai associated with the exemplar Ei, where i may be any integer from 1 to m. A second string mapping function φ′ may be derived from φ, including only those string editing operations that are valid transformations of the action string Ai. A valid transformation may be one that results in an action string A that is well formed in the sense that it is parsed successfully by theaction invoker 111. Second string mapping function φ′ is then applied to both sides of the selected pair by theexemplar adapter 110 to produce the “adapted” pair {(E′i, A′i)}. - In one embodiment of the
variation matcher 107, the string distance μ is the string edit distance and φ is the corresponding edits found by the dynamic programming algorithm used to compute the minimal edit distance. Such edit-distance computation algorithms are known in computer science and have been used in various applications such as document search and evaluating the outputs of speech recognition systems. In this embodiment, language and action strings may both be treated as sequences of tokens (typically words in the case of language strings). - Edit-distance functions rely on a table of token distances for use when comparing tokens. Token distances can be uniform (e.g., two words that are different have a token distance of1 and identical tokens have a distance of 0). Alternatively, token distances can be provided in the form of a table that reflects the closeness in meaning between any two words.
- In an alternative embodiment of a
variation matcher 107, edit-distance matching may be used in conjunction with a natural language generation component. Natural language generators are known per se. Natural language generators may be used to apply linguistic principles to generate a set of paraphrases, or close paraphrases, of English sentences. Such linguistic principles include syntactic transformations (e.g., the active-passive transformation) and paraphrases based on lexical semantics (e.g., “A sells X to B” is the same as “B buys X from A”). In this embodiment of thevariation matcher 107, a natural language generator may first be used to produce paraphrases of each of the variants present in a context. This may result in an expanded set of variants for the context to which edit-distance matching may then be applied as indicated above. In embodiments, natural language generators may be used to automatically generate at least one variant V by generating paraphrases of an exemplar E. - Although only two embodiments of a
variation matcher 107 have been described, it is recognized that alternative techniques may be applied in the variation matcher. For example, any suitable method that can measure the difference in meaning between two sentences and represent that difference as a string mapping function can be used as the basis for a variation matcher. - The action invoker111 may be a command string interpreter capable of executing dynamically generated strings (e.g., method calls and database query requests) corresponding to actions in the API for the application. For example, the command interpreter may execute scripting languages (e.g., TCL), or procedure calls for languages with reflection (e.g., Java), or database query languages (e.g., SQL).
- In embodiments of the present invention, the
exemplar adjuster 110 can askuser 101 for confirmation that adapted exemplar E′ may express the action that is desired by theuser 101. If the user confirms positively, the action invoker 111 may dispatch adapted action A′ toAPI 114.Application program 115 may execute the dispatched action and return the resulting output O′ to theaction invoker 111. Thesession manager 103 may present output O′ to the user viaoutput synthesizer 105. - Session manager or
controller 103 may be coupled withSLP 104 and may manage the plurality of components within theSLP 104. For example,session manager 103 may provide data flow control for the various components ofSLP 104 during a speech-enabled session. At any point in interacting with a user, thesession manager 103 may maintain an active context C. There may be an initial context specification associated with each program application. Each context specification may be associated with a collection of variants in thevariant database 108. Althoughsession manager 103 is shown external toSLP 104, it is recognized that alternativelysession manager 103 may be incorporated withinSLP 104. - FIG. 2 is a component-level block diagram of a spoken
language processing system 200 in accordance with an embodiment of the present invention. The spokenlanguage processing system 200 may be used as the speech-enabled interface for a desiredservice 210. Thus, using the spokenlanguage processing system 200, a user or customer may, for example, input command S to be executed by theservice 210. The user may input command S usingterminal device 102. The user may articulate a spoken command into a microphone of, for example, the terminal device 102 (e.g., a telephone, PC, or other communication device). In alternative embodiments of the invention, for example, theterminal device 102 may include handwriting recognition system, a dial-pad, a touch screen or keyboard or other input device that theuser 101 may use to input command S. - A recognized input string W may be generated by the
speech recognizer 106. The recognized input string W may be in the form of digitized data that represents a command (S) input by a user. The recognizer may be located internal to or external to the naturallanguage processing system 200. Therecognizer 106 may be coupled to aprocessor 203 located in the spokenlanguage system 200 of the present invention. The processor may perform the functions of, for example,variation matcher 107,exemplar adjuster 110,action invoker 111, and/or perform other processing functions that may required by thesystem 200. In embodiments of the present invention, theprocessor 203 may process the command S that is input by the user to generate recognized input string W. -
Processor 203 may be coupled to amemory 204 andcontroller 202. Thememory 204 may be used to store, for example,variant database 108,context specification database 112, and/or any other data or instructions that may be required byprocessor 203 and/orcontroller 202. It is recognized that any suitable memory may be used insystem 200. Thedatabases 108 and/or 112 may be organized as contexts related to the desired service. Accordingly, depending on the service accessed or the stage of service, the processor may load the proper context. In embodiments of the invention,processor 203 may use the variation database and a variation matching function to map the recognized input string W into an exemplary natural language exemplar E, stored in thevariant database 108 inmemory 204. As described above, the exemplar E may correspond to a typical way of phrasing a particular request relevant to the current application. - In embodiments of the present invention, at least one natural language exemplar E may correspond to one or more natural language variants V. These natural language variants V may represent alternative ways to express exemplar E. These variants may also be stored in the
variant database 108 and may be created by, for example, the user, application programmer, and/or speech interface developer. In this case,processor 203 may select, from the one or more natural language variants V, a prospective variant that most resembles or closely matches the recognized word sequence using any known technique for matching as described above. After the selection is made, the corresponding natural language exemplar E may be identified. - In any case, if an exact match for the natural language exemplar E corresponding to the recognized input string W is identified, the processor may identify an application action A (API call) corresponding to the exemplar E. Action A and corresponding exemplar(s) may be stored in, for example,
context specification database 112 stored inmemory 204. After the action A has been identified,controller 202 may cause the action A to be invoked byservice 210. - In alternative embodiments, if there exists a difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V, the
processor 203 may also generate string mapping function φ. String mapping function φ may specify the difference between the recognized word sequence W and the natural language exemplar E or between the recognized word sequence W and the natural language variant V. In this case, theprocessor 203 may then apply the string mapping function φ to the application action A that corresponds with the exemplar E, to produce the actual API call or adapted action A′. Thecontroller 202 may cause the actual API call A′ to be executed by theservice 210 to carry out the user's request. - In alternative embodiments of the invention, the processor may apply the string mapping function φ to the exemplar E to produce an adapted exemplar E′. The adapted exemplar E′ may be presented to the user via
output synthesizer 105. The user may be asked to confirm whether the action desired by the user may be expressed by exemplar E or adapted exemplar E′. If the user accepts E or E′, thecontroller 202 executes action A or adapted action A′, respectively. If the user does not accept E or E′, then theprocessor 203 may continue processing the recognized input string W, as described above, until the user's request has been carried out. In alternative embodiments, if the user does not accept E or E′, the controller may ask the user to rephrase their request. -
Application program 210 may execute the action A or adapted action A′ and return the resulting output O′ to thecontroller 202. Thecontroller 202 may present output O′ to the user'sterminal device 102 viaoutput synthesizer 105. - Now the operation of an exemplary embodiment of the present invention will be described with reference to FIG. 3.
User 101 may accessSLP 104 of a speech-enabled service in accordance with the present invention (301).Session manager 103 may causespeech recognizer 106 to load (or switch to) the language model L for the active context related to the application program serviced by the SLP 104 (302). Theuser 101 may be presented with a greeting viaoutput synthesizer 105, and the user may respond by articulating a command into an input of terminal device 102 (303). Thespeech recognizer 106 may receive input S and produce an output data string W (304). The output data string W may be a transcription hypothesis of the user's command. -
Variation matcher 107 is applied to W to select an exemplar E from the active context C and to construct a string-mapping function φ (305). Theexemplar adjuster 110 applies the string-mapping function φ in order to construct an adapted exemplar E′ and an adapted executable action A′ (306). The system asks the user for confirmation to proceed with the sequence of actions A′ by presenting to the user (via the output synthesizer 105) the English expression E′ (307) and askinguser 101 whether the adapted action A′ as expressed by the adapted exemplar E′ is desired (308). - If the user selects, or says, “Yes,” the session manager passes the adapted action A′ to the action invoker which executes the action A′ and returns any resulting output O′ to the user via the output synthesizer105 (309). The session manager may send this output (or a summary of it as appropriate) to the speech synthesizer or display. Based on the record (C, E, A, C′) in active context specification, the active context for handling the next request by the user is changed to the context C′ associated with E in C, (310).
- If in
step 308, the user selects, or says, “No” indicating that the exemplar E′ does not express the action desired by the user, the speech recognizer produces another output string W based on the command (304). In embodiments of the present invention, thespeech recognizer 106 may produce another output string W that may be different from the previously created W. Thevariation matcher 107 may receive another output string W or may receive the same output string W and thevariation matcher 107 may select another exemplar E′ and mapping function φ′. The system may, for example,re-execute steps 306 through 308 to construct an adapted action A′ and adapted exemplar E′ that is desired by the user. In other embodiments of the present invention, the controller may ask the user to rephrase their request. - FIGS. 4A and 4B show a flow chart applying embodiments of the present invention to an exemplary speech-enabled e-mail service. A user may desire to retrieve e-mail messages and may log on via the Internet or call the speech-enabled e-mail service. The user may articulate speech into a microphone of, for example, the
terminal device 102 or a telephone or other communication device (not shown). Thecontroller 103 may load the active context language model for the speech-enabled e-mail service fromvariant database 108 ofSLP 104. - The user's input may be converted to an electrical signal that is passed to the recognizer as an input command S. Input command S may be, for example, “Was there anything new from Kathleen?” (401).
Recognizer 106 may convert the command S into an output string W which may be interpreted as “Is there any thing few from Kathleen?” (402). As indicated above, therecognizer 106 may be susceptible to errors depending on the clarity of the input or other external or internal variations; thus, for example, “new” may be interpreted as “few” byrecognizer 106.Variation matcher 107 takes the string W and attempts to find a suitable match from thevariant database 108. Variation matcher may retrieve a stored variant V “Is there anything from Joe?” (403). - Based on the variant V, the
variant matcher 107 may retrieve exemplar E (e.g., Get the messages with sender Joe) that is associated with variant V of step 403 (404).Variant matcher 107 may construct a string mapping function φ, that expresses the difference between output string W ofstep 402 and variant V of step 403 (405). String mapping function φ indicates the insertion of the word “few” and the substitution of the word “Joe” by “Kathleen” (405). In embodiments of the invention, various known techniques may be implemented to determine string-mapping function φ. -
Variation matcher 107 may select an action A as “{mailAgent.setFolder(“INBOX); mailAgent.getmessages(“From=Joe”)}” based on the exemplar E of step 404 (406). The action A ofstep 406 is an exemplary action expressed as line of code that the application program understands and may be able to execute. It is recognized that the line of code for action A is given by example only and that many different expressions can be written. A subset of string mapping function φ as applicable to action A (φA) is generated and may be applied to action A (407). - Adapted action A′ may be generated by applying φA to action A resulting in the line of code, for example, “{mailAgent.setFolder(“INBOX); mailAgent.getmessages(“From=Kathleen”)}” (408). Adapted exemplar E′ may generate, for example, “Get the messages with sender Kathleen” (409). The adapted exemplar E′ may be presented to the user; and if the user confirms that the user desires the adapted action A′ as expressed by the exemplar E′, the adapted action A′ may be executed by the
API 114 ofapplication program 115. Accordingly, messages from Kathleen, for example, may be presented to the user viaoutput synthesizer 105. - Embodiments of the present invention may permit users to customize the
variant database 108 so that they can create variants that closely represent the manner in which the user would articulate a particular action. FIG. 5 shows a block diagram showing aspeech customization module 500 in accordance with embodiments of the present invention. Thecustomization module 500 may be used to add personalized variants relating to stored exemplars E invariant database 108.Users 101 may use, for example, a knownweb browser 502 to accesscontext customizer 503. Although a web browser is shown, a user may use a telephone or other suitable device to accesscontext customizer 503. -
Context customizer 503 may be coupled tovariant database 108 andcustomizer server 501. Users of thesystem 100 may access the generic context filesCg 109 stored invariant database 108 and create customizedcontent files 504 stored in acustomization server 501. Generic context filesCg 109 may contain, for example, context identifier C, a variant V and corresponding exemplarE. Customization server 501 may contain customized context files 504 for a plurality of users U1-UN. Each customizedfile 504 may contain personalized context containing personalized variants (e.g., V1 1, V1 2 to Vm n) personal to the user. User U1 may create one or more variants V corresponding to, for example, exemplar E. Thus, if the user U1 prefers to refer to a single action using varying commands, theuser 101 may customizefiles 504 to reflect this preference. It is recognized that any language—for example, French, Spanish, etc. may be used in embodiments of the present invention. - In embodiments of the invention, user U1 may customize a context Cu1, adding to the variants associated with C in the user's U1 personal
variant database file 504 by composing natural language requests V and associating them with natural language requests or lists of requests E which are taken from the exemplars E associated with context C. - The
customization module 500 may permit a user to create and edit natural-language to natural-language (e.g., English-to-English) customization files stored, for example, on aserver 501 using a standard HTTP browser. User U1 may be authenticated by thecustomization module 500 using known techniques and choosing an application, and within that a context C, to customize. In one embodiment, user may construct pairs of the form “When I say V1, I mean E″ by choosing an exemplar E from among the available exemplars in C and entering a personalized variant V1 to be associated with that exemplar. - Once the user U1 customizes file 504 to reflect personalized variants, the resulting variants may be uploaded into
variant database 104 in the form: (U1, V1, E, C), indicating that the customized variant V1 belongs to user U1 and is related to exemplar E in context C. Accordingly, when the user U1 usessystem 100 of FIG. 1, the customized context will be available to the user U1 including the customized variants in addition to any variants that may already be present in the database for all users. In embodiments of the present invention, for subsequent customizations, the user may be presented with their own custom version of any context they have customized in the past. Additionally, users may be able to revert back to the generic context Cg when desired. - The present invention has been described in terms of preferred and exemplary embodiments thereof. Numerous other embodiments, modifications and variations within the scope and spirit of the appended claims will occur to persons of ordinary skill in the art from a review of this disclosure.
Claims (61)
1. A method for providing speech-enabled application programs comprising:
responsive to an input string, selecting from one or more natural language variants a prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the prospective variant.
2. The method of claim 1 , wherein the mapping comprises:
mapping the one or more natural language variants with at least one natural language exemplar.
3. The method of claim 2 , wherein the prospective variant corresponds to at least one natural language exemplar.
4. The method of claim 1 , further comprising:
executing an action instruction associated with the identified natural language exemplar.
5. The method of claim 1 , further comprising:
mapping a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar.
6. The method of claim 5 , further comprising:
generating a mapping function that specifies a difference between the input string and the prospective variant.
7. The method of claim 6 , further comprising:
applying the mapping function to the action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
8. The method of claim 7 , further comprising:
executing the produced adapted action instruction.
9. The method of claim 6 , further comprising:
applying the mapping function to the identified natural language exemplar to produce an adapted exemplar.
10. The method of claim 9 , further comprising
forwarding the adapted exemplar to a user to confirm whether the user desires an adapted action corresponding to the adapted exemplar.
11. The method of claim 10 , further comprising:
executing the adapted action if the user confirms that an adapted exemplar expresses the action desired by the user.
12. The method of claim 11 , further comprising:
if the user does not accept that the adapted exemplar expresses the action desired by the user, selecting from the one or more natural language variants an alternative prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the alternative prospective variant.
13. The method of claim 12 , further comprising:
executing an action instruction associated with the identified natural language exemplar.
14. The method of claim 2 , further comprising:
storing one or more natural language variants mapped to at least one natural language exemplar in a memory.
15. The method of claim 14 , wherein at least one natural language variant is input by a user.
16. The method of claim 14 , wherein at least one natural language variant is input by an application developer.
17. The method of claim 14 , wherein the at leas t one natural language exemplar is input by an application developer.
18. The method of claim 14 , wherein the at least one natural language exemplar is produced automatically by a natural language generator.
19. The method of claim 14 , further comprising:
producing at least one natural language variant by automatically generating paraphrases of the natural language exemplar.
20. The method of claim 1 , further comprising:
loading an active context file relating to a service accessed by a user, the active context file containing the one or more natural language variants and the natural language exemplar.
21. The method of claim 1 , further comprising:
comparing the input string with the one or more natural language variants.
22. The method of claim 1 , wherein the input string is input by at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
23. A system for providing speech-enabled application programs comprising:
a voice recognizer to receive an input string and produce a recognized input string;
a memory to store one or more natural language variants corresponding to at least one natural language exemplar; and
a processor to:
select from the one or more natural language variants a prospective variant that most resembles the received recognized input string; and
identify the at least one natural language exemplar corresponding to the prospective variant.
24. The system of claim 23 , further comprising:
a controller adapted to execute an action instruction associated with the identified natural language exemplar corresponding to the prospective variant.
25. The system of claim 23 , the processor adapted to map a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar and the memory to store the mapped action instructions.
26. The system of claim 25 , the processor adapted to further generate a mapping function that specifies a difference between the received recognized input string and the prospective variant.
27. The system of claim 26 , the processor adapted to apply the mapping function to the action instruction associated with the identified natural language exemplar mapped to the prospective variant to produce an adapted action instruction.
28. The system of claim 27 , the controller adapted to execute the produced adapted action instruction.
29. The system of claim 28 , further comprising:
an output synthesizer to present a result of the executed instruction by providing data that can be presented to an audio or visual terminal device.
30. The system of claim 29 , wherein the output synthesizer is at least one of a display format and a speech synthesizer.
31. The system of claim 23 , further comprising:
an input device to generate an input string.
32. The system of claim 31 , wherein said input device is at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
33. A machine-readable medium having stored thereon executable instructions for performing a method comprising:
responsive to an input string, selecting from one or more natural language variants a prospective variant that most resembles the input string; and
identifying a natural language exemplar via a mapping between the exemplar and the prospective variant.
34. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
mapping the one or more natural language variants with at least one natural language exemplar.
35. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
executing an action instruction associated with the identified natural language exemplar.
36. The machine-readable medium of claim 33 having stored thereon further executable instructions for performing a method comprising:
mapping a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar.
37. The machine-readable medium of claim 36 having stored thereon further executable instructions for performing a method comprising:
generating a mapping function that specifies a difference between the input string and the prospective variant.
38. The machine-readable medium of claim 37 having stored thereon further executable instructions for performing a method comprising:
applying the mapping function to the action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
39. The machine-readable medium of claim 38 having stored thereon further executable instructions for performing a method comprising:
executing the produced adapted action instruction.
40. The machine-readable medium of claim 37 having stored thereon further executable instructions for performing a method comprising:
applying the mapping function to the identified natural language exemplar to produce an adapted exemplar.
41. The machine-readable medium of claim 40 having stored thereon further executable instructions for performing a method comprising:
forwarding the adapted exemplar to a user to confirm whether the user desires an adapted action corresponding to the adapted exemplar.
42. The machine-readable medium of claim 41 having stored thereon further executable instructions for performing a method comprising:
executing the adapted action if the user confirms that an adapted exemplar expresses the action desired by the user.
43. The machine-readable medium of claim 42 having stored thereon further executable instructions for performing a method comprising:
selecting from the one or more natural language variants an alternative prospective variant that most resembles the input string, if the user does not accept that the adapted exemplar expresses the action desired by the user; and
identifying a natural language exemplar via a mapping between the exemplar and the alternative prospective variant.
44. The machine-readable medium of claim 43 having stored thereon further executable instructions for performing a method comprising:
executing an action instruction associated with the identified natural language exemplar.
45. In a speech-enabled service, a method for creating customized files containing personalized command variants relating to the speech-enabled service, the method comprising:
accessing a context file relating to the speech enabled service, the context file containing a natural language exemplar associated with a desired action;
creating a customized variant for the desired action; and
correlating the created variant with the natural language exemplar.
46. The method of claim 45 , wherein the created variant represents one preferred way of expressing the desired action.
47. The method of claim 46 , further comprising:
storing the created variant in a customized context file, wherein during service access by a user the personalized context file is uploaded by the speech-enabled service allowing the user to express the desired action using the created variant.
48. The method of claim 45 , wherein the context file is accessed using a web browser.
49. The method of claim 45 , wherein the context file is accessed using a telephone.
50. A system for providing speech-enabled application programs comprising:
a memory to store one or more natural language variants corresponding to a natural language exemplar; and
a processor to:
select from the one or more natural language variants a prospective variant that most resembles an input string; and
identify a natural language exemplar via a mapping between the exemplar and the prospective variant.
51. The system of claim 50 , further comprising:
a voice recognizer to receive the input string and produce a recognized input string.
52. The system of claim 50 , further comprising:
a controller adapted to execute an action instruction associated with the identified natural language exemplar.
53. The system of claim 50 , the processor adapted to map the one or more natural language variants with the natural language exemplar.
54. The system of claim 50 , the processor adapted to map a plurality of action instructions with a plurality of natural language exemplars, wherein each action instruction is associated with at least one natural language exemplar and the memory to store the mapped action instructions.
55. The system of claim 51 , the processor adapted to generate a mapping function that specifies a difference between the recognized input string and the prospective variant.
56. The system of claim 55 , the processor adapted to apply the mapping function to an action instruction associated with the identified natural language exemplar to produce an adapted action instruction.
57. The system of claim 56 , further comprising:
a controller adapted to execute the produced adapted action instruction.
58. The system of claim 57 , further comprising:
an output synthesizer to present a result of the executed instruction by providing data that can be presented to an audio or visual terminal device.
59. The system of claim 58 , wherein the output synthesizer is at least one of a display format and a speech synthesizer.
60. The system of claim 50 , further comprising:
an input device to generate the input string.
61. The system of claim 60, wherein said input device is at least one of a keyboard, handwriting recognition device, a dial pad, and a speech recognition device.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/732,600 US20020072914A1 (en) | 2000-12-08 | 2000-12-08 | Method and apparatus for creation and user-customization of speech-enabled services |
EP01310087A EP1215657A3 (en) | 2000-12-08 | 2001-11-30 | Method and apparatus for creation and user-customisation of speech enabled services |
US10/103,049 US7212964B1 (en) | 2000-12-08 | 2002-03-22 | Language-understanding systems employing machine translation components |
US11/215,756 US7912726B2 (en) | 2000-12-08 | 2005-08-30 | Method and apparatus for creation and user-customization of speech-enabled services |
US11/656,155 US7467081B2 (en) | 2000-12-08 | 2007-01-22 | Language-understanding training database action pair augmentation using bidirectional translation |
US12/336,429 US8073683B2 (en) | 2000-12-08 | 2008-12-16 | Language-understanding training database action pair augmentation using bidirectional translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/732,600 US20020072914A1 (en) | 2000-12-08 | 2000-12-08 | Method and apparatus for creation and user-customization of speech-enabled services |
Related Child Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/103,049 Continuation-In-Part US7212964B1 (en) | 2000-12-08 | 2002-03-22 | Language-understanding systems employing machine translation components |
US11/215,756 Division US7912726B2 (en) | 2000-12-08 | 2005-08-30 | Method and apparatus for creation and user-customization of speech-enabled services |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020072914A1 true US20020072914A1 (en) | 2002-06-13 |
Family
ID=24944199
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/732,600 Abandoned US20020072914A1 (en) | 2000-12-08 | 2000-12-08 | Method and apparatus for creation and user-customization of speech-enabled services |
US10/103,049 Expired - Fee Related US7212964B1 (en) | 2000-12-08 | 2002-03-22 | Language-understanding systems employing machine translation components |
US11/215,756 Expired - Fee Related US7912726B2 (en) | 2000-12-08 | 2005-08-30 | Method and apparatus for creation and user-customization of speech-enabled services |
US11/656,155 Expired - Lifetime US7467081B2 (en) | 2000-12-08 | 2007-01-22 | Language-understanding training database action pair augmentation using bidirectional translation |
US12/336,429 Expired - Fee Related US8073683B2 (en) | 2000-12-08 | 2008-12-16 | Language-understanding training database action pair augmentation using bidirectional translation |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/103,049 Expired - Fee Related US7212964B1 (en) | 2000-12-08 | 2002-03-22 | Language-understanding systems employing machine translation components |
US11/215,756 Expired - Fee Related US7912726B2 (en) | 2000-12-08 | 2005-08-30 | Method and apparatus for creation and user-customization of speech-enabled services |
US11/656,155 Expired - Lifetime US7467081B2 (en) | 2000-12-08 | 2007-01-22 | Language-understanding training database action pair augmentation using bidirectional translation |
US12/336,429 Expired - Fee Related US8073683B2 (en) | 2000-12-08 | 2008-12-16 | Language-understanding training database action pair augmentation using bidirectional translation |
Country Status (2)
Country | Link |
---|---|
US (5) | US20020072914A1 (en) |
EP (1) | EP1215657A3 (en) |
Cited By (130)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030061054A1 (en) * | 2001-09-25 | 2003-03-27 | Payne Michael J. | Speaker independent voice recognition (SIVR) using dynamic assignment of speech contexts, dynamic biasing, and multi-pass parsing |
US20030061053A1 (en) * | 2001-09-27 | 2003-03-27 | Payne Michael J. | Method and apparatus for processing inputs into a computing device |
US20030060181A1 (en) * | 2001-09-19 | 2003-03-27 | Anderson David B. | Voice-operated two-way asynchronous radio |
US20030130875A1 (en) * | 2002-01-04 | 2003-07-10 | Hawash Maher M. | Real-time prescription renewal transaction across a network |
US20030130868A1 (en) * | 2002-01-04 | 2003-07-10 | Rohan Coelho | Real-time prescription transaction with adjudication across a network |
US20030216913A1 (en) * | 2002-05-14 | 2003-11-20 | Microsoft Corporation | Natural input recognition tool |
US20040030559A1 (en) * | 2001-09-25 | 2004-02-12 | Payne Michael J. | Color as a visual cue in speech-enabled applications |
US20040092293A1 (en) * | 2002-11-06 | 2004-05-13 | Samsung Electronics Co., Ltd. | Third-party call control type simultaneous interpretation system and method thereof |
US20050246177A1 (en) * | 2004-04-30 | 2005-11-03 | Sbc Knowledge Ventures, L.P. | System, method and software for enabling task utterance recognition in speech enabled systems |
US20050283367A1 (en) * | 2004-06-17 | 2005-12-22 | International Business Machines Corporation | Method and apparatus for voice-enabling an application |
US6985865B1 (en) * | 2001-09-26 | 2006-01-10 | Sprint Spectrum L.P. | Method and system for enhanced response to voice commands in a voice command platform |
US20060056602A1 (en) * | 2004-09-13 | 2006-03-16 | Sbc Knowledge Ventures, L.P. | System and method for analysis and adjustment of speech-enabled systems |
US20060069569A1 (en) * | 2004-09-16 | 2006-03-30 | Sbc Knowledge Ventures, L.P. | System and method for optimizing prompts for speech-enabled applications |
US20060136222A1 (en) * | 2004-12-22 | 2006-06-22 | New Orchard Road | Enabling voice selection of user preferences |
US7231343B1 (en) * | 2001-12-20 | 2007-06-12 | Ianywhere Solutions, Inc. | Synonyms mechanism for natural language systems |
US20080133220A1 (en) * | 2006-12-01 | 2008-06-05 | Microsoft Corporation | Leveraging back-off grammars for authoring context-free grammars |
US20080183474A1 (en) * | 2007-01-30 | 2008-07-31 | Damion Alexander Bethune | Process for creating and administrating tests made from zero or more picture files, sound bites on handheld device |
US20100202598A1 (en) * | 2002-09-16 | 2010-08-12 | George Backhaus | Integrated Voice Navigation System and Method |
US20100281435A1 (en) * | 2009-04-30 | 2010-11-04 | At&T Intellectual Property I, L.P. | System and method for multimodal interaction using robust gesture processing |
CN102681463A (en) * | 2012-05-22 | 2012-09-19 | 青岛四方车辆研究所有限公司 | Compact-type expanded input-output (IO) device |
US8606584B1 (en) * | 2001-10-24 | 2013-12-10 | Harris Technology, Llc | Web based communication of information with reconfigurable format |
CN103901795A (en) * | 2012-12-26 | 2014-07-02 | 中国科学院软件研究所 | CPLD (Complex Programmable Logic Device)-based IO-station digital input module and input method |
WO2014144949A3 (en) * | 2013-03-15 | 2014-11-20 | Apple Inc. | Training an at least partial voice command system |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9953027B2 (en) | 2016-09-15 | 2018-04-24 | International Business Machines Corporation | System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9984063B2 (en) * | 2016-09-15 | 2018-05-29 | International Business Machines Corporation | System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US20190075167A1 (en) * | 2017-09-07 | 2019-03-07 | Samsung Electronics Co., Ltd. | Electronic device, server and recording medium supporting task execution using external device |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US20190124031A1 (en) * | 2017-10-20 | 2019-04-25 | Sap Se | Message processing for cloud computing applications |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11126446B2 (en) * | 2019-10-15 | 2021-09-21 | Microsoft Technology Licensing, Llc | Contextual extensible skills framework across surfaces |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020072914A1 (en) * | 2000-12-08 | 2002-06-13 | Hiyan Alshawi | Method and apparatus for creation and user-customization of speech-enabled services |
DE10203368B4 (en) * | 2002-01-29 | 2007-12-20 | Siemens Ag | Method and device for establishing a telephone connection |
EP1447793A1 (en) * | 2003-02-12 | 2004-08-18 | Hans Dr. Kuebler | User-specific customization of voice browser for internet and intranet |
FR2868588A1 (en) * | 2004-04-02 | 2005-10-07 | France Telecom | VOICE APPLICATION SYSTEM |
US8942985B2 (en) | 2004-11-16 | 2015-01-27 | Microsoft Corporation | Centralized method and system for clarifying voice commands |
US7703037B2 (en) | 2005-04-20 | 2010-04-20 | Microsoft Corporation | Searchable task-based interface to control panel functionality |
US7925975B2 (en) | 2006-03-10 | 2011-04-12 | Microsoft Corporation | Searching for commands to execute in applications |
US7848915B2 (en) * | 2006-08-09 | 2010-12-07 | International Business Machines Corporation | Apparatus for providing feedback of translation quality using concept-based back translation |
CN105117376B (en) * | 2007-04-10 | 2018-07-10 | 谷歌有限责任公司 | Multi-mode input method editor |
US9779079B2 (en) * | 2007-06-01 | 2017-10-03 | Xerox Corporation | Authoring system |
JP5235344B2 (en) * | 2007-07-03 | 2013-07-10 | 株式会社東芝 | Apparatus, method and program for machine translation |
US8635069B2 (en) | 2007-08-16 | 2014-01-21 | Crimson Corporation | Scripting support for data identifiers, voice recognition and speech in a telnet session |
JP5100445B2 (en) * | 2008-02-28 | 2012-12-19 | 株式会社東芝 | Machine translation apparatus and method |
US8521516B2 (en) * | 2008-03-26 | 2013-08-27 | Google Inc. | Linguistic key normalization |
US8700385B2 (en) * | 2008-04-04 | 2014-04-15 | Microsoft Corporation | Providing a task description name space map for the information worker |
US8352244B2 (en) * | 2009-07-21 | 2013-01-08 | International Business Machines Corporation | Active learning systems and methods for rapid porting of machine translation systems to new language pairs or new domains |
JP2011033680A (en) * | 2009-07-30 | 2011-02-17 | Sony Corp | Voice processing device and method, and program |
US9063931B2 (en) * | 2011-02-16 | 2015-06-23 | Ming-Yuan Wu | Multiple language translation system |
CN104040238B (en) | 2011-11-04 | 2017-06-27 | 汉迪拉布公司 | Polynucleotides sample preparation apparatus |
US9336302B1 (en) | 2012-07-20 | 2016-05-10 | Zuci Realty Llc | Insight and algorithmic clustering for automated synthesis |
US10033797B1 (en) | 2014-08-20 | 2018-07-24 | Ivanti, Inc. | Terminal emulation over HTML |
JP6466138B2 (en) | 2014-11-04 | 2019-02-06 | 株式会社東芝 | Foreign language sentence creation support apparatus, method and program |
US9472196B1 (en) * | 2015-04-22 | 2016-10-18 | Google Inc. | Developer voice actions system |
DE102015006662B4 (en) | 2015-05-22 | 2019-11-14 | Audi Ag | Method for configuring a voice control device |
US9401142B1 (en) | 2015-09-07 | 2016-07-26 | Voicebox Technologies Corporation | System and method for validating natural language content using crowdsourced validation jobs |
US9519766B1 (en) | 2015-09-07 | 2016-12-13 | Voicebox Technologies Corporation | System and method of providing and validating enhanced CAPTCHAs |
WO2017044409A1 (en) | 2015-09-07 | 2017-03-16 | Voicebox Technologies Corporation | System and method of annotating utterances based on tags assigned by unmanaged crowds |
WO2017044415A1 (en) * | 2015-09-07 | 2017-03-16 | Voicebox Technologies Corporation | System and method for eliciting open-ended natural language responses to questions to train natural language processors |
JP6481643B2 (en) * | 2016-03-08 | 2019-03-13 | トヨタ自動車株式会社 | Audio processing system and audio processing method |
US11100278B2 (en) | 2016-07-28 | 2021-08-24 | Ivanti, Inc. | Systems and methods for presentation of a terminal application screen |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US11354521B2 (en) | 2018-03-07 | 2022-06-07 | Google Llc | Facilitating communications with automated assistants in multiple languages |
AU2018412575B2 (en) * | 2018-03-07 | 2021-03-18 | Google Llc | Facilitating end-to-end communications with automated assistants in multiple languages |
JP7132090B2 (en) * | 2018-11-07 | 2022-09-06 | 株式会社東芝 | Dialogue system, dialogue device, dialogue method, and program |
US11575999B2 (en) | 2020-01-16 | 2023-02-07 | Meta Platforms Technologies, Llc | Systems and methods for hearing assessment and audio adjustment |
RU2758683C2 (en) * | 2020-04-28 | 2021-11-01 | Публичное Акционерное Общество "Сбербанк России" (Пао Сбербанк) | System and method for augmentation of the training sample for machine learning algorithms |
US11664010B2 (en) | 2020-11-03 | 2023-05-30 | Florida Power & Light Company | Natural language domain corpus data set creation based on enhanced root utterances |
US20230214604A1 (en) * | 2022-01-06 | 2023-07-06 | PRIVACY4CARS, Inc. | Translating technical operating instruction |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5675707A (en) * | 1995-09-15 | 1997-10-07 | At&T | Automated call router system and method |
US5729659A (en) * | 1995-06-06 | 1998-03-17 | Potter; Jerry L. | Method and apparatus for controlling a digital computer using oral input |
US6122614A (en) * | 1998-11-20 | 2000-09-19 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US6138100A (en) * | 1998-04-14 | 2000-10-24 | At&T Corp. | Interface for a voice-activated connection system |
US6311159B1 (en) * | 1998-10-05 | 2001-10-30 | Lernout & Hauspie Speech Products N.V. | Speech controlled computer user interface |
US6324512B1 (en) * | 1999-08-26 | 2001-11-27 | Matsushita Electric Industrial Co., Ltd. | System and method for allowing family members to access TV contents and program media recorder over telephone or internet |
US6327566B1 (en) * | 1999-06-16 | 2001-12-04 | International Business Machines Corporation | Method and apparatus for correcting misinterpreted voice commands in a speech recognition system |
Family Cites Families (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5454062A (en) * | 1991-03-27 | 1995-09-26 | Audio Navigation Systems, Inc. | Method for recognizing spoken words |
JPH05197573A (en) * | 1991-08-26 | 1993-08-06 | Hewlett Packard Co <Hp> | Task controlling system with task oriented paradigm |
US5493692A (en) * | 1993-12-03 | 1996-02-20 | Xerox Corporation | Selective delivery of electronic messages in a multiple computer system based on context and environment of a user |
US5544354A (en) * | 1994-07-18 | 1996-08-06 | Ikonic Interactive, Inc. | Multimedia matrix architecture user interface |
JP3066274B2 (en) * | 1995-01-12 | 2000-07-17 | シャープ株式会社 | Machine translation equipment |
JPH09128396A (en) * | 1995-11-06 | 1997-05-16 | Hitachi Ltd | Preparation method for bilingual dictionary |
US5823879A (en) * | 1996-01-19 | 1998-10-20 | Sheldon F. Goldberg | Network gaming system |
US6341372B1 (en) | 1997-05-01 | 2002-01-22 | William E. Datig | Universal machine translator of arbitrary languages |
US5974413A (en) * | 1997-07-03 | 1999-10-26 | Activeword Systems, Inc. | Semantic user interface |
WO1999046763A1 (en) | 1998-03-09 | 1999-09-16 | Lernout & Hauspie Speech Products N.V. | Apparatus and method for simultaneous multimode dictation |
JP3059413B2 (en) * | 1998-03-16 | 2000-07-04 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | Natural language understanding device and natural language understanding system |
US7051277B2 (en) * | 1998-04-17 | 2006-05-23 | International Business Machines Corporation | Automated assistant for organizing electronic documents |
US6070142A (en) * | 1998-04-17 | 2000-05-30 | Andersen Consulting Llp | Virtual customer sales and service center and method |
US6345243B1 (en) * | 1998-05-27 | 2002-02-05 | Lionbridge Technologies, Inc. | System, method, and product for dynamically propagating translations in a translation-memory system |
US6144375A (en) * | 1998-08-14 | 2000-11-07 | Praja Inc. | Multi-perspective viewer for content-based interactivity |
US6327346B1 (en) * | 1998-09-01 | 2001-12-04 | At&T Corp. | Method and apparatus for setting user communication parameters based on voice identification of users |
US6453292B2 (en) | 1998-10-28 | 2002-09-17 | International Business Machines Corporation | Command boundary identifier for conversational natural language |
US7082397B2 (en) | 1998-12-01 | 2006-07-25 | Nuance Communications, Inc. | System for and method of creating and browsing a voice web |
US6275789B1 (en) * | 1998-12-18 | 2001-08-14 | Leo Moser | Method and apparatus for performing full bidirectional translation between a source language and a linked alternative language |
US6978262B2 (en) * | 1999-01-05 | 2005-12-20 | Tsai Daniel E | Distributed database schema |
US6397212B1 (en) * | 1999-03-04 | 2002-05-28 | Peter Biffar | Self-learning and self-personalizing knowledge search engine that delivers holistic results |
JP3016779B1 (en) * | 1999-03-08 | 2000-03-06 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | Voice understanding device and voice understanding system |
US6408272B1 (en) * | 1999-04-12 | 2002-06-18 | General Magic, Inc. | Distributed voice user interface |
US6745165B2 (en) * | 1999-06-16 | 2004-06-01 | International Business Machines Corporation | Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system |
US6178404B1 (en) | 1999-07-23 | 2001-01-23 | Intervoice Limited Partnership | System and method to facilitate speech enabled user interfaces by prompting with possible transaction phrases |
US6658388B1 (en) | 1999-09-10 | 2003-12-02 | International Business Machines Corporation | Personality generator for conversational systems |
US6684183B1 (en) | 1999-12-06 | 2004-01-27 | Comverse Ltd. | Generic natural language service creation environment |
US6748361B1 (en) * | 1999-12-14 | 2004-06-08 | International Business Machines Corporation | Personal speech assistant supporting a dialog manager |
US6701362B1 (en) * | 2000-02-23 | 2004-03-02 | Purpleyogi.Com Inc. | Method for creating user profiles |
US7249159B1 (en) * | 2000-03-16 | 2007-07-24 | Microsoft Corporation | Notification platform architecture |
US6782356B1 (en) * | 2000-10-03 | 2004-08-24 | Hewlett-Packard Development Company, L.P. | Hierarchical language chunking translation table |
US6922670B2 (en) | 2000-10-24 | 2005-07-26 | Sanyo Electric Co., Ltd. | User support apparatus and system using agents |
US20020072914A1 (en) * | 2000-12-08 | 2002-06-13 | Hiyan Alshawi | Method and apparatus for creation and user-customization of speech-enabled services |
-
2000
- 2000-12-08 US US09/732,600 patent/US20020072914A1/en not_active Abandoned
-
2001
- 2001-11-30 EP EP01310087A patent/EP1215657A3/en not_active Withdrawn
-
2002
- 2002-03-22 US US10/103,049 patent/US7212964B1/en not_active Expired - Fee Related
-
2005
- 2005-08-30 US US11/215,756 patent/US7912726B2/en not_active Expired - Fee Related
-
2007
- 2007-01-22 US US11/656,155 patent/US7467081B2/en not_active Expired - Lifetime
-
2008
- 2008-12-16 US US12/336,429 patent/US8073683B2/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5729659A (en) * | 1995-06-06 | 1998-03-17 | Potter; Jerry L. | Method and apparatus for controlling a digital computer using oral input |
US5675707A (en) * | 1995-09-15 | 1997-10-07 | At&T | Automated call router system and method |
US6138100A (en) * | 1998-04-14 | 2000-10-24 | At&T Corp. | Interface for a voice-activated connection system |
US6311159B1 (en) * | 1998-10-05 | 2001-10-30 | Lernout & Hauspie Speech Products N.V. | Speech controlled computer user interface |
US6122614A (en) * | 1998-11-20 | 2000-09-19 | Custom Speech Usa, Inc. | System and method for automating transcription services |
US6327566B1 (en) * | 1999-06-16 | 2001-12-04 | International Business Machines Corporation | Method and apparatus for correcting misinterpreted voice commands in a speech recognition system |
US6324512B1 (en) * | 1999-08-26 | 2001-11-27 | Matsushita Electric Industrial Co., Ltd. | System and method for allowing family members to access TV contents and program media recorder over telephone or internet |
Cited By (182)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US7158499B2 (en) * | 2001-09-19 | 2007-01-02 | Mitsubishi Electric Research Laboratories, Inc. | Voice-operated two-way asynchronous radio |
US20030060181A1 (en) * | 2001-09-19 | 2003-03-27 | Anderson David B. | Voice-operated two-way asynchronous radio |
US20040030559A1 (en) * | 2001-09-25 | 2004-02-12 | Payne Michael J. | Color as a visual cue in speech-enabled applications |
US20030061054A1 (en) * | 2001-09-25 | 2003-03-27 | Payne Michael J. | Speaker independent voice recognition (SIVR) using dynamic assignment of speech contexts, dynamic biasing, and multi-pass parsing |
US6985865B1 (en) * | 2001-09-26 | 2006-01-10 | Sprint Spectrum L.P. | Method and system for enhanced response to voice commands in a voice command platform |
US20030061053A1 (en) * | 2001-09-27 | 2003-03-27 | Payne Michael J. | Method and apparatus for processing inputs into a computing device |
US8606584B1 (en) * | 2001-10-24 | 2013-12-10 | Harris Technology, Llc | Web based communication of information with reconfigurable format |
US20090144248A1 (en) * | 2001-12-20 | 2009-06-04 | Sybase 365, Inc. | Context-Based Suggestions Mechanism and Adaptive Push Mechanism for Natural Language Systems |
US8036877B2 (en) | 2001-12-20 | 2011-10-11 | Sybase, Inc. | Context-based suggestions mechanism and adaptive push mechanism for natural language systems |
US7231343B1 (en) * | 2001-12-20 | 2007-06-12 | Ianywhere Solutions, Inc. | Synonyms mechanism for natural language systems |
US20030130868A1 (en) * | 2002-01-04 | 2003-07-10 | Rohan Coelho | Real-time prescription transaction with adjudication across a network |
US20030130875A1 (en) * | 2002-01-04 | 2003-07-10 | Hawash Maher M. | Real-time prescription renewal transaction across a network |
US20030216913A1 (en) * | 2002-05-14 | 2003-11-20 | Microsoft Corporation | Natural input recognition tool |
US7380203B2 (en) * | 2002-05-14 | 2008-05-27 | Microsoft Corporation | Natural input recognition tool |
US20100202598A1 (en) * | 2002-09-16 | 2010-08-12 | George Backhaus | Integrated Voice Navigation System and Method |
US8145495B2 (en) * | 2002-09-16 | 2012-03-27 | Movius Interactive Corporation | Integrated voice navigation system and method |
US20040092293A1 (en) * | 2002-11-06 | 2004-05-13 | Samsung Electronics Co., Ltd. | Third-party call control type simultaneous interpretation system and method thereof |
US20050246177A1 (en) * | 2004-04-30 | 2005-11-03 | Sbc Knowledge Ventures, L.P. | System, method and software for enabling task utterance recognition in speech enabled systems |
US20050283367A1 (en) * | 2004-06-17 | 2005-12-22 | International Business Machines Corporation | Method and apparatus for voice-enabling an application |
US8768711B2 (en) * | 2004-06-17 | 2014-07-01 | Nuance Communications, Inc. | Method and apparatus for voice-enabling an application |
US20070027694A1 (en) * | 2004-09-13 | 2007-02-01 | Bushey Robert R | System and method for analysis and adjustment of speech-enabled systems |
US7110949B2 (en) | 2004-09-13 | 2006-09-19 | At&T Knowledge Ventures, L.P. | System and method for analysis and adjustment of speech-enabled systems |
US8117030B2 (en) | 2004-09-13 | 2012-02-14 | At&T Intellectual Property I, L.P. | System and method for analysis and adjustment of speech-enabled systems |
US20060056602A1 (en) * | 2004-09-13 | 2006-03-16 | Sbc Knowledge Ventures, L.P. | System and method for analysis and adjustment of speech-enabled systems |
US7653549B2 (en) | 2004-09-16 | 2010-01-26 | At&T Intellectual Property I, L.P. | System and method for facilitating call routing using speech recognition |
US20060143015A1 (en) * | 2004-09-16 | 2006-06-29 | Sbc Technology Resources, Inc. | System and method for facilitating call routing using speech recognition |
US20060069569A1 (en) * | 2004-09-16 | 2006-03-30 | Sbc Knowledge Ventures, L.P. | System and method for optimizing prompts for speech-enabled applications |
US7043435B2 (en) | 2004-09-16 | 2006-05-09 | Sbc Knowledgfe Ventures, L.P. | System and method for optimizing prompts for speech-enabled applications |
US20080040118A1 (en) * | 2004-09-16 | 2008-02-14 | Knott Benjamin A | System and method for facilitating call routing using speech recognition |
US9083798B2 (en) * | 2004-12-22 | 2015-07-14 | Nuance Communications, Inc. | Enabling voice selection of user preferences |
US20060136222A1 (en) * | 2004-12-22 | 2006-06-22 | New Orchard Road | Enabling voice selection of user preferences |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US20120095752A1 (en) * | 2006-12-01 | 2012-04-19 | Microsoft Corporation | Leveraging back-off grammars for authoring context-free grammars |
US20080133220A1 (en) * | 2006-12-01 | 2008-06-05 | Microsoft Corporation | Leveraging back-off grammars for authoring context-free grammars |
US8108205B2 (en) * | 2006-12-01 | 2012-01-31 | Microsoft Corporation | Leveraging back-off grammars for authoring context-free grammars |
US8862468B2 (en) * | 2006-12-01 | 2014-10-14 | Microsoft Corporation | Leveraging back-off grammars for authoring context-free grammars |
US20080183474A1 (en) * | 2007-01-30 | 2008-07-31 | Damion Alexander Bethune | Process for creating and administrating tests made from zero or more picture files, sound bites on handheld device |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US20100281435A1 (en) * | 2009-04-30 | 2010-11-04 | At&T Intellectual Property I, L.P. | System and method for multimodal interaction using robust gesture processing |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
CN102681463A (en) * | 2012-05-22 | 2012-09-19 | 青岛四方车辆研究所有限公司 | Compact-type expanded input-output (IO) device |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
CN103901795A (en) * | 2012-12-26 | 2014-07-02 | 中国科学院软件研究所 | CPLD (Complex Programmable Logic Device)-based IO-station digital input module and input method |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
WO2014144949A3 (en) * | 2013-03-15 | 2014-11-20 | Apple Inc. | Training an at least partial voice command system |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US9984063B2 (en) * | 2016-09-15 | 2018-05-29 | International Business Machines Corporation | System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning |
US9953027B2 (en) | 2016-09-15 | 2018-04-24 | International Business Machines Corporation | System and method for automatic, unsupervised paraphrase generation using a novel framework that learns syntactic construct while retaining semantic meaning |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US20190075167A1 (en) * | 2017-09-07 | 2019-03-07 | Samsung Electronics Co., Ltd. | Electronic device, server and recording medium supporting task execution using external device |
US11032374B2 (en) * | 2017-09-07 | 2021-06-08 | Samsung Electronics Co., Ltd. | Electronic device, server and recording medium supporting task execution using external device |
US11765234B2 (en) | 2017-09-07 | 2023-09-19 | Samsung Electronics Co., Ltd. | Electronic device, server and recording medium supporting task execution using external device |
US20190124031A1 (en) * | 2017-10-20 | 2019-04-25 | Sap Se | Message processing for cloud computing applications |
US10826857B2 (en) * | 2017-10-20 | 2020-11-03 | Sap Se | Message processing for cloud computing applications |
US11126446B2 (en) * | 2019-10-15 | 2021-09-21 | Microsoft Technology Licensing, Llc | Contextual extensible skills framework across surfaces |
Also Published As
Publication number | Publication date |
---|---|
US7212964B1 (en) | 2007-05-01 |
US7912726B2 (en) | 2011-03-22 |
US7467081B2 (en) | 2008-12-16 |
EP1215657A3 (en) | 2005-04-27 |
EP1215657A2 (en) | 2002-06-19 |
US20060004575A1 (en) | 2006-01-05 |
US8073683B2 (en) | 2011-12-06 |
US20090099837A1 (en) | 2009-04-16 |
US20070118352A1 (en) | 2007-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7912726B2 (en) | Method and apparatus for creation and user-customization of speech-enabled services | |
US7869998B1 (en) | Voice-enabled dialog system | |
US8645122B1 (en) | Method of handling frequently asked questions in a natural language dialog service | |
EP1380153B1 (en) | Voice response system | |
US7197460B1 (en) | System for handling frequently asked questions in a natural language dialog service | |
US7024363B1 (en) | Methods and apparatus for contingent transfer and execution of spoken language interfaces | |
Reddy et al. | Speech to text conversion using android platform | |
US6366882B1 (en) | Apparatus for converting speech to text | |
US6801897B2 (en) | Method of providing concise forms of natural commands | |
EP1602102B1 (en) | Management of conversations | |
Black et al. | Building synthetic voices | |
US7146323B2 (en) | Method and system for gathering information by voice input | |
RU2352979C2 (en) | Synchronous comprehension of semantic objects for highly active interface | |
US6246989B1 (en) | System and method for providing an adaptive dialog function choice model for various communication devices | |
US20080208586A1 (en) | Enabling Natural Language Understanding In An X+V Page Of A Multimodal Application | |
GB2323694A (en) | Adaptation in speech to text conversion | |
EP1215656B1 (en) | Idiom handling in voice service systems | |
CA2346145A1 (en) | Speech controlled computer user interface | |
JP6625772B2 (en) | Search method and electronic device using the same | |
Primorac et al. | Android application for sending SMS messages with speech recognition interface | |
US7069513B2 (en) | System, method and computer program product for a transcription graphical user interface | |
JPH07222248A (en) | System for utilizing speech information for portable information terminal | |
Davies et al. | The IBM conversational telephony system for financial applications. | |
US20060031853A1 (en) | System and method for optimizing processing speed to run multiple dialogs between multiple users and a virtual agent | |
US20020138276A1 (en) | System, method and computer program product for a distributed speech recognition tuning platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AT&T CORP., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ALSHAWI, HIYAN;DOUGLAS, SHONA;REEL/FRAME:011382/0616 Effective date: 20001207 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T INTELLECTUAL PROPERTY II, L.P.;REEL/FRAME:041512/0608 Effective date: 20161214 |