This page describes platform-specific JSSCxml extensions related to speech interaction in supported Web browsers. These extensions are not intended to be always interoperable with other SCXML implementations. JSSCxml currently supports only speech synthesis out of the box.
<speak> element can be used wherever executable content is allowed. It connects SCXML to the Web Speech Synthesis API newly implemented in several Web browsers, and exposes most of its functionnality.
In the datamodel,
_x.voices is the list of voices supported by the platform, as returned by the
SpeechSynthesis.getVoices() method. Items of that list are not
voiceURIs, but the full
SpeechSynthesisVoice objects with a
The namespace for
<speak> must be "http://www.jsscxml.org", for which I suggest the shorthand "jssc". Thus:
<scxml xmlns="http://www.w3.org/2005/07/scxml" xmlns:jssc="http://www.jsscxml.org"> … <jssc:speak text="Hello world!" xml:lang="en-US"… /> …
The reason JSSCxml is not using the SSML namespace is that there are some new attributes on the element, and that using inline SSML content would be good-looking but completely inflexible. Instead, SSML documents can be created (or parsed from actual SSML content in a
<data> element) and manipulated by ECMAScript code and finally passed to the
|Name||Required||Default value||Valid values||Description|
|text||yes*, and no more than one of those two||none||text with optional SSML tags||The text or SSML that will be spoken|
|expr||none||an expression evaluating to a string or a SSML Document||Evaluates when the |
|xml:lang||no, and only one of those two||plaform-specific||any RFC 3066 language code supported by the platform||the language of the text to be read|
|langexpr||none||Evaluates when the |
|voice||no||plaform-specific||expression that must evaluate to a member of _x.voices||the voice used to read the text|
|volume||no||1||0 – 1||how loud the text will be spoken|
|rate||no||1||0.1 – 10||how fast the text will be spoken|
|pitch||no||1||0 – 2||pitch modifier for the synthesized voice|
|interrupt / nomore||no*||false||boolean||stops speaking and cancel queued utterances|
* The boolean
interrupt) may appear alone in the
Note that the DOMParser API used by JSSCxml to parse all this may reject SCXML documents where boolean attributes are written without a value. It is therefore advised to give them the value
"true" (although the interpreter will be happy with any value at all, as long as it gets well-formed XML).
When executed, the
<speak> element causes its text or SSML content to be read by the platform's SpeechSynthesis implementation, using the supplied parameters. When the synthesizer has reached the end of the utterrance, a
speak.end event will be placed in the externa queue, with its
data field containing a reference to the underlying SpeechSynthesisUtterance object.
interrupt attribute is present, current and queued utterances will be cancelled first, so the new utterance (if supplied) will be spoken immediately, no matter what. The interpreter will place a
speak.error event in the external queue for each cancelled utterance.
At this time, Chrome and Safari's implementations disagree on the way to select a voice. Chrome's utterance objects have a
voiceURI property which can be set to the
voiceURI value of a voice, whereas Safari's utterance objects have a
voice property which accepts only references to whole
SpeechSynthesisVoice objects. In order to hide this misbehavior from authors, the
voice attribute defined here always takes a reference, and JSSCxml will ensure that each browser gets what it expects.
If no voice is specified, the
xml:lang attribute will cause the platform to choose the default voice for that language, if any is available, or at least for another geographical variation of that language. The language defined by
xml:lang higher in the document hierarchy (typically on the root element) is inherited by
<speak> elements, so there is no need to repeat it all the time.
speak.* events queued when using speech synthesis have the DOM Event
origintype, but their
origin is the corresponding
SpeechSynthesisUtterance object rather than a node. There is no reason to
<send> any event back to those objects (and the interpreter won't take them as a valid target anyway), but their
text property allows you to track which utterance it is that has started, ended, or been cancelled.
data will contain the
name properties of the original DOM event instead of a copy of the event itself, as would be the case for DOM events converted in the usual way by JSSCxml.