# asr_sdk_ja

============================

speech-python-rec-sdk
============================

Summary
-----

The Python SDK for voice interaction services.
Supported services: One sentence recognition, real-time speech recognition

<font color="red"> ***Please carefully read the instruction document*** </font>

### SDK files description

| File/Directory            | Description                                |
| ------------------------- | ------------------------------------------ |
| speech_rec                | SDK related files                          |
| demo                      | Example code                               |
| &emsp;├─ transcriber_demo.py | Real time speech recognition example code  |
| &emsp;├─ recognizer_demo.py | One sentence recognition example code      |
| &emsp;├─ demo.wav         | Chinese Mandarin Sample Audio (WAV Format) |
| &emsp;├─ demo.mp3         | Chinese Mandarin Sample Audio (MP3 Format) |
| setup.py                  | install file                               |
| README.md                 | document                                   |

**Note **: The recognition results of the test audio provided in the SDK are consistent. The default audio used is MP3 format. If the incoming audio is in WAV or other formats, it will be converted to MP3 format.

Operating environment
--------

Python3.4 or later, ffmpeg. It is recommended to create a separate python runtime environment, otherwise version conflicts may occur.

Installation method
--------

1.Ensure that the Python package management tool setuptools is installed. If it is not installed, install it.On the command line, type:

    $ pip install setuptools


2.Unzip the SDK, go to the folder (where the `setup.py` file is located), and run the following command in the SDK directory:

    # Install
    $ python setup.py install
    Note: - The above pip and python commands correspond to Python3.
          - If the following information is displayed, the installation is successful:
            Finished processing dependencies for speech-python-rec-sdk==1.0.0.8
    	  - After installation, the build, dist, and speech_python_rec_sdk.egg-info files are generated.

3.Modify the concrete parameters of the file in `demo`:

    recognizer_demo.py and transcriber_demo.py are execution files for one-sentence recognition and real-time speech recognition, respectively.
    
    //Enter the appID that you get when you purchase a service in the platform
      app_id = '#####'
      
    //Enter appSecret, which you get when you purchase a service in the platform
      app_secret = '#####'
      
    //Enter the path of the voice file to be identified. Change it to the path of the customized audio file
      audio_path = '####'
      
    //Input language, format see platform documentation center-Speech recognition-Development Guide
      lang_type = 'zh-cmn-Hans-CN'

4.Run the file recognizer_demo.py or transcriber_demo.py to recognize the speech. If the token fails or expires, please delete the local **SpeechRecognizer_token.txt** file or **SpeechTranscriber_token.txt** file and try again. If it is still outdated, please contact the technical staff.

    # To run the command in the demo directory, set parameters such as app_id corresponding to python files in the demo.
    $ python recognizer_demo.py 
    $ python transcriber_demo.py 
    #After the successful run, the SpeechRecognizer_token.txt or SpeechTranscriber_token.txt files are generated in the path where the demo is running.

Note: - If "timestamp timeout" or "timestamp is greater than the current time" is displayed, the local time is inconsistent with the server time. According to the time difference in the error message,
Modify the _token.py file in the speech-python-rec file, modify the timestamp = int(t) line code appropriately,
timestamp = int(t)+1 or 2,3,4, etc., or timestamp = int(t)-1 or 2,3,4, etc.
	   

	- After _token.py is modified, the modification takes effect only after it is created again. The specific steps are as follows:
	
	Delete build, dist, and speech_python_rec_sdk.egg-info files created and generated in the SDK directory.
	
	To uninstall and reinstall the SDK, run $pip uninstall speech-python-rec-sdk and repeat steps 2,3,4.

## Parameter description

### Use of real-time speech recognition Demo

`speech_rec/demo/transcriber_demo.py` It is a real-time voice recognition demo, and you can run it directly.

#### Key interface description

Real-time speech recognition SDK is mainly completed using the `Transcriber` class, authorized to use the `Token` class to complete, code call steps:

1. Acquire the token by calling the `get_token()` method in the `SpeechClient` class.
1. Create an instance of the `SpeechTranscriber`.
1. Create the `Callback` instance.
1. Call the `set_token()` methods of the`SpeechTranscriber` instance to set parameters.
1. Connect to the server by invoking the `start()` method of the `SpeechTranscriber` instance.
1. Invoke the `send()` method of the `SpeechTranscriber` instance to send audio.
1. Call the `stop()` method of the `SpeechTranscriber` instance to stop the transmission.
1. Disconnect from the server by calling the `close()` method of the `SpeechTranscriber`instance.

#### Parameter description

| Parameter name                    | **type** | **Description**                                                                                                                                                                                                                                                                                                                                                            | **Default value** |
| --------------------------------- | -------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ----------------- |
| app_id                            | String   | Application id                                                                                                                                                                                                                                                                                                                                                             | required          |
| token                             | String   | To apply a Token, use Auth to obtain it                                                                                                                                                                                                                                                                                                                                    | required          |
| lang_type                         | String   | Identification language                                                                                                                                                                                                                                                                                                                                                    | required          |
| format                            | String   | Audio coding format                                                                                                                                                                                                                                                                                                                                                        | mp3               |
| sample_rate                       | Integer  | Audio sampling rate                                                                                                                                                                                                                                                                                                                                                        | 16000             |
| enable_intermediate_result        | Boolean  | Whether to return intermediate recognition results                                                                                                                                                                                                                                                                                                                         | false             |
| enable_punctuation_prediction     | Boolean  | Whether to add punctuation in post-processing                                                                                                                                                                                                                                                                                                                              | false             |
| enable_inverse_text_normalization | Boolean  | Whether to execute ITN in post-processing                                                                                                                                                                                                                                                                                                                                  | false             |
| max_sentence_silence              | Integer  | Voice break detection threshold. If the silence duration exceeds this threshold, it will be treated as broken sentences. Valid parameters range from 200 to 2000 (ms).                                                                                                                                                                                                     | 450               |
| enable_words                      | Boolean  | Whether to enable return word information                                                                                                                                                                                                                                                                                                                                  | false             |
| enable_modal_particle_filter      | Boolean  | Whether to enable modal word filtering                                                                                                                                                                                                                                                                                                                                     | false             |
| hotwords_id                       | String   | Hot word ID                                                                                                                                                                                                                                                                                                                                                                | none              |
| hotwords_weight                   | Float    | Hot word weight, value range [0.1, 1.0]                                                                                                                                                                                                                                                                                                                                    | 0.4               |
| correction_words_id               | String   | Forcibly replace thesaurus ids. Support multiple forcibly replace thesaurus ids, separated by a vertical bar \|; all Indicates that all mandatory replacement lexicon ids are used                                                                                                                                                                                         | none              |
| forbidden_words_id                | String   | Sensitive word ID. Multiple sensitive word ids are supported. Separate each sensitive word ID with a vertical line \|. all Indicates that all sensitive word ids are used                                                                                                                                                                                                  | none              |
| speaker_id                        | String   | Speaker number. speaker_id supports a maximum of 36 characters, and the excess part will be truncated and discarded. If the speaker_id parameter is not passed in the SpeakerStart event, speaker_id in the return result will be empty. The SpeakerStart event triggers the mandatory clause. Therefore, send the SpeakerStart event only once before switching speakers. | none              |
| enable_save_log                   | Boolean  | Can you provide voice data and recognition result logs for us to use to improve product and service quality?                                                                                                                                                                                                                                                               | true              |
#### **Real-time speech recognition example code**

For the full code, see the `speech_python_rec/demo/transcriber_demo.py` file in the SDK.

```python
# -*- coding: utf-8 -*-
import json
import os.path
import time
import threading
import traceback

import speech_rec
from speech_rec.callbacks import SpeechTranscriberCallback
from speech_rec.parameters import DefaultParameters, Parameters

token = None
expire_time = 7  # Expiration time

info_list = [[], [], False]


class MyCallback(SpeechTranscriberCallback):
    """
    The parameters of the constructor are not required. You can add them as needed
    The name parameter in the example can be used as the audio file name to be recognized for distinguishing in multithreading
    """

    def __init__(self, name='default'):
        self._name = name

    def started(self, message):
        self.print_message(message)

    def result_changed(self, message):
        self.print_message(message)

    def sentence_begin(self, message):
        self.print_message(message)

    def sentence_end(self, message):
        global info_list
        channel = message['header']['user_id']
        begin_time = message['payload']['begin_time']
        end_time = message['payload']['time']
        result = message['payload']['result']
        if channel == "left" or channel == "right":
            if channel == "left":
                if result:
                    info_list[0].append([channel, begin_time, end_time, result])
            elif channel == "right":
                if result:
                    info_list[1].append([channel, begin_time, end_time, result])
            self.print_info()
        else:
            print(message)

    def completed(self, message):
        try:
            print(message)
        except Exception as ee:
            print(ee)
            traceback.print_exc()
        global info_list
        info_list[2] = True

    def print_info(self, ):
        left_list = info_list[0]
        right_list = info_list[1]
        if_end = info_list[2]

        def format_string(data_list, list_name):
            channel, begin_time, end_time, result = data_list[0]
            if list_name == "left_list":
                info_list[0].pop(0)
            else:
                info_list[1].pop(0)

            return f"channel:{channel}\tbegin_time:{begin_time}\tend_time:{end_time}\tresult:{result}"

        if left_list and right_list:
            while True:
                if not left_list and not right_list:
                    break
                if left_list and right_list:
                    left_begin_time = left_list[0][1]
                    left_end_time = left_list[0][2]
                    right_begin_time = right_list[0][1]
                    right_end_time = right_list[0][2]
                    if left_begin_time == right_begin_time and left_end_time > right_end_time:
                        print(format_string(right_list, "right_list"))
                    elif left_begin_time == right_begin_time and left_end_time <= right_end_time:
                        print(format_string(left_list, "left_list"))
                    elif left_begin_time < right_begin_time:
                        print(format_string(left_list, "left_list"))
                    elif left_begin_time >= right_begin_time:
                        print(format_string(right_list, "right_list"))
                if left_list and not right_list:
                    if left_end_time > right_end_time:
                        break
                    else:
                        print(format_string(left_list, "left_list"))
                if not left_list and right_list:
                    if right_end_time > left_end_time:
                        print(format_string(right_list, "right_list"))
                    else:
                        break
        elif if_end:
            while left_list:
                print(format_string(left_list, "left_list"))
            while right_list:
                print(format_string(right_list, "right_list"))

    def print_message(self, message):
        channel = message['header']['user_id']
        if channel == "left" or channel == "right":
            pass
        else:
            print(message)

    def task_failed(self, message):
        print(message)

    def warning_info(self, message):
        print(message)

    def channel_closed(self):
        print('MyCallback.OnTranslationChannelClosed')


def solution(client, app_id, app_secret, audio_path, lang_type, kwargs):
    """
    Transcribe speech,single thread
    :param client: SpeechClient
    :param app_id: Your app_id
    :param app_secret: Your app_secret
    :param audio_path: Audio path
    :param lang_type: Language type
    """
    each_audio_format = kwargs.get("audio_format", DefaultParameters.MP3)
    field_ = kwargs.get("field", DefaultParameters.FIELD)
    user_id = kwargs.get("user_id", "default")
    print("ccc",kwargs)
    assert os.path.exists(audio_path), "Audio file path error, please check your audio path."
    if judging_expire_time(app_id, app_secret, expire_time):
        callback = MyCallback(audio_path)
        transcriber = client.create_transcriber(callback)
        transcriber.set_app_id(app_id)
        transcriber.set_token(token)
        # fixme You can customize the configuration according to the official website documentation
        payload = {
            "lang_type": lang_type,
            "format": each_audio_format,
            "field": field_,
            "sample_rate": sample_rate,
            "user_id": user_id
        }
        transcriber._payload.update(**payload)
        try:
            ret = transcriber.start()
            if ret < 0:
                return ret
            with open(audio_path, 'rb') as f:
                audio = f.read(7680)
                cnt = 0
                while audio:
                    ret = transcriber.send(audio)
                    # fixme: If you need to mandatory clause or set speaker id by yourself, please use the codes below

                    # Default, customizable and changeable
                    # if cnt % 768000 == 0:
                    #     # Mandatory clause setting
                    #     transcriber.set_mandatory_clause(True)
                    #     transcriber._header = transcriber.get_mandatory_clause()
                    #     transcriber.send(json.dumps({Parameters.HEADER: transcriber._header}), False)
                    #     # Set speaker ID
                    #     transcriber.set_speaker_id(speaker_id)
                    #     speaker_id_info = transcriber.get_speaker_id()
                    #     transcriber.send(json.dumps(speaker_id_info), False)
                    #     print("Mandatory and Set speaker:",transcriber._payload)

                    if ret < 0:
                        break
                    cnt += 7680
                    time.sleep(0.24)
                    audio = f.read(7680)
            transcriber.stop()
        except Exception as e:
            print(e)
        finally:
            transcriber.close()
    else:
        print("token expired")


def judging_expire_time(app_id, app_secret, extime):
    global token
    token_file = "SpeechTranscriber_token.txt"
    new_time = time.time()
    if not os.path.exists(token_file):
        client.get_token(app_id, app_secret, token_file)

    with open(token_file, "r", encoding="utf-8") as fr:
        token_info = eval(fr.read())
    old_time = token_info['time']
    token = token_info['token']
    flag = True
    if new_time - old_time > 60 * 60 * 24 * (extime - 1):
        flag, _ = client.get_token(app_id, app_secret, token_file)
        if flag:
            flag = True
            pass
        else:
            for i in range(7):
                flag, _ = client.get_token(app_id, app_secret, token_file)
                if flag is not None:
                    flag = True
                    break
    return flag


def channels_split_solution(audio_path, right_path, left_path, **kwargs):
    client = kwargs.get('client')
    appid = kwargs.get('app_id')
    appsecret = kwargs.get('app_secret')
    langtype = kwargs.get('lang_type')
    remove_audio = kwargs.get('rm_audio', True)
    client.auto_split_audio(audio_path, right_path, left_path)
    thread_list = []
    right_kwargs = kwargs.copy()
    right_kwargs["user_id"] = "right"
    thread_r = threading.Thread(target=solution, args=(client, appid, appsecret, right_path, langtype, right_kwargs))
    thread_list.append(thread_r)
    left_kwargs = kwargs.copy()
    left_kwargs["user_id"] = "left"
    thread_l = threading.Thread(target=solution, args=(client, appid, appsecret, left_path, langtype, left_kwargs))
    thread_list.append(thread_l)
    for thread in thread_list:
        thread.start()
    for thread in thread_list:
        thread.join()
    if remove_audio:
        try:
            os.remove(right_path)
            os.remove(left_path)
        except Exception as ee:
            print(ee)
            traceback.print_exc()


if __name__ == "__main__":
    client = speech_rec.SpeechClient()
    # Set the level of output log information：DEBUG、INFO、WARNING、ERROR
    client.set_log_level('INFO')
    # Type your app_id and app_secret
    app_id = ""  # your app id
    app_secret = ""  # your app secret
    audio_path = ""  # audio path
    lang_type = ""  # lang type
    field = ""  # field
    sample_rate = 16000  # sample rate [int] 16000 or 8000
    audio_format = ""  # audio format
    assert app_id and app_secret and audio_path and lang_type and field and sample_rate and audio_format, "Please check args"
    channel = client.get_audio_info(audio_path)['channel']
    # fixme This is just a simple example, please modify it according to your needs.
    if channel == 1:
        kwargs = {
            "field": field,
            "sample_rate": sample_rate,
            "audio_format": audio_format,
            "user_id": "",
        }
        solution(client, app_id, app_secret, audio_path, lang_type, kwargs)
    elif channel == 2:
        # Dual channel 8K audio solution
        channels_split_solution(audio_path=audio_path,
                                left_path=f"left.{audio_format}",
                                right_path=f"right.{audio_format}",
                                client=client,
                                app_id=app_id,
                                app_secret=app_secret,
                                lang_type=lang_type,
                                field=field,
                                sample_rate=sample_rate,
                                audio_format=audio_format,
                                )
```

### One sentence identification Demo use

`speech_rec/demo/recognizer_demo.py` For a sentence to identify the demo, run directly.<br />
#### Key interface description

In one sentence, the recognition SDK is mainly completed by using the `Recognizer` class, and the authorization is completed by using the `Token` class.

1. Acquire the token by calling the `get_token()` method in the `SpeechClient` class.
1. Create an instance of the `SpeechRecognizer`.
1. Create the`Callback` instance.
1. Call the `set_token()` method of the `SpeechRecognizer` instance to set the parameters.
1. Connect to the server by calling the `start()` method of the `SpeechRecognizer` instance.
1. Call the `SpeechRecognizer` instance's `send()` method to send audio.
1. Call the `stop()` method of the `SpeechRecognizer` instance to stop the transmission.
1. Disconnect from the server by calling the `close()` method of the `SpeechRecognizer` instance.

| **Parameter name**                | **Type** | **Description**                                                                                                                                                                    | **Default value** |
|-----------------------------------|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|
| app_id                            | String   | Application id                                                                                                                                                                     | required          |
| token                             | String   | To apply a Token, use Auth to obtain it                                                                                                                                            | required          |
| lang_type                         | String   | Identification language                                                                                                                                                            | required          |
| format                            | String   | Audio coding format                                                                                                                                                                | mp3               |
| sample_rate                       | Integer  | Audio sampling rate                                                                                                                                                                | 16000             |
| enable_intermediate_result        | Boolean  | Whether to return intermediate recognition results                                                                                                                                 | false             |
| enable_punctuation_prediction     | Boolean  | Whether to add punctuation in post-processing                                                                                                                                      | false             |
| enable_inverse_text_normalization | Boolean  | Whether to execute ITN in post-processing                                                                                                                                          | false             |
| max_sentence_silence              | Integer  | Voice break detection threshold. If the silence duration exceeds this threshold, it will be treated as broken sentences. Valid parameters range from 200 to 2000 (ms).             | 450               |
| enable_words                      | Boolean  | Whether to enable return word information                                                                                                                                          | false             |
| enable_modal_particle_filter      | Boolean  | Whether to enable modal word filtering                                                                                                                                             | false             |
| hotwords_id                       | String   | Hot word ID                                                                                                                                                                        | none              |
| hotwords_weight                   | Float    | Hot word weight, value range [0.1, 1.0]                                                                                                                                            | 0.4               |
| correction_words_id               | String   | Forcibly replace thesaurus ids. Support multiple forcibly replace thesaurus ids, separated by a vertical bar \|; all Indicates that all mandatory replacement lexicon ids are used | none              |
| forbidden_words_id                | String   | Sensitive word ID. Multiple sensitive word ids are supported. Separate each sensitive word ID with a vertical line \|. all Indicates that all sensitive word ids are used          | none              |
| enable_save_log                   | Boolean  | Can you provide voice data and recognition result logs for us to use to improve product and service quality?                                                                       | true              |
#### One sentence identification sample code

For the full code, see the `speech_python_rec/demo/recognizer_demo.py`file in the SDK.

```python
# -*- coding: utf-8 -*-
import os
import time
import threading
import speech_rec
from speech_rec.callbacks import SpeechRecognizerCallback
from speech_rec.parameters import DefaultParameters

token = None
expire_time = 7  # Expiration time


class Callback(SpeechRecognizerCallback):
    """
    The parameters of the constructor are not required. You can add them as needed
    The name parameter in the example can be used as the audio file name to be recognized for distinguishing in multithreading
    """

    def __init__(self, name='SpeechRecognizer'):
        self._name = name

    def started(self, message):
        print('MyCallback.OnRecognitionStarted: %s' % message)

    def result_changed(self, message):
        print('MyCallback.OnRecognitionResultChanged: file: %s, task_id: %s, payload: %s' % (
            self._name, message['header']['task_id'], message['payload']))

    def completed(self, message):
        print('MyCallback.OnRecognitionCompleted: file: %s, task_id:%s, payload:%s' % (
            self._name, message['header']['task_id'], message['payload']))

    def task_failed(self, message):
        print(message)

    def warning_info(self, message):
        print(message)

    def channel_closed(self):
        print('MyCallback.OnRecognitionChannelClosed')

def solution(client, app_id, app_secret, audio_path, lang_type, kwargs):
    """
    Recognize speech,single thread
    :param client: SpeechClient
    :param app_id: Your app_id
    :param app_secret: Your app_secret
    :param audio_path: Audio path
    :param lang_type: Language type
    """
    assert os.path.exists(audio_path), "Audio file path error, please check your audio path."
    sample_rate = kwargs.get("sample_rate", DefaultParameters.SAMPLE_RATE_16K)
    each_audio_format = kwargs.get("audio_format", DefaultParameters.MP3)
    field_ = kwargs.get("field", DefaultParameters.FIELD)

    if judging_expire_time(app_id, app_secret, expire_time):
        callback = Callback(audio_path)
        recognizer = client.create_recognizer(callback)
        recognizer.set_app_id(app_id)
        recognizer.set_token(token)
        # fixme You can customize the configuration according to the official website documentation
        payload = {
            "lang_type": lang_type,
            "format": each_audio_format,
            "field": field_,
            "sample_rate": sample_rate,
        }
        recognizer._payload.update(**payload)
        try:
            ret = recognizer.start()
            if ret < 0:
                return ret
            print('sending audio...')
            cnt = 0
            with open(audio_path, 'rb') as f:
                audio = f.read(7680)
                while audio:
                    cnt += 0.24
                    ret = recognizer.send(audio)
                    if ret < 0:
                        break
                    time.sleep(0.24)
                    audio = f.read(7680)
            recognizer.stop()
        except Exception as ee:
            print(f"send ee:{ee}")
        finally:
            recognizer.close()
    else:
        print("token expired")


def judging_expire_time(app_id, app_secret, extime):
    global token
    new_time = time.time()
    token_file = "SpeechRecognizer_token.txt"
    if not os.path.exists(token_file):
        client.get_token(app_id, app_secret, token_file)
    with open(token_file, "r", encoding="utf-8") as fr:
        token_info = eval(fr.read())
    old_time = token_info['time']
    token = token_info['token']
    flag = True
    if new_time - old_time > 60 * 60 * 24 * (extime - 1):
        flag, _ = client.get_token(app_id, app_secret, token_file)
        if flag:
            flag = True
            pass
        else:
            for i in range(7):
                flag, _ = client.get_token(app_id, app_secret, token_file)
                if flag is not None:
                    flag = True
                    break
    return flag


def channels_split_solution(audio_path, right_path, left_path, **kwargs):
    client = kwargs.get('client')
    appid = kwargs.get('app_id')
    appsecret = kwargs.get('app_secret')
    langtype = kwargs.get('lang_type')
    remove_audio = kwargs.get('rm_audio', True)
    client.auto_split_audio(audio_path, right_path, left_path)
    thread_list = []
    thread_r = threading.Thread(target=solution, args=(client, appid, appsecret, right_path, langtype, kwargs))
    thread_list.append(thread_r)
    thread_l = threading.Thread(target=solution, args=(client, appid, appsecret, left_path, langtype, kwargs))
    thread_list.append(thread_l)
    for thread in thread_list:
        thread.start()
    for thread in thread_list:
        thread.join()
    if remove_audio:
        os.remove(right_path)
        os.remove(left_path)
    pass


if __name__ == "__main__":
    client = speech_rec.SpeechClient()
    # Set the level of output log information：DEBUG、INFO、WARNING、ERROR
    client.set_log_level('INFO')
    # Type your app_id and app_secret
    app_id = ""  # your app id
    app_secret = ""  # your app secret
    audio_path = ""  # audio path
    lang_type = ""  # lang type
    field = ""  # field
    sample_rate = 16000  # sample rate [int] 16000 or 8000
    audio_format = ""  # audio format
    assert app_id and app_secret and audio_path and lang_type and field and sample_rate and audio_format, "Please check args"
    channel = client.get_audio_info(audio_path)['channel']
    # fixme This is just a simple example, please modify it according to your needs.
    multi = False
    process_num = 4
    if channel == 1:
        kwargs = {
            "field": field,
            "sample_rate": sample_rate,
            "audio_format":audio_format
        }
        solution(client, app_id, app_secret, audio_path, lang_type, kwargs)

    elif channel == 2:
        # Dual channel 8K audio solution
        channels_split_solution(audio_path=audio_path,
                                left_path=f"left.{audio_format}",
                                right_path=f"right.{audio_format}",
                                client=client,
                                app_id=app_id,
                                app_secret=app_secret,
                                lang_type=lang_type,
                                field=field,
                                sample_rate=sample_rate,
                                audio_format=audio_format)
```
