Develop With OCI Actual-Time Speech Transcription – DZone – Uplaza

Converse in your pure language, ask questions on your knowledge, and have the solutions returned to you in your pure language as nicely: that is the target, and what I will present on this fast weblog and, as all the time, present full src repos for as nicely. I will depart the use circumstances as much as you from there.  

You possibly can study extra about these Oracle Database options right here for the free cloud model and right here for the free container/picture model. Additionally, you’ll be able to take a look at the Develop with Oracle AI and Database Companies: Gen, Imaginative and prescient, Speech, Language, and OML workshop, which explains the way to create this software and quite a few different examples in addition to the GitHub repos that comprise all of the src code.

Now, let’s get into it. First, I will present the setup for the Choose AI database aspect (which, in flip, calls Gen AI service), then the OCI Actual-time Speech AI Transcription service, and at last the front-end Python app that brings all of it collectively.

Oracle Database NL2SQL/Choose AI (With Gen AI)

Whereas Oracle Database model 23ai comprises a variety of AI options similar to vector search, RAG, Spatial AI, and so on., NL2SQL/Choose AI was launched in model 19.

We have now a stateless Python software, so we’ll be making calls to:

DBMS_CLOUD_AI.GENERATE(
            immediate       => :immediate,
            profile_name => :profile_name,
            motion       => :motion)

Let’s take a look at every of those three arguments.

  • The immediate is the pure language string, beginning with “select ai.”
  • The profile_name is the identify of the AI Profile created within the database for OCI Generative AI (or no matter AI service is getting used) with the credential data and optionally an object_list with meta data in regards to the knowledge. Oracle Autonomous Database helps fashions from OCI Generative AI, Azure OpenAI, OpenAI, and Cohere. In our pattern app, we use the Llama 3 mannequin supplied by OCI Generative AI. Right here is an instance code to create_profile:
dbms_cloud_admin.enable_resource_principal(username  => 'MOVIESTREAM');

dbms_cloud_ai.create_profile(
        profile_name => 'genai',
        attributes =>       
            '{"provider": "oci",
            "credential_name": "OCI$RESOURCE_PRINCIPAL",
            "comments":"true",            
            "object_list": [
                {"owner": "MOVIESTREAM", "name": "GENRE"},
                {"owner": "MOVIESTREAM", "name": "CUSTOMER"},
                {"owner": "MOVIESTREAM", "name": "PIZZA_SHOP"},
                {"owner": "MOVIESTREAM", "name": "STREAMS"},            
                {"owner": "MOVIESTREAM", "name": "MOVIES"},
                {"owner": "MOVIESTREAM", "name": "ACTORS"}
             ]
            }'
        );
  • Lastly, the motion is one in every of 4 choices for the interplay/immediate and kind/format of the solutions which can be returned to you from Oracle Database’s Choose AI characteristic. In our pattern app, we use narrate; nevertheless, we may use others.
    • narrate returns the reply as a narration in pure language.
    • chat as a chat alternate in pure language
    • showsql returns the uncooked SQL for the reply/question.
    • runsql will get the SQL after which runs it and returns the uncooked question outcomes.

OCI Actual-Time Speech Transcription

OCI Actual-time Speech Transcription is predicted to be launched throughout the month and contains Whisper mannequin multilingual help with diarization capabilities. 

Utilizing this service merely requires that sure insurance policies are created to supply entry for a given person/compartment/group/tenancy. These could be specified at varied ranges and would usually be extra restricted than the next however this offers a listing of the assets wanted.

enable any-user to handle ai-service-speech-family in tenancy
enable any-user to handle object-family in tenancy
enable any-user to learn tag-namespaces in tenancy
enable any-user to make use of ons-family in tenancy
enable any-user to handle cloudevents-rules in tenancy
enable any-user to make use of virtual-network-family in tenancy
enable any-user to handle function-family in tenancy

The choices for accessing the service from an exterior consumer are primarily the identical as accessing any OCI/cloud service. On this case, we use an OCI config file and generate a security_token utilizing the next.

oci session authenticate ; oci iam area record --config-file /Customers/YOURHOMEDIR/.oci/config --profile MYSPEECHAIPROFILE --auth security_token

From there it is only a matter of utilizing the popular SDK consumer libraries to name the speech service. In our case, we’re utilizing Python.

The Python App

Right here is the output of our software the place we are able to see: 

  • A printout of the phrases (pure language) spoken into the microphone and transcribed by the real-time transcription service.
  • The set off of a Choose AI command, with “narrate” motion, in response to the person saying “select ai.”
  • The outcomes of the decision to the Oracle database Choose AI perform returned in pure language.

Let’s take the applying step-by-step.

First, we see the Python imports:

  • asyncio occasion processing loop
  • getpass to get the database and pockets/ewallet.pem passwords from the applying immediate
  • pyaudio for processing microphone occasions/sound 
  • oracledb skinny driver for accessing the Oracle database and making Choose AI calls
  • oci sdk core and speech libraries for real-time speech transcription calls
import asyncio
import getpass

import pyaudio
import oracledb
import oci
from oci.config import from_file
from oci.auth.signers.security_token_signer import SecurityTokenSigner
from oci.ai_speech_realtime import (
    RealtimeClient,
    RealtimeClientListener,
    RealtimeParameters,
)

Then we see the primary loop the place sound from the microphone is fed to the OCI teal-time speech transcription API consumer and to the cloud providers by way of WebSocket. The consumer is created by specifying the OCI config talked about earlier together with the URL of the speech service and the compartment ID.

def message_callback(message):
    print(f"Received message: {message}")
realtime_speech_parameters: RealtimeParameters = RealtimeParameters()
realtime_speech_parameters.language_code = "en-US"
realtime_speech_parameters.model_domain = (
    realtime_speech_parameters.MODEL_DOMAIN_GENERIC
)
realtime_speech_parameters.partial_silence_threshold_in_ms = 0
realtime_speech_parameters.final_silence_threshold_in_ms = 2000
realtime_speech_parameters.should_ignore_invalid_customizations = False
realtime_speech_parameters.stabilize_partial_results = (
    realtime_speech_parameters.STABILIZE_PARTIAL_RESULTS_NONE
)
realtime_speech_url = "wss://realtime.aiservice.us-phoenix-1.oci.oraclecloud.com"
consumer = RealtimeClient(
    config=config,
    realtime_speech_parameters=realtime_speech_parameters,
    listener=SpeechListener(),
    service_endpoint=realtime_speech_url,
    signer=authenticator(),
    compartment_id="ocid1.compartment.oc1..MYcompartmentID",
)
loop = asyncio.get_event_loop()
loop.create_task(send_audio(consumer))
loop.create_task(check_idle())
loop.run_until_complete(consumer.join())
if stream.is_active():
    stream.shut()

If the transcribed speech comprises “select ai”, the applying waits for two seconds, and if there isn’t a additional speech, takes the command from “select ai” on, and sends it over to the database server utilizing the Oracle Python driver. The next is the code for the connection creation and execution of this utilizing DBMS_CLOUD_AI.GENERATE (immediate, profile_name, motion) described earlier.

pw = getpass.getpass("Enter database user password:")
# Use this when making a reference to a pockets
connection = oracledb.join(
    person="moviestream",
    password=pw,
    dsn="selectaidb_high",
    config_dir="/Users/pparkins/Downloads/Wallet_SelectAIDB",
    wallet_location="/Users/pparkins/Downloads/Wallet_SelectAIDB"
)

def executeSelectAI():
    world cummulativeResult
    print(f"executeSelectAI called cummulative result: {cummulativeResult}")
    # for instance immediate => 'choose ai I'm in search of the highest 5 promoting films for the most recent month please',
    question = """SELECT DBMS_CLOUD_AI.GENERATE(
                immediate       => :immediate,
                profile_name => 'openai_gpt35',
                motion       => 'narrate')
            FROM twin"""
    with connection.cursor() as cursor:
        cursor.execute(question, immediate=cummulativeResult)
        consequence = cursor.fetchone()
        if consequence and isinstance(consequence[0], oracledb.LOB):
            text_result = consequence[0].learn()
            print(text_result)
        else:
            print(consequence)
    # Reset cumulativeResult after execution
    cummulativeResult = ""

Video

A walkthrough of this content material will also be considered right here:

Concluding Notes

The subsequent logical step after all is so as to add text-to-speech (TTS) performance for the reply and OCI has a brand new service for that as nicely. I will put up an up to date instance together with this within the close to future.

Thanks for studying and please don’t hesitate to contact me with any questions or suggestions you will have. I might love to listen to from you.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version