OpenAI 的 Whisper API 用于转录和翻译

由柏拉图重新发布

关注： 0

作者插图| 来源：平面图标

您是否积累了很多录音，但没有精力开始聆听和转录？当我还是一名学生时，我记得我每天都必须努力听好几个小时的录音课程，而且我的大部分时间都花在了抄写上。此外，这不是我的母语，我必须将每个句子拖到谷歌翻译中才能将其转换为意大利语。

现在，手动转录和翻译只是一种记忆。著名的ChatGPT研究公司OpenAI推出了用于语音转文本对话的Whisper API！只需几行 Python 代码，您就可以调用这个强大的语音识别模型，抛开杂念，专注于其他活动，例如进行数据科学项目实践和改进您的作品集。让我们开始吧！

Whisper是OpenAI开发的基于神经网络的模型，用于解决语音转文本任务。它属于 GPT-3 系列，因其能够以非常高的准确度将音频转录为文本而变得非常受欢迎。

它不限制处理英语，但其能力扩展到 50 多种语言。如果您有兴趣了解您的语言是否包含在内，请检查相关信息。此外，它可以将任何语言的音频翻译成英语。

与其他 OpenAI 产品一样，有一个 API 可以访问这些语音识别服务，从而允许开发人员和数据科学家将 Whisper 集成到他们的平台和应用程序中。

OpenAI 的 Whisper API 用于转录和翻译
作者的 GIF

在继续之前，您需要执行几个步骤才能访问 Whisper API。首先，登录 OpenAI API 网站。如果您还没有该帐户，则需要创建它。输入后，单击您的用户名，然后按“查看 API 密钥”选项。然后，单击“创建新 API 密钥”按钮并将新创建的 API 密钥复制到您的 Python 代码中。

首先，我们来下载 Kevin Stratvert 的 YouTube 视频，Kevin Stratvert 是一位非常受欢迎的 YouTuber，通过学习 Power BI、视频编辑和 AI 产品等工具，帮助来自世界各地的学生掌握技术并提高技能。例如，假设我们要转录视频“3 Mind-blowing AI Tools”。

我们可以使用 pytube 库直接下载该视频。要安装它，您需要以下命令行：

pip install pytube3
pip install openai

我们还安装了 openai 库，因为稍后将在本教程中使用它。安装完所有 python 库后，我们只需将视频的 URL 传递给 Youtube 对象即可。之后，我们获得最高分辨率的视频流，然后下载视频。

from pytube import YouTube video_url = "https://www.youtube.com/watch?v=v6OB80Vt1Dk&t=1s&ab_channel=KevinStratvert" yt = YouTube(video_url)
stream = yt.streams.get_highest_resolution()
stream.download()

下载文件后，就可以开始有趣的部分了！

import openai API_KEY = 'your_api_key'
model_id = 'whisper-1'
language = "en"
audio_file_path = 'audio/5_tools_audio.mp4'
audio_file = open(audio_file_path, 'rb')

设置好参数并打开音频文件后，我们就可以转录音频并将其保存为Txt文件。

response = openai.Audio.transcribe( api_key=API_KEY, model=model_id, file=audio_file, language='en'
)
transcription_text = response.text
print(transcription_text)

输出：

Hi everyone, Kevin here. Today, we're going to look at five different tools that leverage artificial intelligence in some truly incredible ways. Here for instance, I can change my voice in real time. I can also highlight an area of a photo and I can make that just automatically disappear. Uh, where'd my son go? I can also give the computer instructions, like, I don't know, write a song for the Kevin cookie company....

正如预期的那样，输出非常准确。就连标点符号都如此精确，让我印象深刻！

这次，我们会将音频从意大利语翻译成英语。和以前一样，我们下载音频文件。在我的示例中，我使用这个youtube视频意大利著名 YouTuber Piero Savastano 的作品，他以非常简单有趣的方式教授机器学习。您只需复制之前的代码并仅更改 URL 即可。下载后，我们像以前一样打开音频文件：

audio_file_path = 'audio/ml_in_python.mp4'
audio_file = open(audio_file_path, 'rb')

然后，我们可以从意大利语开始生成英语翻译。

response = openai.Audio.translate( api_key=API_KEY, model=model_id, file=audio_file
)
translation_text = response.text
print(translation_text)

输出：

We also see some graphs in a statistical style, so we should also understand how to read them. One is the box plot, which allows to see the distribution in terms of median, first quarter and third quarter. Now I'm going to tell you what it means. We always take the data from the data frame. X is the season. On Y we put the count of the bikes that are rented. And then I want to distinguish these box plots based on whether it is a holiday day or not. This graph comes out. How do you read this? Here on the X there is the season, coded in numerical terms. In blue we have the non-holiday days, in orange the holidays. And here is the count of the bikes. What are these rectangles? Take this box here. I'm turning it around with the mouse....

就是这样！我希望本教程能够帮助您开始使用 Whisper API。在本案例研究中，它应用于 YouTube 视频，但您也可以尝试播客、Zoom 通话和会议。我发现转录和翻译后获得的输出非常令人印象深刻！这个人工智能工具现在肯定可以帮助很多人。唯一的限制是它只能翻译成英文文本，反之亦然，但我相信 OpenAI 很快就会提供它。谢谢阅读！祝你今天过得愉快！

尤金妮娅·阿内罗 现为意大利帕多瓦大学信息工程系研究员。她的研究项目专注于结合异常检测的持续学习。