Blog

Thoughts, insights, and updates on AI, Machine Learning, and Technology

Showing 4 of 4 posts

OpenAI Whisper Inference Guide

OpenAI Whisper Inference Guide

OpenAI Whisper is an open source automatic speech recognition system developed by OpenAI. It is built on a Transformer encoder decoder architecture and trained on a large scale multilingual dataset collected from the web. Whisper is capable of transcribing speech in multiple languages, identifying the spoken language, and translating speech directly into English. Inference with Whisper is straightforward. After installing the package, a pretrained model can be loaded with a single function call. The model automatically handles audio preprocessing, including resampling to 16 kHz and conversion to log Mel spectrogram features. Once the audio is processed, the decoder generates text autoregressively, producing a transcription along with metadata such as detected language and segment timestamps. Whisper supports multiple model sizes, allowing users to balance accuracy and computational cost. Smaller models are suitable for low latency or edge environments, while larger models offer improved robustness to accents and background noise. GPU acceleration through PyTorch significantly improves inference speed, especially for medium and large checkpoints.

Ayush Kumar Bar
#ai#whisper#machinelearning#opeani#asr#ml
All About Transformers by Huggingface

All About Transformers by Huggingface

Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal models, for both inference and training.

Ayush Kumar Bar
#AI#ML#Transformers#VLLM#ASR#LLM#Qlora

Test Post 1

A simple test post 1

Ayush Kumar Bar
Welcome to My Blog

Welcome to My Blog

This is a sample blog post to demonstrate the MDX-based CMS system

Ayush Kumar Bar
#welcome#blog#cms