Real-time implementation of speaker recognition technology involves multiple application-specific tradeoffs such as cost, performance, robustness, enrollment procedures, time to train, adaptability, response time , etc.[2]. As the size of databases for real-world recognition tasks is continuously increasing, large-scale speaker recognition systems pose challenges such as large training times, large memory requirements, and poor response times [3]. While accuracy is always the first consideration, efficient recognition and adaptability are also significant aspects in many real-world speaker recognition systems under large-scale data conditions. This motivates to study new methods at various stages involved in a typical speaker recognition system. The typical speaker recognition process mainly consists of two stages, namely; training (also known as enrollment) and identification [4]. The recording phase collects speaker-specific information about the speech signal in a chronological manner to develop speaker models. The cluster of these models in turn constitutes the speaker database for the subsequent testing phase. During the identification phase, an anonymous speaker model is compared to the existing database and thus the results are accelerated. In fact, both phases include feature extraction that transforms the raw speech signal into a compact but effective representation, comparatively more stable and discriminative than the original signal. A typical speaker recognition system includes the following steps: feature extraction, dimensionality reduction and classification [3]. In this work, in view of the development of an effective large-scale recognition system, the state of the art has met... half of the article... experiments conducted by combining different dimensionality reduction techniques which are chosen . While the proposed algorithms are not strictly new, the approaches are very efficient for large datasets in classification tasks such as pattern and speech recognition. This work demonstrates original contributions of interest in the field of large-scale speaker recognition systems. The following section provides a brief description of feature extraction from excitation source information. Section III describes the computation methods of MPCA, PFA and MLFA dimensionality reduction techniques. Section IV briefly outlined four SVM training algorithms suitable for large-scale data. In Section V, the experimental results are summarized and compared with the performance. The last section concludes the results and proposes directions for the future scope.
tags