Saturday, May 9, 2020

C++ Multiple Implementation Design Pattern

It is common to have multiple implementations to a given problem. In STL, for example, you can build a queue from a vector or a list, i.e,

std::queue<int, std::vector<int>> queue_using_vector;
std::queue<int, std::list<int>> queue_using_list;

Here, the container is provided as a template and therefore must be declared during compilation time. What if, here comes the question, I want to be able to choose its implementation during runtime and not compilation time?

In this post, I want to show the closest solution to the problem above. Consider the following scenario. I want to build a queue that uses some container APIs. The container can be either implemented through an array or through a list. I want to be able to decide this during runtime.

The code below shows a sketch of this design pattern. First, you must create its container APIs through a purely abstract class, i.e., define container interface. Then, you implement the APIs using two implementations: using an array and using a list. Next, you define a concrete container class which takes in either one of the implementation pointer. Finally, you can now use this concrete container class to implement a queue.


Cool, right? Happy hacking!

Friday, March 6, 2020

Docker Cheat Sheet

Concepts:
- container: an instance of running image

View images
$ docker images

Delete an image
$ docker image rm IMAGE_ID

View running containers
$ docker container ls

View all containers
$ docker container ls --all

Delete a container
$ docker container rm CONTAINER_NAME [--force]

Delete all containers
$ docker container prune

To run a new ubuntu container and start terminal
$ docker run -it ubuntu

To run bash on running container
$ docker exec -it CONTAINER_NAME bash

Monday, February 17, 2020

Kaldi series 1 - setup/debug with CLion

In the series of posts, I will describe how to run automatic speech recognition (ASR) system with Kaldi. For the best debugging experience, I will describe step by step run/debug instructions with CLion, the best C++ IDE on non-Windows systems. FYI, I am running these commands on macOS 10.15 (Catalina), but should be similar on Linux systems.

In the very first series, we will simply setup Kaldi project on CLion for running and debugging.
$ git clone https://github.com/kaldi-asr/kaldi.git && cd kaldi

Kaldi recently added CMake support (Thank you so much!), and it will be so much easier for CLion to load the project now. Run CLion and open up Kaldi directory. Run Build --> Build All in Debug. This process will take quite some time, so please be patient.

Unfortunately, there are other things to take care of. The following commands will take some time to run, so be patient.
$ cd tools && make -j4
$ extras/install_irstlm.sh && cd ..

Once you are done, let's run a pre-trained model to see if it works fine.
$ cd egs/apiai_decode/s5
$ ./download-model.sh

We also need to let CMake-built binary files to be used. Edit path.sh as below:
export KALDI_ROOT=`pwd`/../../..
export KALDI_CMAKE_ROOT=`pwd`/../../../cmake-build-debug
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/tools/openfst/bin:$PWD:$PATH
[ ! -f $KALDI_ROOT/tools/config/common_path.sh ] && echo >&2 "The standard file $KALDI_ROOT/src/path.sh is not present -> Exit!" && exit 1
. $KALDI_ROOT/tools/config/common_path.sh
export LC_ALL=C

Lastly, edit tools/config/common_path.sh by replacing KALDI_ROOT to KALDI_CMAKE_ROOT as follows:
# we assume KALDI_CMAKE_ROOT is already defined
[ -z "$KALDI_CMAKE_ROOT" ] && echo >&2 "The variable KALDI_CMAKE_ROOT must be already defined" && exit 1
# The formatting of the path export command is intentionally weird, because
# this allows for easy diff'ing

export PATH=\
${KALDI_CMAKE_ROOT}/src/bin:\
${KALDI_CMAKE_ROOT}/src/chainbin:\
${KALDI_CMAKE_ROOT}/src/featbin:\
${KALDI_CMAKE_ROOT}/src/fgmmbin:\
${KALDI_CMAKE_ROOT}/src/fstbin:\
${KALDI_CMAKE_ROOT}/src/gmmbin:\
${KALDI_CMAKE_ROOT}/src/ivectorbin:\
${KALDI_CMAKE_ROOT}/src/kwsbin:\
${KALDI_CMAKE_ROOT}/src/latbin:\
${KALDI_CMAKE_ROOT}/src/lmbin:\
${KALDI_CMAKE_ROOT}/src/nnet2bin:\
${KALDI_CMAKE_ROOT}/src/nnet3bin:\
${KALDI_CMAKE_ROOT}/src/nnetbin:\
${KALDI_CMAKE_ROOT}/src/online2bin:\
${KALDI_CMAKE_ROOT}/src/onlinebin:\
${KALDI_CMAKE_ROOT}/src/rnnlmbin:\
${KALDI_CMAKE_ROOT}/src/sgmm2bin:\
${KALDI_CMAKE_ROOT}/src/sgmmbin:\
${KALDI_CMAKE_ROOT}/src/tfrnnlmbin:\
${KALDI_CMAKE_ROOT}/src/cudadecoderbin:\
$PATH

Tedious Kaldi setup is all done finally. Now, you need some audio file for testing, so simply create a wav file with your voice, saying whatever you want to be transcribed (in English). Make sure to use 16KHz sampling rate w/ 16-bit encoding. Save this file as test.wav. Let's run it!

$ ./recognize-wav.sh /PATH/TO/YOUR/WAV/test.wav

You should see its transcript in the log. Now let's debug, decoding for example, with CLion. As you can see from the log, the main decoding execution command is as follows:
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model//HCLG.fst 'ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:-  >exp/lat.1'

This big command consists of multiple execution piped in a convoluted way, so let's do one by one. The main execution binary nnet3-latgen-faster takes 4 arguments, as you can see from
$ nnet3-latgen-faster

By the way, it is likely that you will get command not found error, so let's do this first
$ export KALDI_CMAKE_ROOT=$(pwd)/../../../cmake-build-debug
$ source ../../../tools/config/common_path.sh

Now, try again
$ nnet3-latgen-faster

The first two arguments are provided from the files, i.e. exp/api.ai-model/final.mdl and exp/api.ai-model/HCLG.fst.

The third argument is features, which is read from stdin from running the command
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:-

We will create this features file separately, by running
$ apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/test-corpus/utt2spk scp:data/test-corpus/cmvn.scp scp:data/test-corpus/feats.scp ark:features.feat

You should see features.feat file created. We can now run the decoding with this file as an input
nnet3-latgen-faster --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/api.ai-model/words.txt exp/api.ai-model/final.mdl exp/api.ai-model/HCLG.fst ark:features.feat ark:lat.1

Here, I simply replaced the third argument as the feature file and the fourth argument as lat.1 output file, without piping to lattice-scale.

Finally, it is time to run this on CLion debug mode. From CLion's edit configuration, select nnet3-latgen-faster. Enter the program arguments, copied from the above and make sure to set the working directory as the current directory, i.e., egs/apiai_decode/s5. You can set the breakpoint in main function, say line 38 and start debugging with CLion. It should all work well!


Tuesday, November 19, 2019

Debug Pybind11 C++ Extension with CLion

OK, so I need to be able to debug C++ part of the code, which is called from Python3 using Pybind11, and I don't want to do it with lldb or gdb, i.e., simple TUI debugger. In fact, I develop C++ extension with CLion extensively, so I want to be able to debug/step within CLion. Here is how to do so.

I'm going to use the Pybind11's cmake-example, since we want to use CMake with CLion.

First, download the repo
$ git clone --recursive https://github.com/pybind/cmake_example.git && cd cmake_example

From now on, I'm going to assume $ROOT is the path for this cmak_example repository.

Next, import the directory with CLion
CLion --> Open --> select $ROOT folder

Add symbols and turn off optimization for debugging by adding the following line to CMakeLists.txt file
cmake_minimum_required(VERSION 2.8.12)
project(cmake_example)
set(CMAKE_CXX_FLAGS "-g -O0")

add_subdirectory(pybind11)
pybind11_add_module(cmake_example src/main.cpp)

Edit Run/Debug configurations for cmake_exmaple as follows:
target: cmake_example
executable: /your/python3/binary
program arguments: tests/test.py
working directory: $ROOT
environment variables: PYTHONPATH=$ROOT/cmake-build

Now, debug with this configuration. You'll probably get version assert error. Let's just comment out that line in tests/test.py.
import cmake_example as m

#assert m.__version__ == '0.0.1'
assert m.add(1, 2) == 3
assert m.subtract(1, 2) == -1

Now, re-run debug with break point at line 4 of src/main.cpp.
CLion should break there!

Sunday, October 27, 2019

WTF? Fix to "string.h not found" in macOS

OK, I love macOS but I sometimes hate the hassle when it comes to Xcode and its toolchains. Randomly I get errors like "string.h" not found... WTF?

Here is the fix. You probably have the Xcode command line tools installed; Run
open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.1X.pkg

where you want to put the right version yourself. (X = 4 for Mojave, 5 for Catalina, etc)

If you don't have the package, try deleting and re-installing the tools and re-try
sudo rm -rf /Library/Developer/CommandLineTools
xcode-select --install

If that does not work, try
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

Hope this fixes!

*** EDIT ***
Sometimes with CLion, you may get similar error. For that, the fix is rather simple. Go to Tools --> CMake --> Reset Cache and Reload Project. That's it!

Thursday, October 3, 2019

Load Makefile Projects on CLion

If you are like me, there is no better C/C++ IDE than Jetbrain's CLion. I absolutely love it and refuse to use any other IDE.

There is one problem, however. CLion only supports CMake projects. There is a way to import Makefile projects using compiledb, but it was not trivial for projects that heavily relies on GNU toolchains.

In this post, I will go over how to import Makefile projects, such as openfst, that cannot be imported properly following Jetbrain's tutorial and tutorial2. There is one trick; when running make, add -w option, which prints entering/leaving directory. Without this option, the generated compile commands will not locate the true path.

That is, run
$ compiledb make -w

By the way, never use multithreading option -jN here because it will mess up the order of files and compiledb will not be able to reproduce all the make commands.

That's it! Happy hacking.

Wednesday, June 19, 2019

Compile GNU Coreutils from Scratch on macOS Mojave

Here, I will discuss how to compile GNU coreutils from scratch. You have two options. I recommend Option 2 below.

Option 1:
First, download the source code from its repo. I will use v8.31
$ git clone https://github.com/coreutils/coreutils.git -b v8.31
$ cd coreutils

Next, clone git submodule
$ git submodule update --init

Next, bootstrap
$ ./bootstrap
./bootstrap: line 470: autopoint: command not found
./bootstrap: Error: 'autopoint' not found
./bootstrap: line 470: gettext: command not found
./bootstrap: Error: 'gettext' not found
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap:        'makeinfo' version >= 6.1 is required

./bootstrap: See README-prereq for how to get the prerequisite programs

Well, I need to get prerequisite programs first. Install gettext using brew
$ brew install gettext && brew link gettext --force

Let's try again
$ ./bootstrap 
./bootstrap: Error: 'makeinfo' version == 4.8 is too old
./bootstrap:        'makeinfo' version >= 6.1 is required

./bootstrap: See README-prereq for how to get the prerequisite programs

To check the version, run
$ makeinfo --version
makeinfo (GNU texinfo) 4.8

Copyright (C) 2004 Free Software Foundation, Inc.
There is NO warranty.  You may redistribute this software
under the terms of the GNU General Public License.
For more information about these matters, see the files named COPYING.

So, I do need to update makeinfo. Note that this is the same as texi2any from GNU texinfo package.
$ pushd && cd ~/Downloads
$ wget http://ftp.gnu.org/gnu/texinfo/texinfo-6.6.tar.gz
$ cd texinfo-6.6
$ ./configure
$ make -j4
$ sudo make install
$ makeinfo --version
texi2any (GNU texinfo) 6.6

Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Now, we are ready to bootstrap again
$ popd && ./bootstrap

Option 2:
Download the distribution source code
$ wget https://ftp.gnu.org/gnu/coreutils/coreutils-8.31.tar.xz
$ tar xfj coreutils-8.31.tar.xz && cd coreutils-8.31

------------------------------------------------------------------
The easy parts are left.
$ ./configure

Finally, we should be able to build it
$ make -j4
$ sudo make install

Happy hacking!

Saturday, May 4, 2019

Copy to Clipboard in Terminal

Install xclip
$ sudo apt install xclip

Copy to clipboard
$ cat some_file.txt | xclip -selection clipboard

Paste to anywhere!

Sunday, April 21, 2019

N-Gram ARPA Model

This post is to summarize how the probability is calculated from ARPA model.

Consider test.arpa and the following sequences of words:
look beyond
more looking on

Let's consider look beyond first. The log10 probability of seeing beyond conditioned upon look, i.e., log10(P(beyond | look)) = -0.2922095. This is directly from the test.arpa file, line 78.

What is, then, the probability of seeing look beyond? Well, this is by the chain rule of conditional probabilities

log10(P(look beyond))
= log10(P(look) * P(beyond | look)) 
= log10(P(look)) + log10(P(beyond | look)) 
= -1.687872 + -0.2922095 = -1.980081558227539, 

which can be verified with python code

import kenlm
model = kenlm.LanguageModel('test.arpa')
print(model.score('look beyond', eos=False, bos=False)


Let's try the next sequence more looking on. Let us start with the chain rule

log10(P(more looking on))
= log10(P(more)) + log10(P(looking | more)) + log10(P(on | more looking))

The first term on the RHS is easy: log10(P(more)) = -1.206319 from line 34

The second term is a bit tricky, because we cannot find the bi-gram more looking from the model. Hence, we use the following formula:
P(looking | more) = P(looking) * BW(more)
where log10(P(looking)) = -1.285941 from line 33, and log10(BW(more)) = -0.544068 is the back-off weight, which can be read off from line 34.

Lastly, the third term is again not present in the model, so we reduce it to
P(on | more looking) = P(on | looking) * BW(looking | more)
where the first term is -0.4638903 from line 80, and the second term is assumed to be 1, because the bigram more looking does not exist in the model

Thus, we get log10(P(more looking on)) = -(1.206319 + 1.285941 + 0.544068 + 0.4638903) = -3.5

For more details, refer to this document. I also find this answer very helpful.

Wednesday, March 27, 2019

Remote Debugging with Eclipse

I am not familiar with Eclipse, as I prefer to use CLion. However, for projects that do not use CMake build system, I will have to use Eclipse.

In this post, I will discuss how to remote-debug with Eclipse. The setup is as follows:

target (local): running the application from, say terminal
host (Eclipse): debug the program as it is running

First, open up the project with Eclipse. Make sure that Eclipse can build the project.

Next, setup gdbserver on the target:
$ gdbserver :7777 EXECUTABLE ARG1 ARG2 ...

Here, 7777 is the port we will use for remote-debugging, EXECUTABLE is the binary file we are going to debug as it is running, and ARG1, ARG2, ... are appropriate arguments for this program.

Next, we setup Eclipse debugging.
From the menu, select Run --> Debug Configurations... --> C/C++ Remote Application (double click) --> Using GDB (DSF) Auto Remote Debugging Launcher (Select other) --> GDB (DSF) Manual Remote Debugging Launcher --> OK. Basically, we have selected "manual" remote debugging configuration here.

Make sure Project and C/C++ Application fields are properly filled, i.e., you should be able to select the project from the drop down menu if the project import/build is successful, and choose the EXECUTABLE for C/C++ Application.

In the Debugger tab --> Connection tab, change Port Number to 7777.

Finally, click Debug button. You now should be able to remote debug with Eclipse.

Happy hacking!