Speech to Text is finally ready – Whisper review.

Many years ago I developed and published an Android app called “Made Up Stories” which I used to record the bedtime stories I told to my young son. Many of the stories were sanitized versions of movies and novels, but I also had a lot of fun creating my own characters and plots on the fly. Years later, I have a collection of over 300 stories saved, and I started to think that it would be nice to turn some of the original ones into illustrated children’s books – but the time it would take to transcribe these audio files was daunting. I am a reasonably fast typist, but not quite up to the speed of spoken word.

Transcription software

In 2019/2020 I decided to try out transcription software, so I fed a sample audio file into Mozilla Deep Speech and Google Speech to Text, among others. Unfortunately, due to my South African accent, none of the transcribers I tried had any accuracy, and I shelved the idea. Literally every second word was incorrect. Also, Deep Speech required a very specific audio sample rate, and Google Speech to text isn’t free (I had a free trial). *they may have improved by now, I didn’t check.

Discovering Whisper

Last week I saw an article on The Verge, which was singing the praises of “Whisper“, the new open source transcription engine. It was incredibly easy to set up, and best of all, ran entirely on my laptop, for free. After installation it is a simple one-liner on the command line to get your audio file into text format. Best of all, the thing is really accurate – I read in their paper that the model was trained on a more diverse range of speech, and uses a different approach to others. Well, for me at least, it works!

There is still a lot to do with the transcriptions, which come out as one long line of text – with barely any punctuation.. And don’t get me wrong, there are still some errors (admittedly I was using the “tiny” model – only 350mb, the “Large” model is apparently even more accurate but takes up 5-6GB of space.)

Finishing touches

Well, I have cleaned up the first story, “Pinky the Amazon river dolphin visits the Mesozoic”. My wife is an amazing artist, and has agreed to illustrate, but I wonder, could AI help with that as well?

Dinosaurs on an ancient river bank – Craiyon generator
Underwater cave – Stable Diffusion

Maybe not.. (The dolphin images that I have managed to generate so far have only succeeded in making me feel ill.) I have signed up for “Midjourney” (which is DOWN at the time of writing). Apparently it is the best of the bunch – so who knows, maybe I can automate some backgrounds to save time illustrating the stories.

Why you should hire a juggler (Or the similarities between learning to juggle and learning to code)

Learning new juggling moves is a task that benefits from breaking down into smaller, more achievable goals. In this respect it is very similar to learning a new coding language or framework.

Take my favorite 3 ball juggling move, for example: the Box.

In order to learn this we need to work out what is going on. There are three separate throws, two going up (one in each hand) and one going across.

Steps to learn the box:

1. One ball, up, across, up again

2. The hard part on both sides, with two balls (simultaneous up and side throw, not very intuitive but once you have it, it feels great)

3. Go for it. if you spent enough time on step 2, you can do it!

By the way, the example above is from https://libraryofjuggling.com/

Similarly in programming we have to break tasks down into pieces, only putting them together at the end. Recently I got a new job building an online app using Flask, JavaScript and a little bit of JQuery, Jinja and something called “Tabulator” for web tables (and Bootstrap for buttons).

Learning

Previously I had only worked in Android (Java) and Arduino (C++), so at least the syntax made sense on the front end, but in order to get the basics down I started by going through the FreeCodeCamp examples. Once I knew how to call a function, parse an array, set up a library and so on it was time to move on to the next step – building a basic program which works.

Finally I gained enough understanding of the inner workings that I was able to add/remove features without being afraid of breaking things and could start to enjoy building something.

Deploying

In Juggling, I learned that being able to keep up the pattern without dropping is only part of the process. As soon as I wanted to show off my new moves I was faced with another set of problems to solve. After some more practice, I learned that every juggling routine needs a flashy start and finish (to get people’s attention). It’s also important to be able to juggle without looking at the pattern, while talking at the same time.

In the software world this is called deployment. Personally I like going with my own Ubuntu based server (hosted by Digital Ocean ) but there are many options out there, each with their own different requirements to learn. The new job I mentioned is hosted on Google Cloud.

Conclusion

Conclusion and take away message: practice is always an important part of learning anything, but the basics need to be covered at first and you should definitely hire a juggler.

Check out my CV! (soon to include Python, Flask, JavaScript, Jinja, Google Cloud Services, and more)

ESP8266 libraries treasure trove

While looking for a new WiFi manager for my SmartPoi project, I stumbled upon a great resource: https://www.arduinolibraries.info/architectures/esp8266 – a list of Arduino libraries broken down by architecture.

Just having a quick look, I have noticed some great libraries to help improve my ESP8266 based projects (I haven’t had a chance to look at these yet, but looking forward to it!):

some interesting esp8266 libraries:

  1. https://www.arduinolibraries.info/libraries/esp8266-timer-interrupt
    – interrupts for ESP8266! So useful.
  2. https://www.arduinolibraries.info/libraries/esp_eeprom
    – Speed up EEProm and add wear levelling
  3. https://www.arduinolibraries.info/libraries/firebase-esp8266-client
    – Firebase? On ESP8266? Sounds like a challenge!
  4. https://www.arduinolibraries.info/libraries/mini-grafx
    – graphics library, not sure which displays this supports…
  5. https://www.arduinolibraries.info/libraries/process-scheduler
    – process scheduler, is this easy to use though?
  6. https://www.arduinolibraries.info/libraries/restfully
    – hopefully this is better than doing it manually
  7. https://www.arduinolibraries.info/libraries/rich-http-server
    – more http requests wrappers
  8. https://www.arduinolibraries.info/libraries/settings-manager
    – store settings in .json
  9. too many WiFi config libraries to list here, I saw at least 15!

These are only the few I was interested in personally, the site lists 244 libraries for ESP8266. Check it out!

Advanced Arduino editing part 2

Now that I’ve had a chance to play with it for a bit, I really like VSCode a lot.

Here is my current setup for Arduino editing:

  1. Added arduino-snippets plugin (autocompletes arduino code such as “millis()” or “loop()”
  2. in c++ highlighting plugin (installed by default), disabled error squiggles
  3. right click and open folder (the one with the .ino files in)
  4. terminal (within VSCode)
  5. arduino –upload main.ino

The above setup does require Arduino to be installed and set up separately. The upload command for example is part of arduino install and uses the last settings (board and com port for example) that you set inside Arduino IDE.

I still think I might move over to Platform.io eventually, but at least with this setup I don’t have to re-do all of my code.

Advanced Arduino Editing

My favorite practical Arduino project is getting a bit large for Arduino IDE, so I am looking to move the development over to a “real” IDE. During the past few months I have enjoyed using Visual Studio Code (on my laptop running Xubuntu) for HTML editing. Since they have plugins I thought I would give it a go.

First attempt: Platform.io

Platform.io has some impressive marketing out there. They also support esp8266 which for me is a must. Unfortunately it is another setup which requires an entire rewrite of the code (renaming all .ino files to .cpp for example) and for the amount of projects I have lying around and return to regularly that’s a definite pass. *UPDATE: unless there is no other choice!

Arduino Plugin:

There is a nice plugin for Arduino, however. It does require you to already have Arduino set up on your system (check). To set up the VSCode plugin you have to point it to the Arduino installation folder.

https://maker.pro/arduino/tutorial/how-to-use-visual-studio-code-for-arduino

The ESP8266 is also supported! (you just need to add the board repository, kind of like how you add it to Arduino)

I also found a plugin to upload spiffs (esp8266 file system). https://marketplace.visualstudio.com/items?itemName=kash4kev.vscode-esp8266fs *UPDATE: unfortunately spiffs is now obsolete and there is no upload option for LittleFS, the upgraded replacement.

So far in testing the whole thing works, uploading sketches just the same as Arduino, except now I have tab completion(the IDE completes your commands for you when you press <TAB>) and advanced syntax highlighting. Best of all, VSCode comes with Dark Mode!

I think I might enjoy it, going forward. The only issue so far is that VSCode is no lightweight, it seems to be using a fair amount of resources to run. Nowhere near Android Studio, however.

*UPDATE: Unfortunately the Arduino plugin somehow fails to support multiple .ino files for an Arduino sketch (!!!!!). Here is the bug report: https://github.com/microsoft/vscode-arduino/issues/271

This is simply unusable as a result (sigh)

Guess I’m having another look at Platform.io – for lack of alternatives!

If you stumble upon this post and have a solution please send me an email tomjuggler at gmail dot com

PS I did try the alpha version of Arduino: Arduino Pro IDE https://www.arduino.cc/pro/software but it’s just that, alpha. 106 Errors in my code? But it still compiles and uploads fine?

Whatever happened to processing.js

I have mentioned this before, Processing is the greatest tool to code for me because it provides easy access to so many creative coding options. The main reason I love it so much is because it is cross-platform. I use the same code on the web, desktop and mobile apps (Android).

Now one of those options is less accessible for many new coders. Namely, processing.js.

What is processing.js?

If you don’t know, processing.js is simply a way for your processing (JAVA) code to be translated into JavaScript and run in a browser canvas window.

Why is it cool?

I love processing.js because it’s the easiest way to use the same code and get web-based sketches running on my own server. Just include a processing.js file and the processing sketch (with a tiny bit of html) and it works.

What to do now?

Processing wants us to start using P5.js which is the functionality of Processing but using JavaScript syntax. I am mainly an Android and Java programmer, so for me this is an unnecessary step (mainly involving changing all int’s and floats to var) and I personally will continue using processing.js.

But they took the website away!

Now we come to the reason for this post: they took away the website! If you try and go to processingjs.org website now you will find it’s been taken down by the maintainers. Only the github code is left for posterity. Well luckily there is always the wayback machine: https://web.archive.org/web/20180510071709/http://processingjs.org

Processing.js is alive and well. It’s still working on my site, you can see a load of them all over this site, and on my cv even.