Solving a simple problem with ML

ChatGPT has been helping me out

A fair amount if I am going to be honest here. The problem for me always comes down to syntax. Currently focused on Full Stack IOT development my day consists of switching between coding in Python (Flask api and web back-end), JavaScript (with JQuery for front-end) and C++ (embedded).

I find that one of the more interesting parts of any Full Stack IOT code base comes down to dealing with requests and responses. A website has a login page, sends a request to the server. An IOT device has a key to access the online api, sends a request to the server. The server needs to respond but first check the input is valid (sanitize it).

Python/JavaScript/C++ all have their own ways of working with text input. In JavaScript for example you need to trim() but in Python it’s strip() – and in C++ (embedded) it’s a bit more complicated operation involving char arrays. The point is that when switching around between languages I sometimes find myself going blank on the specifics of each approach. That’s when ChatGPT comes in handy for me – as a helper, faster than a google search.

I recently came across an issue which was helped by ML in a different way – namely parsing unstructured text input.

The Problem

I had a series of spreadsheets which I was extracting data from (using Pandas) but the date format was unfortunately unstructured. For example:

12th, 13th Dec 2022 and 5th-8th Jan 2023

In order to continue with the project I needed the date in Pandas date format

2022-12-12

If you give it “12th December 2022” as an input, Pandas can do this for you, but it couldn’t cope with the format in the spreadsheets. Regular expressions weren’t going to do the job either, considering the range of formats. So I asked ChatGPT. The prompt:

Please can you parse the following dates as separate dates in list format, suitable for python pandas usage:
12th, 13th Dec 2022 and 5th-8th Jan 2023

The reply:

['2022-12-12', '2022-12-13', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08']

As you can see, this is exactly what I was looking for! Unfortunately my ChatGPT free trial has ended – not that I am opposed to paying for software but that’s not the only issue stopping me from using the api for this. I have been burned in the past by proprietary software solutions which have arbitrarily changed and/or cancelled a product which I relied on. This is a simple use case but I can imagine myself taking advantage of it more and more, then OpenAI pulling the rug (or just raising prices) and leaving me with a whole bunch of useless code.

The Solution:

After looking around I found NLPCloud (no affiliation, just it looks cool and works for me). They use Open Source Generative AI models which in theory I could replicate and host anywhere. I decided to give it a try.

It works! The free plan api key is now plugged into my program and when my program encounters a date format it can’t deal with, a query is sent.

response = client.chatbot(f"Please can you parse the following dates as separate dates in list format, suitable for python pandas usage: {dates_unformatted}", "Human asking AI for help with a text summarization task. ", [])  #text, context, history

The response is exactly the same quality as the ChatGPT one, using the “finetuned-gpt-neox-20b” model.

What simple things has ML been helping you out with?

K8 Virtual Juggling with working remote

K8 are my favourite type of LED juggling equipment. Recently I updated my virtual juggling web app to include a working remote control – just like the real thing.

The App

So the app consists of an animated juggler, a remote control and “Change your pattern” button.

While the juggler is juggling, you can press buttons on the remote to change the juggling ball colours (accurately emulating the actual K8 equipment settings)

The “Change your pattern” menu is a large list of different juggling patterns, which when selected will change the animation displayed.

How it works: back end

The back end is based on Flask. I am using the beautiful soup library to fetch the menu of juggling patterns from the awesome library of juggling website. Once selected, the gif of the pattern is fetched. It is then processed (a script inverts the colours and makes the juggling balls transparent) – if that hasn’t already been done for the particular pattern.

How it works: front end

The juggling animation and remote control are written with P5.js. The juggling ball colours are implemented as a background which shows through the transparent balls. The button co-ordinates are relative, so work on any size screen (looks best on Desktop)

Magic Poi 2022 update

Current state of Magic Poi – and some ideas for the future.

First of all, an announcement: Magic Poi is now available for ESP32, as well as ESP8266 architecture. This will bring improvements in performance. I plan on continuing support for both, and in the near future a combined code base will be provided.

I am going to list current features here, and improvements I plan to implement.

On-board images:

  • I have partnered with EnterAction, an awesome Sydney based fabrication company who are taking over the hardware development from now on. Improvements will include an SD card add-on for limitless on-board storage. This will require changes to the code, as currently the maximum is 52 images supported.

UDP streaming:

  • this is a defining feature of Magic Poi. The images are generated off-device, and “streamed” via UDP pixel by pixel. I plan to keep improving this functionality but change it to not be the default mode. Due to WiFi interference the UDP stream is sometimes interrupted, making the LED’s stutter, so work is being done to mitigate that.

“Timeline” – images changing in time to music:

  • currently there is a desktop app to generate the timeline (and associated images) and save as a zip file, which needs to be uploaded to the Android app in order to be “streamed” to the poi. I plan on changing this functionality to rather happen in the poi code, thus avoiding the WiFi interference problem. The timeline editor will be made into a web app, with the option to download directly to the poi.

Station mode:

  • poi connected to a router provides more stable WiFi than the current AP mode. I have made a start on providing a way to use this mode.

Online account:

  • like a PlayStation or Kindle, there is a benefit to having a cloud aspect to any product that consumes media. The Magic Poi website is going to be a place where you can upload and share images and timelines, as well as interact with other poi owners. All uploaded images will be private of course, unless shared. I have made a start on this cloud aspect, with an option in testing to download images directly from your cloud account to the poi. The ultimate goal is to be able to sync any two pairs of poi with two clicks!

Android app:

  • Still not working: text to image (stream words directly to the poi).
  • Once the online portal is finished, this will be added to the app, so shared images and timelines can be viewed without need for a web browser.

The above is a small part of the list – thanks to EnterAction taking over the hardware development side, I will have more time to devote to the software improvements. We also plan on adding a battery level indicator, and a higher power battery for more play time.

Thanks for reading!

Keep an eye on this blog, and sign up to the newsletter (if you haven’t already) for more updates as Magic Poi moves forward towards it’s inevitable crowd funder launch!

UPDATES:

Sign up for our update alerts:

Unpacking an inverted index with Python

Lately I have been doing a lot of data processing in my job for UCL. I recently came across a problem which didn’t have a ready made solution, so here is my take on it.

Inverted Index

An inverted index is simply a list of all the unique words in a document, labeled by their position. Something like {‘Hello’: [1], ‘world.’: [2]}. Except that they are rarely in order and there can be multiple instances of any given word in a document.

I was given the task of ‘unpacking’ one of these (actually thousands, but if you can do one..). The inverted index came in the form of a dictionary of words and their positions, returned from an api call. Since I couldn’t find a ready made solution, here is my take on it, in Python 3.

The solution

You can try it out with the included example. I hope this helps someone with a similar problem to solve – do let me know in the comments if you have a different solution.