English-Thai Dictionary Chrome Extension - the Migration from Django and MySQL to AWS Lambda and DynamoDB

10 December 2017 aws lambda, dynamodb, side-project

I made the English-Thai Dictionary Chrome extension for my mom back in 2013. Back then, I just learned Django so I decided to put it to practice. I hosted the backend API application (Django + MySQL) in a Digital Ocean $5/month droplet and discontinued it in 2015 after learning that my droplet was compromised. In an effort to revamp my old projects, I looked into finding an easier and cheaper way to host the API. This blog post documents the journey and some of the surprising findings along the way of the migration.

1. How the English-Thai Chrome Extension Works

The Chrome extension displays Thai definitions of the english word double-clicked by the user. Display Thai Definition of Double-Clicked English Word

Display Thai Definition of Double-Clicked English Word

Upon detection of double-click word highlight, the Chrome extension would make a request to the API passing in the English word as a path variable. For example, when a user double-click on the word "hello", the Chrome extension would make an AJAX call to www.somebaseurl.com/lookupEnglishThai/hello

The server would then respond with a JSON containing a list of the Thai definitions like so:

{
  "found": true,
  "search_term": "hello",
  "closest_search_term": "hello",
  "definitions": [
    "สวัสดี (คำทักทาย)"
  ]
}
What happen when a user double clicks on the word 'hello'

What happen when a user double clicks on the word 'hello'

2. Moving from MySQL to DynamoDB

The English-Thai dictionary data was taken from an XML file from the Lexitron project. My tasks were broken down as followed:

  1. Create a new table in DynamoDB that I can use to efficiently lookup word definition

  2. Parse and load data from the XML file to the DynamoDB table

2.1 New Table in DynamoDB

For reference, the XML representation of an English-Thai pair in the Lexitron's XML file is in this format.

<entry>
    <esearch>prize</esearch>
    <eentry>prize1</eentry>
    <tentry>ของมีค่า</tentry>
    <ecat>N</ecat>
    <ethai>ชั้นยอด, ดีพอที่จะได้รับรางวัล</ethai>
    <id>52005</id>
</entry>

MySQL Schema

The MySQL schema used is a 1 to 1 mapping of each "entry" in the XML.

Field Type Sample Value
id integer 52005
esearch varchar(128) prize
eentry varchar(128) prize1
tentry varchar(1028) ของมีค่า
tcat varchar(10) N

DynamoDB Table

DynamoDB is schemaless but it needs to have a primary partition key. I've chosen english_word to be the primary partition key as that's used to query the database. The table is also populated with an additional attribute called thai_definitions with the data type String Set.

Attribute Type Sample Value
english_word string prize
thai_definitions string set { "การจับกุม", "ของมีค่า", "ตีราคาสูง", "ที่ได้รับรางวัล", "ยอดเยี่ยม", "รางวัล", "สิ่งของที่ยึดได้" }

2.2 Parse and load 53,000 items into DynamoDB table

I used boto3's batch_writer to batch up the put_item request and quickly noticed that that the speed of inserting an item into DynamoDB was really slow after about 200 requests. After some investigation, I discovered that my Write Capacity Unit (WCU) setting for the table was 1 per second and hundreds of requests were being throttled. After increasing WCU to 600, the number of throttled requests decreased and I could see from the log that the put_item requests were completed more quickly.

Throttled requests metric showing hundreds of throttled writes

Throttled requests metric showing hundreds of throttled writes

3. Moving from Django to AWS Lambda

It was easy switching from Django to AWS Lambda because I literally had 1 just API endpoint. The logic in guessing the "closest" search term (in case the user lookup a word which is in past-tense eg. "liked", "married" or plural form of the word eg. "guests", "ferries") remains the same. With Django, I had about 7 different files. With AWS Lambda, the number of files reduced to just 2: 1 for lambda handler, 1 for unit test.

I chose the microservice-http-endpoint-python3 blueprint as the starting point for my function because it covers my basic use case of setting up a RESTful API endpoint that reads from DynamoDB.

microservice-http-endpoint-python3 blueprint

microservice-http-endpoint-python3 blueprint

The tricky part came when I was setting up API Gateway as a trigger for my function. I want the user's search term to be a part of the url path /lookupEnglishThai/hello instead of a part of the query string /lookupEnglishThai?search_term=hello

It was not so obvious from the documentation how to configure a path parameter, but after clicking around, I figured out that it can be achieved by creating a child resource, with {path_parameter_here} syntax as per the screenshot below.

Creating resource with path paramter called 'search_term'

Creating resource with path paramter called 'search_term'

4. Closing Thoughts

The setup works pretty well and I'm happy with the performance. I don't have the stats of the old API call but with the new API, the total time taken by the browser is about 330ms, half of which is DNS lookup. I'm making that HTTP request from Singapore and the AWS server is in Singapore. The latency shouldn't be noticeably different for users in Thailand.

Network Response Timing

Network Response Timing

I didn't know a thing about how to design a table structure in NoSQL database prior to this but watching AWS DynamoDB Deep Dive from Pluralsight helped ease my fear a little. I had a lot of fun and I'm really glad I finally fixed the API endpoint. There are three other old projects left that I want to do a spring cleaning on and they're coming up in a bit!

You can find the source code of this project on Github: https://github.com/veeraya/lexitronapi

Free and Easy Ways to Prevent Heroku Free Dyno From Sleeping

2 December 2017 aws lambda, heroku

Heroku offers free 550 dyno hours to all users. The free dyno, however, sleeps after 30 minutes of inactivity. This is really not ideal if you, like me, want to run a small free application for others to use without incurring too much cost. This poses a problem for me recently since I’ve been trying to use Heroku as an affordable and easy hosting for my low-volume Youtube-to-MP3 LINE bot.

Here are some of the ways you can prevent your Heroku dyno from sleeping.

New Relic APM Heroku Add-On

New Relic APM Heroku add-on Wayne tier is free. New Relic add-on will ping your application at an interval, preventing it from sleeping.

heroku addons:create newrelic:wayne

Pricing: Free plan is sufficient for basic usage.

Approximate time spent: 5 minutes

Pros: Easy to set up

Cons: -

Personal Review: I’ve used New Relic add-on with my Ellen Ticket Tracker a few years back and I remember it working just fine.

AWS Lambda

AWS Lambda is a serverless compute service, meaning you can run your code without having a backend server.

To keep Heroku dyno up, we can set up a lambda function to ping our service at a specific interval.

The ready-made canary-blueprint of AWS Lambda does just that. The official AWS Lambda Scheduled Event tutorial will take you through the inital process of setting up the function all the way to sending email notification. It took me about 10 minutes overall to setup and test it out.

Lambda Canary Template.

Lambda Canary Template.

Pricing: Free tier of AWS Lamba is 1 million requests. See pricing details here.

Approximate time spent: 10 minutes

Pros: Configurable ping rate, good introduction to AWS Lambda

Cons: Slightly move involved than using managed ping service like New Relic

Personal review: I use AWS Lambda for pinging my LINE bot application on Heroku. I highly recommend it. I like that I can configure exactly how frequent the ping will be as well as the url to ping. If configurability is what you’re looking for, AWS Lambda is the service for you.

How I Built A Task Notifier With AWS Lambda and Telegram Bot

11 November 2017 aws lambda, chatbot, side-project

Last weekend I set out to create an app that would allow me to create a reading queue and nag at me daily to action on the top entry until I’ve marked it as read. This stems primarily from my tendency to have a loooong list of articles I want to read but never getting to read them.

1. The Goals

  1. The ability to add and remove task I want to accomplish in a queue.

  2. Scheduled notification which sends me the top task from the queue daily.

  3. I should be able to easily mark the task as “done” (i.e. mark the article as “read”) and it should drop off the queue.

2. The Architecture

How it works

How it works

3. The Process

3.1 CRUD for my reading list

I spent quite some time trying to implement simple CRUD of reading task list but it was boring as hell so I switched to using Todoist as the “datastore” for my reading list. I had been using Todoist on Android for the past few months and I was pleasantly surprised to find out that it provides dead-simple APIs.

Screenshot of Todoist app in my phone.

Screenshot of Todoist app in my phone.

3.2 Building Lambda Function and Setting Up Scheduled Event

Spinning up an EC2 instance just to run Cron jobs seems a lot like an overkill for this very simple task. All I want is a daily job that calls Todoist API to fetch the content of the daily task I’ve queued, and call another API to send me the task of the day. AWS Lambda seems to fit the bill since I can just upload my function and makes it runs upon scheduled CloudWatch Events which acts as a trigger.

Screenshot of the CloudWatch Event used as the trigger for my lambda function.

Screenshot of the CloudWatch Event used as the trigger for my lambda function.

3.3 Setting up Telegram Bot to notify me of the task of the day

Telegram’s Bot creation is real simple. I chatted with BotFather and my bot was created within seconds. You’d get an API access token once the bot is created.

Screenshot of the Telegram bot creation process.

Screenshot of the Telegram bot creation process.

3.4 Creating an API using Amazon API Gateway and Lambda function to mark the task as done

Amazon API gateway let you easily create a REST API to expose your lambda function to a HTTP call. I wrote another lambda function for moving the task out of the queue to an archive folder in Todoist and configured the API Gateway such that a user can click the link to mark it as done.

Lambda function with a API gateway as the trigger.

Lambda function with a API gateway as the trigger.

4. The End Product

What I received everyday at 10AM

What I received everyday at 10AM

5. Closing Thoughts

I’ve got this task (article-of-the-day) notifier running for a week and so far I’ve read 4 articles from the reading queue. That’s a big improvement.

On the technical sides, now that I get the hang of the lambda function, I find it to be incredibly useful and easy to set up. I have configured another lambda function to ping my free heroku instance so it won’t sleep after 30 minutes of inactivity. I’m planning to put the code for this project on Github once I clean it up.