Skip to main content

· 5 min read

CS3282 THEMATIC SYSTEMS PROJECT II

This is a continuation of the CS3281 OSS project, where we now take on the role of a senior developer, overseeing and guiding a new batch of students contributing to the project. In this project, we are presented with the option to either continue our work on the same project from CS3281 or venture into the realm of larger, external Open Source Software (OSS) projects. I must admit that I thoroughly enjoyed this module, as it provides the freedom to delve into what I am genuinely passionate about. For a more detailed account of my learning experiences, feel free to explore my write-up here. Now, let's delve into some high-level insights I gained from this journey.

Working on OSS: One aspect that struck me profoundly during this project was the dedication and commitment of independent developers who voluntarily invest their free time into contributing to and maintaining open-source software, all without monetary incentives. Witnessing this selfless dedication gave me a new appreciation for the sheer amount of work and effort OSS project maintainers put in. From triaging issues and discussing improvements to reviewing Pull Requests, they handle endless updates and upgrades. I can only imagine that for immensely popular projects, the workload may feel never-ending. However, this commitment has a flip side—the strong sense of community and the collective goal of creating high-quality software.

Being a Good Developer: Throughout the module, we participated in three lightning talks. Personally, presenting has never been my strongest suit, but these opportunities allowed me to practice and improve my public speaking skills.

100% will recommend.

CS4218 SOFTWARE TESTING

Getting into the depths of software testing made me realize that this aspect of software is not easy to manage at all. The goal of testing is to ensure that the software not only gets shipped but also works well. This involves a lot of attention to details like correctness, performance, etc. Just on validation and verification itself, there is a need for deep technical knowledge on how we model the software and what sort of testing we perform. While it doesn't always apply, it seems to me that it's a problem in the industry that we are not yet equipped with such knowledge and willingness to do so. I should say that even with the knowledge, testing itself seems to be difficult and not something that can be done without extra effort.

In this course, I learned a lot from the group project where we worked on implementing shell functions in Java and wrote tests for those functions. The arrangement was quite interesting in that we first implemented the functionality, then wrote the tests, and were also "forced" to practice TDD (Test-Driven Development) by writing code based on test cases that other teams wrote. There was also a "hackathon" where we spent time spotting bugs in other team's projects.

Overall, I think the stress didn't stem from the workload...but our team did work after midnight to finish a submission. At this moment, I have pretty much forgotten most of what I learned in this course, but I think it left me with a good impression of what software testing is about and how complex it can be.

I will recommend it.

CS4240 INTERACTION DESIGN FOR VIRTUAL AND AUGMENTED REALITY

I took this course to learn Unity and VR development. While the course itself does not teach you all the necessary details about AR/VR development, the professor provides high-level overviews and discusses concerns and considerations when developing such applications. The learning from this course is very much dependent on how hands-on you are. There are individual assignments where you can go all-out to complete the rather loosely defined deliverables, or you could do the minimum to meet the requirements. From my experience, there was a 3D game, an AR application, a group VR game, and a final group VR project. The requirements are not cast in stone, and the professor is quite flexible in terms of giving us the choice to make what we want for the final project. I enjoyed that it was very open-ended, and we could decide what we wanted to do. Overall, I think I gained some practical knowledge operating Unity and some basic ideas of how to implement a VR application.

CP4101 B.COMP.DISSERTATION (FINAL YEAR PROJECT)

This is my foray into research. To keep it short, I think research is not an easy job. The biggest difficulty I feel lies in the uncertainty that you are exploring. To construct a research plan that succinctly captures the core concept and the steps to achieve it is like telling a good story. You will need the ingredients, the preparation, and you may stay nervous and unsure all the time because you are not sure who will be in your audience. I'm simplifying a lot there, as the other aspect that I am still grappling with is that the devil is in the details. I think this is a mentally draining module, and I am really going into uncharted territories here. Even if I don't continue higher learning after my undergraduate days, I'm partially glad that I have this experience to understand what research is about.

GEH1045 World Religions

  • SU exercised
  • took the module to clear requirement
  • interesting spread of content covering different religions
  • improved my understanding of the context/origin of different religions
  • SU exercised
  • took the module to clear requirement
  • content and workload is manageable
  • essay writing skill is cruicial
  • I enjoyed watching the movie (a long long time ago 2) as part of the module

· 2 min read

Motivation

Need a ChatGPT API server for quick prototyping? Let's make an endpoint and deploy it with Vercel, for free!

TLDR:

  • you get an endpoint that you can POST JSON to
  • the endpoint will return JSON with the response from "ChatGPT"
  • it's done via OpenAI's official text completion API
  • using the "text-davinci-003" model
  • OpenAI API is not free, but you are given some free credits to try it out
  • you will need OPENAI_API_KEY

Pre-requisites

  • Get a OpenAI API key
  • Have a GitHub account and a Vercel account that is linked to your GitHub account

Steps

If the pre-requisites are met, you can simply click on the button below to deploy. It will guide you through the process of cloning the repository and filling in the environment variable.

Deploy with Vercel


The exact steps are as follows:

  1. Clone/fork my template repository
  2. Go to the Vercel dashboard and click on "Add New..." to import your new repository
  3. Enter the environment variable OPENAI_API_KEY
  4. Deploy the project

DONE!

  • The deployed API will be available at https://your-vercel-project-name.vercel.app/api/chat (or whatever domain that you have configured)

  • Send a POST request with a JSON body of {"message": "Something to ChatGPT"}

E.g. using curl:

curl -i -X POST \
-H "Content-Type:application/json" \
-d \
'{
"message": "Hello, how are you today?"
}' \
'https://your-vercel-project-name.vercel.app/api/chat'
  • The response will be a JSON with the key "response" containing the response from your API server.
{
"response": "I'm doing great! How are you?"
}

Conclusion

That's it! You've now deployed a "ChatGPT" API server with Vercel and OpenAI's official text completion API in no time. If you want to customize your API, you can modify the API code in the cloned repository.

References

· 9 min read

Motivation

Markdown is a markup language that has gained immense popularity in recent years. Besides being used as a convenient way to create content that generates full-blown static websites (via engines such as Gatsby.js and MarkBind), I also started to see widespread usage of Markdown in knowledge management systems such as Obsidian and Dendron.

I write articles like this one using Markdown and I am also actively exploring the use of Markdown in the above-mentioned capacities this year. As a result, I decided to dive deep into how Markdown works and hence this article.

I realized that there are two extremes in software projects:

  • the most popular/battle-tested/enterprise-grade projects that define the "standard" for a particular domain
  • tutorial examples/toy projects for educational purposes

While the former is complex and production ready, the latter is simple and easy to understand. The problem is that there's a huge gap between creating something simple to something complex. Should you want to do it, there's less help and at times you are basically on your own to read the code and figure out how the complex implementation works. Nonetheless, there are values in the toy examples, which is what (and why) I will be going through in this article. A simple, starter-friendly implementation.

To understand how Markdown works, I intend to implement several Markdown parsers according to the tutorials/articles that I can find online and work from simple/naive implementations to (hopefully) a more realistic implementation that can be used in production. This is the first one in the "series" and hence the elaborated introduction.

What is Markdown

As Markdown was born without a well-defined set of rules or tests, it has evolved to have a few different flavors. The most well-known flavor of Markdown is CommonMark, which provides a standard set of rules for the language. Borrowing their Markdown reference as seen here, a common set of Markdown syntax looks like this:

*This text will be italic*
**This text will be bold**
# heading 1
## heading 2
* List
* List
* List
1. One
2. Two
3. Three
---
[link](https://www.google.com)
![image](https://xxx.png)
> and more!

The syntax available in Markdown allows you to style plain text using simple "decorators" such as * and #. It is easy to write and even reads well without the need for a rendered HTML preview.

How it works

The simplest idea for a working Markdown parser is probably using regular expressions. They help you match patterns in a string and you can thereafter replace them with the formatted version. For example, you can grab text surrounded by ** (e.g. **text**) and replace them with <b>text<b>. With this mechanism, we can establish a set of regular expressions and their string replacement strategy, and then just iteratively apply them to the input text. However, it is important to note that this approach has obvious limitations which will be discussed later.

With that, we will examine how a simple Markdown parser can be implemented.

(Note that the following sections will be brief in certain areas that are trivial. You can check the codebase for reference)

Setup

As the title suggests, we will be building our parser using TypeScript. Here are the steps to get started:

  1. Create a new project with npm init -y
  2. Install the dev dependencies with npm i -D typescript parcel jest ts-jest @types/jest
    • typescript is the TypeScript compiler
    • parcel is a bundler that we will use to bundle our code into a single HTML file
    • jest and the related packages are going to help during unit testing
  3. Initialize the TypeScript project with npx tsc --init
  4. Ensure that the tsconfig.json file generated is configured properly

With the setup done, we can start building just a few simple components.

Pattern

The Pattern class in src/Pattern.ts is an abstraction that holds the regular expression and the string replacement strategy. It also provides the method to apply the regular expression.

export class Pattern {
regex: RegExp;
replacement: string;
constructor(regex: RegExp, replacement: string) {
this.regex = regex;
this.replacement = replacement;
}

apply(raw: string): string {
return raw.replace(this.regex, this.replacement);
}
}

As an aside, the above can be simplified by using the public modifier.

export class Pattern {
constructor(public regex: RegExp, public replacement: string) {}

apply(raw: string): string {
return raw.replace(this.regex, this.replacement);
}
}

Rule

From patterns, we create a higher-level abstraction which is the Rule. It is a collection of patterns that are applied in a sequence. The reason why we have a collection of patterns is that in Markdown, there can be more than one way to achieve the same formatting. For example, you can use * or _ to achieve italic text. The Rule class in src/Rule.ts is defined as follows:

import { Pattern } from './Pattern';

export class Rule {
name: string;
patterns: Pattern[];
constructor(name: string, patterns: Pattern[]) {
this.name = name;
this.patterns = patterns;
}

apply(raw: string): string {
return this.patterns.reduce(
(result, pattern) => pattern.apply(result),
raw
);
}
}

RMark

With Pattern and Rule, we can now use them to create a RMark class that will be the Markdown parser. RMark is just a convenient name for "Regex Markdown" and it is defined in src/index.ts as follows:

import { Rule } from './Rule';
import { Pattern } from './Pattern';

const defaultRules: Rule[] = [
new Rule('header', [
new Pattern(/^#{6}\s?([^\n]+)/gm, '<h6>$1</h6>'),
new Pattern(/^#{5}\s?([^\n]+)/gm, '<h5>$1</h5>'),
new Pattern(/^#{4}\s?([^\n]+)/gm, '<h4>$1</h4>'),
new Pattern(/^#{3}\s?([^\n]+)/gm, '<h3>$1</h3>'),
new Pattern(/^#{2}\s?([^\n]+)/gm, '<h2>$1</h2>'),
new Pattern(/^#{1}\s?([^\n]+)/gm, '<h1>$1</h1>'),
]),
new Rule('bold', [
new Pattern(/\*\*\s?([^\n]+)\*\*/g, '<b>$1</b>'),
new Pattern(/\_\_\s?([^\n]+)\_\_/g, '<b>$1</b>'),
]),
new Rule('italic', [
new Pattern(/\*\s?([^\n]+)\*/g, '<i>$1</i>'),
new Pattern(/\_\s?([^\n]+)\_/g, '<i>$1</i>'),
]),
new Rule('image', [
new Pattern(/\!\[([^\]]+)\]\((\S+)\)/g, '<img src="$2" alt="$1" />'),
]),
new Rule('link', [
new Pattern(
/\[([^\n]+)\]\(([^\n]+)\)/g,
'<a href="$2" target="_blank" rel="noopener">$1</a>'
),
]),
new Rule('paragraph', [
new Pattern(/([^\n]+\n?)/g, '\n<p>$1</p>\n'),
]),
];

export class RMark {
private rules: Rule[] = defaultRules;

public addRuleBefore(rule: Rule, before: string): RMark {
const index = this.rules.findIndex((r) => r.name === before);
if (index !== -1) {
this.rules.splice(index, 0, rule);
}
return this;
}

public addRule(rule: Rule): RMark {
this.addRuleBefore(rule, 'paragraph');
return this;
}

public render(raw: string) {
let result = raw;
this.rules.forEach((rule) => {
result = rule.apply(result);
});
return result;
}
}

There are two parts in src/index.ts, one being the default rules and the other being the RMark class. The default rules are the Markdown syntax that we support. As for the RMark class, its render method simply iterates through the rules and applies them to the input text. It also has the addRuleBefore and addRule methods that allow us to add new rules to the parser.

Result

Now, the parser is ready to be called via new RMark().render('input text').

A set of unit tests have been written to showcase the result:

  test('should render bold', () => {
expect(new RMark().render('**Bold**')).toBe('\n<p><b>Bold</b></p>\n');
expect(new RMark().render('__Bold__')).toBe('\n<p><b>Bold</b></p>\n');
expect(new RMark().render('This is **Bold**')).toBe(
'\n<p>This is <b>Bold</b></p>\n'
);
});

By using parcel, a simple HTML example is created to see the rendered result in the browser (by running npm run build and npm run serve in the rmark repository):

rendered page

The source code below for the screenshot above can be found in src/page.ts (index.html is also created for this example to work).

import { RMark } from '.';

const sampleText = `# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6

**Bold**
*Italic*

[Link](https://github.com/tlylt/rmark)
![Image](https://raw.githubusercontent.com/tlylt/rmark/main/static/logo.svg)

This is **Bold** and this is *Italic*.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam ornare erat facilisis odio viverra gravida. Phasellus in finibus libero. Duis eget pellentesque arcu, ut lobortis mi. Praesent vitae nulla sed leo dignissim finibus eget hendrerit arcu. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc vestibulum enim nibh, eu pellentesque tellus fermentum venenatis. Nam consectetur sem a magna mattis, sed luctus purus tincidunt. Nam faucibus tellus sed ligula molestie pulvinar. Mauris facilisis felis ex, eu tempor justo commodo et. Aenean lobortis dignissim diam eget tempor.

Sed pellentesque nulla sit amet tincidunt sagittis. Phasellus eget justo nulla. Cras nisi odio, lobortis nec ante eget, commodo euismod
turpis. Cras id orci dolor. Etiam auctor, nisl luctus volutpat lacinia, turpis orci euismod magna, pharetra eleifend massa metus aliquet
`;

const page = document.getElementById('page');

if (page) {
page.innerHTML = new RMark().render(sampleText);
}

Limitations

Besides offering only an incomplete set of Markdown features, there are other limitations to the simple rmark parser above.

  • The regular expression approach may not be the most efficient way to parse the text
  • The regular expression is difficult to write and understand for complex syntax
  • The current approach does not strictly obey the CommonMark spec in terms of the expected HTML output

The limitations are frustrating because while the HTML generated looks almost there/mostly identical, it is not the same as the one generated by the referenced implementation. You can compare the difference with this markdown-it playground (Click on source in the right pane to view the HTML code as well).

One reason is that in the above simple implementation, the paragraph tags are added even for those that do not need them. For example, the heading tags should not be wrapped. However, to tweak the implementation to be more compliant with the CommonMark spec, the regular expression turned out to be difficult to create. If it stops adding the extra paragraph tags, it starts breaking other specs, such as not adding paragraphs for the block of lines. Additionally, the parser is likely to fail when handling nested Markdown syntax.

Conclusion

And...this is where most tutorials on (regex-based) Markdown parsers end! We are left with a simple parser that can handle a few Markdown features. Hopefully, this time it is going to be different. I am interested to find out how the edge cases can be handled, and how the parser can be made more efficient. So, I will stop here for now but continue with a more advanced implementation in the next post of the series.

References