14 posts tagged with "SOC"

Prove that every sorting algorithm must make at least lg(n!) comparisons

January 16, 2022 · 5 min read

P.S. My proof attempt/rewrite based on the solution discussed, uploaded mainly for my own reference.(warning: possibly erroneous)

Claim: Every sorting algorithm must make at least lg(n!) comparisons

the log here is of base 2
Prove by making an adversary argument

When we talk about running time, input size is an important consideration.
The adversary wants to make life difficult for the algorithm, hence it will make choices that are against the algorithm's interest.
Suppose we are given an array of n integers (input size = n), there are n! unique permutations of the set {1, ..., n}.
- They are unique because if they are the same, then sorting the same array will take the same steps.
The adversary could chose any of the permutations. It does not fix the input in advance in the hope that if the algorithm makes too few comparisons, there will be an example of how the algorithm fails to sort the array. In particular, there will be two inputs that are treated the same from the algorithm's perspective. And yet, one of them will not be correctly sorted.
What the (sorting) algorithm can do is to issue a comparison query to ask whether the integer at the ith position is larger than the integer at the jth position.
The adversary takes the following strategy to respond:
- It looks through the current set of permutations(P), makes the comparison as required and takes note of the number(X) of permutations in P that satisfy the query i.e the ith integer is indeed larger than the jth integer.
- It takes n comparisons in the very first run, since there are n permutations in P.
- Suppose the number(X) of permutations that satisfy the query is larger or equal to the total size of the permutations / 2, the adversary returns YES and update the permutation set(P) to the satifying set.
- Else (meaning less than half of the permutations satisfy the comparison query, more than half does not satisfy the comparison query), the adversary replies NO and set the permutations(P) to the larger set of permutations that does not satisfy the comparison query.
- Either way the set of permutations to be examined next decreases. With the above strategy, the adversary is able to decrease at a slower rate, as it always try to keep at least half of the permutations. This makes sense because the goal of the adversary as mentioned earlier, is to come to a situation where there are more than one permutations left.
Since the permutations decreases by at most half for every comparison query issued by the algorithm, there are at least two distinct permutations left. The algorithm will re-order both of the permutations in the same way, and must make one mistake. As all permutations are distinct, it is not possible to sort and re-order two different permutations in the same way to get the correct output.

Concrete Example

n = 3
current set of permutations = {[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}
- 6 different permutations
The algorithm will make less than lg(3!) = lg(6) = 2.584963 = ceil(2.584963) = 3 comparisons.
- meaning it asks at most 2 comparison queries
- trying to sort 3 items with 2 comparisons is not always possible (but ok for finding maximum)
The algorithm asks(0 indexed): is 0th > 1st?
The adversary checks the current set:
- [1, 2, 3] No
- [1, 3, 2] No
- [2, 1, 3] Yes
- [2, 3, 1] No
- [3, 1, 2] Yes
- [3, 2, 1] Yes
X = 3 (count of permutations that satisfy the comparison query)
The adversary answers Yes (Or No, but it does not matter)
The current set becomes {[2, 1, 3], [3, 1, 2], [3, 2, 1]}
The algorithm asks(0 indexed): is 1st > 2nd?
The adversary checks the current set:
- [2, 1, 3] No
- [3, 1, 2] No
- [3, 2, 1] Yes
X = 1
The adversary answers No (to go against the algorithm's interest)
The current set becomes {[2, 1, 3], [3, 1, 2]}
The algorithm now makes a decision about how to sort the array, given the response.
To recall,
- 0th > 1st is True
- 1st > 2nd is False
- So the algorithm does not know whether 0th is larger or smaller than 2nd. Suppose the algorithm choose 0th to be larger and arrange as such: 0th > 2nd > 1st
In the available set, [3,1,2] will be sorted correctly. However, the set [2, 1, 3] does not.
- [2, 1, 3] becomes [2, 3, 1], which is still not sorted!
The algorithm fails at sorting with less than lg(n!) comparisons.

End of University Year 2 Semester 1

December 12, 2021 · 6 min read

Thoughts

This semester has been intense. Much of the stress I attribute to the heavy workload of CS3216. It has indeed lived up to its reputation and I am just glad that it's over. Other than that, I am happy with my selection of modules this semester. Many of them turned out to be modules that I would recommend or won't mind doing. Note that my reviews below are not meant to be comprehensive but just my random thoughts.

Module Review

CS2102: Database Systems

Before taking this class, I had a surface understanding of SQL and databases based on my previous experiences doing web development work. There was a time when I first started working with MySQL and had to create database schemas, I was lost and so wanted to take a proper introductory module about databases. Now that I have taken this introductory-level database module, I do appreciate database systems more and have a better idea of the syntax and the power of SQL. In this module, I learned about the design and analysis of relational databases. We started with learning about ER diagrams to capture the requirements of a database application. With that, we explored SQL statements to define the table structures and query from the database. Advanced features such as triggers and functions were also covered. For the second half of the semester, we looked into analyzing the redundancy and dependency preserving properties of database schemas. Functional dependencies, BCNF, and 3NF were also highlighted. I went through the first half of the semester barely wrote a single line of SQL. That was a poor decision and I only came to appreciate SQL better when I started to work out the lecture examples and tutorial exercises. Overall, I would recommend this module as a great source of general knowledge about databases.

CS2105: Introduction to Computer Networks

The computer network is something that I often treat as a black box. In rare times when I had to fix my internet connection or try to configure my router, I realized that I hardly know what went on in these much-needed devices. This module does a quick overview of the five layers of the internet protocol stack. The content is not math-intensive nor difficult to understand. The assignments could take a while to figure out how to handle the input/output as we had to create programs that somewhat simulated how UDP/TCP server operates. But, they are at a reasonable difficulty and could be solved before the deadline. An important concept that I learned from it is about the reliable transfer of data. Given the unreliable channels of communication, how can we create a protocol that delivers data efficiently, securely, and without errors? It is amazing to me that mechanisms such as sequence number, ACK/NAK, a timer can provide a simple solution to these hard problems. Of course, the details are more complicated in real-world systems. I feel that this module provided me with a rough idea of how certain difficult problems have been solved in practice. For example, how IP addresses can be used to identify the hosts in the network across the globe. Overall, I would say this module's workload is reasonable and not extreme. While content can be technical and plain at times, I do think that it's good to know about them even if you don't remember all the details afterward.

CS3216: Software Product Engineering in Digital Markets

I have posted my reflection about this module here. To say the least, I think different people will experience this module differently. It was a good challenge that I decided to take on and I was fortunate enough to participate. Even though this is a 3K CS module, I learned more non-technical knowledge from the prof, the TAs, my peers, and the various invited speakers. Overall it was a good run and apply if you want a challenge too.

CS3243: Introduction to Artificial Intelligence

I borrowed the textbook for this module and indeed CS3243 only covers a selection of topics in Artificial Intelligence. I find this module interesting as I am sure many will agree, simply because it's just cool to learn how we are going to build intelligent agents with code (and also the hardware). Starting from searching with heuristics to Constraint Satisfaction Problems (CSP), I learned about how solutions can be discovered by representing them well. Later topics such as MDP and Q learning showed me a way that we can build programs to explore and learn. The projects were fun to work on and the course was well-organized. Overall I will recommend this module!

CP3108A: Independent Work

I took this module as a way to consistently contribute to Open-Source projects. In my case, I worked on MarkBind, a tool for generating content-heavy websites from source files in Markdown format. It is also the tool behind the course website for CS2103T. I have my progress report available here. I am happy that I took this module and was given an excuse to properly learn an unfamiliar repository and help to resolve pending issues. It could be somewhat boring if you just want to build features. I spent most of my time trying to understand the code and experimenting with solutions to solve an existing bug. It was also a good practice for me to write my PR descriptions properly and document my code well. Some fun that I had doing this module includes finding out how absurd the idea of Open-Source can be. While we all know the collective effort can bring us to greater heights, relying on someone's passion to contribute without any potential reward is almost against human selfish nature. There were a few times that I ran into old issues in OSS repositories where the author had moved on and the rest of the world still keep on asking questions and wanting help with certain packages. It is not uncommon to see abandoned packages lying around. The workload for this module is reasonable as Prof Damith who is in charge of monitoring your progress provides a frequent update to let you know how much work is needed to fulfill the requirements. Overall, I will recommend taking this module if you want to explore OSS.

GEQ1000: Asking Questions

It is a required university-level module so I took it to clear my graduation requirements. While I admittedly did not spend much time going through the materials, I would say that the module is well structured and conducted professionally. Topics from different disciplines are covered to promote learning about all aspects of questioning. There are many pre-recorded video series to watch as part of the weekly lecture (it seems fine to skip them). The bi-weekly in-person tutorial is very manageable. The tutor for my group was able to facilitate the lesson well. Overall I think this module is low maintenance and well organized, suitable to take it in a semester of high workload (this module is pass-fail-based).

CS3216 "The Last Lecture"

December 2, 2021 · 5 min read

A few years ago, I chanced upon the video "The Last Lecture" given by Randy Pausch. Randy, a computer science professor at Carnegie Mellon University, delivered a lecture titled "Really Achieving Your Childhood Dreams" in which he talked about his life lessons. The one thing that I took away from his talk was his point on failure and setbacks. In particular, I printed a photo of a brick wall and taped it on the wall right in front of my desk to remind me that:

"Brick walls are there for a reason: they let us prove how badly we want things". I was, therefore, excited to enroll in CS3216, a module modeled after a course taught by Randy. Now it's over => finally time to review and reflect! If you don't know what this module is about, see here for the official course website and assignment details. Below I use "A1", "A2" etc to denote the four major module assignments (again, full and official details are available here).

A1

The assignment required us to find a problem worth solving and design a solution for it. Even though we do not have to implement the solution, it was not a breeze. In a short amount of time, we have to (as a team) decide on a probable idea and validate it with the target audience. I have never done so many user interviews before and I have to say that the whole experience drives home the point that users matter. Defining a problem that needs to be solved and having potential customers that are ready to give advice and feedback are not that easy. But, they are truly valuable. Taking our team's project on improving the module registration workflow as an example, our testers pointed out various blind spots in our design, where we thought that a certain feature would be useful to the target audience but none of them actually noticed it during testing. These 'talk to your users" practices made me realize that when we build products for other human beings, it makes no sense to not listen to them attentively. I may have overlooked the users way too often in pet projects or assignments of school modules. A1 was a much-needed reminder.

A2

In this assignment, we had to identify an innovation on the market and make a presentation for it. The added challenge is to do it the Pecha Kucha way (20 slides, auto-advance every 20 seconds). While this is certainly a test of our presentation skills, the technologies that we discussed and heard from our peers were what I found to be interesting. Our team gave a presentation on Neuralink, a "mind-blowing" piece of technology that aimed to create a human-computer interface to drastically improve the lives of paralyzed patients. During the research, I realized that even though the idea of an implant that inserts threads into areas of the brain can deter many of us, the methodical approach with the help of precision surgical robots gave me great confidence that the future may be wild. It was refreshing to look at the current work done by others in the industry and ponder about what counts as innovative today.

A3 & A4

A3 is an assignment that required us to build a progressive web application, from "front" to "back". Not only do we have to think fast, but we also need to act even faster. In retrospect, I think this assignment is all about execution. Within the short 2-3 weeks, the team needs to produce an application that works with a database and handles authentication, among other things. Personally, I got to work on the backend and played around with Django REST Framework. Knowing web development and being familiar with the relevant tools beforehand are going to be helpful.

Following the end of A3, we moved on to our final project phase. A4 is an open-ended challenge to solve a problem with a (creative) solution. I may have hinted in my previous paragraphs that finding the right problem to solve is hard, but an innovative solution is equally hard to come by. I am not going into the details about these two assignments because, at this time in the semester, the fatigue and the workload started to drain all my energy. Jokes aside, I think my experience may not generalize well. Of course, I learned about how to work in a team and deliver a product. But, more importantly, I learned that life is more than CS3216/School/things at hand.

Guest Lectures & Writing assignments

The weekly lectures and writing assignments turned out to be rather enjoyable. I learned many things from the various guest speakers. One notable aspect of it is how "CS" or technical people view problems and optimize for a solution. I joined in the lament that "people problem" is the harder problem to solve. On writing assignments, it was fun to try, over the semester, to appeal to Uncle Soo's taste and get that coveted full marks for each writing.

Final thoughts

Just like the last lecture given by Randy, CS3216 is all about head fakes. What you will learn will be drastically different from the intended outcome. Coming into this module, many people, myself included, were looking forward to practicing and gaining technical skills in a team setting. In the end, this module may have disguised itself as a project module, but it is not all about the projects you build, it is all about YOU.

P.S This essay was written as part of my final writing assignment. All I can say is...What a module!

Claim: Every sorting algorithm must make at least lg(n!) comparisons​

Concrete Example​

Thoughts​

Module Review​

CS2102: Database Systems​

CS2105: Introduction to Computer Networks​

CS3216: Software Product Engineering in Digital Markets​

CS3243: Introduction to Artificial Intelligence​

CP3108A: Independent Work​

GEQ1000: Asking Questions​

A1​

A2​

A3 & A4​

Guest Lectures & Writing assignments​

Final thoughts​