Sunday, September 20, 2015

High School Programming Competition: A Series


When I was a junior in high school, I attended ACM@UVa's High School Programming Competition (HSPC).   A friend told me about the event a couple of months before it was to be held, but I did not know anyone else in my hometown who would be interested in (or even capable of) driving multiple hours and spending a Saturday looking at code on a computer screen--so, I sent an email off to the organizers asking if someone could participate as a team of one.  They said yes, and I rolled up early on a Saturday morning in March to find I was the only team with fewer than 3 people, with most teams having 4.  Several hours later, I emerged in 5th place out of the 17 teams, solving as many problems as the 3rd place team (but slower).

The thrill of competition is like nothing else; I had to do this again, but I knew I would need a team.  I spent the following summer and start of fall searching for teammates, and found one other person with programming experience and two more who loved to solve problems and had analytical thinking skill.  We spent the rest of fall and spring preparing for the competition, and returned with a 2nd place standing.

As team lead, I found that there is a surprising lack of material explaining how a team should prepare for a high school programming competition (hereinafter, HSPC).  As I now organize the ACM@UVa HSPC, I have received email and verbal feedback from coaches asking how to prepare, so I know content in this area is still lacking.  I aim to resolve this issue by writing a series of blog posts to convey the lessons I learned in preparing a team.

The advice I have will apply primarily to ICPC-style programming contests, like those held by ACM@UVa, UMD, VCU, Cornell, and others.  However, many of these concepts will cross-apply to other contest styles.

The topics I'll cover are:
  1. Finding a team - If I don't have a team, how can I grow interest?
  2. Language Selection - What language should I use, and any good libraries?
  3. Contest Environment - What will the contest computers be like?
  4. Problem Types - How do I solve this problem?
  5. Meeting Agenda - I'm stuck running this! What should I do?
  6. Resources - Where else can I get help?

At the end of the series, I'll post a list of contests organized by region and timeframe.  If you're reading this and have questions or suggestions for other topics (even if you're reading this years after I originally posted), leave a comment!

I believe that competition drives performance and is a necessary part of the educational process.  My experience in high school programming competitions was a major factor in my higher education decision process (both major and school selection) and pushed me to improve.  Because this was such a large part of my life, I want to encourage others considering this activity.

I should note that what I write here in no way reflects anything official about the ACM@UVa HSPC.

Tuesday, June 9, 2015

Eight Books on Math and Computer Science

A friend recently emailed me asking for titles of books I'd recommend to read over the summer, particularly to prepare for computer science and mathematics.  I've adapted my suggestions into this post.  I'd like to note that I've restricted my responses to "non-textbooks;" otherwise, I'd have several more additions that would increase the average page count and price quite drastically.  As such, these books don't have problems to work or present an extreme level of detail, but in many cases they present enough information to provide a strong foundation and context for math and CS classes.

From Mathematics to Generic Programming
Alexander Stepanov and Daniel Rose (on Amazon)

I will most likely write a separate blog post about this book.  I read it during the end of the fall semester and found that it presented a very interesting approach to designing reusable code by utilizing principles from abstract algebra.  It's written to be accessible by someone who hasn't studied abstract algebra yet, which means it also can serve as an introduction to that subject.


CODE: The Hidden Language of Computer Hardware and Software
Charles Petzold (on Amazon)

Four years ago, I wrote a review of this book on RoboDesigners.  At that time, my perspective was that of a high school student and I thought the book was interesting; with the additional perspective of a year of college study in Computer Science, I cannot recommend this book highly enough.

By "building" a computer piece-by-piece from the idea of a relay through developing a simple assembly language, it covers nearly all of the material as the Digital Logic Design course I took, but in an easy-to-read book.  If you comprehend the material in this book, you will be able to coast through DLD.

A Mathematician's Apology
G. H. Hardy (on Amazon) 

When a mathematician with Hardy's stature writes a book on why he studies math, it's probably advisable to read it!  Multiple professors of mine have said it's a book any mathematician should read and I wholeheartedly agree.  It's really short (the printing I've linked above is only 154 pages), but the content is amazing.  Hardy addresses the complaints many have with pure math and embodies the spirit of "doing mathematics for mathematics' sake."  If you are thinking about pursuing a theoretical route in either CS or math, I highly recommend you read this book.

The Code Book
Simon Singh (on Amazon)

I love codes--anything resembling secret or hidden knowledge has a particular allure.  Singh does a great job discussing the past, present, and likely future of codes, ciphers, and cryptography in this book.  Starting with Caesar Shift (what good historic code book doesn't?) and traveling through centuries to discuss the Vigenère cipher, RSA encryption, and the general idea of quantum cryptography, this book gave me a broad understanding of where we are at now with codes and how we got here.  It also includes the clearest description of the actual flaws in Enigma that I've ever read.

In Pursuit of the Unknown: 17 Equations that Changed the World
Ian Stewart (on Amazon)

Every equation has a story--how was this truth discovered, who discovered it, and what exists now because of it?  This book examines 17 equations and their impact on society.  If you're actually reading this blog, the early chapters will probably be old hat, but after (and including) Chapter 4 the content becomes quite interesting. 

Reading Chapter 4 helped me visualize vector fields for Vector Calculus and Differential Equations in the context of planetary movement (and it finally made me understand the age-old analogy for dense objects warping the fabric of spacetime).  The last chapter was also memorable for its description of a particular equation used in high-frequency stock trades, the misapplication of which Stewart claims was partially responsible for the 2008 recession.

The Music of the Primes
Marcus du Sautoy (on Amazon)
It is said that Gauss once asserted that "Mathematics is the queen of the sciences and number theory is the queen of mathematics."  This book outlines the history parts of this "queen of mathematics" that relate to prime numbers.  It includes mini-biographies on the people who made great breakthroughs in the search for a formula for the nth prime; obviously the story isn't finished yet, but it's a pretty neat overview of how primes are important, who made them important, and other related topics.

As someone who finds Number Theory fascinating, I'd recommend this book especially to people who like Project Euler problems--not because it will help you solve the problems per-se, but because it provides some historical background to the people who developed the equations you use to solve Project Euler problems.

To Engineer Is Human
Henry Petroski (on Amazon)

This is a wonderfully dry book--one that might bore some people, but I loved it.  (I actually read this right after reading CODE.)   Petroski discusses the role of failure in design, why the principles of engineering are intuitively "built-in" to the human brain, and how engineers must account for error in their designs.  The most thought-provoking and the single most vivid idea that has stuck with me since reading this book was a connection he drew between computerized design and an increased failure rate in new products (toys, furniture, or even buildings).

The Abolition of Man
C. S. Lewis (on Amazon)

C. S. Lewis is known primarily as the author of The Chronicles of Narnia, then as a Christian Apologist.  So then, why am I asserting that this particular book is relevant to all computer scientists?  In fact, this book is of particular interest to anyone working in a field of applied science.  It provokes thought on why we're solving the problems that we are and why we're even interested in the sciences.  In particular, it discusses the effect of a morality outside ourselves on the purpose of science; the third and final chapter paints a vivid picture of what happens when we reduce mankind simply to "nature."  This book bears re-reading, and perhaps someday I'll write another blog post on why every scientist (applied or not) should read this book.


Thursday, July 17, 2014

Factorization and Number of Divisors

How many divisors are there of the number $1281942112$?  It turns out that determining the answer to this problem is (at most) only as difficult as determining the prime factorization of the number.  In this blog post, I will outline a solution to this (and similar) problems.

The Math

The Fundamental Theorem of Arithmetic guarantees each positive integer greater than $1$ a unique prime factorization.  We write this factorization as:
$$N = p_0^{e_0}p_1^{e_1}\cdots p_n^{e_n}$$
where $p_k$ is a prime number, and $e_k$ is its corresponding exponent.  This provides us with useful information regarding divisors of $N$: any divisor of $N$ must be comprised of some combination of those prime factors (and exponents).  Specifically, we can define the divisor, $D$, as:
$$D = p_0^{a_0}p_1^{a_1}\cdots p_n^{a_n}$$
where the $p_k$ are the same as in the factorization of $N$ and $a_k \in \{0, 1, \ldots, e_k\}$.  To find the total number of divisors, we multiply together the number of options we have for each exponent.  That is,
$$\text{Number of Divisors}\; = (e_0+1)(e_1+1)\cdots(e_n + 1)$$

Example:  Consider $N = 20$.  In this case, $N$ has $6$ divisors; to determine this without needing to list them all, we may note that $N = 2^2\cdot 5^1$.  Using the notation described above, this means that $p_0 = 2,\;p_1 = 5$ and $e_0 = 2\;e_1 = 1$.  Each of our divisors will be of the form $2^{a_0}\cdot 5^{a_1}$, where $a_0$ could be $0, 1,$ or $2$ and $a_1$ could be $0$ or $1$.  Since we have $e_0+1 = 3$ options for $a_0$ and $e_1+1 = 2$ options for $a_1$, we have $3\cdot 2 = 6$ total divisors.  In case you were wondering, the list of divisors is:
$$\{2^0 5^0, 2^1 5^0,2^2 5^0,2^0 5^1,2^1 5^1,2^2 5^1\}$$

The Program

We're not out of the woods yet--we have a formula, but we need to write a program to make use of it.  The first thing our program needs is a list of primes.  I'm going to assume you have a function already that can generate a list of primes.  A prime-listing function is an important tool in any programmer's toolkit, but I'll save that for a future post.

The pseudocode for our program is below:
numberOfDivisors: int N -> int divisorCount
  divisorCount = 1
  for (p = 1 to floor(sqrt(N)) && p prime):
    exponent = 0 

    //Determine exponent of p in prime factorization
    while (p divides N):
      exponent++
      N = N / p


     //Update divisorCount
     divisorCount = divisorCount * (exponent + 1)


    //In this case, there is one prime factor greater than the square root of N
    if (N != 1) divisorCount = divisorCount * 2


  return divisorCount

This is mostly straightforward: We iterate through all prime numbers less than the square root of N.  For each prime, we determine how many times it divides N--this is that prime's exponent.  We then multiply the current divisor count by one more than the exponent.  I have pushed an update to my math GitHub repository that includes a Java version of this algorithm in NumberTheory.java.

If we kept track of which primes divide $N$ (for example, adding them to a List whenever we enter the while loop) this program is easily modified to output the prime factorization of a number.

The Analysis

Before analyzing the performance of the algorithm, it would be best to explain why we only need to use primes less than $\sqrt{N}$, not primes less than $N$.  This is because there can only be one prime factor of $N$ greater than $\sqrt{N}$, and (if there is one) it must have only be raised to the $1$st power.  A proof by contradiction works well here (I'm skipping some rigor, please don't kill me):

Assume that there are two prime factors (not necessarily unique) $p$ and $q$ of $N$, such that $p,q \gt \sqrt{N}$.  Let the product of the remaining prime factors be some integer $m$.  Then we have:
$$\begin{align}
N &= p\cdot q\cdot m \\
&\le p\cdot q \\
&\lt \sqrt{N}\sqrt{N}\\
&\lt N\end{align}$$
This is clearly a contradiction, thus we have proven that there cannot be at least two prime factors of $N$ greater than $\sqrt{N}$.  Equivalently, there may only at most one prime factor of $N$ greater than $\sqrt{N}$.

This explains why we don't need to use primes greater than $\sqrt{N}$: if, after "dividing out" all primes less than $\sqrt{N}$, we are left with a number, then that number must be the single prime factor of $N$ greater than $\sqrt{N}$.

On to the performance of the algorithm.  Assuming we are given a list of prime numbers (and don't have to compute them), this procedure has a time complexity of $\mathcal{O}\left(\pi\left(\sqrt{N}\right)\text{lg}(N)\right)$ and $\Omega\left(\pi\left(\sqrt{N}\right)\right)$, where $\pi(x)$ is the prime counting function and $N$ is the input number.  ($\Omega$ provides a lower bound, and $\mathcal{O}$ provides an upper bound.)  Let's see why.

We have an outer-most for loop that makes one iteration for each prime less than $\sqrt{N}$.  This gives us the "$\pi\left(\sqrt{N}\right)$" part of the bounds.  For the lower bound, we would assume the inside of the for loop executes in constant time, every time.  (That is, we never enter the while loop.)  This occurs when $N$ is a prime number.  For the upper bound, we may assume that we execute the while loop at most $\text{lg}(N)$ times each iteration.  This is because $\text{lg}(N) = \log_2(N) \gt \log_b(N)$ for $N \gt 2$ and integer $b\gt 2$, and $\lfloor\log_{p_k}(N)\rfloor$ provides a fairly close upper bound on the exponent of $p_k$ in the prime factorization of $N$.

The upper bound can be improved by performing some summation and simplification, but it's close enough to show how fast this algorithm is.

Monday, July 14, 2014

How to Learn Haskell

To grow my programming repertoire, I decided to learn a functional language; at the recommendation of a friend, I selected Haskell. Thus far, it seems great.  As a mathematician at heart, I love the way that the notation and language constructs resemble math (list comprehensions, tuples, function composition, etc).  In this blog post, I will outline the major resources I am using to learn Haskell.

To learn Haskell, I am using the ebook Learn You a Haskell for Great Good.  Yes--terrible grammar in the title, but it's (fairly) grammatically correct on the inside.  This is a great introduction to Haskell, although I'd highly recommend prior knowledge of another programming language like Java or C++.

Unfortunately, that ebook is somewhat lacking in practice problems.  It does have examples, but there isn't a true "exercise" section like one would find in a textbook.  This is a common fault with online programming tutorials; to be honest, creating a good exercise set is hard work.  To remedy this problem, I turned to a favorite site of mine, HackerRank.com.  While designed for competitive programmers, this site also has an "introductory" set of functional programming challenges (see here).  These range in difficulty from very easy to extremely hard.  This provides a great compliment to the tutorial I referenced above.

Finally, one last resource I am going to use after finishing Learn You a Haskell is a set of lectures by former University of Virginia student-teacher Nishant Shukla.  Although I have not been able to view them in great detail, they present a great introduction to Haskell.

Sunday, June 22, 2014

Sad news: Project Euler

As many in the Competition Programming circuit already know, the amazing website ProjectEuler.net was taken offline on June 15, 2014 due to a security risk. Fortunately, the problem list was restored and is available, but account functionality has been disabled.

If you have an account on Project Euler, it is highly recommended to change reused passwords on other websites.

EDIT (in case anyone stumbles upon this in the future): The good news is that Project Euler is up and running again!  The bad news is that I managed to lose my login credentials in the shuffle, so I'm starting over.

Monday, June 9, 2014

"Big-O" notation: An Introduction to Asymptotics of Loops

Algorithmic efficiency is imperative for success in programming competitions; your programs must be accurate and fast.  To help evaluate algorithms for speed, computer scientists focus on what is called  "asymptotics," or "asymptotic analysis."  The key question answered by asymptotics is: "When your input gets really big, how many steps does your program take?"  This post seeks to explain basic asymptotic analysis and its application to computing simple program runtime.

The underlying principle of asymptotic analysis is that a program's runtime depends on the number of elementary operations it performs.  The fewer elementary operations, the faster the program (and vice-versa).  What do I mean by "elementary operation?"  By this, I refer to any operation such that the runtime is not affected by the input size.  This is more commonly referred to as a constant-time operation.  Examples of such operations are assignment, basic arithmetic operations (+, -, *, /, %), accessing an array element, increment/decrement operations, function returns, and boolean expressions. 

A First Example

So, a good way of gauging the runtime of a program is to count the number of elementary operations it performs.  Let's jump right in by analyzing a simple program. 

public static int test(int N) {
  int i = 0;

  while(i < N) {
    i++;
  }
  return i;
}

Obviously, this program always returns N, so the loop is unnecessary.  However, let's just analyze the method as-is.

Lines 2 and 7 each contribute one constant-time operation.  The loop contributes two constant-time operations per iteration (one for the comparison, one for the increment), plus one extra constant-time operation for the final comparison that terminates the loop.  So, the total number of operations is:

$$1 + 1 + \underbrace{\sum_{i = 0}^N 2}_{\text{loop operations}} + 1 = 3 + 2N$$
(Notice how I used sigma (summation) notation for counting a loop's operation. This is useful, because loops and sigma notation behave in much the same way.)

Thus, it will take $3+2N$ operations to perform that method, given an input $N$.  If each operation takes $2\times 10^{-9}$ (about the speed of a 2 GHz processor), it would take 5 seconds to run this program for an input of $N=10^10$.

Let's make that easier...

That was a lot of work for such a simple result; is there an easier way to get a similar answer?  Fortunately, the answer is yes!
First, let us introduce something that we will call "Big-O notation."  This is a way of describing the long-term growth of a function.  The rigorous definition of Big-O is beyond the scope of this blog, but the following should suffice:
We say $f(n)$ is $\mathcal{O}(g(n))$ if and only if a constant multiple of $g(n)$ is greater than $f(n)$, when $n$ is sufficiently large.  Simply put, this means that, in the long term, $g$ grows as fast or faster than $f$.
As an example, we can say that $f(n) = 3n+2$ is $\mathcal{O}(n)$, because the function $g(n) = n$ grows exactly as fast as $f(n)$.  Or, we can say, $f$ is $\mathcal{O}(n^2)$, because $n^2$ grows faster than $f$, for sufficiently large $n$.  Basically, this means we can ignore two things:
  1. We can ignore anything that is "small" in the long-term.  For example, if $f(x) = 4x^3 + 2x + 3 + \frac{1}{x}$, everything except the "$4x^3$" part becomes small (in comparison) as $x$ gets big.
  2. We can also ignore coefficients.  That is, we don't have to worry about the difference between $4x^3$ and $x^3$.  As $x$ gets really big, the two graphs are so close that it doesn't really matter.

To apply this to algorithm analysis, this means that we only have to worry about the "biggest time-user," rather than all the individual steps.  For most simple programs, this means focusing on loops.  (In advanced problems, you must account for recursion.)

Next, we recall that a single loop can be represented with a single summation sign.  One can fairly quickly see that a nested loop can be represented with a "sum of sums," or multiple, nested summation signs.  It can be proven that:
$$\underbrace{\sum_N\left(\sum_N\left(cdots\sum_N f(N)\right)\right)}_{k \text{ summation signs}} = \mathcal{O}(N^k\cdot f(N))$$

Important Result:
Interpreted into programmer-speak, this means that a program with nested loops (each executing ~$N$ times) to a maximum depth of $k$ will take $\mathcal{O}(N^k)$ operations to complete said loops. 

Another Example

So, let's apply this idea to a bit more complicated program:

public static int test(int N) {
  int total = 0;
  
  for (int i = 0; i < N; i++) {
    for (int j = i; j < N; j++) {
      total++;
    }
  }
  
  return total;
}
Now we have a nested loop!  Looking at this program, we realize that the "deepest" nesting is only $2$ deep.  Thus, by our important result, we know that this program runs in $\mathcal{O}(N^2)$ time.

This means, that as $N$ gets very large, doubling the input will result in a quadruple increase in runtime.

Other Notes

Obviously, there are more cases that can arise in algorithm analysis, instead of the simple loops given above.  For example, recursion and atypical loops (e.g. loops that double the counter each iteration, rather than adding one) require other methods than the "Important Result" I gave here.  Fortunately, there are a few common designations that arise:
$$\mathcal{O}(\log_2(n)),\;\mathcal{O}(n^k),\;\mathcal{O}(2^n),\;\mathcal{O}(n!),\;\mathcal{O}(n^n)$$
I will note that I have written the above in increasing order of runtime.  That is, an algorithm that runs in $\mathcal{O}(\log_2(n))$ is faster than one that runs in $\mathcal{O}(2^n)$, etc.

One can spend many hours studying asymptotic calculations.  In fact, there's an entire chapter devoted to this in Concrete Mathematics by Graham, Knuth, and Patashink.  (I highly recommend this book to anyone interested in programming; it is, quite literally, the best book I have ever opened related to computer science.)  For a thorough guide of the application of asymptotic calculations to programs, I recommend consulting a good Algorithms and Data Structures text.

Thursday, June 5, 2014

Green's Theorem and Area of Polygons

I am an avid member of the Math.StackExchange community.  We have recently reached a milestone, as our request to create a site blog has been approved by the Stack Exchange administration. I volunteered to write a post which I believe should be useful to competition programmers.

Using Green's Theorem, this post derives a formula for the area of any simple polygon, dependent solely on the coordinates of the vertices.  This is useful for some computational geometry problems in programming; for example, the formula can be used to compute the area of the convex hull of a set of points.