Sunday, May 25, 2008

Let the coding begin! The Google Summer of Code 2008 coding session has begun.

I am maintaining this blog as a way of recording my progress for others to follow (and to prevent me from slacking off).

I am working on integrating Memcached with the MySQL Query Cache (mentored by Brian Aker from MySQL)

I've done some bonding through the mailing list, blogs, and IRC but I feel I should have done a bit more. I just moved back from college to my parents house and started working full time in Boston (wow what a commute). I've read through the code a few times and feel I have an understand for how it works. It compiles, tests, and runs on my dev box (gentoo 2.6). I've started reading the book Google sent me (SPOILER) and it is wonderful.


In reading through the MySQL codebase I've come up with a list of design decisions I will need to make in the first week of my project.

1)
Implement this as a different query cache storage engine of sorts by building an interface. One version uses the classic single system memory, the other uses Memcached and is controlled by a flag somewhere.
OR
Code this as a straight patch.

2)
How to optimize the functionality of the cache? Tables that update often are actually slower when cached due to the way caches are pruned currently. Is there a cleaner way to do this? Do I pursue this concurrently or wait till I've finished the initial goal of Memcached supported and then try to rework this?

3)
Using one Memcached cluster for several databases would be a good way to save on memory, but there are several things to consider in order to retain 'ACID' principles. Again, do I try and work on this concurrently, or be more cautious so as not to bite off more than I can chew (which I have a tendency to do...)

4)
How much time should I leave at the end for perf testing, regression testing, etc...
Making this work is wonderful, but if it slows down the cache further is it really worthwhile?


Goals for Monday June 2nd:
-Communicate more.
-Have a solid timeline complete with estimated goals that I can rework at each iteration.
-Have answers to these design decisions that I can begin actively working on.
-Have real code, that compiles, and maybe does something (stubs and such).