Saturday, September 06, 2008

And with one final click, the project is complete.
I've uploaded the source of my project to Google Code.
This whole project has been quite the experience.

I plan on continuing this blog, don't expect timely updates though.
I'm doing Grad work full-time at WPI (www.wpi.edu) as well as some part time work.

Sunday, August 10, 2008

IT WORKS!!!

And not a moment too soon.
I've finished my day job for the summer and have been pouring over the code the last few days trying to isolate the last two bugs.
I've also received word that I don't have 8 days left, I have 1...so I'm a bit rushed but still hopeful.

As part of my bug fixing for the hit code (which works beautifully now) I was able to remove all of the unnecessary mutex locking from within the cache. This is a nice speed boost as there is no longer any need to lock and unlock when working with the cache. It just works (tm).

I have a fairly rough implementation of the invalidate code that looks something like this:

if(anything changed) {
nuke_the_cache();
}

It works, and it allows for multiple instances to use the same cache.
My next task is to make this a bit more refined (per table invalidation) by using 'namespacing' within memcache. The idea works like this:
For each table, store a "0" in memcache with 'db_name:table_name' as the key.
Each time a table changes, increment that key.
When formulating the key for a query result, include each relevant get('db_name:table_name') in the key hash.

The nice thing about this approach is no matter how much data is in the cache,
invalidating is always O(n) where n is the number of tables involved.
The previous query_cache could slow down a MySQL instance in a write-heavy environment due to searching for and invalidating entries in the cache when a table changed.
With this implementation, everything is non-blocking and when a table changes a quick call to 'memcached_increment()' is all that's needed. Memcached will remove stale entries after they've expired, since once the namespace has changed they will never be read again.

It looks like I will not have time to refactor this into a plug-in during the summer of code timeline. But I've met and exceeded most all of my other goals so I'm not too worried.
I'll have a tarball, a diff, and hopefully a checkin with bzr done tomorrow to prepare for my final review.

Best of luck to all!
We're almost done!

Monday, July 07, 2008

IT WORKS!!!

I've been stressing for the first day of midterm review and accomplished quite a bit.
I've managed to work out some of the bugs to the point that it now caches!
It doesn't hit yet but I might just be able to pull that off by the end of the week.

I've hit my milestone and I'm much more confident about the review this week.
I haven't heard back on my pre-review review from my mentor, but with July 4th holiday weekend I really can't blame him, I celebrated too.

Went to a local amusement park and visited with my friend Eric who I haven't seen much of.
Definitely need to brush up on my air-hockey skills after this project is completed.

Just eight more weeks to completion!

Thursday, July 03, 2008

I finally got the Eclipse debugger to play nicely with MySQL!!!

Steps:
make clean
./configure --with-debug (and I modified my flags with -g as well, tho it's not required)

then under Eclipse I did a 'make all'
used --gdb --one-thread when running
be sure to close the 'registers' pane or else MySQL will crash and require a kill -9

I can now step through my code and see it in action,
it really is doing just what I thought it should!
(tho I found the MySQL GUI tools do a lot of things I didn't know about)

I'll be submitting an early review to my mentor tonight.
I'm not too familiar with submitting a diff so I'll probably attach the full source files as well in case I mess it up.
As of tonight I have surpassed 100 hours working on this project!
In a way I wish all this really was for JUST a t-shirt, adding money to the mix is both an incentive and a curse. I can always continue working on the project, even if I don't make the cut though... :)
I just hope my results add up to the effort I've put in.

Monday, June 30, 2008

As midterm reviews approach I've been working on cleaning up any and all messy hacks so I can have a nice presentation. Last week I started using the Makefile.am instead of bash scripts. This week I tried cleaning it up further so that libmemcached is compiled automatically as needed instead of as a pre-installed dependency. Now any developer can download and compile my modifications without having to jump through hoops.

I'm beginning to feel as though the build tools are more difficult to edit than the code itself.
After numerous 'unable to find header' errors I think I have finally fixed every INCLUDE declaration necessary. I think this has put me a bit behind for the midterm review and caused a bit of stress, but I think this will save me a great deal of time overall.

Had my first segfault while working on the project last week, it was a pain to track down without a debugger. The Eclipse debugger doesn't play nicely with MySQL on my machine at the moment. I've been poking around with Netbeans as well though I haven't tried debugging with it yet.

I am still shooting for being able to cache and hit by next week.
That is my goal for a "Hello World" of sorts.

The one question that worries me on the review sheet is along the lines of "have they earned $2,000?". I have a great deal of difficulty associating time and code with money. I've been coding since middle school... I code in my spare time, I code for fun, I code at work, but coding and money aren't linear in my mind. If you based it just on hours spent, I don't think I've put in 200 hours into this project yet, though I am getting close.

I worry too much, back to coding!

Tuesday, June 24, 2008

Alright! Thanks to my mentor Brian I now have a semi-clean build procedure in place.
I just type 'make' and it compiles both libmemcached and mysql together nicely.

The previous setup was rather hackish and involved some bash scripts I'm not proud of, but it worked. I suppose since Google sent me a book on how to write Beautiful Code, I should really be writing code accordingly.

Just two weeks until 'midterm review' ... *panics*
I feel like I am making good progress despite my lack of time.
I'd really like to have something functional in the next two weeks, bugs are inevitable.
My mentor said that it was important to find my "Hello World!" as he put it when working on this project. I'm not sure how to define that though, it compiles, it runs, it connects to memcached, but is that Hello World worthy enough or am I worrying too much?

I'm going to stick to my goal of trying to have it at least try to cache and hit before the review, but I will likely fall short.

I should blog more, the weekly updates are nice, but a more frequent update cycle would help to keep me on track.

Happy coding!

Monday, June 16, 2008

I'm finally starting to feel at home with my build environment.
I have my timeline, I have my tools, and I have motivation.
I've gotten Eclipse/CDT working just right, I'm making small iterative improvements, compiling, debugging, and then repeating.

It doesn't quite feel like a Monday, I worked from home today, and had a nice long talk with a friend of mine this evening. Everything seems to at ease for a Monday. I only hope they'll forgive me for turning in my report a few hours late ... again.

The code:
-Compiles!
-Runs!
-Doesn't cache or hit.

But there is definite progress, the framework is all laid out, now I just need to fill in the methods and tie in memcache. I should have working software ready for midterm review, and even better software for the final review. It would be nice to see this code in production someday.

Monday, June 09, 2008

I attended the Boston MySQL meetup today and it was wonderful.
(Thanks to Sheeri Cabral for the invite)
Food, beverages, and swag were provided for free and the presentation was great.
The talk went into great detail about database backups and was one of the most practical talks I have attended.

Coding is going well, just wish I had more time to do so.

I think I might actually submit a report on time today!

Sunday, June 08, 2008

I've accomplished quite a bit in the last week.
On top of my day job, I cleaned out my basement, fixed my car, and got Xen working on my server.

As for the project, I feel I have a solid understanding of how the current cache works and exactly what needs to go into the interface. I'm a bit worried about the invalidate_by_MyISAM_filename stuff because I'm not familiar with the storage engine code, but everything else appears to be smooth sailing.

I've solved the interface problem I was researching: by avoiding it completely.
An interface in comments alone is sufficient at the moment with only two cache backends, a simple #define can switch them at compile time without any performance hit. This can fairly easily be improved upon by future developers so long as I document appropriately.

I've got a skeleton Query_memcache class going with all the necessary methods defined and commented with what needs to be done. Since not everyone will be wanting to compile in memcache with MySQL I'm thinking of expanding of the existing Query_cache #define's to integrate the memcache stuff in. I've looked into how to write a plug-in but I'm hesitant to go that route yet, if I can refactor working code at the end of the project without breaking anything then I might submit it as a plug-in as well.

I still need to figure out exactly how MySQL starts up and initializes everything so I can add in a few things to init my code. The tools cat, grep, and less only get one so far...

4 weeks till midterm evaluations and I feel worlds better after this week.
I really feel like I'll make it now.

My short term goal is to have working code that compiles cleanly, connects to a memcached server, and can cache things without breaking anything.
Further goals are more robust code, quality tests, better performance, complete documentation.
Lofty goals would be to refactor it into a plug-in, caching across multiple servers, and make fries.

Estimates:
2 week - code connected in and tested, caches but never hits.
4 weeks - caches, hits, working list of bugs
6 weeks - iron out most bugs, clean things up
I'll try and rework these estimates as time goes on.

The weather here in New England is currently hot and muggy, not the best of weather for coding, but a cold beverage and a comfy chair in the basement makes up for it.

Sunday, June 01, 2008

Still haven't heard back from my mentor but I've been forging ahead. I know how busy things can get.

I've been working on interfacing out the caching system and attempting to refactor existing code.
I want to keep things fast and efficient, but simple enough that the interface applies to any caching system. I've been trying to use Eclipse and the CDT plugin to trace where the Query cache is called and how it is currently used.

For any caching solution there are a few basic methods:
-init() //gets things started
-resize() //resizes the cache to a specified size

-store() //stores an entry in the cache
-retrieve() //retrieves an entry from the cache

-invalidate() //invalidates all entries that match certain parameters
-flush() //flush the whole cache

It would be nice to refactor the interface down to just these few methods but the way things look at the moment it may not be that simple...

As wonderful and useful as Eclipse is with Java, the CDT plugin has a long way to go in terms of stability and features. It hurts almost as much as it helps! *crash*

Reports are due Monday and I really need to put in more time to have something to show.

Sunday, May 25, 2008

Let the coding begin! The Google Summer of Code 2008 coding session has begun.

I am maintaining this blog as a way of recording my progress for others to follow (and to prevent me from slacking off).

I am working on integrating Memcached with the MySQL Query Cache (mentored by Brian Aker from MySQL)

I've done some bonding through the mailing list, blogs, and IRC but I feel I should have done a bit more. I just moved back from college to my parents house and started working full time in Boston (wow what a commute). I've read through the code a few times and feel I have an understand for how it works. It compiles, tests, and runs on my dev box (gentoo 2.6). I've started reading the book Google sent me (SPOILER) and it is wonderful.


In reading through the MySQL codebase I've come up with a list of design decisions I will need to make in the first week of my project.

1)
Implement this as a different query cache storage engine of sorts by building an interface. One version uses the classic single system memory, the other uses Memcached and is controlled by a flag somewhere.
OR
Code this as a straight patch.

2)
How to optimize the functionality of the cache? Tables that update often are actually slower when cached due to the way caches are pruned currently. Is there a cleaner way to do this? Do I pursue this concurrently or wait till I've finished the initial goal of Memcached supported and then try to rework this?

3)
Using one Memcached cluster for several databases would be a good way to save on memory, but there are several things to consider in order to retain 'ACID' principles. Again, do I try and work on this concurrently, or be more cautious so as not to bite off more than I can chew (which I have a tendency to do...)

4)
How much time should I leave at the end for perf testing, regression testing, etc...
Making this work is wonderful, but if it slows down the cache further is it really worthwhile?


Goals for Monday June 2nd:
-Communicate more.
-Have a solid timeline complete with estimated goals that I can rework at each iteration.
-Have answers to these design decisions that I can begin actively working on.
-Have real code, that compiles, and maybe does something (stubs and such).