21 Jul 2005

sorry about the geek topics

i've just noticed that the last 10 posts i've made to this blog are all on geeky topics. i'll try to intersperse my blog with non-geeky stuff more often. but right now, most of my spare time is consumed with this, so its kinda hard to talk about something else!

well, i'll make an attempt.

this saturday, my supervisor (and lab head) is having a bbq party at his ranch (i don't know, apparently it's huge.) back in the good ole days, he was famous for holding huge bbqs for the lab and his various commercial offsprings. that was when the dot com boom was all well and good, and money for parties seemed like a good idea. i missed out on one last year because i was in the US, but this year, i'm not going to miss it!

on saturday night, i'm going to a good friend's birthday party which i was told i had to bring a sleeping bag and swimmers. also it will involve alcohol and beaches (not the sand kind -- but the pebble kind.)

in other news, my flatmate and i have been house hunting for a place to stay after next month. there ain't that many properties in cambridge right now, which is quite annoying. and of those available, there are a lot of people vying for them! hopefully i won't be homeless when our contract expires!

also, my tolerance levels have been truly tested these couple of weeks. i can't reveal too much about it, but very soon, it will be all over. between then, i will have to train to become the zen master -- watch me ignore my surroundings and blend into the background like a ninja! oh yeah, and watched fight club for the 50+th time or so, still extremely enjoyable! check out the NIN mtv done by the director of fight club. looks incredibly awesome!

... Read More

21 Jul 2005

pyobjc hacker goes all javascripty

emm, i'm subscribed to this bob ippolito's blog where we talks about stuff relating to pyobjc. recently, his posts have turned all javscripty and it seems like he's working on some big javascript project. i still maintain that it is near impossible to do a large scale project in javascript.

but people have done things like fckeditor, google maps, google suggest and tiddlywiki. bob ippolito has been writing a framework called mochikit that is modelled on some oo-concepts in python that makes javascript slightly more bearable. their sortable table example does look quite good!

... Read More

21 Jul 2005

parsing apache logs in python

recently, i've been intrigued with getting more data from my apache logs than current apache log analysers allow. i'm currently using modlogan, while it is powerful and fast, it is a bit limiting on what i can do. for instance, i want to do queries on the log data instead of looking at a bunch of spam referrers in my logs.

i want to do queries like:

"tell me what user agents most referral spammers use"
"tell me the number of referral spammers who do not get a 403"
"in the last 7 days, how many people were referred from apple.com and rank them by their user agents"

other things i want to be able to do is have a list of top referrers, and being able to click on each one and see the break down of statistics from each referrer. it would make procrastination a lot more interesting.

so i've hacked up some neat modules over the weekend to eventually end up being a collection of components that allow me to make interesting statistics from my logs. right now they're in my subversion repository viewable at here. (NOTE: the code is really sucky!! but it does the job - i'll rewrite it when i figure out what exactly works and doesn't work!)

the system is a bit odd and relies on sqlite. i chose sqlite as a storage medium because it allows construction of sql queries which after being satisfied that is a good way to search true my apache log, i'll write the db backend to use pgsql (which should be relatively easy given that pysqlite conforms to the python DB-API PEP, and i can only assume there's another postgresql python binding that does the same!)

the backend simply looks at each log line, extracts all the vital components using an ugly looking regex (in apache.py), then runs through a series of regex patterns on specific components to "tag" the log line.

for instance, i go through the referrers to look for search engines and tag those lines as "ref_search" and also ones with spam referrers "ref_spam". other interesting and fun stats include looking at user agents and determining what OSes and browsers people are using. those are tagged like "os_series60" (YES! someone wasted their bandwidth by looking at my webpage on a mobile phone! i hope that was a well spent $0.60 for my massively useless weblog :) you can also define a bunch of custom tags that match any thing you find particularly interesting. for mine, i've just written some file types that i see mostly in my logs and also major referrers. i like to be able to construct a query to filter them all out and so i can see if there are any tiny blogs that for some reason or another linking to me :)

after tagging each line, the engine puts all this into an sqlite relational database. two weeks worth of logs equates to around 200K lines. i have around 317 unique tags, this includes browser make and version numbers, eg. cl_msie_5.5 or cl_safari_412. with just 317 tags against 200K lines, i end up having nearly 1.2M of associations (log <-> tag). searching through them using SQL on sqlite is not fun. it's actually pretty slow to get a match on keywords.

that maybe due to the fact i never took a database course, so i decided to go googling on some of the basics of doing efficient joins and narrowing down searching tag like systems. i've come across some sites that talk about tagging and theorising how things like flickr and delicious work efficiently (if at all!) but it seems a lot of them don't have much of an idea of how to make reasonable and scalable databases of tags searched efficiently and with arbitrarily complex logic. for instance, delicious only allows searching with intersections and not unions, possibly for that reason.

maybe i'm an sql dummy, but on my p4 2.4GHz machine, a search for a single keyword takes around 10 seconds. my sqlite database is over 100M big as well, although it is very compressible, mainly due to sqlite storing things in text (i presume.)

anyway more updates as i squeeze small amounts of time to hack on this. statistics is an oddly addictive thing!

... Read More

16 Jul 2005

open source, freeware, widgets and macs.

so why did i open source my widget and the underlying code?

1. freeware vs open source.

my philosophy is that if you not planning to sell your software, why would you hide the source code? maybe it is because i am a programmer, so i believe that everyone should open source code that they are not planning to sell!

the only reason why you don't open source the code is because you are worried that people see you pitiful code. because i am shameless, i don't care about that!

by open sourcing your code, other budding coders can learn from your source code, people who are capable can dive in an add features that they like, fix bugs and do all sorts of crazy stuff. the code is sitting around rotting on your computer, so why not put it on the internet and let the world see! its just like blogging, only slightly more useful!

2. widgets/javascript are ultimately open anyway

if you look inside the widget, you can pretty much see all the code there is. most of the ones that don't use cocoa plugins you can edit and play with as much as you like. so, if the javascript is open already, why not release the whole thing. don't just hide bits and pieces of it, that just frustrates people.

3. widget development is tedious

widgets are not hard to write, but very tedious to debug and maintain. mac os x 10.4.2 isn't helping much either with its insistence on copying the widget into ~/Library/Widgets. the first time that happened, i panicked thinking what the hell happened to my widget went. maybe there's a way to switch that off, but i haven't found that yet.

anyway, the best way to start a widget is not from apple's programming guide, but to modify the code of another one. so what better than to make as much code available as possible so there can be more people can try writing it.

widgets = creative apps?

mac people are (mostly) creative people. popular apps like imovie, garageband and comic life. exploits people's creativity. i see dashboard fall into the same category as applescript and automator. these apps are trying to entice people's creativity in gluing things together, just like garageband lets you glue musical instruments together, and imovie lets you glue movies, photos and music together. dashboard allows you to glue small functional parts of the web on to your desktop (like amazon browsers, google maps, weather, etc). you control what you want to glue into there. just like automator is about gluing small functions in a large range of apps into your own (albeit less pretty) app of your own.

you don't need to know about memory allocation and pointers to write your first widget. these are things normal people cannot come to grips with. javascript isn't the ideal language (if i designed it i would of used python :), but it is half way there.

the computer is not there for you to use and browse information. it should be there for you to create things. i think that is what separates macs and windows pcs. there aren't many creative applications on windows, but in apple's world, they want to you create as much as possible, from music, to movies, to photos and to even programs (eg. xcode is free, script editor is free, automator is free, etc.)

if you don't believe that macs are all about creating, check out quartz composer -- this is a perfect example of letting users harness the raw graphics power of their macs without writing a single line of code.

... Read More

16 Jul 2005

EyeTunes.framework 0.1 and Album Art Widget source code

as promised, i'm releasing the EyeTunes cocoa framework that allows you to query and control iTunes programmatically, but without having to resort to learning Applescript or AppleEvents!

there are still some gaps in the code because i only wrote enough to get all my widget's functions to work. other things like getting specific playlists and finding out what playlist a track belongs to aren't implemented yet. but i suspect it isn't too hard. so i really appreciate any contributions or bug reports. but basically, the code is proven to work for my Album Art Widget and now has reliable artwork saving. (yes, it does all the NSImage, NSString (path) conversion!) the framework is released under BSD, so use it wherever you want.

also, maybe as important as well, i'm open sourcing my widget's code. so those enterprising people who think they can do much better with the artwork or interface, you can play around with that. i think even the photoshop files are in there in case you can improve on my crappy artwork :)

btw, you need to have the EyeTunes framework code to build the Album Art Widget Plugin. i'll put some detailed instructions on how to build it on the wiki page.

... Read More

16 Jul 2005

here comes the source code

i've just finished setting up a trac repository for my open source code. you'll find a new item in the menu called "source." this links to a starter page which leads you to three (there is going to be an extra one) repository for various bits of code that i have always planned to open source but never got around to it.

i haven't gotten around to making tarball checkouts of all the code, but i will get around to that. there are still bits of code sitting on my computer that i plan to open source but don't fit into either mac, python or linux. these are things such as drupal modules (maybe i should ask for drupal cvs access and put them into the contributions.)

so my plan is the maintain the "software" page on my drupal site, and the "source" site is for people who want to report bugs, and contribute to the code.

anyway, thats finally one big thing off my todo list!

... Read More

12 Jul 2005

gamma in photoshop is annoying!

being an occasional photoshop user, i have no idea about the mountains of preferences and options a user can tweak.

i just changed the gamma value on my photoshop a couple days ago and now all the PNGs i created since then have come out in some off colour. i'm not sure how the gamma information is incorporated or used by macs when graphic files are rendered, but its really annoying that certain things are slightly off.

incredibly annoying -- who invented this gamma craziness?

... Read More

07 Jul 2005

podcasts? what are they all about?

so what is the podcasts thing people keep talking about? think of it as recorded radio, difference is that alot of people are doing it from commercial entities to amateurs.

i've been pretty skeptical initially, and i'm sure most people are. i've tried to get ipodder (probably one of the first couple of podcasting programs around) but i never successfully did anything with it.

now that itunes 4.9 has podcasting support built in, i decided to give it another go. and i'm happy to say that i'm slightly hooked. i'm quite picky on the stuff i want to hear. i've initially started to sample the stuff from the top 10 list in itunes, but most of it is pretentious radio or news that i've taken off my subscribe list, other stuff just plain doesn't work.

so what is on my subscription list at the moment? it's Triple-M's 104.9 Podcast (xml), Triple-J's Dr. Karl's podcast (xml), Engadget Podcast (xml) and Adam Curry's Daily Source Code (xml).

Triple-M's podcast are actually snippets from a popular radio station in Sydney. The coolest thing with this podcast is that it summarises the best parts of funny shows on that day (week?). feels like home (and with no ads!)!!

Triple-J's Dr Karl is an eccentric but popular scientist who has a regular slot on radio and had a tv-series on australian tv. he talks about some interesting science things in laymen's terms. i had the pleasure of listening to him explain the theory of relativity at my old high school. i'm sort of on the fence on whether to keep this one up.

Engadget podcast, well, its very amatuerishly done, basically two guys arguing about gadgets. despite my terse description, its actually quite interesting :) too american centric though (eg. arguments about t-mobile or cingular). i kinda like the amatuerish nature of this podcast, the guys do a pretty good job, and they have this non-stop self-promotion aspect to them!

finally, the Adam Curry's Daily Source Code is an eclectic mix of tech geek meets conventional radio. Sometimes there's some weird radio gimmicks. He was featured on Steve Jobs keynote. the guy is from UK.

anyway, thats all for now, maybe i'll promote some new stuff when i come across it. for now, i just wonder what porn podcasts are? hrmmmm....

... Read More

04 Jul 2005

hk ppl don't know where their sex organs are?

AFP has an article on what seemingly is an alarming problem of couples not knowing how to have sex in hk. i think i know just the solution!

... Read More

01 Jul 2005

blogs of employees of companies you want to work for

i guess, if you want to work at these companies, then you might want to read their (eg. google and yahoo) employee's blogs. maybe it'll make you want to work for them more, or find that those people are so boring that maybe you might think again?

someone thought it was a good idea to ask google to tell them who actually has a blog in google and yahoo. an answer that was worth $100!

... Read More