Sunday, April 07, 2013

Videolamer Search Project Redux: Importing the Site

Task: Cleanup Wordpress installation
Estimated Pomodoros: 1
Actual Pomodoros: 1


When the import process completed, Wordpress reported that some content was not retrieved.  I decided that I would try to obtain and add these files manually.  While I was add it, I figured I could do some additional housekeeping within Wordpress.  For instance, I could delete the video and audio files that were imported.  I could also reset the passwords on all the user accounts (in case I needed to access them).

I only managed to complete the second of these three tasks.  I gave up on password management after realizing that there were 40 different accounts on the site (and after I realized that with the admin account, I could do whatever I needed to).  I gave up on adding the missing content after realizing that all images in VL articles are referenced by upload date.  If I added them myself, they'd all be dated for today.  Sure, I could then go to the affected posts and update the links, but I decided instead that a dozen broken images wasn't going to be a dealbreaker. In retrospect, one could argue that this goes against my promise to be patient, but my decision was based not on the amount of time it would take, but on the fact that it felt kludgy and inaccurate to patch them back in.

This task ended up being a waste of time in the long run. Not enough usefulness for the time spent.
 
Task: Create new MySQL Column
Estimated Pomodoros: 1
Actual Pomodoros: 1

This was another worthless task. The VL site uses a PHP function to display a small excerpt of each article on the home page.  I really need a similar excerpt to display on my site, so I thought I would create a new database column that contains a close approximation.  Not a bad idea in theory, and I actually got it to work, but I forgot that the actual article contents contain HTML tags.  This meant that my excerpts would look very funny when displayed as text.  I know there are ways I could strip out the HTML, but I'm not sure if it is worth it.

This task was especially useless considering that before I began it, I thought that I could simply process the excerpt on the fly via Javascript or something.  Why I decided to do work this task, then, is beyond me.  I think I shouldn't have drank that beer while working.  Or maybe I simply need to take a break and come back later.

Videolamer Search Project Redux: Importing the Site

Task: Import Videolamer.com content into local Wordpress installation
Estimated Pomodoros:1
Actual Pomodoros:1

Pre Task Thoughts

I'm estimating this at one pomodoro.  If memory serves me, the importation process isn't hard, so I'm taking my chances and assuming all will go well.

The reason for this task is simple - I want VL's content in a MySQL database, and I don't have shell access to the VL server.  The only way to get to the DB is for me to duplicate it, and the easiest way to do that is to create a local Wordpress install (which I've done) and import the entire site into it.

Post Task Thoughts

The import was quick and easy, but the importation process took well over one pomodoro to complete.  I'm going to make a judgement and not count that import time against the pomodoro.

Task: Create personal reminder README files
Estimated Pomodoros: < 1
Actual Pomodoros:< 1

This task involves an idea I just came up with today.  I often install a piece of software, experiment with it, and then abandon it for months at a time.  When I come back to it, I don't always know exactly what I did, and what tweaks I might have made.  In order to remind me, I'm going to add files into each folder (named CMWREADME) which will contain notes on what I've done so far.

For example, in the MAMP folder, the CMWREADME explains what changes have been made to the default configuration files, as well as what's in the htdocs folder.  Then, in my working folder for this project, there's another CMWREADME which explains the purpose of some of the folders I created, the state of the Wordpress installation, etc.  I'm hoping that if I keep these files up to date, they will help me get back up to speed after a hiatus of any length.

I worked on this task while the site importation was running, and it rounded out the time for most of the pomodoro.  I still had some leftover, however, and I used it to cleanup and reorganize my development folder.

Videolamer Search Project Redux: Install Wordpress

Task: Install a local copy of Wordpress
Estimated Pomodoros: 1
Actual Pomodoros: 2

Pre-Goal Thoughts
Wordpress claims to have a five minute installation process.  While I don't think it takes that little amount of time, it is pretty damn fast, if memory serves.

Last time I tried this,  I used a MAMP instance for installation.  I still have MAMP, but I might just try using the local version of Apache provided by OS X, along with the MySQL server I compiled from source a few months back.  Or maybe use the MySQL instance in MAMP, and the local Apache?

Post-Goal Thoughts
I decided to use MAMP, as I remembered that Wordpress requires PHP.  I still ended up wasting time during my first Pomodoro.  Let me count the ways....

  1. First, I tried using the client program from my source installation of MySQL to connect to MAMP's instance. 
  2. Then decided I'd just delete my source install (I didn't like where I had originally installed it, I guess?).  
  3. I fiddled with the ports that MAMP uses for Apache and MySQL, before reverting them back to their original states.  
  4. Once I began the wordpress install in earnest, I started using PHPMyAdmin to create the database and user it required.  Then I decided to ditch that and find a way to use the command line client.
Once I finally got MAMP's command line MySQL client working, I started making real progress. By the end of the first Pomodoro, I had the DB and user set up.

The second Pomodoro began with me trying to create a virtual host in Apache to point to my wordpress folder. I like keeping all of my programming related material in a specific folder, as it makes it easier to keep track of everything.  A nice virtual host pointing to wordpress would eliminate the need to copy files into MAMP's htdocs folder.  All in all, this task took up too much time, and failed to work.  I didn't go looking into the cause, as I decided it would be easier for the time being to create a symlink in htdocs that pointed to Wordpress.  I'll add a note somewhere to remind myself to clean that out when I'm done (if I'm going to use MAMP, I'd like to keep it as clean as possible, so I don't come back another day and wonder what the hell all these files are, and what I did to the configuration.

My only other small hiccup had to do with database privs.  I typed them in wrong, so Wordpress couldn't install.  With a simple fix, I had it up and running just before the end of the first Pomodoro.

Lessons Learned
This goal took me twice as long as I expected. I may have to adjust my estimation of future goals accordingly.  This may be difficult, since these tasks have less and less concrete steps that I can identify (and far more potential hurdles).

Videolamer Serach Project Redux: Exporting Data

Task: Export data from Videolamer.com
Estimated Pomodoros: less than 1
Actual Pomodoros: less than 1

I already exported the data last time, and I do know where it is, but I said I was starting this project from scratch, and I meant it. Besides, there seems to be one or two new articles posted to the site since then. Might as well grab them.

Thanks for Wordpress, this process is very easy.  Data exportation results in a custom XML script which can be imported into another Wordpress installation with little to no pain.

Videolamer Search Project Redux

A little over a year ago (has it really been over a year?), I tried my hand at a personal coding project. I wanted to index all of the Videolamer archives into an Apache Solr instance, to allow for full text searching on all of the content. I got pretty far, far enough to get a basic webpage up. Unfortunately, the work stopped cold one day.

This happens to me all the time, and I'm getting tired of it. Just once I'd like to finish something I start, so I'm picking this project back up again, from scratch. I'm going to do things differently this time, to see if it makes a difference.

  1. I'm going to break up the work using the Pomodoro system. This will entail breaking the work upon into subtasks, as well as making time estimates.
  2. I'm going to try and document the process from the very start.
  3. I'm not going to be impatient. If there is one problem which I know has undone me in the past, it is sloppy results due to a lack of patience.
Here is my initial breakdown on the major tasks for this project (these will be broken down over into subtasks):

  • Export the data from the VL site.
  • Install a local Wordpress instance.
  • Import VL data into local Wordpress instance.
  • Cleanup VL articles.  This is something I learned from the last attempt - the formatting used in each article is not consistent.  This will be detrimental when it comes to indexing the articles in Solr.  I need to make sure they all have proper HTML markup, as well as making sure their intro paragraph has some sort of "intro" id that I can use for returning the opening paragraph in search results.
  • Index local SQL database (containing all the articles) into Solr.
  • Test article search using the Solr interface.  Tweak if necessary.
  • Prepare the website.  I will once again try and just use ajax-solr for this part.  If it doesn't work well enough, I'll consider switching over to a Rails app.
PS - This post took approx. less than 1 Pomodoro to write.