A little over a year ago (has it really been over a year?), I tried my hand at a personal coding project. I wanted to index all of the Videolamer archives into an Apache Solr instance, to allow for full text searching on all of the content. I got pretty far, far enough to get a basic webpage up. Unfortunately, the work stopped cold one day.
This happens to me all the time, and I'm getting tired of it. Just once I'd like to finish something I start, so I'm picking this project back up again, from scratch. I'm going to do things differently this time, to see if it makes a difference.
- I'm going to break up the work using the Pomodoro system. This will entail breaking the work upon into subtasks, as well as making time estimates.
- I'm going to try and document the process from the very start.
- I'm not going to be impatient. If there is one problem which I know has undone me in the past, it is sloppy results due to a lack of patience.
- Export the data from the VL site.
- Install a local Wordpress instance.
- Import VL data into local Wordpress instance.
- Cleanup VL articles. This is something I learned from the last attempt - the formatting used in each article is not consistent. This will be detrimental when it comes to indexing the articles in Solr. I need to make sure they all have proper HTML markup, as well as making sure their intro paragraph has some sort of "intro" id that I can use for returning the opening paragraph in search results.
- Index local SQL database (containing all the articles) into Solr.
- Test article search using the Solr interface. Tweak if necessary.
- Prepare the website. I will once again try and just use ajax-solr for this part. If it doesn't work well enough, I'll consider switching over to a Rails app.
No comments:
Post a Comment