BIGBABBLEBLOG :-: STORY

July 31, 2005

Hello again!


It's been a while, no?! Many things have passed, but I continue to work on a few main things, like quotationsbook. I have a new project in my day job, but details can't be written here, so I'll talk about quotationsbook instead! Since my web host is now incapable of taking the load, I bought a dedicated server this morning, which has a permanent, unlimited 512Kbps dedicated connection to the internet backbone. Let me know if any of you need high capacity hosting at scorching speed! I can control everything about the server, including hardware reboots. The scary part is that I now have to migrate each site one by one to the nameserver of my new dedicated server, so not as simple as a shared web host. Blinder.

I met Dr. Clark and Sally of LitEncyc.com at their home in Angel last week. I am pleased to announce a partnership between our sites. Specifically, we will share content between our sites using private RSS feeds. This is virgin territory but will benefit us both. I am excited by this development, the single most significant thing since QuotationsBook.com launched. I will have extracts of LitEncyc biographies on my site, and LitEncyc will have quotation extracts from my site. LitEncyc has many academic volunteers, and amongst the most thorough content and meta-data about writers on the web. Robert and Sally are also pleasant people!

Birke put her photos up on FlickR. She takes great photos, and there's some of when she came to London. Also, Gaz looks like he's getting Hesam into the great sights, sounds and smells of England!

Continue Reading on this post for an account on the evolution of QuotationsBook.


QuotationsBook.com - the story of a content-rich website.

This article collects my notes so far on a new venture - a literary resource.

My aim for this site was to complement quotesandsayings.com by building what was not possible for quotesandsayings, and then expanding its line of coverage into one of the most content-rich quotations websites available. The idea for this site was born last year - to create a huge content resource, and use SEO techniques to drive search engine visitors to it, especially from Google. The basic theories came from a thought I had about the long tail for internet businesses. I coined the idea of creating the long tail of content. If I had huge content diversity, I asked, "Could I tap into the long tail for content demand?". Furthermore, I propose this is pioneer territory about to strike the "hard middle", a demonstration that syndicating wrapper content is the way forward for content silos. This project, far from being an experiment, is a solid foundation for a growing website community. The Crafter's Manifesto inspired me. Make user contributions intrinsic and valued, and you open the most powerful technique to growing an internet presence.

Data

I acquired, processed data for quotationsbook over 6 months. Rent-a-coder and other sources were used to create separate databases of quotations (over 40,000), proverbs (over 4,000) and fortunes. The most significant dataset is fortunes, which number over 70,000 and are presented on a sub-domain. This was achieved through many Perl scripts through a coding service to extract and process many Unix and Linux datasets, which were (as far as I know) in the public domain. I'm thinking about data which is not static, involving user interactivity, such as commenting, rating and forums, which should grow the content set. The integration of FlickR photographs and Yahoo! News has created a huge mass of syndicated content. Furthermore, and significantly, I'm creating ways to mine author content, like autobiographies. The breakthrough agreement last week with Dr. Clark and Sally Roe of LitEncyc was very significant and opened the doors to syndication with many other sites (for both of us). Over a long conversation in London, we agreed on a framework for sharing content.

Technical Development and Feed Integration

Using some basic groundwork from a development project I assigned someone, I redeveloped the site from the ground up in PHP/MySQL/CSS. Magpie is used to cache RSS feeds from FlickR (the use of which generated interest) and Yahoo! News. RSS Writer handles the writing back of RSS files. The search engine is only basic, and I'm using an article from FlickR technical lead Cal Henderson to better handle multiple words on the search engine. I also divided the search area neatly into the various match categories possible (sample search). MySQL full text support doesn't meet my full requirement. A mention must be made of something that's important to me - usability. I mostly relied on CSS and applied generally accepted usability principles for compliance to certain web standards. However, I did this knowing that their application is one part of my programme for search engine optimisation (SEO). It should be clear by now, that SEO is core to content sites, since their sole monetisation is through page views.

Traffic

On the first day, I transferred an abnormal 150Mb, despite the fact that only 120 unique visitors came that day. I realised that because of my search engine submissions, some search engine bot, which wouldn't normally give me traffic, was pounding the site. To try containing such bots, I consulted webmasterworld, and placed a robots.txt file on my root. The traffic and referrals to this site are building from sources which I'm expanding steadily. Page view traffic is currently growing more than organically, while google has consumed about 150,000 of my unique URL's. This process revealed that the Googlebot doesn't index whole sites at first pass, and in fact, I observed on average 2,000 URL's added every day as the days passed. On Friday 10/06, I checked my bandwidth logs to find transfer for the day was an astonishing 900Mb, although visitors were level. I can't decide if it's an abnormal log entry or more likely, a bot banging away at the site.

Since I have RSS feeds, I've outsourced feed hosting to feedburner, which should prevent me having feed format issues and also prevent a spike in traffic from user agents that keep requesting the RSS file every few seconds - as a friend with a similar site has experienced. My email mailing list is outsourced to Google Groups. I'm currently writing press releases to go out soon through wire services.

As of today (31/07/2005), I plan to migrate to a dedicated server, since I need to plan ahead for 1Gb - 6Gb of transfer a day, based on a set of syndication deals that are planned, including the My Quotations feature.

Metrics and Monetisation

I've put processes in place to track the "recommend a friend" features, as well as gross throughput from the search engine and various areas of the site. Google's recent Sitemap XML announcement is not something I'd consider.

The whole idea of this venture is of course, to reach a state where there's a steady income from advertising on the site. However, I don't plan to put on any obvious ads until I break 5,000 uniques/day. Currently, the Ask Jeeves search boxes use an affiliate tracker that pays me cost per click - this is the only monetisation on the site. Rather than get loaded on my own site, I'm playing on the theory and spirit of data sharing/services to integrate bits of other sites and allow others site to integrate bits of mine. This will go beyond "put our quotes on your site". The exact ideas are still being played through in the lab inside my mind. There is currently deals in the pipeline with websites and feed software providers.

Features coming soon

Each quote will have other real-world usability features, such as sending flowers or chocolates with quotes. A major site feature (the whole of QuotationsBook version 2) is "My Quotations Book", where people will be able to not only file their favourite quotations into their own account, but write and store their own quotes. They will have their own "favourite quotes" URL that they could then share with friends by email and RSS. Other features are planned that will set the scene for QuotationsBook making the next mega-move - into e-greetings. I plan to share a message board/forum with another quotations website, but no response has been received.

Support

I'm in debt to friends for supporting this venture with a vote of confidence in its very early stages, especially Gareth, Hesam, Pearl, Marcus, Ruchi and Josh. Let's hope the traffic here takes off. It's about time really! Posted by amitkoth at July 31, 2005 01:49 PM | TrackBack