Saint's Log

30Apr/110

Setting Up a Tag Database for the Delicious Data Set

One of my assignments this past semester involved performing search operations on the tags supplied with the delicious data set. I created a database named "contentindex" and then a table for the tag information from the XML file using the following SQL:

CREATE TABLE `tag` (
	`docname` VARCHAR(255) NOT NULL,
	`tagname` VARCHAR(255) NOT NULL,
	`weight` INT NOT NULL,
	UNIQUE index_doc_tag_pair_weight (`docname`, `tagname`)
)
ENGINE = InnoDB;

I set up an Eclipse Java project for the assignment. The MySQL Connector/J JDBC driver is required for interaction with the database. You will need to add it to the unzipped JAR file to the Delicious Eclipse project. To do so, right click on the project -> Build Path -> Add External Archives...

A list of XML parsers that could be used on the tag info XML file is available on the SAX website. I settled on the Xerces parser. Setup was a straightforward extraction into a folder, creation of a corresponding Java project in Eclipse, and finally adding the Xerces project to the Delicious Eclipse project's build path - Right click on the project -> Build Path -> Configure Build Path.... -> "Projects" Tab -> Add...

The source code for the XML tag info parser is available on my github repo.