Online Advertisement

Analyzing cost of bad online advertisement by conducting empirical studies via Amazon MTurk

Aug, 2017 –

The Amazon MTurk research project is conducted under the guidance of Dr. Vilma Todri from the Department of Information Systems and Operations Management, Goizueta Business School.

Based on Dr. Goldstein’s work that explains how annoying online advertisements can “cause the publishers more money than it earns”, we would like to take a closer look at how Ad placement and page layout can affect viewers’ online experience by setting up tasks experiment on Amazon MTurk, a crowdsourcing human intelligence marketplace.

I’m working on designing a classification task on MTurk that utilizes the¬†Enron email database while displaying random page layout and ¬†good/bad advertisements on the sidebars. This project involves using HTML and JavaScript (and maybe Ruby?) to design the desired MTurk HIT page and (possibly) integrating SQL database to store the emails.

Sample Hit Task


  • The Enron Database itself is not processed and not categorized.
  • Email content is raw, need to pre-process the text and privacy content is not removed. MTurk doesn’t allow non-UTF-8 characters.
  • Allowing randomization of Ad types (Medium Rectangle, Leaderboard, Skyscraper), Page Layout (Ads on Sidebar(Left & Right), Header, Footer, or within Text(Left & Right), Ad or no Ad), and HIT bonus (random bonus given to the worker for completing the classification task).
  • Getting our own database of good/bad, animated/still advertisement


  • Thanks to UCB Enron Email Analysis, we get a subset of labeled and semi-processed Email database.
  • Preprocess the email database, clean up the format, remove all Enron related words and use UTF-8 decoder to clean up the content, then generate a random function to match each page with an advertisement.
  • Since MTurk only takes in csv file, write a JS that takes in the webpage randomness from the csv file(position, location of image, image url), to manually control the randomness of the webpage.
  • Writing a scraper to get gif ad from designated websites and generate the jpeg version of these gif ads. Upload all these images to WP media library to save a permanent url. Then set up a second HIT task to have the users rate the annoyingness of these ads.