AMNH Internship: Data Analysis
In previous posts, I discussed the projects I am working on and the data collection which goes with them. While the projects might be high tech, observation and collection of data is performed with pen and paper. Once this data is collected, I get back to the computer for the data analysis.
The first step in the process is simply to enter all of the data from the observation sheets into a spreadsheet. As the museum, like NYU, runs on Google Apps, the first time I did the data entry half the battle was getting used to Google products after relying on Excel for many years. The other half was turning our notes from observation into something computer readable after they were copied into the spreadsheet. Although I was somewhat aware of the concept, I had never needed to take a collection of survey responses and assign them to groups to later perform simple data analysis.
Most of this grouping and analysis work took place on questions related to our learning goals. An example is our goal relating to location. We wanted visitors to be able to point out the location of Haida Gwaii and the other northwest coast Nations on a map. We surveyed the participants in two of our activities: the virtual tour (where visitors controlled a telepresence robot on Haida Gwaii) and the virtual guide (where the curator of the Haida Gwaii museum controlled a telepresence robot at AMNH). We asked them simply, “Where were they (the guide) talking to your from? Can you show us on a map?” and flipped over the observation sheet to reveal a map for them to mark their response.
This method does have some weaknesses, mainly that if someone cannot read a map, the chances are high that they will be unable to show us the location even if they understood from talking to the guide that it is located in Canada, to the south of Alaska. It does not appear that this affected the results greatly, but we were cognizant of this issue and if a visitor named the correct location but then circled a location in South America, we would count them as having given a correct response.
When going through responses given by visitors, I created three categories: correct, somewhat correct, and incorrect. While somewhat correct is not part of the binary that correct and incorrect imply, this category was used to categorize responses from visitors who clearly had the right idea about the location, but either didn’t have the detailed knowledge or weren’t familiar with that region of the world. In addition, the countries on our map were unlabeled, which increased the difficulty. An example of a “somewhat correct” response is a visitor who circled a location in the Aleutian Islands, a location in the Pacific Ocean technically to the “south of Alaska”, as our guide often explained, or a visitor who circled a large swath of inland British Columbia.
There were occasional responses which broke the rules I had created, such as visitor who circled the entire country of Canada, encompassing locations I had defined as correct, somewhat correct, and incorrect at once. After some discussion this response was categorized as “somewhat correct”, because they had the correct idea (Canada), but did not show the necessary specificity. Although this might be too much of a relaxed approach in the context of a study we would hope to publish, in the context of internal, formative research we had more flexibility.
I followed a similar process for assessing the other educational goals of the project. Responses about the current status of the Haida Nation (whether they exist or not) and the relationship of the Haida Nation with the Hall of Northwest Coast Indians (the hall contains objects from their culture) were also coded. With these categories decided and assigned to responses, we could begin looking for patterns. The first visualizations I created were simply to look at how many participants were achieving the goals we set for them and how many were not. But beyond this, I could look for interesting and surprising correlations.
At one point the data pointed to strange correlations between days of the week and visitor responses. One of the tasks I took on in these projects was to collect baseline data about visitors passing through the hall and visiting the Haida alcove. According to the data I collected, there was a weird correlation between the day of the week and the number of correct answers. When Wednesday data was compared to data from Friday, Wednesday visitors had noticeably more correct responses. I collected more data in part because I was curious to see if it would be borne out over multiple weeks, but even then there wasn’t much that we could do with that data. It wasn’t really useful for informing how to implement the activities and still showed a deficit in the learning goals which we hope the activities are addressing. In addition, after discussion with Barry and Hannah, they said that it may just be a fluke that would disappear if we had 300 or 1000 responses instead of just 150.
My experiences with processing data has in many ways surprised me by showing me or reminding me that I actually like working with numbers and looking for insights from a data set. While I would love to be doing design work in my future career, I think I would also like to be involved in evaluation.
0 Comments