The last session I am attending at Analytics Camp 2013 is on Making Segmentation Work with Aaron Terry from BCBS of NC. This is completely out of my domain. Marketing analysis is totally foreign to my world.
Aaron Terry of BCBS of NC Discussing Marketing Segmentation
Different groups within an organization may use segmentation, but other groups may not have a clue what to do with the information.
Below is a copy of his hand-out which is Aaron’s work. My notes are mostly next to his handout text.
1. Ways to leverage an existing segmentation.
-example of market segmenting: family balance, cooking enthusiasts, health nuts, … How many segments are optimal 5-7 due to cost of handling.
-Roll-ups/Sub-segmenting: subseg is separating segments in to sub-segments (ex. healthnuts into sub segments on interest) Roll-ups are the oposite of sub-segmenting, i.e., combining segments into larger groups.
— Short form (for inclusion in future surveys.) This is a short survey of six or seven questions.
– – Database scoring. Use internal data using the same model.
– – Using secondary data (such as Acxiom or Experian) You may be able to map your data to external data from databases and use them in your segmentation.
– Segment profiling (the idea to profile the segment so you can market better to those people)
– – Crossed by other “segmentation” such as customer type, demos, etc.
– – Survey data. Expensive way to get data, but used.
– – Internal data. Databases you have already, but don’t use.
– – Secondary data. You can purchase databases of information
– Estimating changes in segment sizes over time
– Using segment flags as inputs in modeling
– Other ideas?
2) Challenges in implementing segmentation
– Finding segment members
– Other Challenges
3) What other questions do you have about segmentation/ how to make it work better?
Roger: this was interesting to me since I learned a bit about a domain that I was not familiar with. My notes are a bit slim since I was trying to pay more attention to understand the terminology and concepts.
Sessions 4 I attended at Analytics Camp 2013 was on documenting mathematical models.
Steve Burnett and Melinda Thielbar are the presenters. I moved to the much smaller room.
MT: example of a new employee asking for the documentation of the models her new banking employer works. They gave her a stack of SAS programs with no instructions on what to run first.
MT: most analysis works on data in rows and columns. Are they continuous, categorical, predictor(?), outcome(?)
The matrix goes into the magic that flows to the answer.
SB: “raw” data is where you start. There might be multiples of data tables that might be joined together. There might be a filter that combines multiple tables into a shorter table.
Find the TEDx talk about the outliers. Part of the documentation would be how to deal with outliers. Remove dirty, outliers, impossible data points. Part of the documentation would be defining definitions of terms.
Data cleaning should be documented. What are the processes used to clean the data of dirty, outliers, and impossible data points.
Steve Burnett and Melinda Thielbar Tag Teaming the Session
What does she mean by magic?
Adjacency matrix – discussion within the session.
Melinda does math. “Description of variation in yi”
Documentation of your model is one of the ways you can scale and not iterate 1 on 1.
Glossary of terms. You don’t have to define something that you can look up. Overview. Documenting of data preparation. Filtering criteria. Definitions for rows and columns.
Document how to use the answer you get.
Document how you verify the answers you get.
Session 3 of Analytics Camp 2013 for me is Brain Friendly Presentations with Sidd Chopra
Chopra is an experience speaker with toastmasters. Once worked at SAS.
Started with several slides from the O-ring analysis of shuttle booster rockets. Finally, the outcome of the Challenger explosion. “PowerPoint makes us stupid” Gen James N. Mattis.
This session is a little different than the “normal” unconference in that he has prepared well.
Sometimes the visualization fails to tell the story. We have got advanced in how we present data, but our brain can’t process it. Our brain needs explanation. Shows different size pizzas and asks which is bigger, by how much?
We spend a lot of money developing products to visualize data, but the flatter graphic may be clearer.
We go back to what is the question we are asking. How do you decide which type of visualization to use? what decision needs to be made? Who decides that?
Brains are focused on things that it finds interesting. It gets analysis paralysis at a molecular level. Wired to find out what is important and then details. We hide conclusions many times to show the data.
Q: What is the purpose of the brain? A: to control movement.
Example of a sea sponge with brain set solely to find a place to land. It then eats its brain.
What decision makers fear. They don’t like surprises. They want clarity, warnings of threats. They want an Easy Button. Data visualization should provide easy way to visualize what they want. We often overload the audience with data dumps. It is our objective to make their decision makers life easier.
I’m only responsible for what I say, not for what you understand. This is what we have been saying rather than the apposite.
Decisions need to be made.
But our brains are flawed. They process information differently. People aren’t color blind, they perceive colors differently.
Biases. Shows the innocence test.
They have a 3 day workshop that they teach on how the brain works.
Told the Richard Feynman story about testing the O-rings on the shuttle disaster and how powerful the demo was for everyone. The folks could understand it.
Brains are designed for efficiency not accuracy
We are not logical decision makers
you are competing for mindshare
confusion never sells but simplicity does
Session 2 I attended was a panel discussion on Data Visualization. There was some confusion in that there really wasn’t a defined leader at first. What happened next is part of the unconference magic. People started sharing their experiences and at least 6-8 people were giving some really good information.
Spotfire was mentioned as well as Tableau. Tableau was thought to be a little difficult to share with clients. The on-line demos were good in Tableau and shared with the cloud. You have to be careful not to use confidential data.
SAS is represented so there was discussion of their visualization applications. There seems to be a lot of new applications coming out.
JMP was discussed (SAS).
What are some of the free data visualization software packages? You can google for those and there are some good ones.
Many Eyes from IBM was mentioned with the caveat that you have to upload your data so proprietary data is problematic.
Discussion on medical data and depending on difficult data to make decisions. There are many problems. Difficulty of doing no harm. One issue is what information did they have before.
Discussion of what types of graphics will actually present your idea the best. Stephen Few and Edward Tufty. Tufty does one day workshops.
“Don’t show your client something that you can’t do in crayon.”
How to deal with multiple axes with different scales. It can be misleading.
Book: Back of the Napkin was mentioned.
Q: How do you validate data visualization graphics? A: validate the data before you visualize. Also, presentation can affect what people see in the visualization. They might interpret the graphics and come to conclusions that might not follow the data. Of course, the visualization may also give you insight that you can’t perceive from looking at the table of data.
Wordle was mentioned as a word visualization tool.
1st Session: Data Built to Last with Melinda Thielbar
She’s a data scientist in the RTP area.
Three things: Actionable, Verifiable, Repeatable.
1. I wish I knew this. This is what I would do if I knew this.
Who is fraudulent? How can I stop them?
2. What do I know now?
Technical process is taking the thing you wish you knew and turning it into what to do now.
You need a feedback loop from what you wish you knew and what you know now. (agile process methods?)
Take action and see if it worked. First verification should be cheap before you go wild.
A/B testing-cheap way to run experiments.
Build a process for the end user so you don’t have to babysit them.
Repeatable: use programmers to build something that is repeatable.
The Endeavor Blog is a good resource.
A learning resources page is on the wiki. There is a Coursera class on data analytics.
Follow hmason on twitter Hillary Mason.
openIntro.org/stat starts you with statistics.
More resources and web sites on the analytics camp wiki.
In January 2012, I attended Science Online 2012 which is an unconference for scientists, science bloggers, and science writers. I had followed the conference for years, but could never get in or it wouldn’t fit in my schedule. This year, I was able to attend and it was a blast. I learned a lot and scored a lot of science books in the book lottery.
Bora, one of the organizers, asks a number of attendees to fill out an interview for blogging on the Scientific American Blogs as a way for people to get to know other attendees. I submitted my interview and he published it on Monday, March 18th. It was fun to write and I hope I get to talk to a lot more people at Science Online 2013 because I definitely want to return.
Dean Peters is a analytics API junkie. He works for McClatchy (News & Observer owner.) Son of Web Pages that Suck author.
Most Popular Pages Widgets. You see it a lot on news media sites.
Where do they get the data? Google Analytics and Omniture.
You can do this with your blog. You need an analytics service with an API like Google Analytics. You need a programming language. He uses Perl.
Showed the GA dashboard. Showed Standard reporting and Custom reporting.
Why do you need an API? exporting from the dashboard is a process that is tedious and unsustainable. Using an API lets you automate the process. Using a programming language let’s you fine-tune the data and reports you need.
Programmatic access to Google Analytics report daa and stats.
Dimensions and Metrics
Dim are rows of data.
Metrics are columns of data.
You need to be authenticated (gmail id and password)
You need a Profile ID not the UA key
You need to filter the data.
Coding for the API
with Perl we use Net::Google::AuthSub, Profile ID Net::Google::Analytics, with Perl we use XML::FeedPP
Showed the pl code. Discussed what the parts of the Perl code did. He will post the code afterward.