Interesting book on how humans make immediate decisions without a lot of obvious information. It opened up a lot of interesting ideas to pursue and offers a number of links to other books I want to read.
In January 2012, I attended Science Online 2012 which is an unconference for scientists, science bloggers, and science writers. I had followed the conference for years, but could never get in or it wouldn’t fit in my schedule. This year, I was able to attend and it was a blast. I learned a lot and scored a lot of science books in the book lottery.
Bora, one of the organizers, asks a number of attendees to fill out an interview for blogging on the Scientific American Blogs as a way for people to get to know other attendees. I submitted my interview and he published it on Monday, March 18th. It was fun to write and I hope I get to talk to a lot more people at Science Online 2013 because I definitely want to return.
I played high school football in western North Carolina near the small town of Hickory, North Carolina. It isn’t so small any more and it is not as back woods as it was in my youth. Instead of the mill town feel, it has a more metropolitan feel. But the town I remember in my youth didn’t even have a pizza parlor until I left for college.
My father always wanted me to be an athlete so I played a good many sports when I was young. He wanted me to get a scholarship for college in football or something. Unfortunately, my talents lay elsewhere. I wasn’t exactly first stringer material and only got in the game a few times from the time I was a freshman until my junior year. (I finally realized as a senior that I could quit since I really wanted to write on the school paper more than play sports.)
While I was playing, I ended up playing center on offense and interior lineman or linebacker on defense. I played on the second string. Since we were in the foothills of North Carolina, there were a number of high schools with coaches who had been taught under Clarence Stasavich of Lenoir-Rhyne College. This meant they played single-wing offense which was taught by Coach Stasavich religiously. (No pun intended as LR is a Lutheran college.)
Playing single-wing is something that I will never forget. It is an intricate dance by the offensive team that is a thing of beauty when done well. Blocking, faking, laterals, feighting, and all are done with beautiful precision. I loved going to LR games to watch them as they were some of the best. Others such as Appalachian State and East Carolina played it, but never to the precision of LR in the good years.
As center, you had to long snap every play to one of three different backs depending on the play called. Once snapped, the linemen would do elaborate cross blocking and pulling which meshed with the dance that the blocking back and the fullback performed with the wing back. All this movement was choreographed in a way that defenses had to be very good or they would be left behind in a confused state. There were times when our second string group would score on the first string defense since we got good at running the offense. The only thing that comes close today is the wishbone offense in the different ways that you can run.
The demise of the single wing came fairly quickly as I remember. People just weren’t able to sustain the high level of talent needed to run it. It is hard to keep a cohesive unit together and it is very hard to coach. Defenses also evolved into more than the five man front that was played at the time. Stasavich was a master though and I will never forget watching his LR teams on Saturday nights with my Dad. I’ll also never forget playing the little bit of it in those three short years.
Tim Ross is giving an intro on big data systems such as Hadoop and other big data systems. Plus NoSQL systems.
Hadoop was developed at Google to deal with distributed data systems.
Apache Hadoop is a distributed framework for data processing and storage.
Google’s Map-Reduce paper describes the process. Tim explained how the map-reduce framework works by distributing the data across a number of lower powered commodity servers. Map-reduce processes the data on each server and collapses it to another server which then sends to the consumer of the data. Splitting up the data allows much faster, concurrent data access.
Hadoop is written in Java. Is was opensourced. Many use cases are batch.
HBase is nosql database that is gaining traction. Used by Google. Partition tolerant. Can be used with Hadoop.
HPCC Lexus/Nexus opensourced HPCC, marketed as a hadoop killer.
Conrad evaluated Pentaho. Analysis and reporting tools. Frontend tools resource. Focuses on analysis. Strength is building the cubes for processing. Uses MBX language (sort of like SQL) for processing.
Business Intelligence was mentioned.
Tim used a presentation he did in the past on Hadoop. When you write it, you write a mapper and a reducer. He used it on a genomics project.
IRODS was mentioned.
Tim showed the Yahoo! Developer Network site “Module 2: The Hadoop Distributed File System“.
Column oriented RDBMS systems mentioned.
Melinda Thielbar discussing data science in the 1pm session.
Talks about using a distribution curve of % of customers versus $ Sales as an example.
Most underestimate mean vs median.
What does a data scientist do? Each part of being a data scientist actually is a scientific discipline in itself.
Many get caught up with having a single number that describes something, but it is the complexity of the data within the curve that is the important part.
Many times filtering of complex data will exclude information that may provide trends.
You can have success in analyzing data that others are excluding.
Lots of discussion on interactions of data scientists and other scientists with other disciplines as well as with each other. Can people from different worlds interact with respect? Can they communicate?
Data scientists understand randomness and understand it in all its forms and where it comes from.
A better data set is better than a bigger dataset. Discussion of experimental design. Different algorithms and tools can be used to reduce the number of data points. There is a lot of stuff in control theory.
Jim Hazen from SAS led the session on introduction to Google Analytics.
Concepts in analytics. Described the triangle concept of
Hits, Page Views, Visits, Visitors, and Individuals.
Use of a permanent cookie to determine a specific visitor. Use of authentication to determine a specific visitor.
Time on site explanation. Interesting idea of how they determine how much time people spend on a site. The last page they visited is indeterminate. Only n-1 pages can be measured since they are measuring the difference between visit timestamps.
No absolute in web analytics, this isn’t finance. Trends are more important than absolutes.
Much on the Google Analytics main page is crap. It is based on older web analytics. You have to drill down more to answer any questions about it.
Shows Visitors Overview page graphics by day, month, year. Showed location data using SAS site as the example. Showed how to use the filters to get rid of the noise.
Interesting idea for looking at what types of devices are accessing your site. Is mobile important? You can answer the question about how to divert resources for mobile development versus web development. Is mobile important?
People are bidding for you when you search Google. Google is asking for bids for ads from vendors when you search.
When you click on a ad link at the top of the search, someone pays Google a specific amount. Google is optimizing it for their benefit. All the ads have tracking codes.
Organic search items are below (in white) under the page search items.
The Sources page is @DeanPeters (SAS) favorite report because it tells you how people are getting to your site.
Very interesting discussion that had to be cut short.
This post is inspired by a session at Science Online 2012 that I attended in January 2012. The session was on the information overload that many of us endure in our personal and professional lives. There is a constant stream of tweets, email messages, IMs, Google+ posts, FaceBook, etc. Some people become engrossed by all the information and it destroys productivity. I have dealt with email for a long time and have looked into ways to get a handle on it.
I compare the problem to that of a sick and bleeding patient. You deal with the critical thing first. The first thing to consider is to stop the bleeding rather than try to cure the illness. Minimizing the incoming stream can be accomplished in most cases. First, analyze who is sending you messages so you have some idea how you can manage the flood. Most email is from people on your workgroup, friends & family, email distribution lists, and miscellaneous senders.
For project teams or workgroups, there are effective techniques for managing email. Frankly, good management will have developed a communications plan even if it is loosely defined. It is important for a manager to set up rules of engagement for communications. Put together a communications plan where team members understand when to email and when to use other communications.
For example, a manager may want to limit email to weekly updates (daily in some cases.) Individuals may be assigned to deliver their input to one person and that person might send one message a week to the team. I see a lot of people in a constant email stream defining work assignments. These can be multiple messages where one answer ends up causing additional questions. Suddenly, you have 10-20 messages to define simple tasks. Email probably isn’t the best way to handle these types of problems, but I see this happening all the time. Use the telephone or walk down to the person rather than have a long email stream back and forth.
Carbon copy is one of the most abused parts of email systems. Each person should think about who really needs to be copied on messages. There are very few times when everyone on a distribution list needs to have a copy of a message between two other people. This means that all those people need to manage the message with no gain in productivity. Don’t do it.
The email system should not be a file storage where people attach documents and send them back and forth. That is what file systems do. Also, FTP, wikis, and all sorts of collaboration tools are better ways of handling the file distribution problem. The versioning issue with files are also a big problem when people email copies of files around for review or editing. It can be very hard to determine the latest version and the correct version of a document.
A manager should also consider other forms of communication. A wiki or blog system may be a better mechanism for a team. Those will allow other team members to see Q&A and not have to repeat the same discussion on another email thread. Collaboration tools should be used for collaboration rather than having that stuffed into a messaging system.
Getting all the communication plan at the start of a project is a good way to limit the number of messages you have to manage. After a few projects, you will notice a reduction in the number of messages due to frequency and the number of CCs that you get. The idea is to reduce the number of message in total and only message when information is useful rather than CYA which happens a lot in business.
Business people should also consider the sources of their messages and ways to compartmentalize groups of messages into separate folders. Most email systems allow you to filter messages by a number of criteria. You can send email lists to specific folders. You can send messages from the boss or special accounts into the “important” folder. Family and personal messages can be sorted to a “Personal” folder automatically by the sender’s address. Email distribution list messages can be sorted to different folders. Alerts from machines and systems that many of us receive can be moved to a “Alerts” folder. Getting things out of the Inbox and into specific folders allows you to deal with the messages appropriately (or not at all.)
If you find that you don’t read all those messages from professional email lists, you should consider unsubscribing from them and using the web based system to peruse messages. That gets the messages out of your system, but they are still available on-line at your leisure. Some of the geek lists I received would have hundreds of messages a day. No one can deal with a job and deal with lists like that.
Unless you are told by your management that you must be available in real time, consider turning off your email client for most of the day. Check email two to four times a day at specific times. Some people do it twice at 11am and 4pm. Obviously, this doesn’t work for everyone and it should be cleared by your management. This goes to another issue of time management, but it is very much related. That’s another blog post though.
The conclusion is that the best way to deal with email is to not get it in the first place. Determine a communications plan for any teams that you manage. Figure out ways to reduce message count. Delete messages viciously. Sort and file any incoming messages and plan time to deal with them. If you see that you don’t deal with them over time, maybe you should look at why you are getting the messages in the first place. Life is too short to worry about missing something!