Archive

Archive for the ‘Meeting Notes’ Category

TACFUG Meeting Notes 2012-03-15

March 16, 2012 1 comment

Eleven of us met at Open Intelligence in Raleigh on March 15th for the monthly TACFUG meeting. Anant Pradhan gave a presentation titled Design Patterns for Everyday Use. This was a precursor to his talk at CF.Objective in May.

Anant is a young developer at UNC-CH and is taking graduate classes there along with working. He started with a general description of design patterns explaining that they were best practices rather than recipes for code. Many of the same problems come up in programming and similar solutions have been found. These have been organized into types including creation, structural, behavioral, and concurrency. Anant explained that only the first three really were used these days.

He went on to describe the three types in detail with listings of a number of subtypes. I won’t give away any spoilers so you will need to go to his talk in Minnesota to get all the good stuff. He is still working on parts of his talk, especially thinking of code examples that would help explain some of the information. It looks like this is going to be a good session for folks who didn’t grow up using Java.

There was discussion among the group after Anant’s preso. He got a lot of good feedback from the group about pretty much everything. Dan Wilson brought up a book that helped him get started with design patterns called Refactoring To Patterns by Joshua Kerievsky.

Please send me any changes or comments on my notes. I write them up so I remember them.

Analytics Camp 2012 Big Data Intro and Roundtable

February 25, 2012 Leave a comment

Big Data session at Analytics Camp 2012

Tim Ross is giving an intro on big data systems such as Hadoop and other big data systems. Plus NoSQL systems.

Hadoop was developed at Google to deal with distributed data systems.

Apache Hadoop is a distributed framework for data processing and storage.

Google’s Map-Reduce paper describes the process. Tim explained how the map-reduce framework works by distributing the data across a number of lower powered commodity servers. Map-reduce processes the data on each server and collapses it to another server which then sends to the consumer of the data. Splitting up the data allows much faster, concurrent data access.

Hadoop is written in Java. Is was opensourced. Many use cases are batch.

HBase is nosql database that is gaining traction. Used by Google. Partition tolerant. Can be used with Hadoop.

HPCC Lexus/Nexus opensourced HPCC, marketed as a hadoop killer.

Conrad evaluated Pentaho. Analysis and reporting tools. Frontend tools resource. Focuses on analysis. Strength is building the cubes for processing. Uses MBX language (sort of like SQL) for processing.

Business Intelligence was mentioned.

Tim used a presentation he did in the past on Hadoop. When you write it, you write a mapper and a reducer. He used it on a genomics project.

IRODS was mentioned.

Tim showed the Yahoo! Developer Network site “Module 2: The Hadoop Distributed File System“.

Column oriented RDBMS systems mentioned.

There is a Hadoop user group that meets in Durham. There is a Triangle Pentaho user group also on Meetup.com.

 

Analytics Camp 2012 Data Science

February 25, 2012 Leave a comment

Data Science

Melinda Thielbar discussing data science in the 1pm session.

Talks about using a distribution curve of % of customers versus $ Sales as an example.

Example curve dealing with mean vs median

Most underestimate mean vs median.

What does a data scientist do? Each part of being a data scientist actually is a scientific discipline in itself.

Many get caught up with having a single number that describes something, but it is the complexity of the data within the curve that is the important part.

Many times filtering of complex data will exclude information that may provide trends.

You can have success in analyzing data that others are excluding.

Lots of discussion on interactions of data scientists and other scientists with other disciplines as well as with each other. Can people from different worlds interact with respect? Can they communicate?

Data scientists understand randomness and understand it in all its forms and where it comes from.

A better data set is better than a bigger dataset. Discussion of experimental design. Different algorithms and tools can be used to reduce the number of data points. There is a lot of stuff in control theory.

Analytics Camp 2012 Introduction to Google Analytics

February 25, 2012 Leave a comment

Jim Hazen from SAS led the session on introduction to Google Analytics.

Jim Hazen from SAS talking about basic web site analytics

Jim Hazen at Analytics Camp 2012

Concepts in analytics. Described the triangle concept of

Hits, Page Views, Visits, Visitors, and Individuals.

Use of a permanent cookie to determine a specific visitor. Use of authentication to determine a specific visitor.

Time on site explanation. Interesting idea of how they determine how much time people spend on a site. The last page they visited is indeterminate. Only n-1 pages can be measured since they are measuring the difference between visit timestamps.

No absolute in web analytics, this isn’t finance. Trends are more important than absolutes.

Much on the Google Analytics main page is crap. It is based on older web analytics. You have to drill down more to answer any questions about it.

Shows Visitors Overview page graphics by day, month, year. Showed location data using SAS site as the example. Showed how to use the filters to get rid of the noise.

Interesting idea for looking at what types of devices are accessing your site. Is mobile important? You can answer the question about how to divert resources for mobile development versus web development. Is mobile important?

People are bidding for you when you search Google. Google is asking for bids for ads from vendors when you search.

When you click on a ad link at the top of the search, someone pays Google a specific amount. Google is optimizing it for their benefit. All the ads have tracking codes.

Organic search items are below (in white) under the page search items.

The Sources page is @DeanPeters (SAS) favorite report because it tells you how people are getting to your site.

Very interesting discussion that had to be cut short.

 

TACFUG 2012-02-23 Meeting Notes

February 24, 2012 Leave a comment

Quick and dirty: things I wrote down at the meeting. Jim Priest (@TheCrumb) presented on code reviews that he is developing for a CF.Objective talk.

BitNami – Jim uses a lot of tools from that site. We somehow got into a discussion on coding standards. I think I asked about that. A good one can be found on the ColdBox site. CodeCop was mentioned as was VarScoper.

Ethervane Echo was described as a multi-clip replacement for the Windows clipboard function. It saves everything you have clipped so you can go back to them and even combine them to insert. I thought about it as a great quick snippet tool. Also, CLCL was mentioned as another multi-clip clipboard replacement by Gerry Gurvich(sp?).

Someone mentioned Cropper for cropping out screen shots. Skitch was also mentioned. I don’t remember the context other than that Dan Wilson was a fan.

Thanks to Dan Wilson for the pizza. I forgot to tell him, I shift blame to the usual suspects. Those were the largest pizzas that I have ever witnessed. Also, thanks to the SCI Consulting for letting us use the conference room for the meeting.

Update: Charlie Arehart mentioned ClipX on the TACFUG email list.

Follow

Get every new post delivered to your Inbox.