Two Graphs and A Question

Social Media analysis can be done fairly consistently but it must be filtered through one or both of two statistical graphs and a question. The graphs represent a wide range of relationships which brings us to the question.
Where are the relationships?
If you plan on doing social media analysis, say predicting sales, you must understand the statistical relationships… So the data question always starts with that question, and that question may take a few different shapes.
Example- How many ipads will be sold this quarter?
Translates to: “When filtered to likely purchases how many sales does one social media share represent?”
Side Note: I have it at 15 to 17 million ipads for the quarter based on only 30 days of social media data, which is slightly under estimates…
How do I know this, well, lets take a look at the first graph…

In statistics we call this a normally distributed statistical curve. The idea is the bulk of the data will clump along a center line, within 2 standard deviations for a given data set 95% of all data points will be included within that range. (-Look it up on wikipedia if you are stumped)
Back to the ipad: Online we have some magical ratios that have long been established…1 to 100 is the standard ratio of visitors to comments on a given site when there is no friction..The other is 1 to 500 a study suggests any event, even a funeral, attended by 500 or more people in the western world will get at least one public share on social media. These are golden ratios, there are some other important relationships on the demographic front that start making normally distributed curves within this curve, but, that is too complex to explain in this blog. Using the share ratio on the Y-axis starting with 1 to 100 on the low end of 2 standard deviations and using 1 to 500 as the upper end of 2 standard deviations we get a range (a rather large range but a range)…
To shrink the range we cross with other population data and trends …Finally predicting ipad sales (units not profit) becomes a simple process of selecting the correct social share ratio based on posts around purchase intent.
The next graph is a little more complex (sorry if that scares you) its called the linear regression model…

The idea with linear regressions is that they identify central tendency amongst seemingly random points of data on a graph. There is a ton the social media analyst can do with linear regression models or chi-squares… Linear regressions help make assumptions, ie selecting the correct share ratio/make predictions about trends. Uses are more nuanced and complicated.
If you are still with me at this point you should understand that most predictions made using social media data have no statistical soundness. However it doesn’t need to be the case, using simple statistical analysis reasonable assumptions can be made with a degree of certainty. That said outliers can always create problems, but, that is assumed…


