Statistics Functions

Previous  Terms  Next

TinkerPlots_logoV2-rgb


Statistics Functions

One Attribute

count

Returns the number of cases for which the expression is true. For example, count(NumberOfPets > 0) will return the number of cases for which NumberOfPets is greater than zero. Similarly, count(exists(Gender)) will return the number of cases for which the attribute Gender is defined. count( ) returns the number of cases in the collection. For an attribute whose values are true and false, count will return the number of cases for which the value is true.

first

Returns the first value in the data set for the given attribute; for example, first(height) would be 61 for a collection of people in which the first person's height is 61 inches.

iqr

Interquartile range, for example, iqr(blood_pressure). This function returns the value at the 75th percentile minus the value at the 25th percentile.

last

Returns the last value in the collection for the given attribute; for example, last(name) would be Zelda for a collection of ducks in which the last duck's name is Zelda.

max

Maximum value; for example, max(age).

mean

The arithmetic mean; for example mean(height).

median

The median; for example, median(speed).

min

Minimum value; for example, min(salary).

percentile

Returns the value with a given percentile. For example, percentile(50, speed) is another way to compute the median. Or percentile(95, score) will return the score corresponding to the 95th percentile.

popStdDev

The standard deviation of the attribute you give it. This is the "population standard deviation."

popVariance

The variance of the values. This is also popStdDev squared.

proportion

Gives the proportion of cases for which the argument is true. For example, if 12 out of 24 people are over 12 years old, proportion(age > 12) will yield 0.5.

Q1

The value that lies at the 25th percentile; for example, the first quartile.

Q3

The value that lies at the 75th percentile; for example, the third quartile.

sampleStdDev

Computes the sample standard deviation according to the formula TinkerPlotsHelp-1-118-004. It is an unbiased estimate of the population standard deviation. For example, sampleStdDev(pressure) computes the sample standard deviation of the attribute pressure.

sampleVariance

Computes the square of the sample standard deviation according to the formula TinkerPlotsHelp-1-118-005. For example, sampleVariance(voltage) computes the sample variance of the attribute voltage.

stdDev

Standard deviation; for example, stdDev(score). Computes the standard deviation of the cases in the collection using the formulaTinkerPlotsHelp-1-118-006.

stdError

Returns the standard error; for example, stdError(score). The formula that is used is TinkerPlotsHelp-1-118-007 where s is the sample standard deviation and n is the number of cases.

sum

Returns the sum of the values over all the cases. For example, sum(time)/count(isNumber(time)) is another way to compute the mean of the attribute time.

uniqueValues

The number of unique values that an attribute has in the data set. For example, uniqueValues(sex) will be 2 if there are only two values ("male" and "female") for sex. (Missing values are ignored.)

variance

Computes the variance of an attribute, that is, the square of the standard deviation, according to the formulaTinkerPlotsHelp-1-118-008. For example, variance(before - after) computes the variance of the difference of the two attributes before and after.

Transformations

bin

Takes the form bin(a, bin, min, max) where a = attribute, bin = bin width, min = start of bin 1, and max = end. bin gives you a string (category value) for a (its "bin" as defined by the other arguments). For example, bin(3.14, 2, 0, 10) gives "b02" because the value (3.14) is in bin 2 in [0, 10] with bins of width 2. (The last two arguments are optional.)

next

The value for the next case. If this is the last case, next returns 0. For example, next(year) returns, for each case, the value of the next year. As with prev, next takes an optional second argument that specifies the value to be returned for the last case.

popZScore

Returns the number of population standard deviations a value is from the mean. For example, popZScore(finalExam) computes a standard score for each value of the attribute finalExam.

prev

The value for the previous case. If this is the first case, prev returns 0. For example, prev(year) returns, for each case, the value of the previous year. An optional second argument allows you to specify the value prev should take if there is no previous case. For example, prev(Factor, 1) will return the previous value of Factor for all cases except the first, for which it returns 1.

rank

Returns the position of the value when cases are ordered from lowest to highest. For example, rank(Population) used as an attribute in a collection of states assigns to each state its rank according to population. Note that if there are duplicate values, the rank will be fractional and the same for all the values. See also uniqueRank.

runLength(flip)

This one's wild! It gives the number of identical values immediately prior to and including the current value. For example, if flip contained {H, H, H, T, H, T, T}, this example would return {1, 2, 3, 1, 1, 1, 2}. You can use max(runLength(flip)) to compute the longest streak of heads or tails in a coin-flipping simulation.

sampleZScore

Returns the number of sample standard deviations a value is from the mean. For example, sampleZScore(height) computes a standard score for each value of the attribute height.

uniqueRank

Returns the unique position of a value in a list of values sorted from smallest to largest. Each value in the list gets assigned a different rank, even if there are duplicate values. For example, if the attribute N contains the values {1, 2, 3, 2}, an attribute using the expression uniqueRank(N) will have values {1, 2, 4, 3}. See also rank.

zScore

Same as sampleZScore.

Two Attributes

correlation

Returns the correlation coefficient for two continuous attributes. For example, correlation(stories, height) will return the correlation coefficient for stories and height. This value will be between -1 and +1 and is a measure of how closely the values of one attribute follow those of the other.

covariance

Returns the average of the products of the deviations of each of two attributes from the mean. For example, covariance(hp, mpg)/variance(hp) would give the slope of the least-squares regression line for hp versus mpg.

linRegrIntercept

Returns the intercept of the least-squares regression line with x as the independent attribute and y as the dependent attribute.

linRegrSlope

Returns the slope of the least-squares regression line with x as the independent attribute and y as the dependent attribute.

rSquared

The square of the correlation coefficient for two attributes. rsquared(x, y) represents the proportion of the variation of y that is accounted for by the variation in x. It takes on values between 0 and 1.

 

Also see Special Functions.

 


TinkerPlots Help

© 2012 Clifford Konold and Craig D. Miller