Heads up: this is our detailed methodology. Looking for a shorter overview? Check out How ratings work.
Impact ratings show how well a brand, company, or investment supports a cause, or a combination of causes you select. Here's our methodology for creating ratings:
For example, the cause Gender Equality includes metrics such as:
Ethos currently uses ~250 metrics across all causes.
We then aggregate data points for each metric. Data comes from companies (including SEC filings and annual reports), government agencies, and independent third parties. For example, the metric “Leaders in supporting working mothers” assesses company policies for working mothers, using data from Working Mother. When Working Mother has new data available, we update our raw data for the metric.
Once we have raw data for a metric, we calculate a z-score, or standard score, for each raw data point. Z-score is a measure of how many standard deviations a number is above or below the mean. Raw scores above the mean have positive z-scores, while those below the mean have negative z-scores.
Depending on the metric, we calculate mean and standard deviation for a company’s peer group or for all companies on Ethos. If metric data varies a great deal among industries (e.g., carbon emissions), we use peer group as a more appropriate population (so an airline’s carbon emissions are measured against other airlines rather than all companies, for example). If the metric is not industry-specific (e.g., percent of women on the board of directors), we use all companies as the population.
If some industries have a larger impact on a metric (e.g., the Transportation industry for carbon emissions), we apply an industry “materiality factor” to the z-score of companies in that industry. For example, the Transportation industry might have a materiality factor of 2 for carbon emissions metrics, meaning z-scores of transportation companies are multiplied by 2. This increases their normalized score relative to other companies if they scored above average, or decreases their normalized score if they performed below average.
This “materiality factor” is used to give greater “weight” to companies that are in high-impact industries for a particular metric. It rewards companies making outsized positive contributions to improving a metric, and lowers ratings for companies making an outsized negative contribution to a metric.
Most z-scores will be in the range of +/- 3 standard deviations (~99.7% of data points if it's an approximately normal data set). To translate this to an approximate 0-100 scale we multiply each z-score by 25 (translating 1 standard deviation to a value of 25) and then add 50 to each z-score (moving the mean of all data points to 50).
This means most data points will fall in the range of -25 (-3 standard deviations) to 125 (+3 standard deviations). To deal with outliers, we winsorize, or "cap", all scores at +/- 3 standard deviations; i.e., a minimum score of -25 and a maximum score of 125. Since final ratings of companies include a weighted average of many raw metric scores (typically 20-50), individual metric scores less than 0 or greater than 100 (which are uncommon) almost never "pull" a final company rating below 0 or above 100. If they do we cap the final rating at a minimum of 0 and maximum of 100.
For metrics such as “Best of” or “Worst of” lists with a few hundred or fewer companies, we assume the list intends for included companies to rate highly ("Best of" lists) or poorly ("Worst of" lists) relative to companies not on the list. To account for this we create a distribution from 100-60 ("A" and "B" scores) for companies making a "Best of" list, or from 40-0 ("D" and "F" scores) for companies making a "Worst of" list. Companies not on the list receive a uniform score of 0 for "Best of" lists (since they were uniformly measured as not good enough to make the list) or 100 for "Worst of" lists (since they were uniformly measured as good enough to NOT make the list). Because this skews additional distribution of performance among companies not on the lists, these metrics are usually given a small weight in calculating final company ratings.
For metrics where raw data has already been distributed on a 0-100 or equivalent scale (e.g., 0-5), we skip steps 3 and 4. When this is the case, further normalizing would skew the intended distribution of scores from the raw source. In cases where the data is on an equivalent scale (e.g., 0-5), we simply multiply raw data to convert to the 0-100 scale (e.g., multiply data on a 0-5 scale by 20).
The goal of each strategy above is always to maintain the original data's presentation of company performance as best as possible, and to aggregate data points into the most accurate view of company performance possible.
Ethos tests all normalization strategies for each dataset (metric) to assess which strategy most accurately represents the distribution of company performance for that dataset. We look at both the relationship among companies (e.g., are data clustered around a certain range) and at external, absolute gauges of company performance (e.g., is there a credible third party that says the best-performing company in something like gender pay equality should only be at a "B" or "C" level). We then use these assessments to make a decision on the best normalization strategy for each dataset.
We then combine normalized metric data to create a single rating for each company on Ethos with sufficient data (usually at least 60% of metrics for a particular cause) for each of the 45 causes on Ethos.
To combine the normalized data, we first determine a weight for every metric within each impact area. For example, the Gender Equality rating might give 10% weight to the gender pay gap metric, in which case 10% of a company's Gender Equality rating would be composed of its gender pay gap score. To determine an appropriate weight for each metric, we look at:
After assigning weights to each metric, we multiply metric scores by metric weights to get the rating for a company.
We then calculate z-scores for each company rating and normalize to a 0-100 scale. To do this we follow a similar approach as in steps 3 and 4.
Scores are now equal to the ratings you can see on each cause and company profile on Ethos.
Fund ratings are a weighted average of ratings for each company held by the fund. For example, if a fund includes 1% Company A stock and Company A has a rating of 80 for a particular cause, 1% of the fund's rating for that cause will be 80. The other 99% of the fund's rating will be made up of ratings for the fund's other company holdings.
When you take an Impact Assessment on Ethos, we use your input about what’s most important to you to weight ratings of companies and funds. For example, if you picked “Reduced greenhouse gas emissions”, “Renewable energy growth”, and “Disaster readiness and effective aid” and rated them all a 4 out of 7 in importance, your personalized ratings of companies would be calculated using 1/3 of a company’s rating for each cause. You can pick as many causes as you want for your personalized ratings and give them any importance you want, and Ethos will weight your ratings accordingly.
Please contact us at firstname.lastname@example.org with questions.