
Intel, the semiconductor manufacturing powerhouse, is planning on building a new data center. Energy availability and usage are some of the key considerations in deciding on a location of the data center. For example, which regions produce a surplus of energy, and are therefore more likely to provide energy at cheaper prices? Which regions rely more on renewable energy sources?
In this project, co-designed with Intel’s Sustainability Team, I’ll write SQL queries that will power the analysis and create visualizations that will help the Intel team select the best location for the new data center.

— Data Set Descriptions
In this project I’ll query 3 datasets as well as write a query to generate a new dataset that I’ll use in my tableau visualizations. The intel.energy_data dataset will be the main dataset I’ll be working with. The intel.energy_by_plant and intel.power_plants datasets will be joined for an in-depth analysis of energy production at the power plant level.
Read below to learn more about the datasets and their features.
intel.energy_data: Contains information about daily energy production and consumption for different regions in the United States.
- balancing_authority – A Balancing Authority is responsible for maintaining the electricity balance within its region. This is a company that makes sure electricity is being exchanged between electric providers and regions so that no region runs out of electricity due to high demand.
- date – The date the energy was produced.
- region – The electric service area within a geographic area of the USA. e.g. California, Midwest, etc.
- time_at_end_of_hour – The time and date after energy was generated, .e.g., energy generated between 1pm-2pm will show up as 2pm in this field.
- demand – The energy demand in megawatts (MW) on the grid (what the houses/business are using).
- net_generation – The energy produced in MW in the region by all sources e.g., wind, coal, nuclear, etc.
- all_petroleum_products – The energy produced in MW by petroleum products.
- coal – The energy produced in MW by all coal products
- hydropower_and_pumped_storage – The energy produced in MW by water power and pumped heat sources.
- natural_gas – The energy produced in MW by natural gas sources
- nuclear – The energy produced in MW from nuclear fuel sources
- solar – The energy produced in MW by solar panels and other solar energy capturing methods.
- wind – The energy produced in MW from wind turbines and other wind sources.
intel.power_plants: Contains general information about power plants in the United States.
- plant_name – The name of the power plant.
- plant_code – The unique identifier of the plant.
- region – The region in the US where the power plant is located. Matches the regions in the intel.energy_data
- state – The state where the power plant is located.
- primary_technology – The primary technology used to generate electricity at the power plant.
intel.energy_by_plant: Contains total energy production information at the plant for the year 2022.
- plant_name – The name of the power plant.
- plant_code – The unique identifier of the plant.
- energy_type – The kind of energy generated by the power plant. Either renewable energy or fossil fuel.
- energy_generated_mw – The total energy generated, in MegaWatts, at the plant for the year 2022.

SELECT
region,
SUM(net_generation-demand) as Total_net_generation
FROM
intel.energy_data
GROUP BY
region
ORDER BY total_net_generation

I included this map to help me better contextualize the output of my query above.
According to the Data, the Mid Atlantic has the highest net energy generation, sitting at 31,693,087 mw.
I found this strange because New York is listed as a separate region with a net negative generation of -28,310,973 mw, but according to this map, and one other source I found, New York is part of the Mid-Atlantic, this leads me to believe that the mid-Atlantic is not actually the largest net energy producer due to New York being so heavily in the negative with its energy production.
Intel is headquartered in the Northwest, this region has a net generation of 21,685,235 mw, and according to the data, is the second highest energy producer (Net production-demand). Intuitively, it would make sense for this region to be the highest net energy producer.
Assuming I am correct in my assumption that New York is a part of the Mid-Atlantic region and should be included in its calculations, then the Northwest would become the highest energy producer.
- Intel is interested in regions that generate a large amount of energy from renewable sources. Renewable energy is defined as any energy generated from hydropower_and_pumped_storage, wind, and solar sources.
I need to write a query that calculates the sum total of renewable energy by region. Sort the output by the region with the highest renewable energy. What are the top two regions for total renewable energy production?
SELECT
region,
SUM(net_generation-demand) AS net_gener_minus_demand,
SUM(hydropower_and_pumped_storage + wind + solar) AS sum_hydro_solar_wind_prod
FROM
intel.energy_data
GROUP BY
region
ORDER BY
sum_hydro_solar_wind_prod DESC
The Northwest generates 199,266,574 mw and Texas generates 131,367,234 mw. These are the top 2 with regard to total renewable energy generation. Notably, the Central Region is third in total renewable energy generation, with 120,536,677 mw.
B. Modify your query slightly so that it calculates the percentage of renewable energy by region.
HINT: Divide the amount of renewable energy by the sum total of net_generation, and then multiply the result by 100.
SELECT
region,
SUM(net_generation-demand) AS net_gener_minus_demand,
SUM(hydropower_and_pumped_storage + wind + solar) AS sum_hydro_solar_wind_prod,
SUM(hydropower_and_pumped_storage + wind + solar)/sum(net_generation)*100 AS percent_renewable
FROM
intel.energy_data
GROUP BY
region
ORDER BY percent_renewable DESC
C. Which regions change from the top 3 when looking at total renewable energy vs percentage of renewable energy?
Converting the renewable energy generation of each region into a percentage of it’s overall energy generation, paints a different picture, The top region is still the northwest region (52% of total energy generated is green), but the other 2 of the top 3 become regions become the Central region (41% of the total energy generated is green) and the California region (37% of the total energy generated is green).
— Task 2: Generating New Data by Energy Type
Intel would like to know how renewable energy and fossil fuels trend over time. In order to do this, you will first need to generate a new table using your SQL knowledge and the intel.energy_data table before visualizing trends in Tableau Cloud.
- I need to write a query that calculates the renewable energy generated for each row. Return only the date, region, and energy_generated_mw columns.
Note: energy_generated_mw is the alias for hydropower_and_pumped_storage + wind + solar.
SELECT
date,
region,
SUM(hydropower_and_pumped_storage + wind + solar) AS energy_generated_mw
FROM
intel.energy_data
GROUP BY
region,
date
ORDER BY
date desc
After showing the result of the query to your manager, she tells you that she wants it to be clear that the energy_generated_mw column is referring to renewable energy types. She asks you to create a new column called energy_type that has the value ‘renewable energy’ for each row.
A colleague teaches you a simple method to do this. When writing your query, add an additional column after your select statement. Here is an example:
SELECT
*, -- any relevant fields to the query
'renewable energy' AS energy_type
FROM intel.energy_data
B. I need to modify my query from Part A. to include the energy_type column.
SELECT
date,
region,
SUM(hydropower_and_pumped_storage + wind + solar) AS energy_generated_mw,
'renewable energy' AS energy_type
FROM
intel.energy_data
GROUP BY
region,
date
ORDER BY
date desc
C. Next, I need to write a new query that calculates the fossil fuel energy generated for each row. As in Part A., return only the date, region, and energy_generated_mw columns, where energy_generated_mw is now the alias for all_petroleum_products + coal + natural_gas + nuclear + other_fuel_sources.
SELECT
date,
region,
SUM(natural_gas + coal + all_petroleum_products + nuclear + other_fuel_sources) AS energy_generated_mw,
'fossil fuel' AS energy_type
FROM
intel.energy_data
GROUP BY
region,
date
ORDER BY
date desc
D. I need to modify my query in Part C. to include the energy_type column. This column should have the value ‘fossil fuel’ for each row.
HINT: This is very similar to Part B.!
E. My queries from Parts B. and D. should both have the columns date, region, energy_generated, and energy_type. I need to write one final query that UNIONs these two together.
SELECT
date,
region,
SUM(
natural_gas + coal + all_petroleum_products + nuclear + other_fuel_sources
) AS energy_generated_mw,
'fossil fuel' AS energy_type
FROM
intel.energy_data
GROUP BY
region,
date
UNION
SELECT
date,
region,
SUM(hydropower_and_pumped_storage + wind + solar) AS energy_generated_mw,
'renewable' AS energy_type
FROM
intel.energy_data
GROUP BY
region,
date
ORDER BY
date DESC;
Task 3: Aggregating Power Plant Data
Intel has provided you with additional data in order to reach the best conclusion about the location of its next data center. In this task you will be working with two tables intel.power_plants and intel.energy_by_power_plant. You will need to join these tables before you can aggregate them to help the Intel team with their analysis.
- I need to join the intel.power_plants and intel.energy_by_power_plant data on the plant_code. This joined table will form the basis for the rest of the task.
If done correctly, the output will have 2,504 rows.
SELECT
pp.plant_name,
pp.plant_code,
pp.region,
pp.state,
pp.primary_technology,
ebp.energy_type,
ebp.energy_generated_mw
FROM intel.power_plants AS pp
JOIN intel.energy_by_plant AS ebp
ON pp.plant_code = ebp.plant_code
Note: It is recommended to use the WITH keyword for the remainder of this Task to simplify your queries. For a refresher, rewatch “🍿The WITH Keyword” in SkillBuilder 6.
B. I need to write a query that returns the total number of renewable energy power plants for each region. Which region has the most renewable power plants?
WITH renewable_plants AS (
SELECT
pp.region,
pp.plant_code
FROM intel.power_plants AS pp
JOIN intel.energy_by_plant AS ebp
ON pp.plant_code = ebp.plant_code
WHERE ebp.energy_type = 'renewable_energy'
)
SELECT
region,
COUNT(DISTINCT plant_code) AS total_renewable_plants
FROM renewable_plants
GROUP BY region
ORDER BY total_renewable_plants DESC
The Midwest has a total of 234 renewable plants, the most total number of plants out of any other region.
C. Next, I want to write a query that returns both the total number of power plants and the total energy generated, specifically from plants that use “Solar Photovoltaic” technology, grouped by each region.
WITH solar_plants AS (
SELECT
pp.region,
pp.plant_code,
ebp.energy_generated_mw
FROM intel.power_plants AS pp
JOIN intel.energy_by_plant AS ebp
ON pp.plant_code = ebp.plant_code
WHERE pp.primary_technology = 'Solar Photovoltaic'
)
SELECT
region,
COUNT(DISTINCT plant_code) AS total_solar_plants,
SUM(energy_generated_mw) AS total_solar_energy_mw
FROM solar_plants
GROUP BY region
ORDER BY total_solar_energy_mw DESC;
D. I want to modify my query in part C to only show regions having at least 50 power plants that use “Solar Photovoltaic” technology. What can you infer about the efficiency (or size) of the power plants in the Midwest region relative to the other regions in your output?
WITH solar_plants AS (
SELECT
pp.region,
pp.plant_code,
ebp.energy_generated_mw
FROM intel.power_plants AS pp
JOIN intel.energy_by_plant AS ebp
ON pp.plant_code = ebp.plant_code
WHERE pp.primary_technology = 'Solar Photovoltaic'
)
SELECT
region,
COUNT(DISTINCT plant_code) AS total_solar_plants,
SUM(energy_generated_mw) AS total_solar_energy_mw,
ROUND(SUM(energy_generated_mw) / COUNT(DISTINCT plant_code), 2) AS avg_energy_per_plant_mw
FROM solar_plants
GROUP BY region
HAVING COUNT(DISTINCT plant_code) >= 50 -- Only regions with 50+ plants
ORDER BY total_solar_energy_mw DESC;
Surprisingly, Florida has 79, the highest number of solar plants, and they generate 10,264,795 mw. Notably, Texas generates 17,425,264 mw from only 57 solar plants, this implies that Texas might be more efficient in producing solar electric energy.
Note: There is more Tableau work up ahead! If you want to skip the Level Up jump straight to Task 4 below!
— LevelUp: Hourly Trends in Renewable Energy
Before moving on to your Tableau Visualizations, let’s investigate how renewable energy generation fluctuates with the time of day.
- I want to write a query that calculates the total renewable energy generated in each region for each hour of the day.
HINT: I’ll need to use the date_part function to get the hour from the time_at_end_of_hour column. Your result should only have the values 0-23 for that new column.
WITH renewable_energy AS (
SELECT
region,
DATE_PART('hour', time_at_end_of_hour) AS hour,
(hydropower_and_pumped_storage + wind + solar) AS renewable_energy_mw
FROM intel.energy_data
)
SELECT
region,
hour,
SUM(renewable_energy_mw) AS total_renewable_energy_mw
FROM renewable_energy
GROUP BY region, hour
ORDER BY region, hour;
B.I want to Modify the query to filter to the ‘California’ and ‘Northwest’ regions only.
WITH renewable_energy AS (
SELECT
region,
DATE_PART('hour', time_at_end_of_hour) AS hour,
(hydropower_and_pumped_storage + wind + solar) AS renewable_energy_mw
FROM intel.energy_data
WHERE region IN ('California', 'Northwest')
)
SELECT
region,
hour,
SUM(renewable_energy_mw) AS total_renewable_energy_mw
FROM renewable_energy
GROUP BY region, hour
ORDER BY region, hour;
C. Using the built-in visualizer in the SQL app to plot a line graph of the energy generated for each hour of the day and colored by the region. If done correctly you should have two lines in your visualization.

California

Northwest
D. What can you say about the renewable energy generation between California (CAL) and the Pacific Northwest (NW)?
California typically has under 2 million mw, between the 0 hour and 6hour milestones of the day, followed by a spike at around the 10th hour, to around 5 million mw, which lasts until approximately the 15th hour of the day where it begins to fall back to previous levels. The Northwest region seems more stable for some reason. Between hours 2 and 5 we see somewhat of a plateau at around 7 million mw, followed by a rise to 9 million mw at around the 10th hour lasting until the 16th |
— Task 4: Visualizing and Analyzing Using Tableau
Phew! Now that you’ve gotten the queries out of the way, you’re ready to dive into investigating the best regions for Intel to put its next data center. The remaining Tasks will be completed in Tableau, and will focus on visualizing and analyzing your results. Click this link to navigate to the workbook you’ll use to complete the remainder of this Project.
Once you’ve published your Tableau Workbook, paste the Share Link in the box below.
Note: Your share link must begin with:
https://prod-useast-b.online.tableau.com/#/site/globaltech/workbooks/…
Link to the Actual Tableau Dataset.
Continue to post your answers in the provided boxes: purple boxes for your visualizations, and blue boxes for text-based answers.
- On the “Net Production” sheet, create a bar chart of net production , by region. Sort the chart in descending order, from tallest to smallest.
The net energy produced is calculated by subtracting the total energy demand from the total energy generation. This is already created in the field called Net Production.

B. Next, on the “Renewable Energy” sheet, create a bar chart illustrating which regions generate the greatest percentage of renewable energy.
HINT: In Tableau, you have a field called Percent Renewable Energy.
I want to Create a bar chart in descending order of regions with the most renewable energy percentage.

C. On the “Energy Source by Region” sheet, create a line chart of the energy generated for each energy source (fossil fuels & renewable energy) at the weekly date level. Add a filter for the region to your chart.
For this chart, you will use the energy_by_source dataset loaded into your Tableau workbook.

☝️ Petroleum as Energy source, production over time by region – Fairly Flat, I’m going to assume those fluctuations are mainly due to the inefficiencies related to fossil fuel use.

☝️ Coal as Energy source, production over time by region

☝️ Solar as Energy source, production over time by region

☝️ Wind as Energy source, production over time by region

☝️ Hydropower as Energy source, production over time by region

☝️ Nuclear as Energy source, production over time by region
D. On the “Energy by State” sheet create a bar chart of the total energy generated by each state and energy type. Color the bars by energy type. Include a region filter in your chart to reduce the amount of bars shown.
For this chart you will use the power_plant_energy dataset that you created. You can select the data source in the upper left hand column in Tableau.


— Task 5: Communicating Results
Your manager wants you to share the visualizations you created in Task 3 with the Sustainability team for visibility. She has created a dashboard with your visualizations (see the “Dashboard” sheet in Tableau) and has asked you to write a short paragraph explaining which region you recommend that the next data center be built.
- In 1-2 paragraphs, summarize what can be gleaned from your visualizations. What region and state do you think is best and why?
Before jumping into the decision, I noticed a few key things that I’d like to dig deeper into:
- Based on the Net Production by Region, California is the greatest under performer sitting at a deficit of over 50 million megawatt hours, based on that datapoint, combined with the high cost of living in the region, and other geopolitical risks, I’m going to remove California from any further consideration regarding new Datacenters at this time.

Based on the Energy Source by Region & the above graph referencing solar production by region over time, it seems that all renewable energy sources dip over time, this pattern is especially pronounced in states like Texas, regions such as the Midwest and North-Atlantic. Notably, California also shows this dip as well. This leads me to an assumption that I’d like to explore further; is it the case that, the technology associated with renewable energies, is such that, ‘the point of diminishing returns’ manifests faster than with fossil fuel technology? An example of this could be that solar panels need regular cleaning in order to continue to convert energy from the sun at peak efficiency, if the solar panels in question are not being cleaned regularly, or at all, this could explain this diminishing solar production over time, whereas, in the case of fossil fuel generation the maintenance costs are likely to be less due to the maturity of that technology. This question ,and it’s potentially premature assumption, leads me to my next question; is the maintenance and replacement of solar panels renewable? Are older, less efficient solar panels being used to manufacture newer solar panels in a circular, renewable fashion of resource allocation? or is this avenue of energy consumption creating waste that cannot be redirected or reused in any fashion?

This graph shows Nuclear as an Energy source, it’s production over time by region, the consistency of nuclear energy combined with the advent of SMR (Small Modular Reactors) deserves exploration. Notably, Texas and Florida both have nuclear output, but not as much as the mid atlantic.
And now for my Decision; The Risk-free choice is that we establish the new plant in the Northwest, given its proven track record of sustainable energy generation, it’s proximity to headquarters and the increased accountability associated with being so close to executive level management.
The Riskier choice, with the potential for Greater reward, and my personal recommendation to senior management, is Texas, in the Southwest region. Texas has close proximity to fossil Fuels, is an up-and-coming economy, has lower cost of living and proximity to good talent, has the potential for future nuclear use, and is THE most efficient solar energy creation region. It lasts in terms of being a net negative energy consumer which has the potential to get worst but can also get better if provided with the right guidance and staffing. I also recommend some of the senior management be stationed in Texas as well, I’m sure someone in senior management is beginning to feel the cold in their bones and is yearning for a warmer climate as well.
Honorable Mention: Florida
Florida is a beautiful state with significant economic potential. However, due to its unique geography and climate, it poses challenges for renewable energy generation. A single-source approach, such as relying solely on solar, hydro, or wind power, is not advisable. Instead, a well-balanced mix of multiple renewable sources, supplemented by nuclear energy, may be necessary.
Florida’s soft terrain, unpredictable weather patterns, and limited hydroelectric potential make it a difficult candidate for large-scale renewable energy investments. However, the state’s rich talent pool and growing economy position it as a promising hub for Research & Development in clean energy. Through strategic community engagement and investment, Florida could become a leading center for innovation in renewable energy solutions.
Leave a Reply