Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subject Notes #12

Open
andrewfagerheim opened this issue Jun 29, 2023 · 8 comments
Open

Subject Notes #12

andrewfagerheim opened this issue Jun 29, 2023 · 8 comments

Comments

@andrewfagerheim
Copy link
Collaborator

The main Project Organization issue (#10) is getting clogged with notes on specific topics, whiteboard brainstorms, etc so I made this issue dedicated them. #10 will remain dedicated to meeting notes, action items, checklists, etc --> more broad information on the project's direction and management.

@andrewfagerheim
Copy link
Collaborator Author

andrewfagerheim commented Jun 29, 2023

22 June 2023: A Note on EKE

Below is my work from the whiteboards, with a test-case also included. I think my general thoughts are that it would be better to switch to the EKE system of definitions because it minimizes the magnitude & frequency of negative variance values, where EKE has negative values anytime a filtered profile at a higher scale has a value less than a filtered profile at a lower scale. The next step is to try implementing this method with the applications considered above, namely the ratio method and Steinberg Figure 5 plot.

@andrewfagerheim
Copy link
Collaborator Author

29 June 2023: A Note on Ratios

To better understand the output of each ratio, consider the profiles (i, j, k) of density and spice below
profs

1) Extrema Ratio, $R_{ex}$

This is the ratio of everything below the smallest scale over everything above the largest scale, ie
$R_{ex} = EKE_{l=1} / MKE_{l=N}$
As expected, the spice ratio has higher variance, and there's so much variability in the profile that it's very difficult to interpret -- likely because there is so little variance below l1=100m. Density has a lower variance, but it's easier to pick apart
R_ex

2) Partition Ratio, $R_{part}$

This is the ratio of everything below one scale over everything above that scale, ie
$R_{part} = EKE_{l} / MKE_{l}$
Interestingly, R_{part, l=1} looks very similar to R_{ex} which is likely because they have the same numerator and the denominator changes from being MKE_{l=1} to MKE_{l=3}. Also of note, it seems like as filter scale increases for R_{part}, the general magnitude and shape of the curves are preserved, but the noise decreases, making it much easier to read the plots.
R_part

@andrewfagerheim
Copy link
Collaborator Author

30 June 2023: Another Note on EKE

To better understand the differences in MKE and EKE plots (based off first results in the steinberg notebook), I took a much closer look at each of the terms being calculated by using the substitution u = <u> + u'. One mistake a made initially that is good to remember is, <<u>u'> DOES NOT EQUAL <<u>><u'>.

Below is the whiteboard with the substitution into a one-scale EKE/MKE division, and a binning method with multiple scales. Note the color coded underlines denoting terms for large scale, cross terms, and small scale variance. I'll add the plots I've made testing this decomposition when I've finished checking everything over.

IMG_5391

@andrewfagerheim
Copy link
Collaborator Author

andrewfagerheim commented Jul 7, 2023

7 July 2023: Argo Box Analysis

I made a new notebook (argo_box_analysis) to walk through all steps in the process of loading a new box and analyzing it with all the methods we've discussed. The goal was to get everything in one place, make sure everything has been done correctly, and organize any questions/errors/etc. This note is to walk through each part of that notebook and note comments/questions here.

I picked a small box because I was hoping this might allow for a more cohesive seasonal signal to appear and the Southern Ocean because I was hoping it would have seasonal variation.

Plot Tracers and Profiles

  • I plotted colormaps of CT, SA, SIG0, Spice, and Spice Anomaly. This is fairly straightforward, I guess it's just helpful to note that spice anomaly is calculated by
    mean_prof = ds.isel(pressure).mean() and anom_prof = ds.isel(pressure) - mean_prof
  • I plotted profiles of CT, SA, SIG0, and Spice and the profiles filtered at l1=100, l2=200, and l3=400. The main thing to note that I've applied the masking function that uses the following boundaries:
    • upper bound: MLD + l
    • lower bound: 2000 - l

Plot MLD

  • Here MLD is calculated using the threshold method of 0.03kg/m3, so the function takes the density at the surface, adds 0.03, finds the density closest to this value, then considers that depth to be the mixed layer depth (MLD)
  • I've plotted the mean density and spice profile for each month (line), and the density value that corresponds to the mean MLD (marker). Some of the profile/MLD pairs didn't look right, so I tried plotting one, and I think this simply a result of averaging with noisy data.

Plot Spectra

  • I plotted two sets of spectra, one using the mean profiles of density and salinity from the whole box and the other using the mean profile of salinity from each season. The seasonal spectra seem incredibly similar, I visually can't spot any differences between them.
  • The spice spectra seem to have a slight spectral slope shift from -3 at large scales to -2 at small scales, which might occur ~100m?

Plot Ratios

  • This is where EKE & MKE are calculated, so let's start with identifying how they have been defined.
    • MKE bin 0 = EKE bin 0 = EKEl=1
    • EKE bin 1 = EKEl=1 - EKEl=2
    • MKE bin 1 = MKEl=2 - MKEl=1
    • EKE bin 2 = EKEl=2 - EKEl=3
    • MKE bin 2 = MKEl=3 - MKEl=2
    • EKE bin 3 = MKE bin 3 = MKEl=3</.sub>
  • When each of these are defined, I specify bound=True and apply the boundary mask, which applies the upper and lower bounds as described above. The plots of MKEs and EKEs are displayed, which shows the masking for each.
  • Now for the ratios themselves, which are defined below:
    • Rexclusive = EKEl=1 / MKE l=3
    • Rpartition, l=100 = EKEl=1 / MKEl=1
    • Rpartition, l=200 = EKEl=2 / MKEl=2
    • Rpartition, l=100 = EKEl=3 / MKEl=3
  • The density ratio is always lower (which makes sense because we expect it to have lower small-scale variance) and the Rexclusive looks the most like Rpartition, l=100 and the partition ratio becomes smoother as l becomes larger. Additionally, as filter scale gets larger, more of the profile is masked out (as expected).

Plot EKE/MKE by Scale and Depth

  • These plots display the variance quantity summed over depth where the mask term ==1. In other words, they show the variance summer from depth (MLD + l) to (2000 - l).
  • The first two panels are for density and spice (respectively), they each show the MKE and EKE, and difference between these methods. Unfortunately, there is still a significant difference between these two methods, and especially in the middle two bins, which are much higher in the MKE method than the EKE method.
  • The next two panels are also for density and spice (respectively). They only show plots of EKE, but with variance separated by depth and filter scale. Generally the trends seem to be as expected: highest variance at large-scales, with small-scale variance being the highest closer to the surface.
  • QUESTION: I'm still not sure how to resolve the issue of the EKE and MKE methods not lining up, and I think masking only exacerbates the issue.
  • NOTE: For a completely fair comparison between how masking and not masking effect plotting variance quantities. There seem to be some real differences between the ordering of scales/depths when comparing these two methods side-by-side as well, which feels important to note.

GENERAL QUESTION: Should I compute a quantity for all profiles, then display the mean? Or should I take the mean of all profiles, then compute the quantity? (For example for spectra or MLD)

@andrewfagerheim
Copy link
Collaborator Author

28 July 2023: Sampling Rates

The problem is that float sampling rates can change much more dramatically than I expected or really accounted for in my current method of loading boxes. If the rate changes from ~2m to ~5m that's probably fine, but ~100m definitely isn't. I see a few potential solutions:

  • just remove any profile that has a sampling rate above 5m or 10m anywhere along the profile
  • calculate the sample rate along the entire profile and add this as another dimension in the interpolated dataset
  • calculate the sample rate along the entire profile and add a mask that removes areas courser than 5m or 10m

Initially, I'm drawn to the first option because it's easy and I know exactly how to do it. However, I'm worried that it could remove lots of profiles, which would not be great for the analysis. However, with the other two that could retain every profile within the box, I don't know how exactly to convert from N_LEVELS (the dimension sampling rate would be on) to PRES_INTERPOLATED (the end dimension for depth/pressure) because N_LEVELS depends on the sampling rate.

Now that I say this though, I feel like if you added sample rate as a coordinate it would be interpolated just like any other coordinate in the ds, so this is trivial anyway. (Wait, but won't it linearly interpolate between each sample rate?? No, I don't think this is going to work...)

I think the way forward then is to load a box where you calculate sample rate, add it as a coordinate, and see if it's interpolated correctly. If it is, check how many profiles would be usable. If not, you can either keep troubleshooting or just remove any profile with a sample rate greater than 6m anywhere.

@andrewfagerheim
Copy link
Collaborator Author

andrewfagerheim commented Aug 4, 2023

4 August 2023: Data loading issues

More on cache & chunks: https://argopy.readthedocs.io/en/latest/performances.html
Notebooks: problem_floats and problem_errdac

Notes:

  • I feel like I'm still not clear how to proceed with this. Loading with erddap doesn't seem to encounter the PSAL issues, so that's encouraging. It was able to load boxes that previously ran into PSAL issues which is good. However, there's this new issue of loading particularly large boxes irregularly and slowly.
  • I tried setting cache=True and changing the number of chunks but this didn't seem to increase reliability singificantly. To be clear, I'm now able to load boxes eventually. But this can often take trying to load them 4+ times which just doesn't seem sustainable long-term.

@andrewfagerheim
Copy link
Collaborator Author

andrewfagerheim commented Jan 30, 2024

Notes on EKE for simple profile

Blackboard notes from #13 (comment) with @dhruvbalwada

image
image
image
image

@andrewfagerheim
Copy link
Collaborator Author

Notes on Ferrari & Polzin 2005

IMG_7992
IMG_7993
IMG_7994
IMG_7995
IMG_7996
IMG_7997
IMG_7998
IMG_7999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant