Add podcast feature to website #992

bitschmidty · 2023-01-23T19:48:50Z

This PR adds an Optech Podcast to the site. Some notes:

Audio files and rss feed hosted at anchor.fm
Uses an html audio player instead of embedded anchor.fm player as the html player requires no javascript and enables us to link within the episode to play segments
There is javascript for seeking within a podcast's audio, but used sparingly and has noscript fallback
Used the first Optech Recap Twitter Space, from news210 as example to test features
Used whisper from openai for automatic transcription (medium.en setting). Did not edit transcription manually (except where there was a duplicate word that broke the build).
Show notes provided with links to the relevant newsletter content as well as within the transcription as well
Links for each covered segment added from the original newsletter (210) to the podcast in which it is covered
Original audio recording was edited by a third party to clean it up. See transcription difference in the "post edit transcription update" commit

In addition to feedback on approach and features, remaining todos:

If team agrees on approach, finish setting up Apple, Google and other podcasts with Anchor.fm and update podcast links accordingly
additional testing (Ive tested on macOS w/ Chrome, Firefox, Safari and in Chrome's mobile device emulator)
determine whether to append remaining shows to this PR or submit in batches in future PRs (will do separate individual PRs, starting with most recent)
put the headphones icon at the end of an item in the newsletter references

murchandamus · 2023-01-23T20:18:43Z

I think we can afford a separate PR per audio recap. The integration with the newsletter is slick, I like it!
Is it possible to get the automatic transcription also output speaker designations? The transcript would be much easier to follow if the speakers were labeled.

I‘ve listened to a couple minutes, sounds good. I’ll let others chime in more, for now.

bitschmidty · 2023-01-23T20:24:17Z

Is it possible to get the automatic transcription also output speaker designations? The transcript would be much easier to follow if the speakers were labeled.

There are some folks doing this(speaker diarization). I can dig a bit deeper after some more feedback.

harding

Wow! I took a quick look at the preview, and I'm impressed! I left a few minor comments.

I'm in favor of submitting additional shows in batches (or individually). Maybe work from the present backwards so we can show the feature off sooner than latter (but I think that's up to whoever's doing the work).

_includes/functions/podcast-note.md

_posts/en/newsletters/2022-07-27-newsletter.md

harding · 2023-01-26T00:24:59Z

Oh, can we maybe also say something about the licensing on the podcast? Y'all have been doing it, so I think it's up to you to choose a license (even if you want it proprietary), but I think we should explicitly mention the copyright either on the individual podcast pages or on the /en/podcast/ page.

murchandamus · 2023-02-06T22:24:46Z

@bitschmidty: I’d be open to a permissive license, e.g. CC BY. Given our mission it makes sense to me that we allow commercial use, but I’d be okay to restrict it to share-alike (CC BY-SA), or even No Derivatives (CC BY-ND), if you would prefer it not to be remixed.

On second thought, I see that the Bitcoin Optech content is generally licensed per MIT license, that’s fine with me, too.

harding · 2023-02-07T18:23:00Z

On second thought, I see that the Bitcoin Optech content is generally licensed per MIT license, that’s fine with me, too.

FWIW, I'd prefer to see Optech move to a CC license, as I don't think MIT is really designed for natural-language content (even if it does mention documentation). My preference would be CC-BY-SA as I've always been a copyleft guy. But when it comes to the podcast, I again think that's a decision for y'all.

bitschmidty · 2023-02-08T12:54:23Z

License: Id like to allow folks to reuse with attribution.

"putting the headphones icon at the end of an item": Agreed!

Transcripts: I had a vendor submit a free sample transcription to compare. See attached for quality. They estimate £84.00/hr of audio for a turnaround of 6+ days. This is the team that transcribes the What Bitcoin Did Podcast so they are somewhat familiar with the Bitcoin jargon.

harding · 2023-02-08T18:47:08Z

Transcripts: I had a vendor submit a free sample transcription to compare. See attached for quality. They estimate £84.00/hr of audio for a turnaround of 6+ days. This is the team that transcribes the What Bitcoin Did Podcast so they are somewhat familiar with the Bitcoin jargon.

On a quick skim, that looks really good to me. Doing the quick math of an rough upper bound of two hours per week, 51 newsletters per year, times 1.21 exchange rate comes to ~$10k/year. That doesn't sound like much to me compared to the benefit of getting top-quality transcripts of conversations with subject matter experts about topics important to the future development of Bitcoin and LN.

If they have a lower rate for slower turnaround, or if 6 days isn't fast enough for us, we could do the initial transcription for new episodes using software and then update with the human-translated version later.

bitschmidty · 2023-02-27T18:35:54Z

Pushed updates:

headphone icon after newsletter text
filled in podcast listener app links and signed up at each provider
removed podcast 210
added podcast for newsletter 239
239 transcription added (professional transcription)
podcast episodes will show on the homepage (with newsletter and blog post entries)

I think we are ready to go live with this if all looks good.

Edit: preview of news 239 podcast page: https://deploy-preview-992--bitcoinops.netlify.app/en/podcast/2023/02/23/

harding

ACK with one non-critical suggestion.

I reviewed the diff, visited each of the edited pages, clicked a bunch of links, tested all of the podcast links I could, read a bunch of the transcript, and listened to a few segments using the JS player.

This is absolutely incredible. I'm just blown away. Thank you so much @bitschmidty for both the idea to do this and the follow-through. ❤️ ❤️ ❤️

harding · 2023-02-28T00:30:38Z

en/publications.md


-{% else %}
 {:.center}
 Recent publications from our [blog posts][] and [newsletters][].


Now that we have the list, I think this "Recent publications..." line is redundant and should be removed.

bitschmidty · 2023-02-28T15:38:59Z

Pushed changes for:

Fixed the spacing nit @harding pointed out
removed the Recent publications text
added podcast content and transcription license (CC BY-SA 2.0) per hardings recommendation

harding · 2023-02-28T15:46:53Z

ACK ab113e1 with enthusiasm!

murchandamus

I’d like to second @harding's enthusiasm, this is amazing work, thank you a lot @bitschmidty. I clicked through the preview a bit, tried a bunch of links for jumping to parts of the transcript and to timestamps in the audio, and added the podcast to my podcast app—it all worked great. I just got a tiny nit that you can feel free to ignore if you don’t share it.

ACK ab113e1

_posts/en/podcast/2023-02-23-newsletter-recap.md

bitschmidty · 2023-02-28T16:52:29Z

Took @xekyo 's suggestion, squashed, and merged! 🚀

kouloumos · 2024-03-08T14:47:50Z

At bitcointranscripts we developed a transcription pipeline that streamlines the AI-generation, human review, and publication of technical Bitcoin transcripts. I believe that incorporating Newsletter recaps into this system would be a perfect synergy. Adding the Newsletter recaps into our transcription pipeline would lead to faster transcript turnaround times while providing a different path to the community to delve into the topics and engage with the discussions.

I already pushed yesterday's recap into the pipeline. The AI-generated transcript is now available, awaiting claim by a reviewer at review.btctranscripts.com. Once claimed, it will be reviewed, edited, and submitted for evaluation. Upon approval, the finalized transcript will be accessible at the original link.

If that sounds interesting, I would love to explore what we need to do to integrate with your existing workflow.

bitschmidty · 2024-03-08T17:50:54Z

@kouloumos What I do currently for context, including audio editing:

use ffmpeg to pull down the spaces audio file
audio editing including cutting beginning banter, any tech difficulties, removing silence, remove background noise, autoleveling
Create podcast web page (thank you @kouloumos !) and put in timestamps during audio editing
Send edited audio file to human transcriptionist
Publish audio file to anchor.fm for the podcast feeds
Publish the podcast page on Optech via PR
After ~3 business days, get the transcription, address any queries on it, markdown convert, add transcript to podcast page

Pros of btc transcripts proposed setup: quicker initial transcription, involvement of community, no $ cost
Pros of current setup: predictability, quality (? have not compared)

Open to others feedback, @murchandamus @harding

murchandamus · 2024-03-08T18:49:24Z

I must admit I do like the predictable turn-around and high quality of our current pipeline. I would imagine (and at a cursory glance seem to be correct) that the initial automatic transcript would not measure up to our human transcription and I would also guess that the improved transcript would take an unpredictable time on btctranscripts. That implies to me that we would first have a lower quality transcript and touch it up later again when we get the improved version. Overall, it seems like it would be higher touch to go with the new proposed route, and I’m not sure I see major upsides beyond a lower cost. @kouloumos, could you perhaps compare the advantages and disadvantages as you see them? While I don’t know how much the transcription costs, it seems to me that the current approach works well, but then @bitschmidty is doing all the work for that these days, so if @bitschmidty prefers what you propose, I’m happy to roll with it.

harding · 2024-03-09T00:12:51Z

Strongly agree with this:

@bitschmidty is doing all the work for that these days, so if @bitschmidty prefers what you propose, I’m happy to roll with it.

Having high quality transcripts is really important to me but the method we use to get there isn't something I care much about (so long as it's ethical).

I would like to continue hosting transcripts on our site as I want to incorporate them with our topics index and other site features in the future (when I finish the transition to Hugo), although I don't mind other sites also hosting them.

kouloumos · 2024-05-09T08:22:16Z

Thank you all for your feedback and insights. After reflecting on your comments, I let some time pass to observe the review process for the AI-generated newsletter transcript we added. Unfortunately, it took a month for the transcript to be claimed and reviewed by a human, which is less than ideal. Moreover, there's a bottleneck in the evaluation stage, primarily because I'm currently managing these evaluations myself.

Given these observations, I agree about the unpredictability in both the timing and quality of reviews. It's clear that now is not the right time to incorporate Newsletter recaps into our system. However, I'm optimistic about future improvements that could address these concerns:

Unpredictable Timing: We can introduce a feature to prioritize certain transcripts, moving them to the top of the list to encourage quicker claims. We could offer incentives, either through social recognition (acknowledging the reviewer's contribution) or monetary compensation (extra sats for specific transcripts).
Unpredictable Quality: While we strive for consistency, the quality still varies depending on the reviewer. We're planning to implement a tiered or reputation system for reviewers based on their track record. This could ensure that only the most reliable reviewers can claim important transcripts.

I'm aware that our current system doesn't yet match the predictability and quality of your existing setup. However, I believe the proposed improvements could align our turnaround times and maintain high standards, with the added benefits of: automatic timestamp inclusion, quicker initial transcription, involvement of community, minimal $ cost and streamlined processes therefore minimal involvement required by you. Here’s how I envision the revised process:

Audio editing and publishing to podcast feeds (handled by you)
Downloading the audio and generating the AI transcript (handled by us)
Publishing the AI transcript, making it available for review (handled by us)
Editing of the transcript by reviewers, followed by submission for evaluation (handled by us)
Submission evaluation (handled by you or other members of the community)
Publishing the final transcript (handled by us) - the transcript could be hosted on both websites

We aren’t there yet, but your feedback is invaluable. I plan to revisit this proposal once we've implemented the necessary changes and feel confident in the enhanced system. In the meantime, any further input you have would be greatly appreciated. Thank you again for considering this integration and for the thoughtful discussion.

bitschmidty marked this pull request as draft January 23, 2023 20:26

harding reviewed Jan 26, 2023

View reviewed changes

_includes/functions/podcast-note.md Outdated Show resolved Hide resolved

_posts/en/newsletters/2022-07-27-newsletter.md Outdated Show resolved Hide resolved

bitschmidty force-pushed the 2023-01-add-podcast branch from bec4b15 to e3bd55e Compare February 27, 2023 18:13

bitschmidty marked this pull request as ready for review February 27, 2023 18:43

bitschmidty changed the title ~~WIP: Add podcast feature to website~~ Add podcast feature to website Feb 27, 2023

harding reviewed Feb 28, 2023

View reviewed changes

murchandamus approved these changes Feb 28, 2023

View reviewed changes

_posts/en/podcast/2023-02-23-newsletter-recap.md Outdated Show resolved Hide resolved

Add Podcast publication to the site, add pod239

84543ac

bitschmidty force-pushed the 2023-01-add-podcast branch from fb3471f to 84543ac Compare February 28, 2023 16:46

bitschmidty merged commit 63b1043 into bitcoinops:master Feb 28, 2023

Add podcast feature to website #992

Add podcast feature to website #992

Uh oh!

Conversation

bitschmidty commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

murchandamus commented Jan 23, 2023

Uh oh!

bitschmidty commented Jan 23, 2023

Uh oh!

harding left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

harding commented Jan 26, 2023

Uh oh!

murchandamus commented Feb 6, 2023

Uh oh!

harding commented Feb 7, 2023

Uh oh!

bitschmidty commented Feb 8, 2023

Uh oh!

harding commented Feb 8, 2023

Uh oh!

bitschmidty commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harding left a comment

Choose a reason for hiding this comment

Uh oh!

harding Feb 28, 2023

Choose a reason for hiding this comment

Uh oh!

bitschmidty commented Feb 28, 2023

Uh oh!

harding commented Feb 28, 2023

Uh oh!

murchandamus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bitschmidty commented Feb 28, 2023

Uh oh!

kouloumos commented Mar 8, 2024

Uh oh!

bitschmidty commented Mar 8, 2024

Uh oh!

murchandamus commented Mar 8, 2024

Uh oh!

harding commented Mar 9, 2024

Uh oh!

kouloumos commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bitschmidty commented Jan 23, 2023 •

edited

Loading

bitschmidty commented Feb 27, 2023 •

edited

Loading

kouloumos commented May 9, 2024 •

edited

Loading