Skip to content

Add podcast feature to website #992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 28, 2023

Conversation

bitschmidty
Copy link
Contributor

@bitschmidty bitschmidty commented Jan 23, 2023

Preview link of first episode

This PR adds an Optech Podcast to the site. Some notes:

  • Audio files and rss feed hosted at anchor.fm
  • Uses an html audio player instead of embedded anchor.fm player as the html player requires no javascript and enables us to link within the episode to play segments
  • There is javascript for seeking within a podcast's audio, but used sparingly and has noscript fallback
  • Used the first Optech Recap Twitter Space, from news210 as example to test features
  • Used whisper from openai for automatic transcription (medium.en setting). Did not edit transcription manually (except where there was a duplicate word that broke the build).
  • Show notes provided with links to the relevant newsletter content as well as within the transcription as well
  • Links for each covered segment added from the original newsletter (210) to the podcast in which it is covered
  • Original audio recording was edited by a third party to clean it up. See transcription difference in the "post edit transcription update" commit

In addition to feedback on approach and features, remaining todos:

  • If team agrees on approach, finish setting up Apple, Google and other podcasts with Anchor.fm and update podcast links accordingly
  • additional testing (Ive tested on macOS w/ Chrome, Firefox, Safari and in Chrome's mobile device emulator)
  • determine whether to append remaining shows to this PR or submit in batches in future PRs (will do separate individual PRs, starting with most recent)
  • put the headphones icon at the end of an item in the newsletter references

@murchandamus
Copy link
Collaborator

I think we can afford a separate PR per audio recap. The integration with the newsletter is slick, I like it!
Is it possible to get the automatic transcription also output speaker designations? The transcript would be much easier to follow if the speakers were labeled.

I‘ve listened to a couple minutes, sounds good. I’ll let others chime in more, for now.

@bitschmidty
Copy link
Contributor Author

Is it possible to get the automatic transcription also output speaker designations? The transcript would be much easier to follow if the speakers were labeled.

There are some folks doing this(speaker diarization). I can dig a bit deeper after some more feedback.

@bitschmidty bitschmidty marked this pull request as draft January 23, 2023 20:26
Copy link
Collaborator

@harding harding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! I took a quick look at the preview, and I'm impressed! I left a few minor comments.

I'm in favor of submitting additional shows in batches (or individually). Maybe work from the present backwards so we can show the feature off sooner than latter (but I think that's up to whoever's doing the work).

@harding
Copy link
Collaborator

harding commented Jan 26, 2023

Oh, can we maybe also say something about the licensing on the podcast? Y'all have been doing it, so I think it's up to you to choose a license (even if you want it proprietary), but I think we should explicitly mention the copyright either on the individual podcast pages or on the /en/podcast/ page.

@murchandamus
Copy link
Collaborator

@bitschmidty: I’d be open to a permissive license, e.g. CC BY. Given our mission it makes sense to me that we allow commercial use, but I’d be okay to restrict it to share-alike (CC BY-SA), or even No Derivatives (CC BY-ND), if you would prefer it not to be remixed.

On second thought, I see that the Bitcoin Optech content is generally licensed per MIT license, that’s fine with me, too.

@harding
Copy link
Collaborator

harding commented Feb 7, 2023

On second thought, I see that the Bitcoin Optech content is generally licensed per MIT license, that’s fine with me, too.

FWIW, I'd prefer to see Optech move to a CC license, as I don't think MIT is really designed for natural-language content (even if it does mention documentation). My preference would be CC-BY-SA as I've always been a copyleft guy. But when it comes to the podcast, I again think that's a decision for y'all.

@bitschmidty
Copy link
Contributor Author

License: Id like to allow folks to reuse with attribution.

"putting the headphones icon at the end of an item": Agreed!

Transcripts: I had a vendor submit a free sample transcription to compare. See attached for quality. They estimate £84.00/hr of audio for a turnaround of 6+ days. This is the team that transcribes the What Bitcoin Did Podcast so they are somewhat familiar with the Bitcoin jargon.

@harding
Copy link
Collaborator

harding commented Feb 8, 2023

Transcripts: I had a vendor submit a free sample transcription to compare. See attached for quality. They estimate £84.00/hr of audio for a turnaround of 6+ days. This is the team that transcribes the What Bitcoin Did Podcast so they are somewhat familiar with the Bitcoin jargon.

On a quick skim, that looks really good to me. Doing the quick math of an rough upper bound of two hours per week, 51 newsletters per year, times 1.21 exchange rate comes to ~$10k/year. That doesn't sound like much to me compared to the benefit of getting top-quality transcripts of conversations with subject matter experts about topics important to the future development of Bitcoin and LN.

If they have a lower rate for slower turnaround, or if 6 days isn't fast enough for us, we could do the initial transcription for new episodes using software and then update with the human-translated version later.

@bitschmidty
Copy link
Contributor Author

bitschmidty commented Feb 27, 2023

Pushed updates:

  • headphone icon after newsletter text
  • filled in podcast listener app links and signed up at each provider
  • removed podcast 210
  • added podcast for newsletter 239
  • 239 transcription added (professional transcription)
  • podcast episodes will show on the homepage (with newsletter and blog post entries)

I think we are ready to go live with this if all looks good.

Edit: preview of news 239 podcast page: https://deploy-preview-992--bitcoinops.netlify.app/en/podcast/2023/02/23/

@bitschmidty bitschmidty marked this pull request as ready for review February 27, 2023 18:43
@bitschmidty bitschmidty changed the title WIP: Add podcast feature to website Add podcast feature to website Feb 27, 2023
Copy link
Collaborator

@harding harding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK with one non-critical suggestion.

I reviewed the diff, visited each of the edited pages, clicked a bunch of links, tested all of the podcast links I could, read a bunch of the transcript, and listened to a few segments using the JS player.

This is absolutely incredible. I'm just blown away. Thank you so much @bitschmidty for both the idea to do this and the follow-through. ❤️ ❤️ ❤️


{% else %}
{:.center}
Recent publications from our [blog posts][] and [newsletters][].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have the list, I think this "Recent publications..." line is redundant and should be removed.

@bitschmidty
Copy link
Contributor Author

Pushed changes for:

@harding
Copy link
Collaborator

harding commented Feb 28, 2023

ACK ab113e1 with enthusiasm!

Copy link
Collaborator

@murchandamus murchandamus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to second @harding's enthusiasm, this is amazing work, thank you a lot @bitschmidty. I clicked through the preview a bit, tried a bunch of links for jumping to parts of the transcript and to timestamps in the audio, and added the podcast to my podcast app—it all worked great. I just got a tiny nit that you can feel free to ignore if you don’t share it.

ACK ab113e1

@bitschmidty bitschmidty merged commit 63b1043 into bitcoinops:master Feb 28, 2023
@bitschmidty
Copy link
Contributor Author

Took @xekyo 's suggestion, squashed, and merged! 🚀

@kouloumos
Copy link
Contributor

At bitcointranscripts we developed a transcription pipeline that streamlines the AI-generation, human review, and publication of technical Bitcoin transcripts. I believe that incorporating Newsletter recaps into this system would be a perfect synergy. Adding the Newsletter recaps into our transcription pipeline would lead to faster transcript turnaround times while providing a different path to the community to delve into the topics and engage with the discussions.

I already pushed yesterday's recap into the pipeline. The AI-generated transcript is now available, awaiting claim by a reviewer at review.btctranscripts.com. Once claimed, it will be reviewed, edited, and submitted for evaluation. Upon approval, the finalized transcript will be accessible at the original link.

If that sounds interesting, I would love to explore what we need to do to integrate with your existing workflow.

@bitschmidty
Copy link
Contributor Author

@kouloumos What I do currently for context, including audio editing:

  • use ffmpeg to pull down the spaces audio file
  • audio editing including cutting beginning banter, any tech difficulties, removing silence, remove background noise, autoleveling
  • Create podcast web page (thank you @kouloumos !) and put in timestamps during audio editing
  • Send edited audio file to human transcriptionist
  • Publish audio file to anchor.fm for the podcast feeds
  • Publish the podcast page on Optech via PR
  • After ~3 business days, get the transcription, address any queries on it, markdown convert, add transcript to podcast page

Pros of btc transcripts proposed setup: quicker initial transcription, involvement of community, no $ cost
Pros of current setup: predictability, quality (? have not compared)

Open to others feedback, @murchandamus @harding

@murchandamus
Copy link
Collaborator

I must admit I do like the predictable turn-around and high quality of our current pipeline. I would imagine (and at a cursory glance seem to be correct) that the initial automatic transcript would not measure up to our human transcription and I would also guess that the improved transcript would take an unpredictable time on btctranscripts. That implies to me that we would first have a lower quality transcript and touch it up later again when we get the improved version. Overall, it seems like it would be higher touch to go with the new proposed route, and I’m not sure I see major upsides beyond a lower cost. @kouloumos, could you perhaps compare the advantages and disadvantages as you see them? While I don’t know how much the transcription costs, it seems to me that the current approach works well, but then @bitschmidty is doing all the work for that these days, so if @bitschmidty prefers what you propose, I’m happy to roll with it.

@harding
Copy link
Collaborator

harding commented Mar 9, 2024

Strongly agree with this:

@bitschmidty is doing all the work for that these days, so if @bitschmidty prefers what you propose, I’m happy to roll with it.

Having high quality transcripts is really important to me but the method we use to get there isn't something I care much about (so long as it's ethical).

I would like to continue hosting transcripts on our site as I want to incorporate them with our topics index and other site features in the future (when I finish the transition to Hugo), although I don't mind other sites also hosting them.

@kouloumos
Copy link
Contributor

kouloumos commented May 9, 2024

Thank you all for your feedback and insights. After reflecting on your comments, I let some time pass to observe the review process for the AI-generated newsletter transcript we added. Unfortunately, it took a month for the transcript to be claimed and reviewed by a human, which is less than ideal. Moreover, there's a bottleneck in the evaluation stage, primarily because I'm currently managing these evaluations myself.

Given these observations, I agree about the unpredictability in both the timing and quality of reviews. It's clear that now is not the right time to incorporate Newsletter recaps into our system. However, I'm optimistic about future improvements that could address these concerns:

  • Unpredictable Timing: We can introduce a feature to prioritize certain transcripts, moving them to the top of the list to encourage quicker claims. We could offer incentives, either through social recognition (acknowledging the reviewer's contribution) or monetary compensation (extra sats for specific transcripts).
  • Unpredictable Quality: While we strive for consistency, the quality still varies depending on the reviewer. We're planning to implement a tiered or reputation system for reviewers based on their track record. This could ensure that only the most reliable reviewers can claim important transcripts.

I'm aware that our current system doesn't yet match the predictability and quality of your existing setup. However, I believe the proposed improvements could align our turnaround times and maintain high standards, with the added benefits of: automatic timestamp inclusion, quicker initial transcription, involvement of community, minimal $ cost and streamlined processes therefore minimal involvement required by you. Here’s how I envision the revised process:

  1. Audio editing and publishing to podcast feeds (handled by you)
  2. Downloading the audio and generating the AI transcript (handled by us)
  3. Publishing the AI transcript, making it available for review (handled by us)
  4. Editing of the transcript by reviewers, followed by submission for evaluation (handled by us)
  5. Submission evaluation (handled by you or other members of the community)
  6. Publishing the final transcript (handled by us) - the transcript could be hosted on both websites

We aren’t there yet, but your feedback is invaluable. I plan to revisit this proposal once we've implemented the necessary changes and feel confident in the enhanced system. In the meantime, any further input you have would be greatly appreciated. Thank you again for considering this integration and for the thoughtful discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants