Audio transcription not working right, shows repeating text (Android, PC fine)

I have turned on the option to use audio which provides a transcript to send to the LLM.

When I use a PC, it works, transcript is fin.

When I use my android phone (Samsung), the transcription massively breaks. Reading “1, 2, 3, 4, 5” provided me a transcript of “1 1 2 1 2 3 1 2 3 1 2 3 1 2 3 4 1 2 3 4 5” the first time, and “1 1 1 2 1 2 3 1 2 3 1 2 3 1 2 3 4 1 2 3 4 5” the second time. Its like it repeats the early part around many times. not sure why. Tried a few phone browsers and happens on all of them. Works well on a PC.

Hi @david-thompson,

Welcome to the Pickaxe community!

Thanks for flagging this and sharing the details. Sometimes transcription issues can also be related to the phone’s microphone, which is why I’d love to know if you’ve had a chance to test this on another Android device apart from your Samsung. That will help us see if the issue is device-specific.

Could you also please share the Pickaxe link you’re using and send it over to us at info@pickaxeproject.com so we can take a closer look?

Appreciate your patience while we work through this with you.

Same thing happened for me users yesterday who were using iPhones. It works fine on all my devices.

Speech to text is tricky and dependent on the quality of the users microphone. Even with a pro podcasting mic, I still find myself editing text I dictated to the chatbot, on Pickaxe, ChatGPT, and Gemini.

I’ve tried a few phones and they all have the same issue. I’ve sent an email with a video and a link to my studio.

If it can’t be fixed, is there a way to submit audio files? I see Gemini accept audio as an input.

Thanks!

Our team spent several days troubleshooting this across 12 devices and 5 browsers. What we found is that it’s not the microphone as we were experiencing perfect transcription on android and iPhone using the native voice to text (microphone as keyboard opens). Best option is to bypass the pickaxe voice to text entirely and use the html footer code injection on the studio level.
Attached is the html to create our own microphone button (with pulsing animation). It’s worth noting that iPhone will not allow microphone on Chrome, so there is a pop up message telling users to use the native mic near the keyboard.

<style>
  @keyframes pulse {
      0% { opacity: 1; }
      50% { opacity: 0.5; }
      100% { opacity: 1; }
  }
</style>
<script>
(function() {
  // Patched to handle Chrome iOS limitations, continuous recognition, and submission.
  const MIC_CLASS = 'mic-button';

  let recognition = null;
  let isMicActive = false;
  let finalTranscript = '';

  const stopRecognition = () => {
      if (!isMicActive || !recognition) return;
      isMicActive = false;
      recognition.onend = null;
      recognition.stop();
      finalTranscript = '';
      const micBtn = document.querySelector('.' + MIC_CLASS);
      if (micBtn) {
          micBtn.style.setProperty('background-color', '#A9A9A9', 'important');
          micBtn.style.setProperty('animation', 'none', 'important');
          micBtn.style.setProperty('border', 'none', 'important');
      }
  };

  const handleEnterKey = (event) => {
      if (event.key === 'Enter' && !event.shiftKey) {
          setTimeout(() => {
              stopRecognition();
          }, 100);
      }
  };

  function injectButtons() {
      document.querySelectorAll('div.flex.items-center.gap-x-2').forEach(container => {
          let micBtn = container.querySelector('button.' + MIC_CLASS);
          const isThisContainerThinking = container.classList.contains('rounded') && container.classList.contains('px-2') && !container.querySelector('textarea');
          if (isThisContainerThinking) {
              if (micBtn) micBtn.style.display = 'none';
              stopRecognition();
              return;
          }

          if (micBtn) {
              micBtn.style.display = 'flex';
          } else {
              micBtn = document.createElement('button');
              micBtn.className = `${MIC_CLASS} outline-none w-8 h-8 flex items-center justify-center rounded-full duration-200 transition-colors ease-in-out`;
              micBtn.style.backgroundColor = '#A9A9A9';
              micBtn.style.marginLeft = '4px';
              micBtn.innerHTML = `<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24" class="h-6 w-6 shrink-0 duration-200 transition-colors ease-in-out" style="color: #333333;"><path fill="currentColor" d="M12 14a3 3 0 0 0 3-3V6a3 3 0 0 0-6 0v5a3 3 0 0 0 3 3Zm5-3a5 5 0 0 1-10 0H5a7 7 0 0 0 14 0h-2ZM11 21h2v-2h-2v2Z"/></svg>`;

              micBtn.addEventListener('click', () => {
                  // --- CHROME ON IOS CHECK ---
                  const isChromeIOS = /CriOS/i.test(navigator.userAgent);
                  if (isChromeIOS) {
                      alert("Speech-to-text is not supported in Chrome on iOS. For the best experience, please use Safari or the native keyboard microphone.");
                      return;
                  }

                  const ta = document.querySelector('textarea.resize-none');
                  if (!ta || !('webkitSpeechRecognition' in window)) {
                      return alert('Your browser does not support SpeechRecognition');
                  }

                  if (!isMicActive) {
                      recognition = new webkitSpeechRecognition();
                      recognition.lang = 'en-US';
                      recognition.interimResults = true;
                      recognition.maxAlternatives = 1;
                      finalTranscript = '';

                      recognition.onresult = e => {
                          let interim = '';
                          for (let i = e.resultIndex; i < e.results.length; i++) {
                              const r = e.results[i];
                              if (r.isFinal) {
                                  finalTranscript += r[0].transcript + ' ';
                              } else {
                                  interim += r[0].transcript;
                              }
                          }
                          const text = (finalTranscript + interim).trim();
                          const setter = Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value').set;
                          setter.call(ta, text);
                          ta.dispatchEvent(new Event('input', { bubbles: true }));
                      };

                      recognition.onerror = err => console.error('SpeechRecognition Error:', err.error);
                      recognition.onend = () => { if (isMicActive) recognition.start(); };

                      ta.addEventListener('keydown', handleEnterKey);
                      recognition.start();
                      micBtn.style.setProperty('background-color', '#FF0000', 'important');
                      micBtn.style.setProperty('animation', 'pulse 1s infinite', 'important');
                      micBtn.style.setProperty('border', '2px solid #FF4444', 'important');
                      isMicActive = true;
                  } else {
                      stopRecognition();
                      const ta = document.querySelector('textarea.resize-none');
                      if(ta) {
                         ta.removeEventListener('keydown', handleEnterKey);
                      }
                  }
              });
              container.appendChild(micBtn);
          }
      });
  }

  const chatRoot = document.querySelector('div.fixed.flex.w-full.flex-col');
  const target = chatRoot || document.body;
  new MutationObserver(injectButtons).observe(target, { childList: true, subtree: true });

})();
</script>

Update, most of our issues were Safari on iOS (Webkit API).
Anything Android was great (Google Cloud speech to text API).

I’ve uploaded this code to my studio. Initially, it worked. But now it doesn’t- there is no new mic icon anymore. The issue was found on several Android phones. Not sure about iOS. Copy/pasted the code once again - didn’tsolve an issue.
Maybe there is some stupid simple solution? I’m not a developer :sweat_smile: