Pan narrans - the story telling ape

Review Bounties

Richard J. Acton — Sat, 24 Jun 2023 00:00:00 GMT

There is legitimate added value in some of the conventional functions of the academic publishing industry and the people adding that value should be remunerated fairly for their labour.

The ‘journal’ as currently constituted provides:

The administrative functions of coordinating the review
- Finding, vetting, & corralling, suitable reviewers
- coordinating the correspondence with authors
Proofreading & typesetting
Hosting the websites from which the papers are served
There is a ‘curatorial’ function of dubious utility because of its biases and the disproportionate impact these unaccountable bodies have on which papers gain prominance.
- Deciding which papers to ‘bench reject’ and which to send out for review is a dated and inadaquate approach to the curation problem anyway in my view.

The bulk of the most vaulable labour performed in this process is performed by the reviewers who are unpaid. The rest of the labour and costs here simply detract from the bottom line of the publishers who because of the vertically integrated nature of the publsihing pipeline have little incentive to provide a high quality service in areas such proofreading. The bulk of the value acruces to the owners of the publshing companies who can extract rents on the copyrighted articles, and restrict access to artifically limited high presige publication space which they can charge to access via APCs. They pay their professional editors relatvely poorly, whilst expecting them to judge the quality of an unreasonablly large number of articles per unit time and according to dubious criteria. This leaves the door open to considerable corruption and gaming of this system.

As an alternative to this way of working I propose ‘review bounties’

In it’s simplest form it would look like this:

Instead of paying an article processing charge authors offer a review bounty. the ‘Editor’ of a journal agrees to mediate the review process for some fraction of the review bounty. A portion of the remaining bounty is divided among the reviewers of this paper, if the editor deems the review of succicient quality, according to clearly indicated expectations of what constitutes a good qualtiy review. The remainder of the bounty is offered as a ‘bug bounty’ such that anyone identifying an error which materially alters a conclusion of the work can claim it.

Bug bounty claims would be adjudicated by a pre-specified procedure or rules for deciding if the claim is valid, for example:

If the Authors agree
or if the editor and a majority of the reviewers agree, (editor breaks ties)

Under this model the journal performs its conventional functions of aranging review and hosting / distributing the published paper but the reviewers get paid for their labour and an incentive is created to find errors in the published literature. The papers are published under an open license such as a CC-BY or (preferably in my view) CC-BY-SA so that they can become a part of the knowledge commons. This incentive structure encourages authors and reviewers to avoid errors in the first place as they are staking some cash on the assertion that they have not made any errors whilst also incentivising 3rd parties to try and spot errors in the published literature

Extensions to the review bounty model

In it’s simplest form ‘journal’ combines a number of potential functions

Additional parties
- Proofreaders & type-setters Instead of having a journal do this in house they could directly be in on the bounty cut
- Hosting, host of sites for servering publications, and services for administering their publication could also be directly included, especially in a context where journals are no longer as relevant and publications might be shared independently of them.
Extened bounties from interested parties e.g. a pharma company is contempating starting a new program based on this work they can put a large bug bounty on it to try and attract additional 3rd party srcutiny and save them money on the long run from on investing in a dead end if an error is discovered early.
Grant award bodies could require minimum bug-bounty amounts / proportions to ensure that work theat they fund gets adaquate srcutiny

Integration into a larger picture of publication workflow reform

This proposal forms a part of a larger set of reforms to the conventional publishing model that I will be writing about here under the heading literate science

:::

Reuse

CC BY-SA 4.0](HTTPS://CREATIVECOMMONS.ORG/LICENSES/BY-SA/4.0/)

Podcast appearance on fixing academic publishing

Richard J. Acton — Wed, 17 May 2023 00:00:00 GMT

I made another appearance on the bayesian conspiracy podcast to talk about the sorry state of academic publishing and some things we might be able to do to fix it including my concept of Review Bounties.

:::

Showing our working

Richard J. Acton — Mon, 03 Apr 2023 00:00:00 GMT

Why show our working?

We all remember being told to show our working in school, usually in maths 🧮. Why were we asked to do this? So that someone else can follow our reasoning, step-by-step, and see (with a little effort) for themselves if they found our reasons sound, or at least partially so. This is one of the fundamental motivating factors for open science, sharing our data and methods so that others can assess our conclusions for themselves with all of the same information. This is core to the corrective mechanism that drives the scientific progress, you can’t progress if you can’t spot mistakes, gaps, and misunderstandings; importantly, you can’t spot these if you can’t see our working. Ideally anyone can ask: “how do you know what you think you know?” and we can provide a detailed and compelling answer that anyone can challenge and, with some effort, check for themselves. Trust in our conclusions is, rightly, derived from the transparency and accountability of our processes. This applies both within the scientific community and to our relationship with the public.

The number and complexity of the steps that take us from our starting point to our conclusions in modern science has grown as the depth of our understanding of the world has increased. This has made it harder, as a practical matter, to show all of our working from start to finish. Doing so however is no less important now than it was ever been, if anything the complexity of modern science makes it more important than ever. The length of the story that now has to be told to get from basic assumptions to conclusions is quite long. In many cases there is also a great deal of context needed to understand some of the questions we now tackle. This can present a significant communication challenge when interacting with the public and even specialists in other disciplines. The effort necessary to asses the strength of others’ conclusions has risen, this makes the importance of explanatory clarity greater than ever. One of the factors which makes completeness of description challenging is that It is rarely one person, or even one research group, that is responsible for the full chain of steps that produce a modern research paper, especially ‘prestigious’ ones with lots experiments often making use of varied methods. Thus, there is no single person with full insight into the granular details of every experiment and method used in many modern papers. Consequently, ensuring that every detail needed for reproducible work can be a significant coordination challenge among co-authors.

So how are we, the scientific community doing at this task of showing our work? Unfortunately not as well as you might hope.

Are we any good at showing our working? How hard can it be?

To start with, across disciplines our work is getting harder to read. It has become laden with ‘science-ese’ or general scientific jargon, or so conclude Plaven-Sigray and co-authors in a 2017 paper in eLIFE “The readability of scientific texts is decreasing over time” (Plavén-Sigray et al. 2017). This does not help the general accessibility of our work to colleagues, students, science journalists or the public. Nor does it appear to driven by specific technical jargon which is a useful communication shorthand, it’s apparently mostly the addition of seemingly superfluous polysyllabic obfuscationalisms presumably so that we can show-off our erudition 😉.

Richard F. Harris’s 2017 book “Rigor mortis: how sloppy science creates worthless cures, crushes hope, and wastes billions” (Harris 2017) & Stuart Ritchie’s 2020 book “Science fictions: exposing fraud, bias, negligence and hype in science” (Ritchie 2020) called popular attention to issues of reproducibility especially in the life sciences. In 2021 a series of papers was published by the ‘Reproducibility Project: Cancer Biology’ summarising the results of efforts spanning 8 years to reproduce the findings of 193 experiments from 53 prominent papers in field of cancer biology. Their results were not particularly reassuring. 0 of the 193 experiments were described in sufficient detail for the project team to design protocols to repeat them 😅, yes none of them could be repeated without additional information from the original researchers. Of the 193 experiments the researchers were eventually able to repeat 50, where they were able to get some additional information from the original researchers, 32% of whom did not respond helpfully or at all to their inquiries. Between 40% and 80% of these 50 experiments, were successfully replicated, depending on how the criteria for successful replication were applied. Reproducing biological work is genuinely a difficult task, lab work requires significant skill that is often hard to fully codify in protocols. Biology has a lot of inherent variability even when you are going to great lengths to get the same starting conditions there are sometimes factors that it is difficult to control. Analysing your data though, reproducing that should be easy right 😎? It’s all on the computer you can just run the same thing again right 🥺? Alas it is not as simple as you might think 🤦. It can take several months to reproduce a bioinformatic analysis in a published paper if it is possible at all, see: “Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome” (Garijo et al. 2013). (Significant strides have been made in reproducible analysis since that paper was written but I’ve not found a more recent attempt to quantify the problem.)

Why is this the case? What makes reproducibility hard and what can we do to make it better?

How can we do better?

As the data outputs manager for the Human Developmental Biology Initiative (HDBI) It’s my role to make it easier for the other scientists in HDBI to make their data available to others, and easier for them to make both their experiments and analyses reproducible.

What’s needed for data & methods to be open and reproducible?

If you begin looking into the area of reproducibility you will, before too long, encounter the somewhat nebulous term ‘metadata’. Meta is from the Greek meaning something like “higher” or “beyond”, metadata, therefore is data about data. So what sorts of things are metadata? Who generated it? When? With what machine? With what settings on that machine? Why? What properties did the samples measured have? Species, type(s) of cell, stage of development? What where the sample preparation steps? The answers to all these questions and more can be considered metadata. This abundance of possible properties that could be recorded leads to the following questions:

How do we decide which pieces of metadata we need for a given experiment?
How do we store and organise them nicely so people and computers can read them?

The answers to these questions are complex and context dependent but in general scientific communities have to get together and decide for themselves what they need to know and how best to represent that information by creating community standards that they can agree to adhere to.

This has varying degrees of success…. 🤷

Fortunately, There are some general principles that we can adhere to when designing domain specific metadata standards and things it is sensible for everyone to consider when sharing their data. These are the principles of ‘linked data’ where we can used the ‘resource description framework’ to devise a suitable representation of our metadata. This is not always the most practical format of metadata for day-to-day use but if you can automatically translate your working format to a linked data format then others can use a shared set of tools to combine data from your domain with data from theirs. In this way components common to different domains can be re-used e.g. an agreed set of standard terms to refer to certain classes of things like cell-types or sequencing technologies so that we don’t use different words for the same thing. For more on metadata standards checkout the section in my book.

One of the ways of thinking about making our data available for others to use, so that they can not only use it in their own work, but also check existing conclusions is based around the acronym FAIR:

FAIR (Findable, Accessible, Interoperable, Re-usable)

Findable
- Has a unique identifier that can be looked up in a database, plus some associated terms so you can find the id with a search.
Accessible
- If I’ve got the ID I can download a copy, or figure out who to ask for permission to download a copy if there are e.g. privacy restrictions.
Interoperable
- It’s in a file format I can read (without expensive proprietary software).
- It’s described in standard terminology so I can easily search for it and connect it with data about the same/related things.
Re-usable
- Where is came from/how it was generated and other attributes are well documented according to community standards.
- It’s licensed so it can be used.

Reproducing lab work 👨‍🔬 and computational work 👨‍💻 analogises quite well to cooking 👨‍🍳, a recipe is comprised of a list of ingredients (Data, Materials & Reagents), a set of steps to follow (Code, Protocols), and descriptions of the environment in which the food is cooked. e.g. The type and temperature of the oven (Compute environment, Lab environment) and additional information that helps me find, contextualise and appropriately use the recipe (metadata): Some recipes are overly vague and some highly specific, this might depend on the difficultly and stakes of getting a tasty or at least edible result. Science can be like high stakes cooking (Think, a meal for diplomats from two countries with a tense relationship who also happen to be restaurant critics with a bunch of mutually incompatible food allergies and religious dietary restrictions) so the recipes have to be good, really good 😅.

Here’s a table fleshing out the analogy with some small examples of the sorts of information that can fall into these categories as they apply to these different disciplines:

	Cooking 🧑‍🍳	Lab Work 🧑‍🔬	Computational Analysis 🧑‍💻
Inputs	Ingredients ❌ 7oz Flour (vague) ✅ 200g Plain Wheat Flour (all-purpose/550/55/0)	Materials & Reagents ❌ HeLa cells ✅ UKBi001-A	Data ❌ Human Genome ✅ Homo sapiens (NCBI:txid9606) genome (Ensembl 109, GRCh38.p13)
Process	Cooking Instructions ❌ Bake at medium temperature until bronzed (vague) ✅ Bake at 190C for 35mins if using a fan oven	Protocols ❌ Gel electrophoresis (vague) ✅ Agarose gel electrophoresis (0.5%) TBE buffer, 120V, ethidium bromide, 10kb ladder from…	Code ❌ `random-script-from-email.R` ✅ `git` repository
Environment	Kitchen Conditions Ambient temperature, pressure, and humidity (What’s the boiling point of water in your kitchen?)	Lab Environment Ambient temperature, pressure, and humidity (What’s the boiling point of water in your ~~kitchen~~ Lab?) How you wash you glassware (no really this has affected the reproducbility of experiments)	Compute Environment OS 🐧/🪟/🍎 🩹 R v4.2.0 ✅ Environment management tools & Containers `renv.lock`
Context	Discernment History, Culture, & Origins of a Dish 🌍 Allergens 😵 Appropriate pairings 🍷	Metadata ❌ Who: Steve, When: Tuesday (vague) ✅ Who: Steven Stickler (ORCID: 0000-1234-1234-1234), When: 2023-03-30 15:34 (UTC+0)	Metadata (same as Lab +) Who: PGP fingerprint: 96C2 0929 FA88 DD89 9270 When: commit hash: 954086fdf13d8…

Cooking 🧑‍🍳

Lab Work 🧑‍🔬

Computational Analysis 🧑‍💻

Inputs

Ingredients

❌ 7oz Flour (vague)
✅ 200g Plain Wheat Flour (all-purpose/550/55/0)

Materials & Reagents

❌ HeLa cells
✅ UKBi001-A

Data

❌ Human Genome
✅ Homo sapiens (NCBI:txid9606) genome (Ensembl 109, GRCh38.p13)

Process

Cooking Instructions

❌ Bake at medium temperature until bronzed (vague)
✅ Bake at 190C for 35mins if using a fan oven

Protocols

❌ Gel electrophoresis (vague)
✅ Agarose gel electrophoresis (0.5%) TBE buffer, 120V, ethidium bromide, 10kb ladder from…

Code

❌ random-script-from-email.R
✅ git repository

Environment

Kitchen Conditions

Ambient temperature, pressure, and humidity (What’s the boiling point of water in your kitchen?)

Lab Environment

Ambient temperature, pressure, and humidity (What’s the boiling point of water in your ~~kitchen~~ Lab?)
How you wash you glassware (no really this has affected the reproducbility of experiments)

Compute Environment

OS 🐧/🪟/🍎
🩹 R v4.2.0
✅ Environment management tools & Containers renv.lock

Context

Discernment

History, Culture, & Origins of a Dish 🌍
Allergens 😵
Appropriate pairings 🍷

Metadata

❌ Who: Steve, When: Tuesday (vague)
✅ Who: Steven Stickler (ORCID: 0000-1234-1234-1234), When: 2023-03-30 15:34 (UTC+0)

Metadata (same as Lab +)

Who: PGP fingerprint: 96C2 0929 FA88 DD89 9270
When: commit hash: 954086fdf13d8…

When a protocol can’t quite capture your bench work well enough that someone else could do your experiment if they read it, then you can take the approach of JoVE (The Journal of Visualised Experiments). If that’s a bit much the less formal approach is available, anyone with a smartphone or action camera can film their experimental work upload it to figshare and get a DOI to reference in a protocol on protocols.io or in a paper. (Enlist the help of a student an alarming fraction of them are surprisingly capable videographers thanks to social media.)

What does a ‘computational environment’ mean? When doing an analysis you don’t re-implement all steps from scratch you use existing tools to perform many calculations, these in turn use other tools creating a ‘tree’ of ‘dependencies’. The way these tools work can change if the software gets updates so to re-run your analysis exactly I need to know not just the steps that you took but the versions of the tools you were using and the versions of the tools your tools where using, and so on. It’s tools all the way down. Fortunately there are tools for taking inventory of all the versions of all the tools that you’ve used, sharing this list and even re-creating the same computational environment from these inventories. Checkout the section in my book to learn more about this.

How can we encourage the adoption of these practices?

So why are we not working more reproducibly already? It’s quite hard to do in certain cases often because tooling and automations have not caught up to make it easier. It’s also not yet a norm to which we expect one-another to conform in the scientific community, either when we review others’ work or when we have our own reviewed. In his article Five selfish reasons to work reproducibly (Markowetz 2015) Florian Markowetz lays out some excellent reasons to get ahead of the curve on working reproducibly.

The laboriousness of recording and providing this level of detail can be a major impediment to researchers actually sharing their processes if they don’t feel that doing so will be time well spent. So we should ask what can be automated, what tools, practices and standard procedures can scientists adopt to make the provision of sufficiently detailed structured information a part of their workflow that does not get in their way, and if anything makes their lives easier?

Crafting a ‘pit of sucess’

Make FAIR data and reproducibility the default and the expectation in every area, such that if you ‘go with the flow’ your work will be FAIR and reproducible. Not everyone will have the time, inclination or incentive to strive relentlessly towards a ‘pinnacle of excellence’ so raise to floor not the ceiling and construct a ‘pit of success’ into which we can all trip without trying too hard. We can do our best to make it easy & useful to do so but this may not be quite enough to get up over the hump in all cases. So in some cases we may have to take a slightly more “Nice paper/grant/dataset you’ve got there ’be a shame if you couldn’t publish/fund/generate it, unless…” approach.

Here is my advice for people in various roles for using a mixture of carrots 🥕 and sticks📏 to improve FAIR data practices and working reproducibility.

Core facility Staff
- If you run/work in Microscopy, flow cytometry, bioinformatics, proteomics, sequencing, etc. facilities develop policies which make good metadata annotations a condition of researchers using your facilities and getting access to the data once it has been generated. Make it as frictionless as possible so as not to put them off, and showcase how useful it is to be able to call on well structured and anotated datasets.
Human Resources
- Make proper data stewardship part of the on-boarding and leaving process. You shouldn’t be able to clear HR when you are leaving if your data is not in a suitable state to hand over to others.
Software Developers & IT Staff
- Build tools and systems which make the ingestion, annotation, sharing, accessing, and processing of data as intuitive, seamless and integrated a process as possible with open-source tools.
- (Don’t develop a proprietary platform or product to try and solve these issue or people like me will tell others not to use it as the incentives don’t line up with the degree of interoperability and portability needed in science. Go with an open-source business model that is compatible with the openness required by scientific process.)
- Package your software so that it can easily be included in portable reproducible environments like Docker simply and declaratively.
Peer Reviewers
- When you review things ask questions about reproducibility, and FAIR data. This is where the expectation of higher standards in this area can begin to be set. (Also if you get asked to review stuff a lot and would like to reduce incoming requests suddenly becoming a reproducibility nut might lighten your load 😉). If you are asking these questions in your reviews of papers it may clue editors in that academics now expect this and shape their decisions on what to put out for review in the first place.
Journal Editors
- If you are a journal editor and you are deciding between many good submissions make this a criterion for what you choose.
Grant Reviewers
- If you review grants ask about peoples plans for making their data FAIR and their analyses reproducible. This will get them thinking about it well in advance and hopefully planning for it, especially if they think it might be the difference between getting funded or not.
Press
- If you are a member of the press or a populariser of science, report favourably on publications that show their work and skeptically of those that don’t.
- Ask questions about reproducibility and openness when interviewing scientist and push university PR departments about these qualities in the papers they choose to make press releases about.
Public (also applies if you fall into any of the other categories)
- Ask your elected representative or the relevant minister (At time of writing Michelle Donelan Secretary of State for Science, Innovation and Technology) why the research councils aren’t holding their grant awardees to a higher standard on reproducibility and FAIR data so that the best use can be made of public research funds?

Where Can I Start / Learn More?

I’ve written a short ebook as a resource for HDBI members “Data: Inception to Publication & Beyond” this is directed at a more technical audience but aims to be written in an accessible style. It features many links to external resources to learn more about a given topic in various media. It’s a living document and I’m working on some new material for it, I’m always looking for feedback, comments and suggestions for improvement from any readers.

To get a sense of what it covers here’s a table of contents:

What Constitutes Data?
When Should I Generate Data?
How to Store Your Data
Working With Data
When to Publish Data
Where to Publish Data
What Data to Publish
How to License Your Data
How to Manage References

:::

References

Garijo, Daniel, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, and Yolanda Gil. 2013. “Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome.” Edited by Christos A. Ouzounis. PLoS ONE 8 (11): e80278. https://doi.org/10.1371/journal.pone.0080278.

Harris, Richard F. 2017. Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions. New York: Basic Books.

Markowetz, Florian. 2015. “Five Selfish Reasons to Work Reproducibly.” Genome Biology 16 (1). https://doi.org/10.1186/s13059-015-0850-7.

Plavén-Sigray, Pontus, Granville James Matheson, Björn Christian Schiffler, and William Hedley Thompson. 2017. “The Readability of Scientific Texts Is Decreasing over Time.” eLife 6 (September). https://doi.org/10.7554/elife.27725.

Ritchie, Stuart. 2020. Science Fictions: Exposing Fraud, Bias, Negligence and Hype in Science. London: The Bodley Head.

Reuse

CC-BY 4.0

My new ebook ‘Data: Inception to Publication & Beyond’

Richard J. Acton — Mon, 30 Jan 2023 00:00:00 GMT

I recently published a short ebook that I’ve been working on as a part of my role at the Human Developmental Biology Initiative. I’m calling it “Data: Inception to Publication & Beyond”.

The focus is on creating a workflow for streamlined generation, analysis, & publication of FAIR research data with reproducible analyses both in theory and practice.

HDBI data resource

To anyone who reads it I would much appreciate any feedback or contributions, details of how best to provided are in the intro to the book.

:::

FLOSS Exobrains Braindump

Richard J. Acton — Sun, 12 Jun 2022 00:00:00 GMT

brain-1 icon by Servier https://smart.servier.com/ is licensed under CC-BY 3.0 Unported https://creativecommons.org/licenses/by/3.0/

LAST UPDATED: 2023-09-22 (see this files history in the repo)

I set myself the constraint of using of Free/Libre and open source (FLOSS) software tools for the task of creating an exobrain.

Here are some of the tool’s I’ve considered and in some case am using to do this. First Why did I impose this constraint? I regard it as as an essential feature of a exobrain that it can be private, secure and will always accessible to me in an open an interoperable format into the future. This way you can own the data in your exobrain and still have access to it if the company that makes the product/service that you are using goes out of business. I have philosophical and ethical problems with proprietary software (see: GNU philosophy) & consider a right to privacy to be of fundamental political importance (see: Privacy is Power. I’ve added a password manager as an essential component of a modern exobrain and some notes on private communications.

meta (apps / platforms / tools)

These are tools which facilitate building an exobrain on your own private cloud so you can access it anywhere and retain data ownership.

Nextcloud is the first (meta)application to mention. is is a self-hosted google drive / drop box / one drive etc. replacement. It has many features and extensions that could potential fulfil all your exobrain app categories for notes, calandar & tasks. The key feature for me is calendar and contacts (this can include calendar based tasks for those who use their email’s todo list)
- Nextcloud features (too many to mention so some highlights, there are also extensions with apps
  - Nextcloud Deck for Kanban - style task boards
  - Calendar/contacts sync and management via webDAV with web interfaces
  - Integration with A choice of 2 collaborative document editors Collabra (libreoffice) and onlyoffice
  - dedicated Mobile apps for tasks, recipes and other things.
- How to Host nextcloud
  - To self-host nextcloud I recommend setting up a TrueNAS on which to host a nexcloud instance. It has a simple interface to do this and is built of the highly robust ZFS file system which provides the advanced user with what I regard as the best available guarantees or data integrity and security available in any storage solution, especially for secure encrypted off-site backups.
    - Advanced note, encrypted ZFS raw snapshots should be pulled from your NAS by a remote backup system which never sees the encryption key and by a user with only the permissions necessary to do this sync over ssh. Your NAS should not be able to access the remote system over ssh only the other way around and limited to only running the sync command by configuring restrictions in authorized_keys (consult Jim Salter ARS technica blog for advanced ZFS tips)
  - Advanced mode - roll your own Linux host, there are many ways to do this.
  - Many virtual private server providers offer ‘1-click’ deployments of Nextcloud e.g. Linode however you are still entrusting your data to a 3rd party if you take this approach
- Nextcloud alternatives:
  - Cryptpad - very good lacks the extensibility of Nextcloud, has a paid hosting option Security and encryption model are baked into the design unlike Nextcloud which it is easier to insecurely misconfigure
  - File sync only **syncthing*
NOT FULLY OPEN tailscale VPN This tool makes it very simple to keep your home servers on closed VPN whilst easily being able to access them from anywhere. The service for which there is a free (financially) tier handles the key exchange but once established all traffic is peer to peer over a wireguard based mesh network. There is a self-hostable but less featurefull key exchange server for the uber-paranoid with some extra computer networking skills. (Note that this is distinct from using a VPN to obscure the point of origin of your web traffic for this I currently recommend Mullvad VPN for a number of reasons including that you can pay anonymously with monero)
Notes Apps
- logseq - My Current Favourite. Somewhat similar to obsidian
- emacs org mode & org roam (emacs is ostensibly a text editor but it’s more of a lifestyle)
- Carnet a google keep-like Nextcloud note app
- Joplin - markdown based notes app (web/desktop/mobile), cloud hosted option or many other could sync options including nextcloud, quick grab browser plugins
- tiddlywiki A non-linear personal notebook in the form of a WIKI, local or hostable
- focalboard notion-like, same dev as mattermost
- appflowy notion-like
- standard notes notion-like, open? but with paid option that has annoying up-sells?
- notesnook went open not originally so, open server is in the roadmap - we’ll see
- AFFiNE monday / notion-like, some interesting UI innovations, (MIT client, source available server)
- anytype - notion-like source available
- trilium extensible scriptable ans customisable notes apps with md, freehand drawing, latex, mermaid etc.
- zettlr MD editor with Zotero integration pandoc document export.
- QownNotes Simple MD notes with tagging in the QT framwork
- Stylus / handwritten
  - Rnote good alternative to Xournal++ with cleaner UI great for stylus / handwritten notes
  - styluslabs
- mind map
  - minder for fans of mind mapping
- Dendron for VScode/(VScodium) aficionados
- Zotero Ideal for academics who need a reference manager, supports notes and PDF annotations tags etc. has many integrations including with logseq. This make a good supplement to a notes system for managing source materials that you reference in your notes. It also has a built in feed reader which facilitates the import of new articles ets. into your library. For biosciences people this pairs nicely with the feature in PubMed that lets you turn any search into an RSS feed (just click the create rss button under the search box). There is a web version with a nice sharing feature.
Task management
- super-productivity - sleek task and time management with web, desktop & mobile apps. Syncs via popular cloud storage options and webDAV
- Nextcloud deck - kanban like task management within Nexcloud
- Nextcloud Tasks
- planner - has todoist integration and sync, alternatively CalDAV based sync (desktop only)
- todo.txt format editors e.g. go-for-it & sleek for those who like text files but also pretty guis. sync is just file sync.
- WeekToDo a task manager/planner which may integrate well with the weekly review strategy (desktop & webapp only)
Calendar
- KOrganiser - a desktop calendar client which can sync with remote calendars, excellent views and highly customisable (alternatively kalendar for a slightly simpler interface)
- Nextcloud Calendar
- lightning calendar for thunderbird mail interfaces with a calDAV calendar
- (NON-FREE proton and tutanota calandars for privacy)
Password / private key management
- bitwarden
  - cloud based password manager with a self-hostable option, very featureful
- keepass
  - see keepassxc for a better frontend
- NOT FULLY OPEN but get some Yubikeys to use as your Second factor in multi factor authentication, this is better than TOTP codes. (avoid SMS 2FA if at all possible, number spoofing is an issue)
(Specially for data science types)
- If you need a collaborative environment for analysis/development with reproducible containerised computation in Python/R (with jupyter/Rstudio) and dataset sharing, gitlab integrations and git-lfs then Renku is for you

Private comms

email
- self-hosting email is nightmarish if you want it to be actually reliable for communications it is unlikely to be a good idea for you unless you have a very specific threat model.
  - This said what are the best Non-free options?
    - ProtonMail - mostly interoperable with standard PGP email encryption
    - tutanota - different approach to mail encryption, better metadata protection
  - Use these in-conjunction with an email aliasing service such as simplelogin (now owned by proton) This also has integrations with bitwarden so that you can generate a random email alias as well as a random password when signing up for services. (This reduces your risk when account info is lost in data breaches as your email no longer identifies you for things like credential stuffing attacks)
Secure Messaging apps
- Signal - perhaps the most popular alternative, does have a single central server infrastructure
- Element (and other matrix protocol apps) - operates a federated server model, encryption is not required by the protocol
  - Element - somewhat discord like in it’s community and group features
  - Coming soon to the matrix protocol is RocketChat and excellent mature slack alternative
- Zulip - highly performant team chat app replacement for slack, teams etc.
- simpleX - messenger with no fixed user ID and a decentralsied newtork
- Session - a Signal fork with no need to phone number, blockchain based global unique identifiers
- Briar XMPP protocol based with local mesh network options over wifi/bluetooth when cellular networks are unavailable - good for disasters when infrastructure might be down
Side note on Phone security
- Phone security is particularly challenging right now, there are no truely good options which do not entail significant functionality compromise. Hardware wise nothing ideal exists, what would be desirable is a phone with hardware dip switches which physically disconnect the wi-fi/bluetooth, cellular modem, camera & microphone as well as having a removable battery, an eSIM and a re-lockable boot-loader. Many phones running a mainline linux kernel still lack or have buggy basic functionality though android app emulation options are improving with the waydroid project.
  - GrapheneOS probably the best secure degoogled android option, limited to google pixel hardward for the secure bootloader functionality, good profiles system that will allow you to install google play services if you need it but with limited privledges instead of their default system level access
  - calyxOS is among my prefered de-googled android options, but limited to specific hardware mainly google pixel phones
  - Lineage OS is less hardcore and will permit you to have a minimum viable google services option and work if you don’t have a locked boot-loader, available on many more handsets
- silent-link will permit you to acquire an anonymous phone number which can be used with your calyxOS phone with an eSIM This number is data only or SMS and inbound voice only, NO OUTBOUND CALLS It should be used in conjunction with a VOIP service for all actual calls. It can be paid for anonymously with monero.
  - note use a roaming phone not one from your own country this dramatically reduced the data available about you to the telecoms providers.
  - Your phone’s IMEI is still a unique identifier so you can be de-anonymised by cross referencing it with your location data from other sources. It is often illegal to change this number but swapping between a few different eSIMs can get you part way of the way there as this changes your IMEI with each eSIM.
- Desktop PCs are if anything worse in some regards than phones but less of a liability with respect to location data and at least with better alternative software options which match or exceed the capabilities of the main commercial options. Use a well regarded Linux Distro encrypt your data and update regularly, macOS and windows are privacy nightmares.

:::

Ageing & Immortality Special Episode

Richard J. Acton — Sun, 15 May 2022 00:00:00 GMT

progression of siluettes from and infant to an old man

Episode 53 of Xenothesis the podcast I cohost with my friend Michael Glinka is a special episode on Ageing & Immortality

:::

Performant R - how to do things faster in R

Richard J. Acton — Thu, 12 May 2022 00:00:00 GMT

Logo of the R progamming language

I recently ran a workshop on writing performant R code you can find the slides at Performant R.

I took a very wide overview of the subject covering profiling and benchmarking, vectorising, parallelism, Rcpp, and working with larger than memory datasets as well as caching of computationally results with compressed data object outputs and pipeline managment tools. There are links to further reading/watching in the presenter notes and the gitlab repo.

[UPDATE 2022-11-12] I ran an extended version of this workshop at the Babraham institute and migrated the code to Renku so that you can spin up an R environment to follow along with the exercises in the cloud of in a local container see the Renku project page.

:::

Why multi-panel figures are terrible & we should stop using them.

Richard J. Acton — Sat, 02 Apr 2022 00:00:00 GMT

Literally from the fisrt cell paper I found on their site: https://doi.org/10.1016/j.cub.2023.02.013 (no offense if this was yours it’s a systemic thing I’m not picking on this figure in particular)

We need to talk about Multi-panel figures. To start with let us consider what information do you usually need to interpret a panel in a ‘figure’?

The figure itself
The figure legend
The place in the text which references the figure
more often than not at least one disambiguation of a novel acronym

Each of these 4 pieces of information can be located on different pages in a modern manuscript. Yes the figure and its legend are sometimes on different pages (I’m looking at you Cell WTF?). To drive this point home you can be looking at a situation like this: Figure 6(i) was discussed on page 4, printed on page 5 and the legend for which is, for some ungodly reason, printed on page 6! Oh and also the title for the figure is not on the figure but in the legend and helpfully features (YATA) yet another terrible acronym that was defined once in the abstract on page 1.

I don’t know about you but a mere mortal such as myself needs some working memory spare to think about the underlying concepts that a manuscript is trying to communicate to me, and I do not appreciate having to cache as many a 4 look-up tables in my head just to be able to effectively read a figure, let alone reason further about it!

I contend that this problem is worse when looking at electronic representations of formats meant to be printed, at least in part because it is easier to form a mental map of the relationships between these different parts of the manuscript when it is printed. When you have a physical copy the parts have relationships to one another in 3D space that take less effort to construct a mental model of. Also, it’s just easier to arrange the pages so can actually look at the different parts and the same time. e-reader like formats also suffer from this issue as they don’t have fixed pagination so the relationships between objects are still harder to model as they are not even in fixed relationship to one another in a virtual 2D space.

I hypothesize that breaking up figure panels and placing figures in line in the text as they occur would increase reading speed and comprehension, especially in electronic media. I term this information-non locality minimization. This also entails placing information such as titles, abbreviations, and other small details like summary statistics in the figure itself. This instead of unnecessarily relegating this information to figure legends.

Specific hypotheses I’d like to see tested:

Time taken to read a paper with multi-panel figures will be longer than the time taken when information-non locality is minimised.
Reading comprehension tests will be worse for papers with multi-panel figures than when information-non locality is minimised.
Hypotheses 1 & 2 will be true of both electronic and physical media but the effect size will be larger for electronic media.
Bound copies of papers with multi-panel figures will provide less of an advantage over information-non locality minimised copies than loose-leaf / unbound prints.
Effect size of the advantage information-non locality minimized versions have over multi-panel figure versions will be larger in people with dyslexia. (I’m dyslexic and it’s been thought of as an issue related to working memory, it may be why this issue jumps out at me.)

If anyone wants to do this research please get in touch. If my hypotheses on reading speed and comprehension work out it would be a ‘fun’ exercise to compute an estimate for how much our collective time and money we have wasted because of bad formatting choices.

Why are we wasting time, energy and mental cycles mentally reassembling properly constructed figures from their component parts which we have strewn willy-nilly throughout our manuscripts? To sum up Multi-panel figures are an anachronistic concession to typesetting colour prints to a compact format for printing and they have no place in modern electronic publishing. They ruin the flow of reading a paper. They represent an unreasonable assault on the working memory resources of a reader who is trying to understand what is likely a complex topic and create undue cognitive load for readers who need their full mental faculties about them to grapple with complex scientific ideas.

Additionally, I think that they encourage bad graphical practices. Panels are labelled not with meaningfully interpretable titles, as both our schooling and basic common sense about presenting information dictates they should, be but with alphabet soup. These are bad practices for which we take school children to task in science classes but which we seem to have collectively forgotten in the peer-reviewed literature. Multi-panel figures should be used ONLY when it is actually useful for the understanding of the content for visuals to be placed together. We don’t need to optimize for printing in electronic formats we can have as many full-size colour figures as we need to optimally convey our point. This would be an improvement even if we stick to pdfs as the primary endpoint paradigm for published articles, the case against which deserves its own discussion.

:::

Why rationalists should care (more) about free software

Richard J. Acton — Sun, 23 Jan 2022 00:00:00 GMT

Tux the Linux penguin and the GNU project’s Gnu

cross-posted to lesswrong

Why rationalists should care (more) about free software

especially if you want to upload your brain

In the limit condition freedom of compute is freedom of thought.

As we offload more of our cognition to our computational devices we expose a new threat surface for attacks on our ability to think free of malign or otherwise misaligned influence. The parties who control the computational systems to which you have outsourced your cognition have a vector by which to influence your thinking. This may be a problem for you if their interests are not aligned with your own as they can use this power to manipulate you in service of their goals and against your own.

The fundamental operations of our brains remain difficult to reliably and effectively interfere with primarily because of our ignorance of how to achieve this. This, however, may change as understanding of our wetware increases and subtle direct manipulations of our brain chemistry can be employed to influence our behaviour. A highly granular version of this approach is likely still quite far off but it generally feels more viscerally scary than influencing us via our technology. Surfing the web without ad-block already feels uncomfortably close to the futurama gag about ads in your dreams. Increasing though this is amounting to the same thing. Indeed our technology is already doing this to us, albeit fairly crudely for now, by exploiting our reward circuits and many other subtle systematic flaws in the human psyche.

What is “free” software? Free as in liberty no as in gratuity, as in speech not beer, politically and not necessarily financially. The free software foundation defines free software as adhering to the four essential freedoms which I paraphrase here:

The freedom to run the code however you wish

The freedom to examine its source code so that you can understand and modify it for your own purposes

The freedom to distribute the source code as is

The freedom to distribute modified versions of the source code

Note that code which is ‘source available’ only really gets you freedom 1, depending on how the code is licenced and built this may not get you any of the others including freedom 0. Much ink has been spilt over the use of the term ‘open source’ as not going far enough as a result. Free software is often referred to by the acronyms FOSS & FLOSS (Free/Libre and open source software)

The occasionally controversial but ever prescient Richard Stallman (AKA RMS, AKA saint IGNUcius) has been banging on about the problems of proprietary software for nearly forty years at this point. Having essentially predicted the abuses of today’s software giants because he got a bad printer diver in the early 1980s.

The problem that Stallman saw with ‘proprietary’ software, i.e. software which does not meet the criteria of the four essential freedoms, is one of game theoretic incentives. Making software free serves as a pre-commitment mechanism by the software authors to not abuse the users of their software. This works by empowering users to exercise a credible threat of forking the project and cutting devs abusing their position out of the project and any associated revenue streams. Revenue from free software projects can take a number of forms e.g. premium-hosting, donations/pay-what-it’s-worth schemes, & service/support agreements, though how to successfully monetise free software remains a hard problem.

As the maker of a piece of propriety software, you are not subject to this kind of check on your power and it is often in your interest to increase lock-in to your product from your users to make it hard for them to leave for a competitor, should they become dissatisfied. The lack of transparency on how proprietary software works can also hide a multitude of sins such as bad security practices and provides scope for extensive surveillance of the users whilst maintaining deniability. Thus free software can serve as a solution to an alignment problem between makers and users of the software.

The speculative fiction of Cory Doctorow and Greg Egan in ‘permutation city’, along with the speculative (non-fiction?) of Robin Hanson in ‘Age of em’ has painted pictures of numerous diverse dystopian futures in which software is used to curtail individual liberties, as well as to gas-light, frame control, and otherwise manipulate or abuse people and other conscious entities.

Concerns over these potential abuses have been gaining increasing popular attention in recent years though the emphasis has been placed on Shoshana Zuboff‘s concept of surveilance capitalism rather than framing the problem, as I suspect Stallman would, as having its root causes in non-free software. In particular, the popularity of the Netflix documentary’The Social Dilema’ made in collaboration with Tristan Harris & Aza Raskin’s Centre for human technology has increased public awareness of the problems, solutions, however, remain relatively unspecified.

Computing is becoming ever more ubiquitous, connected and is beginning to be embedded in our bodies, though mostly still as medical devices for now. Whose phone numbers do you know, what about addresses or how to travel there? How’s your mental arithmetic? how good is your recall of your chat history with all your friends - would you notice it if was subtly edited in retrospect? Do you have a voice assistant? When was the last time you left your house without your phone? The more of our cognition takes place external to our brains the more vulnerable we are to the technological capture of our thought processes by misaligned entities. If we do not take measures to ensure the alignment of software makers interests with those of software users we invite dystopias galore.

Over the years there have been many explicit efforts by technology companies to lock general-purpose computing devices to vendor-approved applications (e.g. many game consoles & iPhones). This is often in the name of copyright protection and increasingly in recent years in the name of providing better security. ‘Better security’ of course begs the question, against what threat model? It’s better security against malicious 3rd parties but what if I’m worried about what the 1st parties are doing? It comes down to the question of who holds the keys to the locks. I know I’d want to be the one deciding who’s signing keys I trust to go in the future-TPM-analog of the computer system emulating my brain and given their track records it’s probably not Google, Apple, Amazon, Facebook I’m sorry Meta - rolls eyes, or Microsoft. (The basic competencies, understanding, and good widely adopted low friction systems needed for individuals to be good stewards of their own private keys is a problem in the locked bootloader space as well as the cryptocurrency space.) It is worth noting that at this point in time it is almost impossible and extremely impractical to get a completely free software computer down to the firmware level.

I think a strong case could be made that a ‘freedom of compute’ should be enshrined in future constitutional settlements on par with freedom of speech as a protection of fundamental freedoms, in service to preserving freedom of thought. FOSS development has been discussed in the EA community as a potentially valuable intervention. Developers seem to be overrepresented in the rationalist community so maybe this is a bit of a touchy subject for any of us working on proprietary code. I’m of the opinion that we as a community should advocate for free software and that there is a certain synergy between the free software movement’s goals and those of the rationality community, I’d be interested to hear contrary opinions.

Well-aligned software has the potential to massively improve our lives both at the individual and societal levels, look at what Taiwan is doing with open software in digital governance. Making use of some of the same behavioural modification tricks currently used to sell us crap we don’t need and immiserate us as a side effect so that we can be sold the cure can be turned to good. Helping us to establish good habits, to break bad ones and beat akrasia. To collaborate and communicate more deeply and effectively, instead of more shallowly and ineffectually. To be understood not misunderstood, seen for who we are and not through the funhouse mirror of beautification filters. To build a fun world together, not a depressing and solipsistic one.

Disclosure: I am an associate member of the FSF, and pay them an annual membership fee & the link on ‘beginning to be embedded in our bodies’ is a shamelessly self-promotional link to an episode of my podcast were my co-host and I discuss embedded tech and its implications at length

:::

My First First Author paper!

Richard J. Acton — Wed, 06 Oct 2021 00:00:00 GMT

I published my first first author paper: The genomic loci of specific human tRNA genes exhibit ageing-related DNA hypermethylation Download PDF. Many thanks to my PhD supervisor and main co-author on this Chris Bell for all his help getting this out.

:::

PhD Thesis

Richard J. Acton — Wed, 06 Oct 2021 00:00:00 GMT

Now that my PhD Thesis is available online I thought I would update my blog to point to it in case anyone want to read it. It’s hosted here on github pages and was written in Rmarkdown and built with the bookdown package.

:::

Xenothesis

Richard J. Acton — Wed, 06 Oct 2021 00:00:00 GMT

I have a Podcast with my friend Michael Glinka, It’s called Xenothesis. In it Michael reads Octavia E. Butler’s Xenogenesis trilogy (alternatively known as Lilith’s brood) for the first time and we discuss the books including any scientific topics, especially biological one that come up. I regard Xenogenesis as essential sci-fi for anyone working in the biological sciences.

We have also done two special episodes one discussing cyberpunk and on GATTACA.

:::

Podcast appearance on aging biology

Richard J. Acton — Wed, 06 Oct 2021 00:00:00 GMT

I made an appearance on the bayesian conspiracy podcast to talk about the current understanding of aging in biology and share my thoughts on the field.

:::

Cyberpunk Special Episode

Richard J. Acton — Sun, 03 Jan 2021 00:00:00 GMT

Episode 20 of Xenothesis, the podcast I cohost with my friend Michael Glinka is a special episode on the Cyberpunk genre

:::

GATTACA Special Episode

Richard J. Acton — Sun, 03 Jan 2021 00:00:00 GMT

half cell half ringed planet with GATTACA superimposed

Episode 25 of Xenothesis, the podcast I cohost with my friend Michael Glinka is a special episode on the 1997 film GATTACA

:::

Advanced Rmarkdown YAML headers

Richard J. Acton — Wed, 24 Apr 2019 00:00:00 GMT

Rnotebooks - What and Why?

For anyone unfamiliar with Rnotebooks here is a quick overview of why you might want to use them more experienced users can skip ahead. Rnotebooks are scientific notebooks for R, somewhat like jupyter for anyone coming from python but baked right into the Rstudio IDE which offers some benefits over the browser based interface of jupyter. It permits you to organise your code, notes, reasoning and references in one place. Combining Rnotebooks with a version management system such as git gives a robustness similar paper lab book records when it comes to seeing what you did and when coupled with dynamism, portability, share-ability and ease of backup of electronic working. Rnotebooks use a simple flavour of markdown with options to render output to HTML and PDF (via LaTeX) formats. Rnotebooks also have big pluses for reproducibility, creating an Rnotebook that does, explains and references your analysis makes it very easy to give to another at least somewhat competent R user and have them re-run your analysis - potentially with their own variants. Reproducibility and verifiability are substantial issues in scientific computing, including my own field of biology. A recent article in PeerJ provides a nice discussion of these issues and a look at what the future of scientific computing notebooks might resemble.

Basic Structure

Raw Rmarkdown looks like this:

---
title: "Example Rnotebook" # a yaml header with document properties and options
---

# Introduction

Mardown formatted text, with __Bold__ and *exciting!* Scientific claims
about $in-line math^2$ at least according to @Smith2007.


::: {.cell}

```{.r .cell-code}
print("some actual R code in chunks")
```
:::


saved with a .Rmd file extension

Rstudio of course adds nice syntax highlighting, and various bells and whistles.

The acctual YAML header stuff

Inline R

You can use inline R in the YAML header of an Rnotebook to produce dynamic content. This takes the general form:

option: ""

For example you can include the current date with:

date: "2024-03-16"

I’m partial to the YYYY-MM-DD format due to it’s unambiguousness and nice sorting behaviour but you can of course employ format() to render the date in other ways.

Params

The params option allows you to add arguments to your Rnotebook. The params you add to your header are accessible from within the notebook from the immutable params list. Rstudio makes the contents of this list available in interactive sessions so you can use them whilst working on your code not just when you build the notebook. Note that you can reference params in other options (see).

---
params:
  includeThing: TRUE # set the default to TRUE
---

::: {.cell}

```{.r .cell-code}
print(
    paste0(
        "Some R that is only evaluated if and included in the notebook if",
        " params$includeThing is true"
    )
)
```

::: {.cell-output .cell-output-stdout}

```
[1] "Some R that is only evaluated if and included in the notebook if params$includeThing is true"
```


:::
:::

Document Format Options

For an HTML output these are a few of my favourite options. There are numerous additional options described in the outputs section of the manual, setting the depth of the table of contents for example.

---
output:
  html_document:
    df_print: paged       # print paged tables - like the default 'html_notebook' format
    fig_caption: yes
    number_sections: yes  # prepend x.y style numbering to you sections
    toc: yes              # Add a table of contents
    toc_float: yes        # have to TOC float at the side of your HTML page so you do have to keep scrolling to the top
---

For a PDF output pdf_document can be used instead of html_document though my preferred table format for PDF is df_print: kable. More advanced LaTeX customisations can also be used in conjunction with PDF outputs.

Bibliograghy and Citation YAML options

Placing a bibliography option in your Rnotebook’s header and pointing it to a bibtex file containing your citation information permits you to create citations in Rnotebooks using the following syntax: @Smith2016 for an in-line citation e.g. ‘work by Smith et al. 2016 showed that cheese…’ or [@Smith2016] for a reference like this: ‘assertion (Smith et al. 2016)’, or even lists of citations to be contracted where possible given the citation style e.g. [@Smith2016; @Jones2018], (note the semi-colon list separator) yielding something like this: ‘assertion [1-2]’

I frequently use a header that contains code like this:

---
bibliography: "bib.bib"
params:
  bib: "~/Documents/bibtex/library.bib"
---

The reason I do this is my bibliography has the same path relative to my home directory on my laptop, desktop and computing clusters but the absolute paths differ and these headers seem to prefer absolute paths. Thus, if I compose a notebook on one system it won’t execute on another unless I change the path or use a set-up like this to do so dynamically when building the notebook.

I also frequently set the path to my working directory as a parameter to my Rnotebooks and use relative paths to any files I want to load/write in the body of the Rnotebook so as to achieve similar portability between the different system’s I work on as I get with my bibliography files.

The following chunk sets the working directory for when you ‘knit’ your Rnotebook into the desired format in the first line and for interactive sessions in the second.

::: {.cell}

```{.r .cell-code}
knitr::opts_knit$set(root.dir = normalizePath(params$pwd))
setwd(params$pwd)
```
:::

A note on generating your bibtex file(s). I currently use Mendeley as my refernce manager and it has a nice bibtext output option which is automatically updated whenever you sync (On balance I would probably recomend Zotero to someone starting out afresh with reference management but its bibtex output is not quite as convenient as Mendeley’s)

If you have multiple bibliography files this can be done:

bibliography: [multiple.bib, dotbib.bib, files.bib]

Including a csl option allows you to specify a citation style using the .csl format. The specific citations styles of numerous journals in .csl format can be found here. Including the link-citations: yes option will create hyperlinks from the in-text references to the full citations at the end of the document.

By default the bibliography is placed at the very end of your document, so simply placing a # References header at the end of your document helps to separate your bibliography from the body of your text and puts an entry for it in the table of contents. If however you have some appendices to add after your references placing this HTML snippet in your Rnotebook should set the position at which the references will be rendered:

. Helpfully this will set the postion in both HTML and PDF outputs. (This may not work with older versions of pandoc).

Executing an Rnotebook with params

Whilst you can render your Rnotebook with a one line R command from your terminal if you have a lot of params it can get unwieldy, you may also want to be able to reproduce your render at a later time or even submit it as a job to a batch computing manager. To do this you can create simple bash scripts like the one below to render your Rnotebook.

#!/bin/bash
R --no-save --no-restore <
rmarkdown::render(
  'notebook.Rmd',
  output_file = 'notebook.html',
  params = list(
      bib = "path/to/some/bib.bib"
  )
)
EOF

The --no-save option prevents R from saving your notebook’s R session, and the --no-restore option prevents your Rnotebook from loading whatever random previous R session files you have lying around in your working directory into it’s session.

Full Length Example YAML header

---
title: "A thing I'm Working on - Ideally with a more descriptive title"
author: "Richard J. Acton"
date: "2024-03-16"
output: # Specifying multiple outputs appears to favour the first
  pdf_document:
    toc: yes
    fig_caption: yes
    df_print: kable
  html_document:
    fig_caption: yes
    number_sections: yes
    toc: yes
    toc_float: yes
    df_print: paged
  html_notebook: # This determines the RStuido preview format
    fig_caption: yes
    number_sections: yes
    toc: yes
    toc_float: yes
bibliography: "/root/Documents/bibtex/library.bib"
csl: "/root/Documents/bibtex/genomebiology.csl"
link-citations: yes # make citations hyperlinks
linkcolor: blue
---

Resources

Feedback is always welcome, especially if you spot any mistakes.

:::

An intuitive explanation of inferential distance

Richard J. Acton — Sun, 26 Nov 2017 00:00:00 GMT

Let us represent our beliefs as nodes in a network. Most nodes in your network of beliefs have dependencies, they are connected to other beliefs the truth value of which they are contingent on.

Picture a subset of your network of beliefs, let us take a simplified section that looks like a tree, stripping away the complex interconnections to aid in visualization.

Beliefs B and C are predicated on belief A, beliefs D and E on belief B, and beliefs F and G on belief C. Thus belief A is need for beliefs B and C, belief B for D and E, and C for F and G. (see the figure)

The One: Apples grow on trees - lets us call this belief F.

The Other: What are trees? and why are you labeling your beliefs with letters?

The one is trying to convince the other of F but the other lacks belief C (the belief in trees), the one is unable to successfully graft the branch of their tree containing F onto the other’s tree as their is no suitable node at which to affix it.

The one must then traverse back down their tree to node C and first convince the other of C. The one made the mistake of thinking the inferential distance between themself and the other was but 1 node when in actuality it was 2. The one must convince the other not merely that apples grow on trees but must also introduce the concept of trees.

The One: Trees are large upright plants with stiff bodies due in part to large amounts of lignin in some of their cell walls, a property not shared by they bendier cousins. There is one over their with apples growing on it. (points to a nearby apple tree)

Assessing the inferential distance between yourself and your interlocutor can often be accomplished by the asking of questions which prompt an explication of their belief structure. It also helps here if you have exercised good epistemic hygiene and can clearly identify the path you must take in order to converge. The concept of inferential distance is interestingly related to finding the double crux of a disagreement, you must traverse back through your belief network until you find the relevant point of difference in your network or the genuine source of your disagreement. Indeed inferential distance may be measured by the number of iterations of the double crux game algorithm you have to execute find the crux. Once the extent of the gulf between you has been assayed you may begin to devise a path to traverse it.

There is a distinction to be drawn between adding a new node to an interlocutor’s belief network when the desired new node is not in obvious tension with other beliefs held by the interlocutor, and the condition where you are attempting to displace an existing node with a different mutually exclusive one. The former is often much easier than the latter.

The Other: There is not such thing as trees.

The One: let us call that belief C’, What then is yonder plant with apples growing on it? (points to a nearby apple tree)

The Other: That is just an unusually large fern. The One: sigh

The Other believes C’ which they are aware is mutually exclusive with C.

Playing the double crux game with those not initiated in it (the weak version) can can be challenging. Taking a Socratic approach with genuine expressions of curiosity about what exactly your interlocutor believes can be effective at causing your interlocutor to notice when they are expressing a belief in belief as opposed to one that pays rent in anticipated experiences as they should be encouraged by your curiosity to explain to you exactly why they believe what they believe.

Depending on your interlocutor and subject you may wish to begin incrementally to bridge the void, node by node. It is a common error to attempt to plant whole sections of your belief tree in infertile soil and to grow frustrated when it fails to take root. Complex, obscure or controversial issues are frequently best tackled stepwise, especially when in dialog with someone not yet explicitly committed to a rationalist approach or otherwise high in Doxastic Openness.

Some branches of the tree which you may wish to graft onto others will have deeply rooted dependencies which run back to fundamental epistemic differences and will present a major challenge to make them successfully take. Some branches however have shallow dependencies and can easily be transfered even between those with but a single necessary belief in common. This can be strategically important as people may believe the same thing for different or even invalid reasons and pointing this out can be detrimental in the short term if you are running a “Mothers Against Drunk Driving” style single issue campaign (Potential Dark Side Applications Warning).

See This post at Less wrong

:::

Cognitive Bias Cards!

Richard J. Acton — Sat, 08 Jul 2017 00:00:00 GMT

Check out the Cognitive Bias Cards Repo for a deck of cards featuring 104 cognitive biases.

Among other things these can be used to play Biased Pandemic: http://lesswrong.com/lw/ar2/biased_pandemic/

:::

*Pan narrans* - the story telling ape

Review Bounties

As an alternative to this way of working I propose ‘review bounties’

Extensions to the review bounty model

Integration into a larger picture of publication workflow reform

Reuse

Podcast appearance on fixing academic publishing

Showing our working

Why show our working?

Are we any good at showing our working? How hard can it be?

How can we do better?

What’s needed for data & methods to be open and reproducible?

FAIR (Findable, Accessible, Interoperable, Re-usable)

How can we encourage the adoption of these practices?

Crafting a ‘pit of sucess’

Where Can I Start / Learn More?

References

Reuse

My new ebook ‘Data: Inception to Publication & Beyond’

FLOSS Exobrains Braindump

meta (apps / platforms / tools)

Private comms

Ageing & Immortality Special Episode

Performant R - how to do things faster in R

Why multi-panel figures are terrible & we should stop using them.

Specific hypotheses I’d like to see tested:

Why rationalists should care (more) about free software

Why rationalists should care (more) about free software

My First First Author paper!

PhD Thesis

Xenothesis

Podcast appearance on aging biology

Cyberpunk Special Episode

GATTACA Special Episode

Advanced Rmarkdown YAML headers

Rnotebooks - What and Why?

Basic Structure

The acctual YAML header stuff

Inline R

Params

Document Format Options

Bibliograghy and Citation YAML options

Executing an Rnotebook with params

Full Length Example YAML header

Resources

An intuitive explanation of inferential distance

Cognitive Bias Cards!

Pan narrans - the story telling ape